Data scientists are the new rock stars of Silicon Valley. At a time when the words “big data” provide the fastest path to a VC’s wallet, guys like Cameron Marlow and Jeff Hammerbacher are finding themselves the subjects of laudatory articles that are turning them into pin-ups for the petabyte set. (Let’s not call them “petaphiles.”) The recent favorable press has also made data analysis framework Hadoop trendy, if not downright sexy.
But one up-and-coming founder warns that we shouldn’t be seduced by Hadoop – a platform that makes it possible to analyze huge sets of data on a broad level – and that the future of big data lies in a more narrowly defined approach.
At this point, it’s important to note that this man is Suhail Doshi, founder of mobile analytics company Mixpanel, which in May received $10 million from Andreessen Horowitz and Max Levchin in Series A funding. It’s also important to point out that Mixpanel is today launching a new feature called “People,” which takes data analytics down to its most micro level: The individual human. So there’s more than a little self-interest at play here. But, with those caveats, let’s carry on.
MixPanel’s “People” feature lets site or app owners track users that have similar profiles and allows them to interact with users on an individual basis by email (with appropriate consent). That takes customer relationship management to its extreme. At its most basic level, though, the service allows site or app owners to track individual user behavior over time.
This granular approach within the vertical of mobile analytics, reckons Doshi somewhat unsurprisingly, is the best way forward for big data in most cases. That also applies to companies such as ChartBeat, Marketo, Eloqua, and RJMetrics, which have all picked a specific sector of the market on which to focus.
On the other hand, the Hadoop approach – favored by Hammerbacher’s Cloudera and Platfora – Doshi contends, is too general. “A lot of people in big data are very hellbent on the idea that we just need tools to find insight, and once we find that insight we’ll make billions of dollars,” he says.
That line of thinking has led to an industry of Hadoop consultants and data scientists, when for most companies simpler solutions are at hand. “It’s all kind of wrong,” says Doshi. “It’s all being productized in a completely wrong way.”
With Hadoop, a lot of work is required to even get started. “Hadoop is a strange tool. You have to write a program, and then you have to write a program on that data, and then you’ll get some numbers, and you have to interpret that data yourself.”
For organizations that just need to get through a mountain of data each day – the likes of Google, Netflix, Amazon, and Facebook – Hadoop is perfect, says Doshi. But for finding insight, it’s not so great. He brings up Google’s PageRank for search as an example. Lots of people think of it as a general algorithm, he says, but in reality PageRank is probably hundreds of algorithms, each customized according to what type of search you’re doing – weather, vacation research, country data, and so on. When none of the custom solutions work, says Doshi, PageRank serves as a good fallback. “Big data should be solved in the same way.”
Despite his criticisms of Hadoop, however, Doshi might find that the likes of Cloudera and Platfora don’t strongly disagree. “It’s perfectly valid,” says Charles Zedlewski, Cloudera’s vice president for product, of Doshi’s view. But he also says the problem isn’t new. “If you think about before Hadoop, this is a very old contrast,” says Zedlewski. People have long faced a choice between buying and building infrastructure as a way to solving different problems. “Both of those are very large and vibrant markets to this day.”
Zedlewski uses Omniture as an example. Like Cloudera, it processes data from clicks. But if offers a pre-packaged set of reports, whereas a Hadoop system allows more flexibility down the line. So if a customer wants to ask lots of different questions or have more choices in the long term, it might be best served by working with Hadoop. In that way, the offerings serve different needs. “I think both models are going to thrive for a long time to come,” says Zedlewski, who has worked in the past at both SAP and BEA, which have comparably diverging worldviews.
In fact, Zedlewski concludes, we’re going to see more systems ultimately relying on Hadoop anyway. He cites Qualcomm, which built messaging applications for telcos on Hadoop, as example of what the future will look like. “You’re really going to be consuming Hadoop either way. The question’s only whether you want to customize it, or whether you want to consume a packaged application.”
[Image supplied by Mixpanel]