Boiling Down the Open Data Platform Debate

February 25, 2015 | By Jeff Kelly |

Analysis, Big Data

The creation of the Open Data Platform has resulted in a pretty vigorous debate among industry players about the need (or not) for such an industry consortium to help speed Hadoop adoption in the enterprise.

Rather than take up a lot of space detailing the pro and cons of the Open Data Platform (you can watch John Furrier and I discuss it here and watch my presentation at #BigDataSV covering the topic here or see embedded video at the end of this post), let me just boil the debate that’s been playing out very publicly down to the core issue: money.

The overall Big Data market will top $50b in 2017, but the slice of the market for Hadoop is just 3% of that larger figure, and that includes Hadoop professional services. If you look at just Hadoop software and support subscription revenue, that slice of the market in 2017 will be somewhere around $677m. Cloudera and Hortonworks, who collectively have raised over $1.5b and are therefore under enormous pressure to deliver significant returns to their investors and shareholders, are understandably fighting tooth and nail for every penny. The two companies are taking very different approaches in this battle, however, which is why there was such an outcry over the Open Data Platform.

In our view, Hortonworks, from day one, has focused on developing a free enterprise-grade Apache Hadoop distribution and monetizing via technical support subscriptions. While Hadoop itself is a revolutionary approach to data processing, by itself it isn’t of much use to mainstream enterprises. They require analytics and applications that work in conjunction with Hadoop to derive insights from all that Big Data. Hortonworks’ goal is to become the de facto standard Hadoop distribution for the enterprise and enable partners to innovate around the core. The Open Data Platform is a vehicle to further execute this strategy.

We believe Cloudera’s strategy is to try and own most, if not all, of the Big Data stack. By Big Data stack, we mean HDFS, MapReduce, Hadoop management and governance software, SQL-on-Hadoop database, Hadoop search, Hadoop-based analytics tools and data visualization software. The company, in our view, is competing with Hortonworks for Hadoop distribution-related revenue, but the company has also developed its own proprietary management software, developed and open sourced its own SQL-on-Hadoop engine, and is acquiring machine learning, analytics and data visualization companies to round out its stack offerings. The Open Data Platform is naturally an anathema to Cloudera because the ODP’s goal is to enable Cloudera’s self-described competitors, Pivotal and IBM, to better compete with Cloudera!

The non-Hortonworks members of the ODP rightly recognize that the way they will make money in Big Data is not by competing for their slice of the relatively small Hadoop distribution pie but by developing and selling analytics & applications software as well as services to enable enterprises to unlock all the value in Big Data. By comparison, while the Hadoop slice of the Big Data pie will top out at about $1.7b in 2017, the combined market for Big Data analytics software, applications and professional services will reach $25b. That’s why Pivotal, for example, laid off its Hadoop distribution team last fall – the real money is in analytics, apps and services. Not to say I told you so, but as far back as last July I suggested Pivotal forget trying to monetize their Hadoop distribution, throw their support behind Hortonworks and focus on building a business up the stack around Big Data analytics and applications.

why doesn’t @pivotal just go all in with HDP for #Hadoop layer and focus on driving revenue up the stack with HAWQ, GemFire XD etc.?

— Jeff Kelly (@jeffreyfkelly) July 28, 2014

Because of its chosen business-model, Cloudera is now fighting what is effectively a two-front war. It’s battling Hortonworks for Hadoop distribution revenue on one front, and it’s fighting IBM, Pivotal, Teradata and others for analytics and applications revenue up the stack. Establishing itself as the leader in enterprise Hadoop (the former) is required to open up much more lucrative revenue opportunities up the stack (the latter.) And for the non-Hortonworks members of the Open Data Platform, the new consortium is a vehicle for executing their Big Data monetization strategies and an effort to isolate Cloudera (and its largest investor Intel.)

Which approach – the Hortonworks-Open Data Platform approach or the Cloudera’s “Own the Stack” approach – is better for the development of the Big Data market and for enterprise practitioners is open for debate (a debate that should and, I have no doubt will, continue.) But from a business perspective, all the blustering around the establishment of the Open Data Platform is really about money.