Hadoop Pure-Play Business Models Explained

December 12, 2013 | By Jeff Kelly |

Analysis, Big Data

The competition between Hadoop pure-play vendors continued unabated in 2013, with each vying for the top spot in a market expected to top $1.6b by 2017. The three competitors – Hortonworks, Cloudera and MapR – have unique approaches to commercializing/monetizing the open source Big Data framework. Significant confusion remains in the market about just what each approach entails, particularly around Hortonworks’ business model.

With that, let’s look at each of the three competitors and their respective business models as 2013 comes to a close.

Hortonworks: Not All “Services” Are Create Equal
As mentioned, Hortonworks’ business model has caused the most confusion among market watchers. The company released its flagship Hadoop distribution, called “Hortonworks Data Platform”, in the summer of 2012. Hortonworks makes just one version of its platform, which is 100% open source Apache-based, and makes it available to all for free download. The company makes its money through annual subscription services contracts and education/professional services, and here is where the confusion starts.

Currently annual subscription services revenue makes up about 70% of Hortonworks overall revenue, with education/professional services revenue making up the remaining 30%. The company’s eventual goal is an 80/20 split. The confusion stems from the misunderstanding of the term “services.”

Many believed Hortonworks, without any software license revenue, set out to build a professional services shop, with an army of consultants marching out to customer sites to train Hadoop practitioners and perform other hands-on, manpower-intensive services. This indeed would be a very difficult business to scale for a variety of reasons, including that specialized Hadoop consultants are both rare and expensive. Professional services can be a lucrative business, but easily scalable it is not.

That is not the business Hortonworks is building, however. Rather, Hortonworks is a software company, albeit one with an open source twist. The vast majority of its revenue comes via annual subscription services contracts, which is the equivalent of maintenance and supports contracts standard with virtually all enterprise software. Hortonworks offers three levels of support subscriptions – standard, enterprise and developer – which offer varying levels of phone and Web support as well as access to software patches, updates and upgrades.

This business is undeniably scalable, though with relatively low margins since it lacks the software license revenue to compliment the maintenance & support revenue. For Hortonworks’ to win the Hadoop market, it must therefore extend the reach of HDP to as many enterprises as possible, putting it in position to win support contracts en masse when Hadoop proofs-of-concept move into mission-critical production. To help do this, Hortonworks has aggressively developed its channel, striking reseller agreements with major enterprise software vendors such as Teradata, Microsoft and SAP.

Here too lies a challenge, however. To make the most of the channel, Hortonworks’ resellers must incentivize their sales forces to sell the low margin HDP support contracts over lucrative and in some cases competing software licenses. This is particularly true in the case of Teradata, whose data warehouse business is under significant threat from Hadoop. That said, there is no reason Hortonworks can’t build a successful business around annual subscription support revenue. It’s all about execution and patience as its low margin support business puts price pressure on competitors.

Cloudera: Proprietary and Proud of It
Cloudera’s business model follows more closely to the conventional open core enterprise software model practiced by vendors in other software markets. That is, Cloudera’s Hadoop distribution, called “Cloudera’s Distribution Including Apache Hadoop (CDH)”, is 100% Apache Hadoop*. Cloudera packages CDH with varying levels of its own proprietary Hadoop management software into two forms of its platform. Cloudera Standard is made up of CDH plus a limited version of Cloudera’s proprietary management software and is available for free download. Cloudera Enterprise is made up of CDH plus the full version of Cloudera’s proprietary management software and requires an annual for-pay subscription.

But that is only half of the story. Cloudera has been steadily developing add-on modules to its Hadoop distribution intended to extend the functionality of its platform, which it is marketing as an Enterprise Data Hub. These add-on modules include Impala for SQL-like interactive queries, Search for Google-like search functionality, and Navigator for data lineage, access management and auditing capabilities. Some of these add-on modules are open source and free to use (Impala and Search), but support from Cloudera requires the purchase of additional subscriptions.

Unlike Hortonworks, which is betting on the open source community (of which both Hortonworks and Cloudera engineers are an active part) and partners to extend the functionality of Hadoop, Cloudera has set about on its own to make its Hadoop-based platform a comprehensive data management and analytics platform. This is both daring and risky, with a high risk/reward quotient.

By taking this approach, Cloudera is taking on much larger, more mature and much more powerful foes in the database and analytics space such as IBM and Pivotal. It is also opening itself up to questions from customers about the potential for vendor lock-in – a particularly sore subject for many enterprises that have been paying through the nose to database vendors like Oracle over the years.

Further, it is logical to assume that Cloudera will spend much of its R&D budget on its internally developed components like Impala and Search rather than on the core open source components of Apache Hadoop. Interestingly, Cloudera has always been extremely vocal about the open source Apache nature of CDH, quick to defend its platform against critics that used the dreaded ‘P’ word to describe it. That all changed in October, when CSO Mike Olson penned an article on LinkedIn detailing the central role Cloudera’s proprietary componentsnow play in the company’s strategy. Whether this will impact morale among the company’s developers and engineers, many of whom live and breath open source, is an open question.

Having said all that, Cloudera has a number of factors in its favor. The company was the first commercial Hadoop vendor to hit the market, giving it a three-year head start over Hortonworks. During that time it has built a strong stable of enterprise customers and an undeniably robust platform. Its model also requires customers to pay for its fully functional platform and related services, as well as for-pay support subscriptions for add-ons such as Impala, giving the company more diverse and higher margin revenue streams. Finally, it’s vision of a comprehensive data management platform, while daunting to achieve, is definitely compelling and likely appealing to enterprise CIOs tired of managing cluttered, complex and confusing data management infrastructures.

If it can indeed sell customers on its Enterprise Data Hub vision — and execute –Cloudera could potentially pull away from its pure-play Hadoop competitors … where IBM and Pivotal will be waiting.

MapR: Leave Your Ideology at the Door
While getting significantly less press attention than its two larger competitors, MapR has operated under the radar building its business by focusing on enterprise customers ready to move to production Hadoop deployments. The company doesn’t waste much time with those just kicking Hadoop’s tires, nor does it subscribe to any type of open source ideology.

From a technology standpoint, MapR has always stated its goal is to provide enterprise-ready, production grade Hadoop for analytics and transactional workloads. To do so, and to the horror of open source purists everywhere, the company developed its own Hadoop distribution that replaced some open source components with MapR’s proprietary bits. The most notable of these proprietary components is Direct Access NFS at the storage layer to provide real-time data access. It’s important to point out that MapR has also embraced important open source Apache projects, such as Apache Drill, and ensures that its Hadoop distributions are 100% API compatible with Apache Hadoop.

MapR’s approach could be classified as a practical one, utilizing whichever software/components – open source or proprietary – the company’s engineers think will most benefit its platform and its customers in the here and now. There’s no room for ideology at MapR, one way or the other.

From a business model standpoint, MapR makes its basic Hadoop distribution, called “M3”, available for free download. It then sells two enterprise editions – “M5” and “M7” – via subscriptions. Both M5 and M7 include advanced features not available in the free M3 edition such as mirroring, snapshots and NFS high availability, as well as support & maintenance services. M7 also includes support for Apache HBase and related proprietary components to improve performance and reliability.

By all accounts MapR’s Hadoop distributions provide the high performance and reliability the company claims, and its customer count recently crossed the 500 mark. The challenge for MapR continues to be its ability to differentiate from its larger rivals and stand out in a very noisy market. Another risk is that as open source Hadoop gains parity with MapR’s proprietary components, specifically around high availability and performance, MapR’s value proposition will diminish.

Action Item: The pure-play Hadoop vendors will continue to battle it out in 2014, all three with an eye towards either an initial public offering or possible acquisition in the next 12 to 18 months. In addition to the particular business models of each pure-play vendor, other important factors that will play roles in the Hadoop market in 2014 are the impact of the cloud generally, and AWS specifically, on new Hadoop deployments, and how aggressively larger vendors such as IBM and Pivotal mark down their own Hadoop wares to both undercut the pure-play vendors and serve as gateways to their higher-margin database products.