Premise: The software, hardware, and professional services markets for big data are still in formative stages, with packaged applications emerging only slowly. But the demand for solutions is sufficiently large and growing fast enough to support a number of large vendors with a variety of packaging and go-to-market approaches.
The big data market exploded into the analytics universe about ten years ago and has been expanding at an impressive rate since, filling up any space not occupied in the analytics market, pushing established analytics categories aside, and weaving it’s way into a number of critical transaction application domains. Generally, the market looked pretty much the same in all directions: open source Hadoop and a mishmash of analytic pipeline tools. Today, however, the big data community is gaining significant experience regarding what does and doesn’t work and proto-segments are starting to take shape:
- IBM, Palantir, Accenture, Teradata offer consulting-led template solutions. These vendors are focused on creating semi-custom templates with varying degrees of repeatability. Each engagement typically contributes additional intellectual property that makes future engagements more repeatable and less custom. Some of that intellectual property is contributed back to the big data community as open source artifacts.
- Microsoft, Oracle, SAP, Splunk offer big data tools and application-led platforms. The tools and platforms that support big data applications are undergoing the most rapid innovation the data management sector has seen in decades. Multi-product platforms that can handle different workloads and data types are growing at the expense of individual products. In addition, pricing models and levels are dropping dramatically under pressure of cloud metering and open source. Notably absent from this list? Cloudera, Hortonworks, and MapR, each of which continues to search for a business model that turns users into profitable revenue.
- Dell and HPE will offer converged infrastructure. Private cloud is going to be with us for many years, even as it starts to adopt some of the architectural elements of hyper-converged infrastructure rapidly emerging in the public cloud. However, private cloud won’t remain “vanilla.” HPE and Dell Tech, as leaders in this sector, will package their offerings to appeal to specific categories of buyers, including big data and IoT.
Table 1 shows that the overall Big data market grew 22% from $18.3 billion in 2014 to $22.6 billion in 2015. The overall growth rate of the top 10 vendors was 24%, and the remaining vendors grew slightly slower at 21%.
Vendor | 2014 Worldwide Total Big Data Revenue | 2015 Worldwide Total Big Data Revenue | 2015/2014 Growth Rate | 2015 Market Share |
IBM | $1,771 | $2,104 | 19% | 9% |
SAP | $720 | $890 | 24% | 4% |
Oracle | $628 | $745 | 19% | 3% |
HPE | $578 | $680 | 18% | 3% |
Palantir | $544 | $672 | 24% | 3% |
Splunk | $451 | $644 | 43% | 3% |
Accenture | $390 | $507 | 30% | 2% |
Dell | $418 | $489 | 17% | 2% |
Teradata | $288 | $432 | 50% | 2% |
Microsoft | $289 | $396 | 37% | 2% |
All Other | $12,241 | $15,046 | 21% | 67% |
Total | $18,319 | $22,605 | 22% | 100% |
Table 1: Top Ten Vendors by 2015 Revenue ($million) Source: © Wikibon 2016, Big Data Market Shares 2015, Derived from Table 2 in Appendix |
The document linked here describes Wikibon’s definition of big data and big data categories. This document also serves as our methodology underpinning for our big data market forecasts and other related forecasts, as well as vendor big data market shares.
Service-led Plays
IBM is focusing most of its growth expectations on its analytics and machine learning businesses. Just like its pivot 16 years ago to embrace open source, when it migrated its middleware platforms and tools to Linux, today it is migrating most of its analytic tools and platforms to Spark. IBM uses its tools and platforms to build customizable industry-specific templates based on deep industry expertise. The expertise comes from years of serving large enterprises across many industries with its consulting organization. Each engagement deepens IBM’s domain expertise and makes the templates richer. While each new customer gets the benefit of the incremental functionality, IBM gets to carry the cumulative expertise forward. Expect to see IBM fill out its template solutions “library” with a portfolio of vertical and horizontal solutions with premium pricing for Big Blue accounts. One potential weakness is that its cloud strategy is built from top-to-bottom on IBM technology: from infrastructure through the application platform to the data management layer to analytics. IBM will need to keep the entire stack competitive with AWS, Azure, and Google or potentially put its differentiation in analytics at risk.
Palantir launched itself as the pioneer in solving extreme challenges in machine learning for customers in defense and intelligence, among others. Based on estimates that their revenues exceed $1bn primarily on solutions built on custom programming, their value proposition clearly has some merit. But its growth period preceded the explosion of independent machine learning tools. Putting these tools in the hands of vendors like IBM may make a services-only approach less competitive. The key question will be to what extent they can build their solutions on the rising tide of 3rd-party technology.
Accenture is like IBM, but unconstrained by the need to use in-house technologies. It can serve horizontal functions and vertical industries with solutions it assembles from a virtual bazaar of 3rd-party technology. The core of its custom templates is an analytic record, which defines the data required to support the analytics in its solutions. Its biggest challenge will be to keep up-leveling its solutions as the supply of 3rd-party technologies get more repeatable and requires less customization, integration, and change management. At some point Accenture may return just to the business of implementation and change management consulting, as it did for client server enterprise applications. But that day is likely far off since big data packaged applications are emerging very slowly.
Teradata gave birth to business intelligence and big data. Its business model of selling appliances delivered both maximum performance through tight integration with its database and high absolute gross margin dollars. Both powered its rise to prominence. Now, Hadoop and open source pricing are turning that business model inside out. Teradata offers a version of the database that can run on non-Teradata hardware, which means it can no longer expect to capture higher margins on its hardware. An explosion of open source Hadoop-based MPP SQL engines for business intelligence have brought pricing way down and started siphoning off workloads from the more carefully curated data warehouses. Teradata is answering the challenge by targeting three audiences, changing its delivery and pricing model, and transitioning from platforms to solutions. The three audiences include the traditional business analysts focused on business intelligence, data scientists originally targeted by the rich functionality in its Aster Data acquisition, and Hadoop for data scientists who want industry-standard functionality and can manage the complexity that comes with it. The delivery and pricing model is also going through a major transformation. Teradata will offer versions of its flagship database as well as Aster on AWS and Azure over the next several quarters with fully elastic scalability and metered pricing. And, finally, it is focused on going to market like IBM with consulting-led solutions, not platforms. This transition will be the hardest because changing the skills and relationships for any infrastructure sales force is extremely difficult. But in Teradata’s favor it has thousands of consultants who can lead the transition.
Tools and Application-based Platform-led Plays
SAP shares a unique position with Oracle in having a mission critical database capable of big data analytics, the HANA in-memory database, and the industry’s broadest suite of packaged enterprise applications. The combination of applications and a database should drive adoption synergistically. Two issues have slowed HANA’s adoption. First, a database can take many years to mature for mission critical uses and HANA is no exception. Second, and more unusual for an application vendor, SAP’s own core application suite, which hasn’t leveraged big data in the past, is slowly being rewritten over a period of years to take advantage of HANA. But SAP has also been slow to build new applications that leverage the in-memory speed and big data analytics capabilities in HANA. SAP has also been evangelizing HANA to 3rd-parties, but adoption has been slow. HANA’s pricing puts it out of the reach of buyers accustomed to the open source ecosystem, though it’s comparable to Oracle’s flagship database. Expect to see SAP emphasize its complementary HANA Vora database that works with data stored both in HANA and Hadoop in order to open more opportunities.
Oracle’s core 12c DBMS remains the gold standard for mission critical systems of record. However, it faces two challenges: (1) scalability on a shared-nothing cluster and (2) pricing. With respect to shared-nothing scalability for transaction processing, 12c has nothing to offer (nor do any of the OLTP database incumbents), and it’s not clear when it will. Oracle’s appliances stretch as much absolute performance from a shared-storage database as is possible. While the appliances deliver absolute scale, they also raise the second challenge, pricing. For those who want a price level closer to the open source ecosystem, Oracle offers the very low-end MySQL database. However, Oracle recently added a federated query capability to 12c that lets developers seamlessly access data in Hadoop, including their version of the Cloudera distribution, and NoSQL databases such as Cassandra. This query engine should allow Oracle to be competitive when their customers want to extend their solutions with open source databases. As far as Oracle’s enterprise applications, they seem to be adding big data functionality opportunistically, just like SAP.
Splunk is the anti-Hadoop. It is an application framework for analyzing large volumes of machine data that works out of the box. It is easy to collect, analyze, and present data without the complex setup, administration, and development associated with Hadoop. Splunk also sells several packaged solutions including security and operations management. While Splunk’s greatest strength is its “no muss, no fuss” simplicity, that same strength by necessity makes it less easy to stretch to unanticipated use cases. Splunk also has a relatively high price point, which might become problematic as data volumes continue to grow exponentially. Splunk is adding more general purpose capabilities, starting with the MongoDB database, to support a wider variety of applications. As long as it can continue to maintain its out of the box simplicity, expect to see it as the most direct competitor to Hadoop on-premises.
Microsoft has come from behind in big data and the cloud by leveraging its on-premise enterprise products. Unlike Amazon and Google, Microsoft can now build enterprise products on Azure first and then periodically “snapshot” them into packaged products for on-premises deployment. That is only the first step in their strategy of offering hybrid solutions. They are also building a PaaS layer where its many data management and analysis products offer a data management “fabric” where each component reinforces the rest. The hybrid-cloud equivalent of Visual Studio and Systems Center will greatly simplify developing for and managing this platform. Microsoft will be able to make the boundary between private and public cloud very hard to distinguish. In addition, more than any vendor, Microsoft will have great flexibility in pricing when bidding against either on-premises or public cloud competitors.
Converged Infrastructure Plays
HPE’s big data software offerings to date have largely centered on the highly regarded Vertica MPP SQL database it acquired. How the Autonomy technology that handles semi-structured data fits in has been less clear. Recently, HPE dropped its efforts to build an IaaS cloud and ported its HAVEn machine learning APIs to Microsoft’s Azure cloud. The API’s prospects aren’t clear since standalone machine learning tools and frameworks are widely available from dozens of vendors. Integrating the API’s with data sets that turn them into horizontal or vertical solutions would be a promising next step. After HPE dropped its plans to compete with AWS it has continued to sell infrastructure for private cloud deployments.
Dell Many people assume that x86 servers are low-margin commodity items. The reality is that Dell’s server business can take advantage of EMC’s higher-touch channel to sell sophisticated server networks to sophisticated customers. And as long as EMC’s products continue to be sophisticated, EMC will still need its high-touch channel. More intriguing, as systems continue to evolve from server-centric to storage-centric, the combined company will be able to design hyper-converged solutions better than either company on its own for the large portion of the market that still wants and needs to manage private clouds. EMC also has a targeted but vibrant presence in analytics that Dell could turn into something higher volume.
Action Items: The big data universe has been undergoing a period of “adaptive stretch” since exploding into the analytics space. Users should identify their big data starting point — services, application platforms, or converged infrastructure — and establish the strategic relationships they need to progress with big data investments.