In a digital business, moving data is a strategic competency. However, the tests of competency aren’t bulk, batch, and basic, but compact, consistent, and cost-effective.
Digital businesses run on distributed data. If the right data isn’t where it needs to be at the right time, a digital business will do the wrong thing, every time. Traditional data movement technologies aren’t well suited for many digital business needs. They tend to be tool specific, which perpetuates data silos; bulk oriented, which drives data communications costs; and operate in batch mode, which can generate onerous business latencies. A new breed of data movement tools is emerging to address these issues. The new tools support “dynamic” data movement: continuous, high-volume movement of data among complex, high-value distributed systems. At the high-end serving the most challenges types of workloads, the market for this new breed of tools is small, but poised to grow 39% through 2021, to US$339 million (see Table 1). Growth and success in the continuous data movement market will be fueled by:
- Distribution of workloads. The cloud gold rush continues, as businesses consider moving additional workloads to the cloud. Big data analytics and even legacy applications are in play. Over the next few years, the creative capacity of business people and developers will catalyze an explosion of use cases, many of which will overwhelm today’s most popular data movement technologies.
- Architectural conventions. The best data movement is no data movement. However, data will be more distributed as businesses use digital technology to not only evolve technology infrastructure (e.g., cloud and on-premise) and application portfolios (e.g., SaaS and licensed software), but also – crucially – business models and partnerships.
- Technology advances. The fundamental costs of computing are moving data and sustaining state of that data. Historically, the most advanced data movement technologies operated within monolithic applications (e.g., from database to app server). Intersystem data transfers tended to bulky and made few consistency promises. New data movement technologies are emerging that are more fine-grained and capable of supporting highly distributed data without sacrificing high performance or scalability.
2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | CAGR | |
All Dynamic Data Movement | 600 | 660 | 726 | 800 | 878 | 966 | 1063 | 10% |
High-Performance segment | 48 | 66 | 91 | 125 | 173 | 241 | 339 | 39% |
High-performance % of total | 7 | 10 | 13 | 16 | 20 | 25 | 32 |
Table 1: Forecast for High-Performance Dynamic Data Movement Software
Greater Distribution of Workloads Is Certain
As digital business imperatives advance, three things are now certain: death, taxes – and greater distribution of data. The spending for infrastructure technologies generally is shifting to cloud, as companies like AWS, Microsoft, Oracle, and Google implement increasingly robust cloud technologies (see Figure 1). But as compelling as cloud economics are, perhaps a more important driver of workload distribution is the drive to new digital business models that seek out new sources of productivity via data-driven partnerships. New approaches to institutionalizing work demand unprecedented infrastructure plasticity which many traditional infrastructure and data movement technologies cannot support at any scale.
Dynamic data movers will employ a variety of technologies to ensure data is available when, where, by whom, and in what form it is needed, including copy management, replication, and streaming services, among others. Technology advances will drive each of these techniques forward. For example, Actifio is advancing copy management to provide an enterprise data-as-a-service offering; Wandisco is resetting the state-of-the-art in data replication with a continuous replication technology that guarantees transactional ordering at all data sinks. Ultimately, data movement technologies will evolve in response to application and business model use cases. These include:
- Big data analytics. The best business solutions are incorporating many different data types from a significant variety of internal, partner, and 3rd-party sources. Adding these sources will require data movement technologies that can rapidly move large data sets while sustaining metadata regarding ownership, format, integrity, and control.
- Intercloud communication. The average enterprise has at least 2-3 cloud supplier relationships. Standards for moving dynamic data between clouds are modest. This will be an area of rich invention for the next few years, but users with high-end continuous and consistent data needs are starting to utilize dynamic data movers. This requirement only increases as enterprises adopt true private cloud technologies for extending cloud services on premise.
- Data-as-a-service. The market for third-party data is evolving rapidly as regulations and business practices change. In some domains, businesses can secure highly targeted records in low volumes (100s) for low prices (<$10). As data-as-a-service options improve, the need for secure and highly controlled dynamic data movers increases.
- Automation. One of the most important use cases will be compressing the time of feedback loops among operational, analytic, and engagement applications. These applications will present some of the most sophisticated requirements for dynamic data movement.
Data Movement Costs Are Real and Poorly Understood
How much does it cost to move data? Powerful groups in business, including most executives and a surprising percentage of application developers, don’t really know. That’s why many presume that all data, from all sources, will end up in the cloud for processing, but the truth is quite different. Data movement costs take multiple forms. The three greatest are:
- Physics. The costs generated by physics are latency, bandwidth, and fidelity: data movement is not instantaneous, not unlimited, and is subject to errors. Typically, the option that does not include data movement is the best option. Thus, our research shows that as much as 90% of data generated in-place will stay in-place, especially given current data networking costs.
- Data network suppliers. For many use cases, network suppliers can comprise the greatest application cost. For example, internet-of-things (IoT) applications that feature significant edge processing can experience dramatic differences in total costs due to data movement costs (see Figure 2).
- Regulation. An array of data movement regimes set privacy, reporting, or other local laws. Today, the software manifestation of the rules dictated by those regimes typically run within applications. Dynamic data movers will evolve the capabilities required to enact these rules within movement events.
These costs are not trivial and can be the difference between business success and failure. The high-end of the dynamic data movement market is appealing directly to enterprises suffering from limits of traditional data movement technologies as they push boundaries on distributed application complexity. Products that can deliver fine-grained data with near certainty over distance and operational scale are poised for significant growth. Among these are Oracle’s Goldengate, Attunity’s Cloudbeam, and Wandisco’s Fusion. The market for these tools is about US$653 million in 2016, highly fragmented, and transforming quickly, as tools that can support high-end needs capture more share from those offering more traditional capabilities.
Figure 2: IoT Edge Processing Costs Are Highly Sensitive to Data Movement Costs
Data Movement Technology Directions: Compact, Consistent, and Cost-Effective
Dynamic data movement technologies will co-evolve with advances in digital business platform and application development technologies. Both domains will accelerate the need for dynamic data movement that is more compact, provides greater data consistency across instances, and is keenly focused on optimizing data movement costs.
The digital business platform (DBP) catalyzes especially interesting requirements (see Figure 3). Digital businesses will operate differently from traditional businesses, being more dependent on high quality, high volume data flows among operating elements. At the center of any digital business platform are services that can handle data feedback loops among these operating elements. While all these loops won’t require high-performance dynamic data movers, the end-to-end nature of applications that combine operational, engagement, and activation data will mean, for example, that customer experience can be compromised by the least quality service in a data feedback loop. These loops will be a major focus on technology, application, and business model innovation over the next few years, and will be a major driver of high performance data movers.
The second catalyst in the data movement domain is new application development tools and methods. While any technology can be used to build poorly organized and performing applications, container tooling is especially well suited to supporting the Agile development methods so crucial to today’s enterprise applications and digital services. Using containers, applications can be constructed as microservices, each operating independently and capable of being located anywhere within a pool of infrastructure resources (subject to the constraints of physics). The advantage of this approach is that application size and lifecycles can better map to function and value. Increasingly, these services will be woven into complex application webs and the performance and scalability of any application web can be highly sensitive to the performance and consistency of data movement among services. Again, as application development tools generate more granular elements, the requirement for more compact, consistent, and cost-effective dynamic data movers increases.
Figure 3: Data Feedback Loops Are the Foundation for Digital Business Platforms