Formerly known as Wikibon

Data on Disk is Dead Data – How Flash is Changing Systems Design Forever

A new wave of applications is sweeping across global business, moving IT from the back office to the front lines of competitive advantage and changing how business is conducted forever. The key to these new applications is big data, whether that is data generated internally about what customers want and don’t want from the companies they trade with or externally from public social networks and other Cloud-based sources.

But traditional disk storage is proving inadequate to power these new applications. What they need is an order-of-magnitude increase in data read/write speeds to avoid the increasingly long latency times caused by the slow speed of disk access.

Fortunately the cost of flash storage is dropping steadily, driven by the growing volume of flash used in consumer products. And its overall business value when used for transaction data already far exceeds disk. As a result, Wikibon believes that the data center storage architecture will move to effectively a two-tier physical structure, with active data, including non-structured data types, on solid state — in-memory flash or solid-state drives — and inactive data on SATA disk, which will maintain a per-Gbyte cost advantage. This newsletter, which was created out of the December 13 Peer Incite meeting on the impact of flash storage, explores some of the most important issues and implications of this generational change in storage architecture.

The Potential of Flash to Disrupt Whole Industries

Introduction

Mechanical Disk drives have imprisoned data for the past 50 years. Technology has doubled the size of the prisons every 18 months for the past 50 years, and will continue to do so. The way in and out of prison is gated by the speed of the mechanical arms on a mechanically rotating disk drive. These prison guardians are almost as slow today as 50 years ago. The chances of remission for data are slim; there is little opportunity for data to be useful again. Data goes to disk to die.

Transactional Systems Limitations

Transactional systems have driven most of the enterprise productivity gains from IT. Bread-and butter applications such as ERP have improved productivity by integrating business processes. Call centers and web applications have taken these applications direct to enterprise or government customers. The promise of transactional systems is to manage the “state” of an organization, and integrate the processes with a single consistent view across the organization.

The promise of systems integration has not been realized. Because transactional application change “state”, this data must be written to the only suitable non-volatile medium, the disk drive. The amount of data that can be read and written in transactional systems is constrained by the elapsed time of access to disk (milliseconds) and the bandwidth to disk. The fundamental architecture of transactional systems has not changed for half a century. The number of database calls per transaction has hardly changed, and limits the scope of transactional systems. Transactional systems have to be loosely coupled, and be part of an asynchronous data flow from on system to another. The result is “system sprawl”, and multiple versions of the actual state of an enterprise.

Read-heavy Application Limitations

Enterprise data warehouses were the first attempts to improve the process of extracting value from data, but a happy data warehouse manager is as rare as two dollar bills. The major constraint is bandwidth to the data. Big data applications are helping free some data by using large amounts of parallelism to extract data more efficiently in batch mode. Data in memory systems can keep small data marts (derived from data warehouses or big data applications) in memory and radically improve the ability to analyze data.

Overall, the promise of data warehousing has been constrained by access to data. The amount and percentage of data outside the data warehouses and imprisoned on disk is growing. Enterprise data warehouses are better named data prisons.

Social & Search Breakthroughs

Social and search are the fist disruptive applications of the twenty-first century. When “disk” is googled, the search bar shows “109,000,000 results (0.17 seconds)”, impossible to achieve if the data was on disk. The Google architecture includes extensive indexing and caching of data to enable access to disk to be avoided where possible for these read-heavy data applications, data without “state”.

Social applications turn out to be a mixture of state and stateless components. All of them started implementing scale-out architectures using commodity hardware and homegrown or open-source software. The largest reached a size where the database portions of the infrastructure were at a limit, where locking rates had maxed out at the limits of hard disk technology.

As examples, Facebook and Apple have used flash within the server (using mainly Fusion-io hardware and software) extensively for the database portions of the infrastructure to enable the scale-out growth of services. In a similar way to Google, they have focused on extensive use of caching and indexing.

The end objective for both Google and Facebook is to ensure quality of end-user experience. Both have implemented continuous improvement of response time, with the objective of shaving milliseconds of response times and assuring consistency of data. Productivity of the end user ensures user retention and more clicks, and more revenue.

The Potential Impact on Health Care

 

Looking out to mid-decade, it is interesting to think through what could happen to the health care market. Assume for the moment that the technology and trust issues have been addressed, and in 2015 there exists a robust technology that will answer in real time spoken queries about health care issues. Two key questions:

  • Where and how could this technology be applied?
    • Direct use by Doctors in medical facility
    • Service Provided by Insurance Company
    • Service Provided Direct to Consumer
  • What are the Potential Savings?
    • Reduction in Costs might include reduced time of doctors per patient, fewer tests, lower risk of malpractice suits, avoidance of treatmeents
    • Improvement in Outcome might include additional revenue from new patients and improved negotiation with health insurance companies/government
    • Perceived Quality of Care or customer satisfaction might include improved Yelp scores, improved customer retention, additional revenue from new patients/health insurance companies.
    • Risk of Adverse Publicity negative impact on revenue, brand and customer satisfaction from misuse, faults found with technology, etc.
Table 1: Analysis of potential impact of Data-in-Memory Systems on US Health Care

The total health care spend in the US is estimated to be 16% of GDP, about $2.5 trillion per year. Assuming that this technology can address 40% of that spend ($1.0 trillion) that is attributable to doctor initiated spend, the top half of Table 1 attempts to look at difference in impact between the different deployment models. The bottom half of Table 1 takes the two cost cells, and based on some simple assumptions, attempts to ball-park the potential yearly impact.

The table indicate some interesting potential impact on the heath care industry. Firstly, the savings might be 10 times higher if this service could be delivered direct to the consumer. If we assume some contribution from the cells not assessed, the potential benefit might be $100 Billion/year, or $1 trillion over 10 years. And health care practitioners indicate that increasing consumer use would be easier than increasing doctors use.

Nobody is going to build a factory based on this analysis – but it show that there is the potential of broad-scale implementation of systems that would rely on all the data being held in memory to meet a 3 second response time. Flash would play a pivotal role in enabling cost-effective deployment.

The players that are able to develop and deploy these technologies will have a major impact of health care spend in general, and a major potential to create long-lasting direct and indirect revenue streams.

Action item: CEOs and CIOs must take the best and brightest to identify integrated high-performance applications that could disrupt their industry increasing real-time access to large amounts of data. They should then be working proactively with their current or new application suppliers on how to such systems could be designed. Significant resources will need to be applied, based on the assumption that the first two attempts at defining any “killer” application will be off target.

 

Footnotes: *Trust issues include openness about the sources and funding behind information within the technology, trust in the brand of the suppliers and deployers of the technology, trust in the training to use the technology, trust in the security and confidentiality arrangements, trust that the system can be updated rapidly in the light of new information, trust in the reliability and validity of the outputs of the technology.

You may also be interested in

Book A Briefing

Fill out the form , and our team will be in touch shortly.

Skip to content