In between meeting with customers, crowdchatting with our communities and hosting theCUBE, the research team at Wikibon, owned by the same company as SiliconANGLE, finds time to meet and discuss trends and topics regarding digital business transformation and technology markets. We look at topics from the standpoints of business, the Internet of Things, big data, application, cloud and infrastructure modernization. We use the results of our research meetings to explore new research topics, further current research projects and share insights. This is the seventh summary of findings from these regular meetings, which we plan to publish every week.
This week on the Wikibon Friday research meeting, we discussed a key challenge to building systems that combine edge and cloud computing: addressing data differences. Wikibon believes a new data taxonomy will inform hybrid cloud and edge computing system design and related business outcomes.
The digital business systems on the drawing table today are increasingly complex, highly distributed, and decentralized. Why? In part, because businesses are trying to exploit edge computing through IoT and mobile engagement applications.
Consequently, Wikibon believes that businesses must distinguish between three “data shapes”: primary data, secondary data, and tertiary data. Wikibon’s Neil Raden explains:
- Primary data. Edge devices typically create “primary data”: raw data, in the form of measurements or signals, that originates from edge sensors. Primary data puts the “thing” in the “internet of things.” However, most primary data will not migrate to the internet, but rather will stay proximate to the thing being measured and controlled. Primary data can be high volume and low latency, a difficult engineering combination. Because of this, a lot of processing will be moved to primary data (e.g., functional or serverless computing), and a lot of actuators will operate without referencing other locations within the cloud.
- Secondary data. Feeds from work on primary data will converge to enact more complex events. Secondary data will also involve low latency, but likely will be dramatically reduced in volume. For example, a video camera handling facial recognition at a turnstile will generate enormous data, but will process that data against models in the camera itself, distributing perhaps only a millionth of the data it generates as secondary data: the event decision and any mandated lineage data. Secondary data will tend to persist and will be a crucial input for model training and deployment.
- Tertiary data. Integration, modelling, and management data we call tertiary data. This is the data that will tend to persist in the cloud, either in a public or true private cloud setting.
Mapping Data Shapes to Device Classes
We expect to see a lot of new combinations of edge-related devices deployed to work with primary data. These combinations will emanate from operational technology (OT) companies that have been central to SCADA technology for decades, but it also will include new classes of technology vendor that invent new ways to apply IT to real-time, primary data applications. However, a keen understanding of the problems being solved will be vastly more important than the technology being deployed. Wikibon’s David Floyer explains:
The IT industry will aggressively target the edge (e.g., Dell EMC’s recent IQT initiative). However, our expectation is that IT-related technologies and standards will diffuse into primary data applications faster that IT vendors can grab share. Generally, we expect that IT vendors will be successful down to gateways that handle secondary data. Some of these gateways, however, will be significant, depending on data volume, data lineage, processing complexity, and importantly backup/restore requirements.
Tertiary data will end up in the cloud, either in public or true private cloud options. The cloud systems processing tertiary data will focus on modeling, management, overall data administration, and integration with other data and applications. While tertiary data will feature the least stringent latency constraints, data movement, data persistence, and mixed workload costs will be central considerations.
The complexity of highly distributed, highly decentralized edge-related systems necessitates significant collaboration across multiple parties, including OT, IT, the business, and an array of suppliers and partners. Users that take a procurement approach to these systems will encounter significant challenges building and operating edge-related systems. Instead, a strategic vendor management approach will be essential. Wikibon’s George Gilbert explains:
Business First, Technology Second
While the technology issues related to cost effectively handling different classes of data are not trivial, the business issues remain a primary concern. Issues such as who owns the intellectual property that’s being created, what level in that hierarchy is central to the business’s operation and differentiation, and how will liability and legal concerns be addressed are all essential to edge-related system design. All firms will in one form or another be impacted by the emergence of the edge as a dominate design as consideration for their infrastructure but also for their business. Using this simple data classification scheme can help professionals better understand the working relationships between OT, IT, business, and – crucially – partners.
Action Item. Businesses must begin incorporating edge opportunities into their strategic planning. The tie between strategy and operation will be data characteristics. However, the central concern must be can we use the edge to create a new business opportunity, either to improve our engagement or our operations. Neil Raden explains: