The IoT Challenges

From vehicles and buildings, to factory floors to power plugs, everything around us is increasingly controlled by sensors. The number of data collection points, edge devices, is huge and is growing fast. In the recent past, we dealt with dozens of connected devices: factory floor controllers, face recognition or fingerprint scanners, TV transponders, etc. In today's applications, the count of connected devices is often in the thousands when considering how many intelligent data collection points ("intelligent connected devices”) are at play!

The volume of data collected at the edge, data from various streaming sources (modern transponder frequencies are higher and sensors have higher precision than before), generated by algorithms, etc., passed to edge devices, is increasingly high and reflects the increasing volume of real world data. And the number of IoT devices can change rapidly even in single setup: new sensors are brought on-line often by hundreds at a time, instruments and control mechanisms get replaced overnight, gateways are added or removed, etc. While all of this happens, the IoT environment must remain operational. This has promoted a tendency to extend the scalability of the IoT environment and its data management by moving data processing from the backend down to the edge device.

From the standpoint of database management, that means several things:

By nature, “embedded“ edge devices are never going to have enough memory and CPU power to be able to process much data on their own - at least not while fulfilling the devices’ main purpose. The purpose of brakes on the car is to stop the car, not to analyze braking patterns to improve overall efficiency of the brakes. Edge device analytics capabilities are always going to be constrained. Thus, modern edge devices are truly connected; the processing of data is often going take place outside the device itself.

Connectivity

Edge nodes' physical connectivity can be intermittent and constrained. Some popular protocols have limited bandwidth (ZigBee, NFC and RFID, LPWAN, Low Energy Bluetooth, etc.) and edge device connectivity can be intermittent because of the physical device locations (e.g. a mobile device) or because the device is battery operated and connects at certain predefined times. From the standpoint of data management, the unpredictable connectivity means that fully integrated support for “push" and “pull" protocols is important to preserve data collected at the edge.

Another connectivity issue is the fact that IoT devices are, by definition, connected to the outside world through the Internet. That means that their data is often accessible via Web client applications written in Java, Python and script languages and through other Web-related technologies; these applications should be easy to deploy and integrate with browsers and other web applications. The common technique is to implement access to IoT device databases through Web Services that provide end-point URLs and expose methods to access the device and its data through standard networks.

As mentioned previously, the IoT generates large volumes of data (“Big Data”). The overall purpose of collecting data on the edge has made a shift from purely device control and monitoring, to improving various service capabilities through real-time analysis. Artificial intelligence and machine learning is becoming the core of many every day services (e.g. using your VISA card at the ATM in a foreign country without notifying the bank first). This, of course, requires adding analytical capabilities to node containers. However, mostly due to the reasons discussed above, adding sophisticated storage and analytical software toolsets is restricted and, instead, usually exist at the backend. Given the size of data being collected and analyzed, for a database management system in the cloud, specific methodologies geared towards increasing modern CPU and storage media utilization, such as vertical storage, vector processing, pipelining, sophisticated specialized indexing and improved SQL engine capabilities, have become a necessity. A greatly increased emphasis is put on the ability to process data in parallel, utilizing server’s multi-core capabilities, by employing scalable data processing algorithms, the ability to execute complex queries in parallel and utilizing various lockless data structures and algorithms.

Data produced by various IoT edge devices is quite often unstructured, i.e. it does not reside in traditional relational databases. Sensors and instruments communicate their measurements, multimedia content, and other readings between each other and backend servers that analyze the content and make real-time decisions based on that analysis .The underlying storage should be able to support data presented in non-relational formats, like JSON, and optimize the storage for complex near real-time analytics.

Security

IoT devices span all aspects of modern life; industrial, automotive, medical, and commercial applications affect us all. We will not pretend to address IoT security in general though - the topic is far too broad. Nor will we pretend to be security experts, either. Our expertise, for the purpose of this discussion, is from the standpoint of data management. While not invincible, the existing Internet SSL/TLS technologies do a good job of protecting communications channels between edge nodes and back-end servers. In addition, to secure communications and channel authentication, the integrity of data must be provided by the database management system, an important component of which is data encryption. Data encryption on the edge node is, in a way, more important than server-side encryption. Backend database servers normally have available a full network security apparatus, run behind sophisticated firewalls. There, security and authentication are overseen by security personnel, etc. In contrast, edge nodes' storage containers are "out in the open", often running unattended; more often than not they have very limited resources and don’t have well-established firewall-like security mechanisms. Yet, tampering with nodes’ data containers has already proven to be harmful (in recent years we have seen cases of spoofing the identity of insecure network nodes). In the end, you don’t really care why the lights went off, or why your in-car GPS led you to a lake. You are wet, regardless of whether it was the GPS transmitter providing incorrect data, or your car’s receiver was hacked!

Edge Device Toolsets and eXtremeDB Active Replication Fabric Capabilities

An observation, on a more social level, is that, in the past, embedded systems programming was like magic (or hard-core, depending on the viewpoint): embedded systems developers had to know the C language (or God forbid assembler!), understand how to use registers, interrupts and other low-level stuff and to be able to use special debugging techniques, etc. Nowadays, development has to be much simpler; the time-to-market is often shorter because of increased competition. It is true that some evaluations still take ages, but even there, increasingly, the decisions to use or not to use a tool takes days. One popular example is a small hedge fund development group. They make a decision based on a demo thrown together in a few days. Computer knowledge, programming techniques, patience, and perseverance are not necessarily the virtues of modern developers. This means that the toolset that we provide must include simple and easy to understand alternatives to programming in low-level languages like C/C++ or even Java. By this we mean that providing tools like SQL, scripting languages, supporting third-party BI tools, GUI-based access tools, and online documentation with extensive search capabilities and ready to use copy-and-paste type samples is a necessity.

For these reasons, eXtremeDB Active Replication Fabric offers: