Monday, 27 February 2017

Rule-Based Fault Management for Environmental Monitoring IoT system

Final report


Description of the project


Thidea of the project was to build a fault management solution for distributed IoT systems. Fault managment, along with security, is among the most important management features of IoT networks. The solution had to demonstrate three main fault management functions:
  • detect faults in IoT system from symptons (events and messages, containing raw data about faults);
  • isolate and diagnose the causes of faults;
  • apply the fault recovery procedures.
The rule-based reasoning technic had to be used for the implementation of these functions.
The environmental monitoring IoT system was selected as an example of the diagnosable system.


The current status


The following is already done:
  • the hardware part of environmental monitoring system created;
  • the sensor node software fully implemented;
  • the Raspberry Pi based field IoT gateway configured to work as the Wireless Access Point;
  • Mosquitto MQTT broker installed and configured on the IoT gateway component;
  • the IoT gateway software partially implemented (this work is in progress, see GitHub repo with the sources);
  • one DigitalOcean 1Gb CentOS 7 droplet created;
  • Docker and Docker Compose installed on the droplet;
  • Docker daemon secured with TLS;
  • Eclipse Hono v0.5-M3 deployed on the droplet and tested.
Well, there is still much to be done :)


Changes in the components of the solution


Current changes are related to the implementation of the sensor node and field gateway components:



The sensor node component's ESP-12 modules replaced with SparkFun ESP8266 Things. The Raspberry Pi based IoT gateway now configured as the WAP for sensor nodes and connected to the router using LAN connection.
I initially considered Kura for the IoT gateway, but after some research think it's a bit overcomplicated for the case of the project. I needed to implement bidirectional MQTT - AMQP bridge to connect Mosquiotto broker, which running locally on Raspberry PI and Eclipse Hono AMQP telemetry endpoint, running on the DigitalOcean droplet. And as I can see there is no direct way to implement it with Kura. I would have to develop a bundle from scratch to support request-response scenario for the local Mosquitto. Also Kura's CloudService is EDC specific and it seems like it doesn't fit Hono API. So I decided to implement custom IoT micro-gateway using Apache Camel and run it as Linux service on IoT gateway component. In this case it will be possible to connect to Hono using Hono API through AMQP protocol.


Lessons learned


As it is seen now, the declared scope of the project turned out to be too broad for the stated time frame. And, unfortunately, working full-time, it was not always possible to fully engage in the Challenge.

Saturday, 18 February 2017

Rule-Based Fault Management for Environmental Monitoring IoT system

Sensor node software


In this post I'm showing the software part of the sensors. The code is implemented with Arduino IDE. Libs used:
  • ESP - provides ESP8266 specific functions;
  • Wire - provides I2C protocol support;
  • ESP8266WiFi - WiFi related functions;
  • EEPROM - allows to work with persistent data storadge;
  • PubSubClient - a client library for MQTT support.
The software is devided into two parts: a) BME280 I2C driver lib and b) Arduino sketch with the sensor functionality. Next I'm describing the implementation of each part.

BME280 driver


The BME280 sensor driver is implemented as Arduino library. The BME280 datasheet was the main reference for the implementation. The library functional spec is following:
  • I2C interface support only;
  • Forced mode support only; 
  • allows to dynamically set the BME280 I2C address;
  • allows to dynamically set SDA and SCL pins fot I2C interface;
  • oversampling isn't used (acceptable for the environmental measurements);
  • IIR filter isn't used (also acceptable for the environmental measurements);
The public API of the library includes two constructors and three methods:
The UML diagram depicts the library usage workflow:

The diagram is self-explanatory. One thing can be mentioned here that according to BME280 datasheet (see 3.3.3. Forced mode) the Forced mode has to be selected again for each subsequent measurement cycle. For this purpose the last statement of readAll() method sets the sensor mode on each method call (5.4.5 Register 0xF4 "ctrl_meas"): 
The BME280 ADC output values for temperature, pressure and humidity are compensated using formulas from the datasheet (4.2.3 Compensation formulas).
You can find sources for the BME280 driver in the GitHub repo.


Arduino sketch



The project's sensor node software is implemented as an Arduino sketch. The main features that was implemented in the sketch:
  • wireless WiFi communication channel with the field IoT gateway;
  • MQTT client, provides connectivity with MQTT broker which runs on a field IoT gateway;
  • bidirectional communication with BME280 sensor;
  • the use of a persistent storage for the current state of the sensor node during restarts;
  • three modes of  operation:
    • active;
    • power save;
    • suspended.
  • one-way message exchange pattern implementation over MQTT for environmental telemetry data messages;
  • request-reply message exchange pattern implementation over MQTT for control messages, sended from the field gateway to the sensor node.

Startup

The UML sequence diagram depicts the startup process after powering the sensor node:

The programm starts with reading the MAC address as it is used as a MQTT client name and included in MQTT topic tree structure. Then BME280 is initialized using I2C protocol. Then the EEPROM memory allocated - 5 bytes total (4 bytes for the sleep period value in seconds + 1 byte for the current mode id). After that WiFi connection is established with a WiFi access point, based on the Raspberry Pi. The same Raspberry PI is running local Mosquitto MQTT broker instance and the field IoT gateway software. Next steps are for MQTT  PubSub client callback function setup, MQTT topic names initialization and the sensor node current operating mode processing.
The following topics are configured by the sensor node:


Topic name templateTopic name exampleTopic typePurpose
env/<MAC_address>/statusenv/5ccf7f2f1d04/statusPublishSensor node state messages.
Possible values: 'off', 'sleeping', 'on'
env/<MAC_address>/temperatureenv/5ccf7f2f1d04/temperaturePublishTelemetry data messages: temperature in DegC
env/<MAC_address>/pressureenv/5ccf7f2f1d04/pressurePublishTelemetry data messages: atmospheric pressure in hPa
env/<MAC_address>/humidityenv/5ccf7f2f1d04/humidityPublishTelemetry data messages: humidity in %RH
env/<MAC_address>/requestenv/5ccf7f2f1d04/requestSubscribeCommand messages: request
env/<MAC_address>/requestenv/5ccf7f2f1d04/replyPublishCommand
messages: reply

The preprocessMode() function implements a conditional workflow, that depends on the current sensor node mode. Three operational modes are supported:

According to SparkFun's ESP8266 Thing Hookup Guide XPD pin has to be connected to DTR pin to enable the sleep capability.

The UML activity diagram represents the details of the workflow:

Two interesting points here:
1. As the Arduino Client for MQTT only supports Clean Sessions (see for example this note), the command messages that are addressed to the sensor node, can only be sent while the node is connected to the local MQTT broker, i.e. the node never receives the command message if it was sent while the node is disconnected from the broker in the deep sleep state. So as a workaround, before sending the sensor node to deep sleep in PWR_SAVE or SUSPENDED mode, the node waits for an incoming command message at a fixed interval of 5 seconds and then runs PubSub client's loop() method to process incomming messages.
2. Here is the strange thing: I've never managed to successfully process incoming request message if the loop() is called once. In this case the callback function is never called. I was able to fix it by inserting the second loop() call:


First integration test

I ran integration tests using the following setup:





I created several gifs to visualize the tests.
This gif reflects the following scenario:
1. The sensor node is configured to connect to the Mosquitto MQTT broker which runs on a field IoT gateway, based on Raspbery PI;
2. mqtt-spy utility is configured to connect to the same MQTT broker;
3. Four MQTT subscribtions are created to the topics in mqtt-spy:
    env/5ccf7f2f1dc8/status
  env/5ccf7f2f1dc8/temperature
  env/5ccf7f2f1dc8/humidity
  env/5ccf7f2f1dc8/pressure
4. The sensor node run in PWR_SAVE mode and initially is in the deep sleep state;
5. The payload of a message in the env/5ccf7f2f1dc8/status topic contains 'off' string value which corresponds the payload of a retain message of the sensor node;
6. When the power save period ends, the sensor node establishes wireless MQTT connection with the broker;
7. The node publishes new message to env/5ccf7f2f1dc8/status with a payload, containing 'on' string indicating that the status of the node is changed.
8. The node reads new environmental data from BME280 and publishes three new messages to env/5ccf7f2f1dc8/temperature, env/5ccf7f2f1dc8/humidity and env/5ccf7f2f1dc8/pressure topics (see Preprocess Mode Activity Diagram).
9. The node publishes new message to env/5ccf7f2f1dc8/status with a payload, containing 'sleeping' string indicating that the status of the node is changed.
10. The node is sent to the deep sleep state.
11. The broker disconnects the network connection as it doesn't receive any packets from the sensor node within one and a half times the Keep Alive time period.
12. The broker publishes the last will message of the node to env/5ccf7f2f1dc8/status with a payload, containing 'off' string.


Commands and command messages

The sensor node supports request-reply message exchange for command messages. The current implementation supports three commands:
  • getsensid - returns the BME280 sensor identifier in hex string format;
  • getbattery - returns the supply voltage (VCC) value;
  • setmode - sets the operational mode of the node.
For a command messages very lightweight application level protocol was created. The payload of the command message is formatted as a CSV-string.

Request command message format:
<msg_id>,<command_name>,<param 1>,<param 2>,...,<param n>
where
  msg_id - unique identifier of the message (required),
  command_name - predefined command name (required),
  param 1, param 2, param n - the command input parameters (optional). 

Reply command message format:
<correl_id>,<status>,<payload>
where
  correl_id - correlation identifier, must match the msg_id  value of the corresponding request message (required),
  status - command execution status. Only one possible value 200 is supported in the current implementation (required),
  payload - command output payload (optional). 

 Examples:

  Request command message payload: 0001,getbattery
  Reply command message payload: 0001,200,3153

  Request command message payload: 0002,setmode,1
  Reply command message payload: 0001,200

Command messages are handled by the message callback function of MQTT PubSub client. The callback function dispatches the command from the message to the corresponding command handler function:



You can find sources of the sketch in the GitHub repo.