Ingesting enriched IoT knowledge into Amazon S3 utilizing Amazon Kinesis Knowledge Firehose

lohitnath.453

May 31, 2023

Ingesting enriched IoT knowledge into Amazon S3 utilizing Amazon Kinesis Knowledge Firehose

[ad_1]

Introduction

When sending knowledge from Web of Issues (IoT) units to an information lake, chances are you’ll want to counterpoint the system knowledge payload with extra metadata within the cloud for additional knowledge processing and visualization. There are a number of causes this knowledge won’t exist within the system payload, equivalent to minimizing the system payload in restricted bandwidth environments or modifying it with enterprise inputs within the cloud. For instance, a machine on the manufacturing unit flooring may be assigned to completely different operators through the day. This variable enterprise knowledge could be saved in a database. In your knowledge lake, you may want this data to be saved together with the payload.

On this weblog publish, you’ll discover ways to ingest enriched IoT knowledge to an information lake in close to real-time.

Conditions

An AWS account
AWS Command Line Interface (AWS CLI). See AWS CLI fast setup for configuration.

Use case definition

Let’s assume that in your logistics firm, you have got containers geared up with sensor-enabled IoT units. When the container is loaded right into a ship, the container ID is related to the ship ID. It’s worthwhile to retailer the IoT system payload with the ship ID in your knowledge lake.

In such a use case, the sensor payload comes from the IoT system connected to the container. Nevertheless, the related ship ID is simply saved within the metadata retailer. Subsequently, the payload should be enriched with the ship ID earlier than placing it into the information lake.

Answer structure

Architecture diagram for ingesting enriched IoT data into Amazon S3 by using Amazon Kinesis Data Firehose

Within the structure diagram,

The IoT units stream payloads to the AWS IoT Core message dealer to a selected MQTT subject system/knowledge/DEVICE_ID. The AWS IoT Core message dealer permits units to publish and subscribe to messages through the use of supported protocols.
The AWS IoT rule is triggered when there’s a payload in its subject. It’s configured with an Amazon Kinesis Knowledge Firehose motion on this use case. You should utilize AWS IoT guidelines to work together with AWS companies by calling them when there’s a message in a selected MQTT subject or instantly through the use of Primary Ingest characteristic.
Amazon Kinesis Knowledge Firehose buffers the system payloads earlier than delivering them to the information retailer primarily based on the dimensions or the time, whichever occurs first. Kinesis Knowledge Firehose delivers real-time streaming knowledge to locations for storing or processing.
As soon as the buffer hits the dimensions or the time threshold, Kinesis Knowledge Firehose calls an AWS Lambda perform to counterpoint the system payloads in batches with the metadata retrieved from an Amazon DynamoDB AWS Lambda is a serverless compute service that runs your code for any sort of utility. Amazon DynamoDB is a totally managed NoSQL database that gives quick efficiency.
The enriched payloads are returned again to Kinesis Knowledge Firehose to ship to the vacation spot.
The enriched payloads are put into an Amazon Easy Storage Service (Amazon S3) bucket as a vacation spot. Amazon S3 is an object storage service which shops any quantity of knowledge for a spread of use instances.

AWS CloudFormation template

Obtain the AWS Cloudformation template from the code repository.

The AWS CloudFormation template deploys all the mandatory sources to run this instance use case. Let’s have a more in-depth have a look at AWS IoT guidelines, Kinesis Knowledge Firehose, and AWS Lambda perform sources.

AWS IoT guidelines useful resource

IoTToFirehoseRule:
  Sort: AWS::IoT::TopicRule
  Properties:
    TopicRulePayload:
      Actions:
        -
          Firehose:
            RoleArn: !GetAtt IoTFirehosePutRecordRole.Arn
            DeliveryStreamName: !Ref FirehoseDeliveryStream
            Separator: "n"
      AwsIotSqlVersion: ‘2016-03-23’
      Description: This rule logs IoT payloads to S3 Bucket by aggregating in Kinesis Firehose.
      RuleDisabled: false
      Sql: !Ref IotKinesisRuleSQL

The AWS IoT rule takes a SQL parameter which defines the IoT subject to set off the rule and knowledge to extract from the payload.

Within the instance, the SQL parameter is ready to SELECT *, subject(3) as containerId FROM ‘system/knowledge/+’ by default. SELECT * means the entire payload is taken as it’s and containerId is generated from the second merchandise within the MQTT subject and included to the payload.
FROM ‘system/knowledge/+’ describes the IoT subject that can set off the AWS IoT rule. + is a wildcard character for MQTT subjects and the IoT units will publish knowledge payloads to system/knowledge/DEVICE_ID subject to set off this rule.

The AWS IoT rule additionally defines actions. Within the instance, you may see a Kinesis Knowledge Firehose motion which defines the goal Kinesis Knowledge Firehose supply stream and the AWS Id and Entry Administration (IAM) position wanted to place information into this supply stream. A separator will be chosen to separate every file, within the given instance it’s a new line character.

Kinesis Knowledge Firehose supply stream useful resource

FirehoseDeliveryStream:
  Sort: AWS::KinesisFirehose::DeliveryStream
  Properties:
    ExtendedS3DestinationConfiguration:
      BucketARN: !GetAtt IoTLogBucket.Arn
      BufferingHints:
        IntervalInSeconds: 60
        SizeInMBs: 1
      Prefix: device-data/
      RoleARN: !GetAtt FirehosePutS3Role.Arn
      ProcessingConfiguration:
        Enabled: true
        Processors:
          - Sort: Lambda
             Parameters:
               - ParameterName: LambdaArn
                  ParameterValue: !Sub '${FirehoseTransformLambda.Arn}:$LATEST'
               - ParameterName: RoleArn
                  ParameterValue: !GetAtt FirehoseLambdaInvokeRole.Arn

Kinesis Knowledge Firehose supply stream should outline a vacation spot to place the stream into. It helps several types of locations. You could find the out there vacation spot sorts and their utilization on this documentation. On this instance, you’re going to use Amazon S3 because the vacation spot.

The instance Supply Stream useful resource defines the next properties:

BucketARN: the vacation spot bucket which is able to retailer the aggregated knowledge. The vacation spot bucket is created by the CloudFormation stack.
BufferingHints: the dimensions and time threshold for knowledge buffering. On this instance, they’re set to 1MB and 60 seconds respectively to see the outcomes quicker. It may be adjusted in response to the enterprise wants. Protecting these thresholds low will trigger the Lambda perform to be invoked extra incessantly. If the thresholds are excessive, the information will likely be ingested to the information retailer much less incessantly, subsequently, it is going to take time to see the most recent knowledge within the knowledge retailer.
Prefix: the created objects will likely be put beneath this prefix. Kinesis Knowledge Firehose partitions the information primarily based on the timestamp by default. On this instance, the objects will likely be put beneath the device-data/YYYY/MM/dd/HH folder. Kinesis Knowledge Firehose has superior options for knowledge partitioning equivalent to dynamic partitioning. The partitioning of the information is necessary when querying the information lake. For instance, if you must question the information per system foundation through the use of Amazon Athena, scanning solely the partition of the related system ID will considerably cut back the scan time and the price. You could find particulars on partitioning on this documentation.
RoleARN: that is the IAM position that offers PutObject permission to Kinesis Knowledge Firehose to have the ability to put aggregated knowledge into the Amazon S3 bucket.
ProcessingConfiguration: As described within the use case, a rework Lambda perform will enrich the IoT knowledge with the metadata. Processing Configuration defines the processor which is a Lambda perform within the instance. For every batch of knowledge, Kinesis Knowledge Firehose will name this Lambda perform for the transformation of the information. You’ll be able to learn extra about knowledge processing on this documentation.

Transformation Lambda Operate

As you may see within the following Python code, Kinesis Knowledge Firehose returns a batch of information the place every file is a payload from the IoT units. First, the base64 encoded payload knowledge is decoded. Then, the corresponding ship ID comes from the DynamoDB desk primarily based on the container ID. The payload is enriched with the ship ID and encoded again to base64. Lastly, the file listing is returned again to Kinesis Knowledge Firehose.

As soon as Kinesis Knowledge Firehose receives the information, it places them as an aggregated file into the Amazon S3 bucket.

import os
import boto3
import json
import base64

dynamodb = boto3.useful resource('dynamodb')
desk = dynamodb.Desk(os.environ['METADATA_TABLE'])
information = []

def function_handler(occasion, context):
  for file in occasion["records"]:
    # Get knowledge subject of the file in json format. It's a base64 encoded string.
    json_data = json.masses(base64.b64decode(file["data"]))
    container_id = json_data["containerId"]

    # Get corresponding shipId from the DynamoDB desk
    res = desk.get_item(Key={'containerId': container_id})
    ddb_item = res["Item"]
    ship_id = ddb_item["shipId"]

    # Append shipId to the precise file knowledge
    enriched_data = json_data
    enriched_data["shipId"] = ship_id

    # Encode the enriched file to base64
    json_string = json.dumps(enriched_data).encode("ascii")
    b64_encoded_data = base64.b64encode(json_string).decode("ascii")

    # Create a file with enriched knowledge and return again to Firehose
    rec = {'recordId': file["recordId"], 'consequence': 'Okay', 'knowledge': b64_encoded_data}
    information.append(rec)
  return {'information': information}

Deployment

Run the next command in a terminal to deploy the stack.

aws cloudformation deploy --stack-name IoTKinesisDataPath --template-file IoTKinesisDataPath.yml --parameter-overrides IotKinesisRuleSQL="SELECT *, subject(3) as containerId FROM 'system/knowledge/+'" --capabilities CAPABILITY_NAMED_IAM

After the deployment is full, run the next command in a terminal to see the output of the deployment.

aws cloudformation describe-stacks --stack-name IoTKinesisDataPath

Observe the IoTLogS3BucketName, MetadataTableName output parameters.

Testing

After the deployment is full, very first thing you must do is to create a metadata merchandise for knowledge enrichment. Run the next command to create an merchandise within the DynamoDB desk. It’s going to create an merchandise with cont1 as containerId and ship1 as shipId. Exchange IoTKinesisDataPath-MetadataTable-SAMPLE parameter with the DynamoDB desk output parameter from the CloudFormation stack deployment.

aws dynamodb put-item --table-name IoTKinesisDataPath-MetadataTable-SAMPLE --item '{"containerId":{"S":"cont1"},"shipId":{"S":"ship1"}}'

In a real-life situation, the units publish the payloads to a selected MQTT subject. On this instance, as an alternative of making IoT units, you’ll use AWS CLI to publish payloads to MQTT subjects. Run the next command in a terminal to publish a pattern knowledge payload AWS IoT Core. Take note of the payload subject of the command, the one knowledge supplied by the system is the dynamic knowledge.

aws iot-data publish --topic "system/knowledge/cont1" --payload '{"temperature":20,"humidity":80,"latitude":0,"longitude":0}' --cli-binary-format raw-in-base64-out

Now, navigate to Amazon S3 from the AWS Administration Console and choose the bucket that has been created with the CloudFormation stack. It’s best to see the device-data folder on this bucket. It could take as much as 1 minute for the information to look because of the buffering configuration that’s set for the Firehose supply stream. In the event you navigate into the device-data/YYYY/MM/dd/HH folder, you will notice an object has been created. Go forward and open this file. You will notice the content material of the file is the information payload with enriched shipId subject.

{“temperature”: 20, “humidity”: 80, “latitude”: 0, “longitude”: 0, “containerId”: “cont1”, “shipId”: “ship1”}

Troubleshooting

In case of failure within the system, the next sources will be helpful for analyzing the supply of the issue.

To observe AWS IoT Core Guidelines Engine, you must allow AWS IoT Core logging. It will give detailed details about the occasions occurring in AWS IoT Core.

AWS Lambda will be monitored through the use of Amazon CloudWatch. The instance CloudFormation template has vital permissions to create a log group for the Lambda perform logging.

In case of failure, Kinesis Knowledge Firehose will create a processing-failed folder beneath the device-data prefix within the AWS IoT Guidelines Engine motion, rework Lambda perform or Amazon S3 bucket. The main points of the failure will be learn as json objects. You could find extra data on this documentation.

Clear up

To scrub up the sources which were created, first empty the Amazon S3 bucket. Run the next command by altering the bucket-name parameter with the title of the bucket deployed by the CloudFormation stack. Vital: this command will delete all the information contained in the bucket irreversibly.

aws s3 rm s3://bucket-name --recursive

Then, you may delete the CloudFormation stack by working the next command in a terminal.

aws cloudformation delete-stack --stack-name IoTKinesisDataPath

Conclusion

On this weblog, you have got discovered a standard sample of enriching IoT payloads with metadata and storing affordably in an information lake in close to real-time through the use of AWS IoT Guidelines Engine and Amazon Kinesis Knowledge Firehose supply stream. The proposed resolution and the CloudFormation template can be utilized as a baseline for a scalable IoT knowledge ingestion structure.

You’ll be able to learn additional about AWS IoT Core Guidelines Engine and Amazon Kinesis Knowledge Firehose. Greatest practices for utilizing MQTT subjects within the AWS IoT Guidelines Engine will information you to outline your subject buildings.

Ozan Cihangir

Ozan is a Prototyping Engineer at AWS. He helps prospects to construct revolutionary options for his or her rising know-how initiatives within the cloud. LinkedIn

[ad_2]