Construct streaming knowledge pipelines with Amazon MSK Serverless and IAM authentication

Big Data

Construct streaming knowledge pipelines with Amazon MSK Serverless and IAM authentication

lohitnath.453

September 7, 2023

Construct streaming knowledge pipelines with Amazon MSK Serverless and IAM authentication

[ad_1]

At present, MSK Serverless solely instantly helps IAM for authentication utilizing Java. This instance exhibits find out how to use this mechanism. Moreover, it gives a sample making a proxy that may simply be built-in into options in-built languages apart from Java.

The rising pattern in at present’s tech panorama is the usage of streaming knowledge and event-oriented buildings. They’re being utilized in quite a few methods, together with monitoring web site site visitors, monitoring industrial Web of Issues (IoT) units, analyzing online game participant conduct, and managing knowledge for cutting-edge analytics methods.

Apache Kafka, a top-tier open-source software, is making waves on this area. It’s broadly adopted by quite a few customers for constructing quick and environment friendly knowledge pipelines, analyzing streaming knowledge, merging knowledge from totally different sources, and supporting important functions.

Amazon’s serverless Apache Kafka providing, Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless, is attracting a number of curiosity. It’s appreciated for its user-friendly method, capacity to scale routinely, and cost-saving advantages over different Kafka options. Nonetheless, a hurdle encountered by many customers is the requirement of MSK Serverless to make use of AWS Id and Entry Administration (IAM) entry management. On the time of writing, the Amazon MSK library for IAM is unique to Kafka libraries in Java, making a problem for customers of different programming languages. On this submit, we intention to handle this concern and current how you need to use Amazon API Gateway and AWS Lambda to navigate round this impediment.

SASL/SCRAM authentication vs. IAM authentication

In comparison with the normal authentication strategies like Salted Problem Response Authentication Mechanism (SCRAM), the IAM extension into Apache Kafka by MSK Serverless gives a number of advantages. Earlier than we delve into these, it’s vital to grasp what SASL/SCRAM authentication is. Basically, it’s a conventional methodology used to verify a consumer’s id earlier than giving them entry to a system. This course of requires customers or shoppers to offer a consumer title and password, which the system then cross-checks in opposition to saved credentials (for instance, by way of AWS Secrets and techniques Supervisor) to resolve whether or not or not entry must be granted.

In comparison with this method, IAM simplifies permission administration throughout AWS environments, permits the creation and strict enforcement of detailed permissions and insurance policies, and makes use of momentary credentials moderately than the everyday consumer title and password authentication. One other advantage of utilizing IAM is that you need to use IAM for each authentication and authorization. In the event you use SASL/SCRAM, it’s a must to moreover handle ACLs by way of a separate mechanism. In IAM, you need to use the IAM coverage hooked up to the IAM principal to outline the fine-grained entry management for that IAM principal. All of those enhancements make the IAM integration a extra environment friendly and safe resolution for many use circumstances.

Nonetheless, for functions not in-built Java, using MSK Serverless turns into difficult. The usual SASL/SCRAM authentication isn’t out there, and non-Java Kafka libraries don’t have a method to make use of IAM entry management. This requires another method to hook up with MSK Serverless clusters.

However there’s another sample. With out having to rewrite your present software in Java, you possibly can make use of API Gateway and Lambda as a proxy in entrance of a cluster. They will deal with API requests and relay them to Kafka matters immediately. API Gateway takes in producer requests and channels them to a Lambda perform, written in Java utilizing the Amazon MSK IAM library. It then communicates with the MSK Serverless Kafka matter utilizing IAM entry management. After the cluster receives the message, it may be additional processed inside the MSK Serverless setup.

You may also make the most of Lambda on the patron facet of MSK Serverless matters, bypassing the Java requirement on the patron facet. You are able to do this by setting Amazon MSK as an occasion supply for a Lambda perform. When the Lambda perform is triggered, the info despatched to the perform contains an array of information from the Kafka matter—no want for direct contact with Amazon MSK.

Resolution overview

This instance walks you thru find out how to construct a serverless real-time stream producer software utilizing API Gateway and Lambda.

For testing, this submit features a pattern AWS Cloud Improvement Equipment (AWS CDK) software. This creates a demo setting, together with an MSK Serverless cluster, three Lambda features, and an API Gateway that consumes the messages from the Kafka matter.

The next diagram exhibits the structure of the ensuing software together with its knowledge flows.

The info circulate comprises the next steps:

The infrastructure is outlined in an AWS CDK software. By working this software, a set of AWS CloudFormation templates is created.
AWS CloudFormation creates all infrastructure elements, together with a Lambda perform that runs throughout the deployment course of to create a subject within the MSK Serverless cluster and to retrieve the authentication endpoint wanted for the producer Lambda perform. On destruction of the CloudFormation stack, the identical Lambda perform will get triggered once more to delete the subject from the cluster.
An exterior software calls an API Gateway endpoint.
API Gateway forwards the request to a Lambda perform.
The Lambda perform acts as a Kafka producer and pushes the message to a Kafka matter utilizing IAM authentication.
The Lambda occasion supply mapping mechanism triggers the Lambda shopper perform and forwards the message to it.
The Lambda shopper perform logs the info to Amazon CloudWatch.

Notice that we don’t want to fret about Availability Zones. MSK Serverless routinely replicates the info throughout a number of Availability Zones to make sure excessive availability of the info.

The demo moreover exhibits find out how to use Lambda Powertools for Java to streamline logging and tracing and the IAM authenticator for the straightforward authentication course of outlined within the introduction.

The next sections take you thru the steps to deploy, check, and observe the instance software.

Stipulations

The instance has the next stipulations:

An AWS account. In the event you haven’t signed up, full the next steps:
The next software program put in in your improvement machine, or use an AWS Cloud9 setting, which comes with all necessities preinstalled:
Acceptable AWS credentials for interacting with assets in your AWS account.

Deploy the answer

Full the next steps to deploy the answer:

Clone the mission GitHub repository and alter the listing to subfolder serverless-kafka-iac:

git clone https://github.com/aws-samples/apigateway-lambda-msk-serverless-integration
cd apigateway-lambda-msk-serverless-integration/serverless-kafka-iac

Configure setting variables:

export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query 'Account' --output textual content)
export CDK_DEFAULT_REGION=$(aws configure get area)

Put together the digital Python setting:

python3 -m venv .venv

supply .venv/bin/activate

pip3 set up -r necessities.txt

Bootstrap your account for AWS CDK utilization:

cdk bootstrap aws://$CDK_DEFAULT_ACCOUNT/$CDK_DEFAULT_REGION

Run cdk synth to construct the code and check the necessities (guarantee docker daemon is working in your machine):

Run cdk deploy to deploy the code to your AWS account:

Take a look at the answer

To check the answer, we generate messages for the Kafka matters by sending calls by the API Gateway from our improvement machine or AWS Cloud9 setting. We then go to the CloudWatch console to watch incoming messages within the log information of the Lambda shopper perform.

Open a terminal in your improvement machine to check the API with the Python script supplied underneath /serverless_kafka_iac/test_api.py:

On the Lambda console, open the Lambda perform named ServerlessKafkaConsumer.

On the Monitor tab, select View CloudWatch logs to entry the logs of the Lambda perform.

Select the newest log stream to entry the log information of the final run.

You’ll be able to overview the log entry of the obtained Kafka messages within the log of the Lambda perform.

Hint a request

All elements combine with AWS X-Ray. With AWS X-Ray, you possibly can hint the complete software, which is beneficial to determine bottlenecks when load testing. You may also hint methodology runs on the Java methodology degree.

Lambda Powertools for Java lets you shortcut this course of by including the @Hint annotation to a technique to see traces on the tactic degree in X-Ray.

To hint a request finish to finish, full the next steps:

On the CloudWatch console, select Service map within the navigation pane.
Choose a element to research (for instance, the Lambda perform the place you deployed the Kafka producer).
Select View traces.

Select a single Lambda methodology invocation and examine additional on the Java methodology degree.

Implement a Kafka producer in Lambda

Kafka natively helps Java. To remain open, cloud native, and with out third-party dependencies, the producer is written in that language. At present, the IAM authenticator is simply out there to Java. On this instance, the Lambda handler receives a message from an API Gateway supply and pushes this message to an MSK matter referred to as messages.

Usually, Kafka producers are long-living and pushing a message to a Kafka matter is an asynchronous course of. As a result of Lambda is ephemeral, you could implement a full flush of a submitted message till the Lambda perform ends by calling producer.flush():

// Copyright Amazon.com, Inc. or its associates. All Rights Reserved.
// SPDX-License-Identifier: MIT-0
package deal software program.amazon.samples.kafka.lambda;
 
// This class is a part of the AWS samples package deal and particularly offers with Kafka integration in a Lambda perform.
// It serves as a easy API Gateway to Kafka Proxy, accepting requests and forwarding them to a Kafka matter.
public class SimpleApiGatewayKafkaProxy implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {
 
    // Specifies the title of the Kafka matter the place the messages might be despatched
    public static ultimate String TOPIC_NAME = "messages";
 
    // Logger occasion for logging occasions of this class
    personal static ultimate Logger log = LogManager.getLogger(SimpleApiGatewayKafkaProxy.class);
    
    // Manufacturing unit to create properties for Kafka Producer
    public KafkaProducerPropertiesFactory kafkaProducerProperties = new KafkaProducerPropertiesFactoryImpl();
    
    // Occasion of KafkaProducer
    personal KafkaProducer<String, String>[KT1]  producer;
 
    // Overridden methodology from the RequestHandler interface to deal with incoming API Gateway proxy occasions
    @Override
    @Tracing
    @Logging(logEvent = true)
    public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent enter, Context context) {
        
        // Making a response object to ship again 
        APIGatewayProxyResponseEvent response = createEmptyResponse();
        attempt {
            // Extracting the message from the request physique
            String message = getMessageBody(enter);
 
            // Create a Kafka producer
            KafkaProducer<String, String> producer = createProducer();
 
            // Making a file with matter title, request ID as key and message as worth 
            ProducerRecord<String, String> file = new ProducerRecord<String, String>(TOPIC_NAME, context.getAwsRequestId(), message);
 
            // Sending the file to Kafka matter and getting the metadata of the file
            Future<RecordMetadata>[KT2]  ship = producer.ship(file);
            producer.flush();
 
            // Retrieve metadata in regards to the despatched file
            RecordMetadata metadata = ship.get();
 
            // Logging the partition the place the message was despatched
            log.information(String.format("Message was ship to partition %s", metadata.partition()));
 
            // If the message was efficiently despatched, return a 200 standing code
            return response.withStatusCode(200).withBody("Message efficiently pushed to kafka");
        } catch (Exception e) {
            // In case of exception, log the error message and return a 500 standing code
            log.error(e.getMessage(), e);
            return response.withBody(e.getMessage()).withStatusCode(500);
        }
    }
 
    // Creates a Kafka producer if it does not exist already
    @Tracing
    personal KafkaProducer<String, String> createProducer() {
        if (producer == null) {
            log.information("Connecting to kafka cluster");
            producer = new KafkaProducer<String, String>(kafkaProducerProperties.getProducerProperties());
        }
        return producer;
    }
 
    // Extracts the message from the request physique. If it is base64 encoded, it is decoded first.
    personal String getMessageBody(APIGatewayProxyRequestEvent enter) {
        String physique = enter.getBody();
 
        if (enter.getIsBase64Encoded()) {
            physique = decode(physique);
        }
        return physique;
    }
 
    // Creates an empty API Gateway proxy response occasion with predefined headers.
    personal APIGatewayProxyResponseEvent createEmptyResponse() {
        Map<String, String> headers = new HashMap<>();
        headers.put("Content material-Kind", "software/json");
        headers.put("X-Customized-Header", "software/json");
        APIGatewayProxyResponseEvent response = new APIGatewayProxyResponseEvent().withHeaders(headers);
        return response;
    }
}

Connect with Amazon MSK utilizing IAM authentication

This submit makes use of IAM authentication to hook up with the respective Kafka cluster. For details about find out how to configure the producer for connectivity, consult with IAM entry management.

Since you configure the cluster by way of IAM, grant Join and WriteData permissions to the producer in order that it may push messages to Kafka:

{
    “Model”: “2012-10-17”,
    “Assertion”: [
        {            
            “Effect”: “Allow”,
            “Action”: [
                “kafka-cluster:Connect”
            ],
            “Useful resource”: “arn:aws:kafka:area:account-id:cluster/cluster-name/cluster-uuid “
        }
    ]
}
 
 
{
    “Model”: “2012-10-17”,
    “Assertion”: [
        {            
            “Effect”: “Allow”,
            “Action”: [
                “kafka-cluster:Connect”,
                “kafka-cluster: DescribeTopic”,
            ],
            “Useful resource”: “arn:aws:kafka:area:account-id:matter/cluster-name/cluster-uuid/topic-name“
        }
    ]
}

This exhibits the Kafka excerpt of the IAM coverage, which should be utilized to the Kafka producer. When utilizing IAM authentication, bear in mind of the present limits of IAM Kafka authentication, which have an effect on the variety of concurrent connections and IAM requests for a producer. Discuss with Amazon MSK quota and observe the advice for authentication backoff within the producer shopper:

        Map<String, String> configuration = Map.of(
                “key.serializer”, “org.apache.kafka.widespread.serialization.StringSerializer”,
                “worth.serializer”, “org.apache.kafka.widespread.serialization.StringSerializer”,
                “bootstrap.servers”, getBootstrapServer(),
                “safety.protocol”, “SASL_SSL”,
                “sasl.mechanism”, “AWS_MSK_IAM”,
                “sasl.jaas.config”, “software program.amazon.msk.auth.iam.IAMLoginModule required;”,
                “sasl.shopper.callback.handler.class”,
				“software program.amazon.msk.auth.iam.IAMClientCallbackHandler”,
                “connections.max.idle.ms”, “60”,
                “reconnect.backoff.ms”, “1000”
        );

Extra concerns

Every MSK Serverless cluster can deal with 100 requests per second. To cut back IAM authentication requests from the Kafka producer, place it outdoors of the handler. For frequent calls, there’s a likelihood that Lambda reuses the beforehand created class occasion and solely reruns the handler.

For bursting workloads with a excessive variety of concurrent API Gateway requests, this could result in dropped messages. Though this is perhaps tolerable for some workloads, for others this may not be the case.

In these circumstances, you possibly can prolong the structure with a buffering expertise like Amazon Easy Queue Service (Amazon SQS) or Amazon Kinesis Information Streams between API Gateway and Lambda.

To cut back latency, cut back chilly begin instances for Java by altering the tiered compilation degree to 1, as described in Optimizing AWS Lambda perform efficiency for Java. Provisioned concurrency ensures that polling Lambda features don’t must heat up earlier than requests arrive.

Conclusion

On this submit, we confirmed find out how to create a serverless integration Lambda perform between API Gateway and MSK Serverless as a option to do IAM authentication when your producer is just not written in Java. You additionally realized in regards to the native integration of Lambda and Amazon MSK on the patron facet. Moreover, we confirmed find out how to deploy such an integration with the AWS CDK.

The final sample is appropriate for a lot of use circumstances the place you wish to use IAM authentication however your producers or customers will not be written in Java, however you continue to wish to benefit from the advantages of MSK Serverless, like its capacity to scale up and down with unpredictable or spikey workloads or its little to no operational overhead of working Apache Kafka.

You may also use MSK Serverless to cut back operational complexity by automating provisioning and the administration of capability wants, together with the necessity to continuously monitor brokers and storage.

For extra serverless studying assets, go to Serverless Land.

For extra info on MSK Serverless, take a look at the next:

Concerning the Authors

Philipp Klose is a World Options Architect at AWS primarily based in Munich. He works with enterprise FSI prospects and helps them remedy enterprise issues by architecting serverless platforms. On this free time, Philipp spends time together with his household and enjoys each geek interest potential.

Daniel Wessendorf is a World Options Architect at AWS primarily based in Munich. He works with enterprise FSI prospects and is primarily specialised in machine studying and knowledge architectures. In his free time, he enjoys swimming, mountaineering, snowboarding, and spending high quality time together with his household.

Marvin Gersho is a Senior Options Architect at AWS primarily based in New York Metropolis. He works with a variety of startup prospects. He beforehand labored for a few years in engineering management and hands-on software improvement, and now focuses on serving to prospects architect safe and scalable workloads on AWS with a minimal of operational overhead. In his free time, Marvin enjoys biking and technique board video games.

Nathan Lichtenstein is a Senior Options Architect at AWS primarily based in New York Metropolis. Primarily working with startups, he ensures his prospects construct good on AWS, delivering inventive options to their complicated technical challenges. Nathan has labored in cloud and community structure within the media, monetary companies, and retail areas. Outdoors of labor, he can usually be discovered at a Broadway theater.

[ad_2]