0 0
Read Time:2 Minute, 57 Second

Previously, data engineers used Kinesis Firehose to transfer data into blob storage (S3) and then load it into Snowflake using either Snowpipe or batch processing. This introduced latency in the data pipeline for near real-time data processing. Now, Amazon Kinesis Data Firehose (Firehose) offers direct integration with Snowflake Snowpipe Streaming, eliminating the need to store data in an S3 bucket. With just a few clicks, users can set up a Firehose stream to deliver data directly to Snowflake. Through its integration with Snowpipe Streaming, Firehose sends records as soon as they are available, making data query-ready in Snowflake within seconds.

During this post, we will discuss how this integration transforms data ingestion workflows. We will explore practical use cases and a step-by-step guide to setting up your own Kinesis Firehose to Snowflake pipeline. By the end of this post, you’ll have a comprehensive understanding of how to leverage this powerful integration to enhance your data architecture.

Technical:

Let’s delve into the process of setting up a pipeline using Kinesis:

Step1: Generate public and private key file requires in Kinesis setup.

openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

key-pair

Step 2: Create table in Snowflake.

CREATE OR REPLACE TABLE DEMO_DB.PUBLIC.sales_data ( json_data VARIANT );

Step 3: Generate the fingerprint (a SHA-256 hash) of the public key for the user. This is done by using:

Copy the Key from rsa_key file (Make sure all in one line there should not be any line break)

Alter user sachinsnowpro set rsa_public_key='';

Step4: Configure Kinesis and ADF delivery stream for data streaming.

  1. AWS Console->Kinesis->Create Data stream.
Data Stream

2.Once Data Stream gets created Create a Firehose Stream.

Firehose Stream
Firehose config
Firehose SF
Firehose SF Table

Till this point, the Kinesis and Firehose setup is done. Next, we have to invoke this data stream in automate way rather than manual intervention.

Step5: Configure the Lambda in AWS.

To automatically invoke the Stream, we will utilize an AWS Lambda function. Here’s the logic we aim to implement:

When a file uploads to the S3 Bucket, the Lambda function will  trigger.

  • The Lambda function will call the Kinesis stream.
  • The Stream will execute the following tasks:
    • a) Read the source file from the S3 bucket.
    • b) Connect to Snowflake as the destination.
    • c) Load the data directly into Snowflake.
Lambda Kinesis

Step6: Attach function to S3 Bucket:

Attach Lambda

Step7: Upload file to the Bucket and this would trigger Lambda function which in turn will call kinesis data stream.

File Upload

Step 8: Verify Data in snowflake:

Data in SF

Incorporating this setup into your data architecture not only streamlines operations but also enhances performance, scalability, and cost-efficiency. The ease of configuration and the powerful capabilities of Snowpipe Streaming make it an invaluable tool for data engineers looking to optimize their workflows.

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *