Triggered tasks in Snowflake offer a key advantage: they only execute when new data arrives, eliminating the need to run a warehouse or cloud service constantly and reducing associated costs. By leveraging Snowflake’s stream processing and trigger-based task scheduling, we ensure data is loaded and validated as soon as it arrives, allowing for near real-time processing. In this use case, we explore how Snowflake’s triggered tasks can automate the process of loading new orders data into Table. This approach not only minimizes costs but also maximizes efficiency by performing essential operations only when new data is available.
This use case will walk through the setup of a Snowflake task called LOAD_ORDER_DATA, which performs automated data ingestion and validation. The process uses a Snowflake stream to track incoming Orders records and includes logic for handling duplicate data. Also it identifies potential data quality issues, such as negative values.
Our goal is to build a task to:
- Automatically load new Orders records into a HIST_ORDER_DATA table.
- Deduplicate records to avoid re-processing.
- Identify and log entries with negative values for quick validation and resolution.
Implementation:
Step 1: Define the Tables
Two tables will be used in this setup:
- ORDER_DLY_DATA: The source table where new sales records are initially loaded.
- HIST_ORDER_DATA: The destination table that stores processed, deduplicated orders records.
Step2: Set Up a Stream for Change Tracking
The ORDER_DLY_DATA_STM stream allows Snowflake to track changes in the ORDER_DLY_DATA table, serving as a trigger for our task.
Step 3: Create the Task for Data Loading and Validation
The LOAD_ORDER_DATA task is triggered whenever new data is detected in the ORDER_DLY_DATA_STM stream. The task performs the following:
- Data Ingestion: Inserts new data into HIST_ORDER_DATA.
- Deduplication: Checks for existing ORDER_IDs to avoid re-processing.
- Validation: Logs any records with negative amounts
Step 4:Verify the count in table:
Step 5:Insert data into DLY table and this will populate Stream as well.
insert into ORDER_DLY_DATA (ORDER_ID, CUSTOMER_ID, ORDER_DATE, AMOUNT, ORDER_STATUS)
values
('ORD001', 'CUST001', '2024-10-01', 100.0, 'New'),
('ORD002', 'CUST002', '2024-10-02', 150.0, 'New'),
('ORD003', 'CUST003', '2024-10-03', 200.0, 'New'),
('ORD004', 'CUST004', '2024-10-04', 50.0, 'New'),
('ORD005', 'CUST001', '2024-10-01', -100, 'New');
Step 6:Task will trigger automatically.
Step 7:Verify data in table and stream.
Snowflake’s triggered task functionality, coupled with streams, offers a powerful, cost-effective way to manage data ingestion and validation processes. With no need for a constantly running warehouse, this approach provides substantial cost savings and optimizes resource usage.