In the modern data landscape, the common challenge is efficiently handling and integrating new data from various files into data warehouses. This blog post explores a real-world use case for a Snowflake stored procedure designed to dynamically create tables based on CSV file headers, load data, and provide insightful feedback on the data loading process.
Consider a company that receives daily CSV files containing customer transaction data. The files have varying structures, as different departments might add or remove columns based on their reporting needs. To streamline the ingestion process, the company needs a solution that can:
- Dynamically create tables based on the CSV file headers.
- Load data into these tables.
- Provide a summary of the loading process, including the number of records ingested.
File Structure:
We implemented a Snowflake stored procedure, Dynamic_File_Pipe
, to address these requirements. This procedure automates the table creation and data loading process, ensuring that the data ingests accurately and efficiently.
Here’s a breakdown of how the Dynamic_File_Pipe
stored procedure works:
- Generate Column List and Create Table:
- The procedure first examines the CSV file to dynamically generate a list of columns based on the headers.
- It uses Snowflake’s
FLATTEN
function to parse the headers and create aCREATE TABLE
statement. - The table is created in the
DEMO_DB.PUBLIC
schema with columns that match the CSV file structure.
- Load Data into the Table:
- The
COPY INTO
command is used to load the CSV data into the newly created table. - The file format for CSV is specified to handle the data appropriately, including delimiters and header skipping.
- The
- Count Records and Return Summary:
- After loading the data, the procedure counts the number of records inserted into the table.
- It returns a success message with the record count and the date of the load.
Code:
File_Format:
create or replace file format demo_db.public.csv_format_first
type = csv
null_if = ('NULL', 'null')
field_delimiter = ',';
Code:
Call the Procedure:
call DYNAMIC_FILE_PIPE();
This dynamic file pipe demonstrates the power of Snowflake’s stored procedures and JavaScript capabilities. By leveraging automation and dynamic schema generation, we can streamline real-time data ingestion, empowering businesses to gain valuable insights from their ever-evolving data landscape.