3 0
Read Time:2 Minute, 20 Second

Loading CSV files into Snowflake is a common data engineering task. However, a frequent challenge arises when CSV files contain more columns than their corresponding Snowflake tables. In such cases, the COPY INTO command with schema evolution (AUTO_CHANGE=TRUE) fails because it requires matching columns. To address this, Dynamic CSV Column Mapping with Stored Procedures can be used to create a flexible, automated process that maps additional columns in the CSV to the correct fields in the Snowflake table, making the data loading process smoother and more adaptable.

In this blog, we’ll address this challenge by building a metadata-driven solution using a JavaScript stored procedure that dynamically maps and loads only the required columns from multiple CSV files into their respective Snowflake tables.

The Challenge

Imagine you receive 10 CSV files daily, each corresponding to a Snowflake table. However:

  • Each CSV contains more columns than required by the target table.
  • Only a subset of columns from the CSV matches the table schema.
  • Schema Evolution (AUTO_CHANGE=TRUE) won’t work due to non-matching columns.

Goal: Automate loading only the necessary columns dynamically without manually adjusting each COPY INTO statement.

Implementation:

We will build a scalable, automated data loading process using:

  1. Metadata Table: Stores CSV-to-table column mappings.
  2. JavaScript Stored Procedure: Generates and executes COPY INTO commands dynamically.
  3. Internal Stage: Stores CSV files for ingestion.

Step 1: Create Metadata Table for Column Mappings
The metadata table acts as the “source of truth” for column mappings.

Metadata table

Step 2: Stage the CSV Files in Snowflake

create or replace file format public.csv_format type = csv skip_header = 1 null_if = ('NULL', 'null') empty_field_as_null = true;

create or replace stage dynamic_copy file_format = public.csv_format;

Step 2.1: CSV Files in Snowflake

Cust Table
Orders Tbl

Step 3: Create JavaScript Stored Procedure for Dynamic Loading
Now, let’s create a JavaScript stored procedure that reads mappings from file_column_mapping and dynamically constructs COPY INTO statements for all files.

Metadata Proc

Step 4: Execute the Stored Procedure.

CALL load_csv_to_tables();

Proc Completed

Step 5: Verify data.

Order Table
Cust Table

Conclusion:

  • Automated Column Mapping: The stored procedure reads column mappings from the metadata table.
  • Dynamic SQL Generation: COPY INTO statements are generated programmatically.
  • Scalability: Supports multiple CSV-to-table mappings simultaneously.
  • Reusability: Easily extended to new files by adding rows to file_column_mapping.

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *