0 0
Read Time:2 Minute, 38 Second

In data-driven organizations, File Archival in Snowflake: A Snowpark-Powered Solution has become a game-changer. Handling feed files in data pipelines is a critical task for many organizations. These files, often stored in stages such as Amazon S3 or Snowflake internal stages, are the backbone of data ingestion workflows. Once processed, these files need to be archived to ensure cleanliness in the staging area and facilitate historical tracking. Without a proper archival strategy, these files can clutter staging areas, leading to operational challenges.

Snowflake, with its robust Snowpark API and features like COPY FILES and REMOVE, allows us to develop fully managed solutions without relying on external automation like AWS Lambda or lifecycle policies.

This blog explores a real-world use case where a Snowpark stored procedure automates the movement of processed feed files from one stage to another (either internal or external). This approach simplifies workflows, reduces manual intervention, and ensures a clean staging area.

Use Case:
Imagine a retail company that processes daily feed files containing transactional data. These files are staged in an internal staging location (@S3_STAGE/inbound/). After processing, the business needs to archive these files to:

  1. An internal archive stage (@S3_STAGE/archive/) .
  2. An external stage (@ext_csv_stage) for sharing with third-party vendors.

Solution Overview

Using a Snowpark stored procedure, we can:

  1. Dynamically choose between internal or external archiving.
  2. Move processed files using Snowflake’s COPY FILES command.
  3. Keep the staging area clean by generating REMOVE commands for execution.

Technical Implementation:

Technical Details:

Below is the Python code for the stored procedure, which automates file movement based on user preference:

Snowpark Copy

Execution:

Case1: Files in Internal staging and snowpark will move to S3 external staging area.

Internal stage Files

No Files in External stage.

External stage Before Snowpark

Execute the Procedure:

Snowpark Proc Output

Files Archived in External stage.

External stage After Snowpark

Files Archived in Internal stage:

Archived in internal stage

Case2: Files in Internal staging and snowpark will move to Internal staging area in archive folder.

Files in internal stage

Execute the Procedure:

Files Archived to Internal stage

Internal stage Archive Folder:

Archive Files to Internal stage

Enhance User Interaction:
Integrate this procedure with a Streamlit application to allow users to dynamically select archive types (internal or external).

Also to make it more dynamic we can have below code in our snowpark:

def main(session: snowpark.Session ,archive_type: str): 

and deploy the stored procedure with execute as CALLER and can call the procedure dynamically values.

CALL ARCHIVE_FILE(‘internal’);

CALL ARCHIVE_FILE(‘external’);

 

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *