During this post we are going to discuss how we can read EXCEL file in Snowflake Python code. Snowflake offers seamless integration with a variety of file formats, such as CSV, JSON, Parquet, Avro, ORC, and XML, providing users with extensive options for data ingestion. However, what if your specific use case involves a file format that isn’t among Snowflake’s native offerings?
In such scenarios, SnowflakeFile.open comes to the rescue. This function serves as a versatile gateway, enabling users to work with a broader spectrum of file formats, even those not explicitly supported out of the box by Snowflake. By utilizing SnowflakeFile.open, users gain flexibility and adaptability, ensuring they can efficiently ingest and process data regardless of the file format’s native support status within the Snowflake ecosystem.
This feature enhances Snowflake’s capabilities and making it a powerful tool for handling a wide range of data formats in various real-world scenarios. We’ll explore practical use cases, demonstrating how Snowpark Python, in combination with SnowflakeFile.open, can be employed to seamlessly ingest data from an Excel file into a Snowflake table
Say we have below excel file present in my internal staging area:
Below is the Python procedure to read excel file from internal staging area i.e. @EXCEL_STAGE.
Call the Procedure:
CALL parse_excel_sp(build_scoped_file_url(@EXCEL_STAGE, 'Inv.xlsx'));
Output:
Now in extension to it, if we have multiple sheet in one excel file and we have to merge these sheets into one Dataframe. Finally this dataframe has to be imported into Snowflake table. Below is the procedure for the same.
Call the Procedure:
CALL read_excel_multfile(build_scoped_file_url(@EXCEL_STAGE, 'Inv1.xlsx'));
Output:
Final state of DB after two procs execution:
You can embed the build scoped file URL as a function to simplify the call signature of the procedure. With this function, you would call the procedure passing in the stage name and file path as parameters
def get_scoped_url(stage, file_path):
sql = f”CALL BUILD_SCOPED_FILE_URL(@{stage}, ‘{file_path}’)”
row = session.sql(sql).collect()
url = row[0][“BUILD_SCOPED_FILE_URL”]
return url