Schema Detection Use Case

Read Time:2 Minute, 37 Second

Schema Detection Automate: Consider the scenario where we are getting files (PARQUET format) from multiple sources and structure of Files is also varies depending on the source system. Every source file has different set of columns and as per need ,Ingest these files to Snowflake in a single table. Data pipeline should be robust enough that it should read the multiple file structure at run time and ingest them in a same table.

One of the upfront approach comes to our mind, will copy the file data into Snowflake by using VARIANT columns .Later on parse the data into rows-columns by SQL query. But limitation with this approach is that we need to provide the schema beforehand and only then we could able to parse the file. Also another challenge is that File Schema is different for every source and requires manual intervention to get the schema based on the Source system.

Snowflake latest feature Schema Detection came to the rescue, It automatically detecting the schema i.e. Schema Detection Automate in a set of staged semi-structured data files and retrieving the column definitions. Moreover ,Snowflake automatically create a table for you with the detected schema.

Snowflake provide below three functions for the implementation of Schema Detection:

INFER_SCHEMA retrieves and returns the schema from a set of staged files.
GENERATE_COLUMN_DESCRIPTION returns the list of columns necessary to create a table, external table, or view. Here you can modify the columns or data types before you complete the creation of the object.
CREATE TABLE … USING TEMPLATE expands upon Snowflake’s CREATE TABLE functionality to automatically create the structured table using the detected schema from the staged files with no additional input.

We will discuss more about these functions in detail in another blog post Schema detection overview….