3 0
Read Time:2 Minute, 0 Second

Snowflake announce the public preview of the Schema Detection feature with support for Parquet, Avro, and ORC  semi-structured file formats. Therefore, with schema detection, Snowflake will automatically determine and return the schema of your staged files for review. Helps to generate DDL which you can easily review and use to create table, external table, or view.

Therefore, Snowflake implemented this feature by introducing 3 new features:

  1. INFER_SCHEMA : Firstly, retrieves and returns the schema from a set of staged files.

select column_name,expression

from table (

infer_schema(

location=>'@my_parquet/parquet/userdata1.parquet',

file_format=>'my_parquet_format'

));

INFER_SCHEMA

However, supports named stages (internal or external) and user stages only. It does not support table stages.

2. GENERATE_COLUMN_DESCRIPTION: Secondly function, produces the list of columns from the staged files and leverage the INFER_SCHEMA output to create a table, external table, or view. Here you can modify the columns or data types before you complete the creation of the object.

select generate_column_description(array_agg(object_construct(*)), 'table') as columns
from table (
infer_schema(
location=>'@my_parquet/parquet/userdata1.parquet',
file_format=>'my_parquet_format'
)
);

COLUMN_DESCRIPTION TABLE

select generate_column_description(array_agg(object_construct(*)), 'external_table') as columns
from table (
infer_schema(
location=>'@my_parquet/parquet/userdata1.parquet',
file_format=>'my_parquet_format'
)
);

select generate_column_description(array_agg(object_construct(*)), 'View') as columns
from table (
infer_schema(
location=>'@my_parquet/parquet/userdata1.parquet',
file_format=>'my_parquet_format'
)
);

COLUMN_DESCRIPTION VIEW EXTERNAL

3. CREATE TABLE … USING TEMPLATE: Finally, expands upon Snowflake’s CREATE TABLE functionality to automatically create the structured table using the detected schema from the staged files with no additional input.

CREATE TABLE USING TEMPLATE

create or replace table PARQUET_TABLE1 using template
(select array_agg(object_construct(*)) from table
(
infer_schema(
location=>'@my_parquet/parquet/',
file_format=>'my_parquet_format'
)
))

However, An EMPTY table would be created and we need to load the DATA into the table using COPY command.

In addition, We have implemented above three functions in Snowflake Use Case where we are ingesting multiple formats Parquet file into a single table at snowflake.

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *