Automating Snowflake External Tables Creation: A Better Way to Handle N+ Columns

Read Time:3 Minute, 51 Second

Automating Snowflake External Table Creation: If you’ve worked with Snowflake External Tables, you’ve likely encountered this scenario: you have a source file with 100+ columns in S3, and you need to create an external table. The traditional approach requires manually parsing and defining each column—a tedious, error-prone process that can take time.

Snowflake gives us two powerful tools:

INFER_SCHEMA – scans staged files and returns:
USING TEMPLATE + GENERATE_COLUMN_DESCRIPTION – can auto-generate column definitions.

These work beautifully when you’re creating regular tables. While out of the box they don’t fully solve the “wide external table” problem, there’s a disconnect

The Challenge:

For CSV external tables, using INFER_SCHEMA and USING TEMPLATE leads to an external table where only the VALUE column is populated and all derived columns are NULL.
Or, if you change the file format, you get data in the columns – but the column names become C1, C2, C3 instead of meaningful names.
For Parquet external tables, INFER_SCHEMA knows everything, but you still end up copy–pasting a huge column definition list unless you automate it yourself.
With 100+ columns, this manual process is:
- Time-consuming (hours of copy-paste-edit)
- Error-prone (typos in column names or data types)
- Frustrating (repetitive work that should be automated)

The CSV Experiment: Two Attempts, Two Different Problems

The CSV Experiment: Two Attempts, Two Different Problems

CREATE OR REPLACE FILE FORMAT public.csv_format1 TYPE = CSV PARSE_HEADER = TRUE SKIP_BLANK_LINES = TRUE TRIM_SPACE = TRUE NULL_IF = ('NULL', 'null') EMPTY_FIELD_AS_NULL = TRUE;

CREATE OR REPLACE FILE FORMAT public.csv_format TYPE = CSV SKIP_HEADER = 1 -- no PARSE_HEADER here NULL_IF = ('NULL', 'null') EMPTY_FIELD_AS_NULL = TRUE;

CREATE OR REPLACE STAGE public.external_csv_stage URL = 's3://sharedbucket29/countrycode/' STORAGE_INTEGRATION = s3_int FILE_FORMAT = public.csv_format1;

Attempt 1: Use header-aware format (csv_format1) in INFER_SCHEMA

CREATE OR REPLACE EXTERNAL TABLE ext_country_parse USING TEMPLATE ( SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) FROM TABLE ( INFER_SCHEMA( LOCATION=>'@public.external_csv_stage/', FILE_FORMAT=>'csv_format1' ) ) ) LOCATION=@public.external_csv_stage FILE_FORMAT=csv_format;

Attempt 2: Use non-header format (csv_format) for INFER_SCHEMA and external table

CREATE OR REPLACE EXTERNAL TABLE ext_country USING TEMPLATE ( SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) FROM TABLE ( INFER_SCHEMA( LOCATION=>'@public.external_csv_stage/', FILE_FORMAT=>'csv_format' ) ) ) LOCATION=@public.external_csv_stage FILE_FORMAT=csv_format;

Result:

External table is created.
VALUE and all projected columns now contain data
But the column names are generic (e.g., C1, C2, C3) instead of country_Code, Country_number, etc.

Solution : Let INFER_SCHEMA Generate Metadata, Then Build VALUE:cN Automatically

For CSV Files

CREATE OR REPLACE FILE FORMAT public.csv_infer_fmt TYPE = CSV PARSE_HEADER = TRUE SKIP_BLANK_LINES = TRUE TRIM_SPACE = TRUE NULL_IF = ('NULL', 'null') EMPTY_FIELD_AS_NULL = TRUE;

— File format for the actual external table (no header parsing)
CREATE OR REPLACE FILE FORMAT public.csv_ext_fmt TYPE = CSV SKIP_HEADER = 1 NULL_IF = ('NULL', 'null') EMPTY_FIELD_AS_NULL = TRUE;

— External stage pointing to S3
CREATE OR REPLACE STAGE public.external_csv_stage URL = 's3://sharedbucket29/countrycode/' STORAGE_INTEGRATION = s3_int FILE_FORMAT = public.csv_infer_fmt;

— File format for Parquet
For Parquet Files
CREATE OR REPLACE FILE FORMAT public.parquet_fmt TYPE = PARQUET;

— External stage for Parquet files
CREATE OR REPLACE STAGE public.parquet_stage URL = 's3://srcbucketsachin/parquet_vector/' STORAGE_INTEGRATION = s3_int FILE_FORMAT = public.parquet_fmt;

Before automating, verify that INFER_SCHEMA() works correctly:

For CSV Files:

For Parquet Files:

Automated External Table Creation: CSV:

Automated External Table Creation: Parquet:

Key Differences: CSV vs. Parquet:

While Snowflake’s INFER_SCHEMA() is powerful, bridging the gap to External Table creation required custom logic. This solution eliminates the manual drudgery of defining 100+ columns and makes External Table creation as simple as running a procedure.