For instance, Consider a scenario where we have unstructured data in our cloud storage. However, Unstructured I assume : PDF,JPEG,JPG,Images or PNG files. Therefore, As per the requirement, Business users wants to download the files from cloud storage. But due to compliance issue, users were not authorized to login to the cloud provider. However, They were bound to download files from Stage using Snowflake Web UI. In addition, Snowflake has introduced Directory tables to store a catalog of staged files in cloud storage. They are used to provide Snowflake access to unstructured datafiles and supports both of internal or external stage
Query a directory table helps to retrieve the snowflake hosted file URL for each file present in stage.
To use Directory tables with existing stages,need to perform below two actions:
- You have to enable them.
- You have to refresh them.
Command:
alter stage EXT_CSV_STAGE;
set directory = (enable = true AUTO_REFRESH = true);
alter stage EXT_CSV_STAGE refresh;
select * from directory(@EXT_CSV_STAGE);
The file URL has the following format:
https://<account_identifier>/api/files/<db_name>.<schema_name>.<stage_name>/<relative_path>
To use Directory tables at the STAGTE creation, need to perform below two actions:
create or replace stage demo_db.public.ext_csv_stage
URL = 's3://srcbucketsachin/POC'
STORAGE_INTEGRATION = s3_int
file_format = demo_db.public.csv_format
directory = (enable = true);
alter stage ext_csv_stage refresh;
Note:
In real time we don’t want to run stage Refresh command manually. Directory tables metadata should be refreshed automatically when underlying stage gets updated. To implement the same we need to configure the Event notification for Directory tables.
Steps to configure automatic refresh:
- DESC Stage EXT_CSV_STAGE;
- Log into the AWS Management Console
- Configure an event notification for your S3 bucket
- Name: Name of the event notification (e.g. Auto-ingestSnowflake).
- Events: Select the ObjectCreate (All)and ObjectRemoved
- Send to: Select SQS Queuefrom the dropdown list.
- SQS: Select Add SQS queue ARNfrom the dropdown list.
- SQS queue ARN: Paste the SQS queue name from the DESC STAGE output.
Now upload the file to your bucket and it will refresh automatically in your directory command.
Internal stages:
Currently, directory tables in internal stages require manual metadata refreshes.
In order to identify if the stage has been modified either append new files or removed the older files, Below function is used:
STAGE_DIRECTORY_FILE_REGISTRATION_HISTORY
SELECT *
FROM TABLE(information_schema.stage_directory_file_registration_history(
STAGE_NAME=>'EXT_CSV_STAGE'));
In next part of post we will discuss about
- BUILD_STAGE_FILE_URL
- BUILD_SCOPED_FILE_URL
- GET_PRESIGNED_URL