During this post we will discuss about how to troubleshoot Snowpipe data load error. As we know Snowpipe automates the process by enabling the data loading from files as soon as they’re available in a stage. But sometimes we come into the situation where Snowpipe executes successfully while records are partially loaded. Consider the scenario where feed file contains 100 records. Once the Snowpipe execution completes, we verify in snowflake tables and found record discrepancy compare to feed file. There are few records missing in the target table. So we need to identify the erroneous or missing records as well the failure reason so that can avoid in next run.
Now the challenge is how we can identify such records which are left during processing. To identify the root cause we can leverage the information_schema.copy_history and validate_pipe_load snowflake functions.
COPY_HISTORY function determines the status of your feed file (Success or partially loaded).
VALIDATE_PIPE_LOAD:returns details about any errors encountered during an attempted data load into Snowflake tables.
I was querying both functions one by one every time when my Snowpipe loads executed. This was really looking pain for me and though of automating the complete process.
I have written following stored procedure which perform the following actions.
- Read the COPY_HISTORY table .
- Track the errors feed file
- Traverse these Feed file
- Pass file name input to VALIDATE_PIPE_LOAD
- Capture the Errors
- Push into the output array
Snowpipe has executed below two files:
Our process will read only the erroneous file and find the rejected records along with reason.
This process has saved the time and just one call we get the history of snowipe process in past one day or hour.