In a data-driven world, maintaining data quality is paramount for organizations. Snowflake provides a powerful mechanism to assess and ensure data quality using Data Metric Functions (DMFs). These functions enable administrators to evaluate data in tables based on pre-defined or custom metrics. Large organizations often deal with vast datasets spread across multiple tables and schemas. Extracting DMF Details Across Schemas allows administrators to gain a comprehensive view of where and how DMFs are applied. Tracking this information ensures that data quality initiatives consistently implement and monitor across the entire organization.
- Which DMFs are applied?
- On which tables and columns?
- The results of these metric evaluations?
For a Snowflake AccountAdmin or Schema Owner, having a consolidated view of this information is crucial for auditing, reporting, and troubleshooting data quality issues.
This blog explores a real-time use case where we automate the extraction of detailed information about DMFs applied to tables and columns across a specified schema. This is achieved using a custom stored procedure in Snowflake, GET_DMF_DETAILS_FOR_SCHEMA
.
We developed the GET_DMF_DETAILS_FOR_SCHEMA
procedure that:
- Identifies all DMFs within a specified schema.
- Extracts the table and column details associated with each DMF using the
DATA_METRIC_FUNCTION_REFERENCES
table. - Returns a consolidated result for analysis.
Technical Implementation:
Technical Implementation:
- Automated DMF Discovery: Uses the
SHOW DATA METRIC FUNCTIONS
command to list all DMFs in the schema. - Detail Insights: For each DMF, retrieves the database, schema, and metric paths and identifies the specific columns evaluates.
- Dynamic Execution: Builds queries dynamically to fetch results for multiple DMFs within the schema.
Call the Procedure:
call GET_DMF_DETAILS_FOR_SCHEMA(‘PUBLIC’);
Output:
If we observe the output we can see DMF: CHECK_HIGH_TRANSACTION applies to the multiple tables i.e. Customer and DATA_QUALITY_TEST. The above proc will give the list of all tables and DMF inside the schema passes as input argument.
Practical Benefits
- Auditing: Maintain detailed records of DMFs applied across schemas.
- Troubleshooting: Quickly locate data issues by identifying impacted tables and columns.
- Governance: Demonstrate compliance by showing proactive monitoring of data quality metrics.