Data Masking with Snowflake Tags and dbt Post-Hooks

Read Time:6 Minute, 0 Second

Data Masking with Snowflake Tags: Data governance is no longer optional—it’s a regulatory and business imperative. Organizations handling sensitive data like PII (Personally Identifiable Information) must ensure that the right people see the right data at the right time. But here’s the challenge: How do you scale data masking across multiple tables and number of columns without manual intervention?

The answer lies in combining two powerful technologies:

Snowflake’s Tag-Based Masking Policies — for dynamic, role-aware data protection
dbt’s Post-Hooks — for automating governance as part of your data pipeline.

In this blog, I’ll walk you through a complete implementation where classification tags are automatically applied to columns every time a dbt model runs, ensuring your masking policies are always in sync with your data.

Why This Approach Matters

If masking is done manually, there’s a risk that:

dbt creates the table
masking/tag is applied later

That delay creates a governance gap, where sensitive data can be visible temporarily.

So the goal of this solution is simple:

dbt builds the table → post-hook calls a Snowflake procedure → procedure reads metadata and applies tags → masking policies enforce visibility at query time.

The Solution

The Solution: Tag-Based Masking + dbt Automation

By combining Snowflake tags with dbt, we achieve:

Single masking policy that adapts based on tag values
Metadata-driven classification — change a row in a table, not DDL
Automated tag application — governance happens at build time
Separation of concerns — data engineers build, governance team classifies

Step-by-Step Implementation

Step 1: Set Up the Governance Database

First, we create a dedicated database for all governance objects. Keeping governance artifacts separate from business data is a best practice.

CREATE OR REPLACE DATABASE DATA_GOVERNANCE;

CREATE SCHEMA IF NOT EXISTS DATA_GOVERNANCE.PUBLIC;

Step 2: Create Sample Data

Let’s create a sample CUSTOMERS table with sensitive columns:

CREATE OR REPLACE TABLE DATA_GOVERNANCE.PUBLIC.CUSTOMERS ( CUSTOMER_ID NUMBER, EMAIL STRING, DOB DATE, SEGMENT STRING );

INSERT INTO DATA_GOVERNANCE.PUBLIC.CUSTOMERS (CUSTOMER_ID, EMAIL, DOB, SEGMENT) VALUES (1001, 'alice@gmail.com', TO_DATE('1992-04-18'), 'RETAIL'), (1002, 'bob@company.com', TO_DATE('1988-11-05'), 'ENTERPRISE'), (1003, NULL, TO_DATE('1999-01-22'), 'SMB'), (1004, 'david@yahoo.com', NULL, 'RETAIL'), (1005, 'eva@provider.net', TO_DATE('1975-07-30'), NULL);

Step 3: Create the Classification Tag

Snowflake tags are key-value pairs that can be attached to objects and columns. We create a tag with predefined allowed values representing our classification levels:

CREATE TAG DATA_GOVERNANCE.PUBLIC.CLASSIFICATION ALLOWED_VALUES 'public', 'internal', 'confidential','restricted;

Step 4: Create the Metadata Configuration Table

This is the heart of our metadata-driven approach. Instead of hardcoding classifications, we store them in a configuration table:

CREATE OR REPLACE TABLE DATA_GOVERNANCE.PUBLIC.COLUMN_METADATA ( table_name STRING, -- target table name column_name STRING, -- physical column name standard_name STRING, -- logical/business name classification STRING, -- public|internal|confidential|restricted is_active BOOLEAN DEFAULT TRUE, last_updated TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP );

Populate it with your column classifications:

INSERT INTO DATA_GOVERNANCE.PUBLIC.COLUMN_METADATA (table_name, column_name, standard_name, classification, is_active) VALUES ('CUSTOMERS_DETAIL', 'EMAIL', 'email_address', 'confidential', TRUE), ('CUSTOMERS_DETAIL', 'DOB', 'date_of_birth', 'restricted', TRUE), ('CUSTOMERS_DETAIL', 'SEGMENT', 'customer_segment', 'internal', TRUE);

Why This Matters:

Governance teams can update classifications without touching code
The is_active flag allows disabling masking without deleting config
last_updated provides audit trail
standard_name enables data catalog integration

Step 5: Create Tag-Based Masking Policies

Now we create masking policies that read the tag value at query time:

CREATE OR REPLACE MASKING POLICY DATA_GOVERNANCE.PUBLIC.MASK_BY_CLASSIFICATION_STRING AS (val STRING) RETURNS STRING -> CASE WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DATA_GOVERNANCE.PUBLIC.CLASSIFICATION') = 'public' THEN val WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DATA_GOVERNANCE.PUBLIC.CLASSIFICATION') = 'internal' AND CURRENT_ROLE() IN ('DATA_INTERNAL_ROLE') THEN val WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DATA_GOVERNANCE.PUBLIC.CLASSIFICATION') = 'confidential' AND (CURRENT_ROLE() IN ('DATA_CONFIDENTIAL_ROLE','DATA_RESTRICTED_ROLE')) THEN val ELSE '***MASKED***' END;

The Magic of SYSTEM$GET_TAG_ON_CURRENT_COLUMN():

This function dynamically reads the tag value attached to the column being queried. This means:

One policy handles all classification levels
No need to modify the policy when classifications change
Tag changes take effect immediately

Step 6: Attach Masking Policy to the Tag

This is the key step that links everything together:

ALTER TAG DATA_GOVERNANCE.PUBLIC.CLASSIFICATION SET MASKING POLICY DATA_GOVERNANCE.PUBLIC.MASK_BY_CLASSIFICATION_STRING;

Now, any column tagged with CLASSIFICATION will automatically have the masking policy applied. No need to attach policies to individual columns!

Step 7: Create the Tag Application Procedure

Procedure Highlights:

Procedure Highlights:

Accepts database, schema, and table as parameters (perfect for dbt)
Reads classifications from metadata table
Respects is_active flag — can unset tags when disabled
Returns count of columns updated for logging

The dbt Integration

Now comes the powerful part — automating this with dbt post-hooks.

Step 8: Create the dbt Macro

Create a macro in your dbt project (e.g., macros/classification_governance.sql):

{% macro classification_governance_post_hook() %} {{ return( "CALL DATA_GOVERNANCE.PUBLIC.APPLY_CLASSIFICATION_TAGS_MIN('" ~ this.database ~ "','" ~ this.schema ~ "','" ~ this.identifier ~ "')" ) }} {% endmacro %}