Tag Architecture
1 0
Read Time:6 Minute, 0 Second

Data Masking with Snowflake Tags: Data governance is no longer optional—it’s a regulatory and business imperative. Organizations handling sensitive data like PII (Personally Identifiable Information) must ensure that the right people see the right data at the right time.  But here’s the challenge: How do you scale data masking across multiple tables and number of columns without manual intervention?

The answer lies in combining two powerful technologies:

  • Snowflake’s Tag-Based Masking Policies — for dynamic, role-aware data protection
  • dbt’s Post-Hooks — for automating governance as part of your data pipeline.

In this blog, I’ll walk you through a complete implementation where classification tags are automatically applied to columns every time a dbt model runs, ensuring your masking policies are always in sync with your data.

Why This Approach Matters

If masking is done manually, there’s a risk that:

  • dbt creates the table
  • masking/tag is applied later

That delay creates a governance gap, where sensitive data can be visible temporarily.

So the goal of this solution is simple:

dbt builds the table → post-hook calls a Snowflake procedure → procedure reads metadata and applies tags → masking policies enforce visibility at query time.

The Solution

The Solution: Tag-Based Masking + dbt Automation

By combining Snowflake tags with dbt, we achieve:

  • Single masking policy that adapts based on tag values
  • Metadata-driven classification — change a row in a table, not DDL
  • Automated tag application — governance happens at build time
  • Separation of concerns — data engineers build, governance team classifies
Tag Architecture

Step-by-Step Implementation

Step 1: Set Up the Governance Database

First, we create a dedicated database for all governance objects. Keeping governance artifacts separate from business data is a best practice.

CREATE OR REPLACE DATABASE DATA_GOVERNANCE;

CREATE SCHEMA IF NOT EXISTS DATA_GOVERNANCE.PUBLIC;

Step 2: Create Sample Data

Let’s create a sample CUSTOMERS table with sensitive columns:

CREATE OR REPLACE TABLE DATA_GOVERNANCE.PUBLIC.CUSTOMERS (
CUSTOMER_ID NUMBER,
EMAIL STRING,
DOB DATE,
SEGMENT STRING
);

INSERT INTO DATA_GOVERNANCE.PUBLIC.CUSTOMERS
(CUSTOMER_ID, EMAIL, DOB, SEGMENT)
VALUES
(1001, 'alice@gmail.com', TO_DATE('1992-04-18'), 'RETAIL'),
(1002, 'bob@company.com', TO_DATE('1988-11-05'), 'ENTERPRISE'),
(1003, NULL, TO_DATE('1999-01-22'), 'SMB'),
(1004, 'david@yahoo.com', NULL, 'RETAIL'),
(1005, 'eva@provider.net', TO_DATE('1975-07-30'), NULL);

Step 3: Create the Classification Tag

Snowflake tags are key-value pairs that can be attached to objects and columns. We create a tag with predefined allowed values representing our classification levels:

CREATE TAG DATA_GOVERNANCE.PUBLIC.CLASSIFICATION ALLOWED_VALUES 'public', 'internal', 'confidential','restricted;

Step 4: Create the Metadata Configuration Table

This is the heart of our metadata-driven approach. Instead of hardcoding classifications, we store them in a configuration table:

CREATE OR REPLACE TABLE DATA_GOVERNANCE.PUBLIC.COLUMN_METADATA (
table_name STRING, -- target table name
column_name STRING, -- physical column name
standard_name STRING, -- logical/business name
classification STRING, -- public|internal|confidential|restricted
is_active BOOLEAN DEFAULT TRUE,
last_updated TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP
);

Populate it with your column classifications:

INSERT INTO DATA_GOVERNANCE.PUBLIC.COLUMN_METADATA
(table_name, column_name, standard_name, classification, is_active)
VALUES
('CUSTOMERS_DETAIL', 'EMAIL', 'email_address', 'confidential', TRUE),
('CUSTOMERS_DETAIL', 'DOB', 'date_of_birth', 'restricted', TRUE),
('CUSTOMERS_DETAIL', 'SEGMENT', 'customer_segment', 'internal', TRUE);

Why This Matters:

  • Governance teams can update classifications without touching code
  • The is_active flag allows disabling masking without deleting config
  • last_updated provides audit trail
  • standard_name enables data catalog integration

Step 5: Create Tag-Based Masking Policies

Now we create masking policies that read the tag value at query time:

CREATE OR REPLACE MASKING POLICY DATA_GOVERNANCE.PUBLIC.MASK_BY_CLASSIFICATION_STRING
AS (val STRING) RETURNS STRING ->
CASE
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DATA_GOVERNANCE.PUBLIC.CLASSIFICATION') = 'public'
THEN val
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DATA_GOVERNANCE.PUBLIC.CLASSIFICATION') = 'internal'
AND CURRENT_ROLE() IN ('DATA_INTERNAL_ROLE')
THEN val
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DATA_GOVERNANCE.PUBLIC.CLASSIFICATION') = 'confidential'
AND (CURRENT_ROLE() IN ('DATA_CONFIDENTIAL_ROLE','DATA_RESTRICTED_ROLE'))
THEN val
ELSE '***MASKED***'
END;

The Magic of SYSTEM$GET_TAG_ON_CURRENT_COLUMN():

This function dynamically reads the tag value attached to the column being queried. This means:

  • One policy handles all classification levels
  • No need to modify the policy when classifications change
  • Tag changes take effect immediately

Step 6: Attach Masking Policy to the Tag

This is the key step that links everything together:

ALTER TAG DATA_GOVERNANCE.PUBLIC.CLASSIFICATION
SET MASKING POLICY DATA_GOVERNANCE.PUBLIC.MASK_BY_CLASSIFICATION_STRING;

Now, any column tagged with CLASSIFICATION will automatically have the masking policy applied. No need to attach policies to individual columns!

Step 7: Create the Tag Application Procedure

Tag Proc

Procedure Highlights:

Procedure Highlights:

  • Accepts database, schema, and table as parameters (perfect for dbt)
  • Reads classifications from metadata table
  • Respects is_active flag — can unset tags when disabled
  • Returns count of columns updated for logging

The dbt Integration

Now comes the powerful part — automating this with dbt post-hooks.

Step 8: Create the dbt Macro

Create a macro in your dbt project (e.g., macros/classification_governance.sql):

{% macro classification_governance_post_hook() %}
{{ return(
"CALL DATA_GOVERNANCE.PUBLIC.APPLY_CLASSIFICATION_TAGS_MIN('"
~ this.database ~ "','"
~ this.schema ~ "','"
~ this.identifier ~ "')"
) }}
{% endmacro %}

Macro

How It Works:

  • this.database — the database where the model is materialized
  • this.schema — the schema of the model
  • this.identifier — the table/view name
  • These are dbt context variables available at runtime

Step 9: Apply to Your dbt Model

Now, any model that needs governance simply includes the post-hook:

— models/customers_detail.sql

Model

What Happens at Runtime

When you run dbt run –select customers_detail:

  1. dbt creates/replaces the table DATA_GOVERNANCE.PUBLIC.CUSTOMERS_DETAIL
  2. Post-hook fires calling APPLY_CLASSIFICATION_TAGS_MIN
  3. Procedure reads COLUMN_METADATA for CUSTOMERS_DETAIL
  4. Tags are applied to EMAIL, DOB, SEGMENT columns
  5. Masking is now active based on user’s role

Once we run the model, it applies TAGS to the columns:

Tag Apply
dbt call proc
DATA_INTERNAL_ROLE

 

DATA_CONFIDENTIAL_ROLE

Scaling Across Your dbt Project

Option 1: Apply to Specific Models

Add the post-hook to individual models that contain sensitive data:

{{ config(

post_hook = “{{ classification_governance_post_hook() }}”

) }}

Option 2: Apply to All Models via dbt_project.yml

# dbt_project.yml

models:

your_project:

staging:

+post-hook: “{{ classification_governance_post_hook() }}”

Conclusion

By combining Snowflake’s tag-based masking with dbt’s post-hooks, we’ve created a governance framework that:

  • Scales automatically with your data models
  • Requires zero manual intervention after initial setup
  • Keeps governance in sync with your data pipeline
  • Empowers governance teams with metadata-driven control

This pattern transforms data governance from a bottleneck into an automated, integral part of your data platform.

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *