3 0
Read Time:2 Minute, 58 Second

Tag, Mask, Protect: PII Handling in Snowflake is no longer optional—it’s essential. As organizations grow, data governance at scale becomes critical — especially when handling personally identifiable information (PII). Manual tagging and masking do not scale across thousands of tables and columns. Imagine a global logistics company ingesting real-time shipment tracking data that includes:

  • Customer contact details (email, phone, name, address)
  • Geolocation (latitude, longitude)
  • Shipment tracking info (shipment_id, status, timestamp)

These fields require proper classification as IDENTIFIER or QUASI_IDENTIFIER for internal privacy compliance.

In this post, I’m excited to share a real-time, metadata-driven framework for:

  • Automatically detecting and classifying PII columns
  • Dynamically applying Snowflake tags
  • Enforcing masking policies using tags — all without hardcoding

This reusable framework is built 100% on native Snowflake capabilities using JavaScript stored procedures.

 Solution Overview

At a high level, this Snowflake-native pipeline does the following:

  1. Lineage Metadata Table: Capture column-level PII classification info (LOGISTICS_STG.COLUMN_LINEAGE_INFO).
  2. PII Tag Catalog: Central repository storing tagging decisions (PII_TAG_CATALOG).
  3. Auto-Tagging Procedure: JavaScript stored procedure that reads lineage info, maps categories to tags, and populates the catalog.
  4. Tag Application Procedure: Another JS procedure that applies Snowflake tags to actual columns from the catalog.
  5. Dynamic Masking Policy: Defines masking logic based on applied tags to protect sensitive data for unauthorized roles.
  6. Masking Application Procedure: Applies the masking policy to all tagged sensitive columns dynamically.

Technical Walkthrough (In Brief)

  1. Lineage Table: Capture detected PII categories for all columns as they arrive.
  2. Tag Catalog: Store tagging decisions centrally for audit and control.
  3. Dynamic Tagging Procedure: Uses a JS map to classify and insert tags based on lineage.
  4. Apply Tags: Automate Snowflake tags on columns from the catalog.
  5. Dynamic Masking: Mask sensitive columns for unauthorized users, driven by tags.
  6. Enforce Masking: Apply masking policies to tagged columns dynamically.

Implementation:

Source Table:

Source Table
Lineage Table

2.TAG CATALOG: This table acts as the single source of truth for data privacy tagging.

CREATE OR REPLACE TABLE PII_TAG_CATALOG (
database_name STRING,
schema_name STRING,
table_name STRING,
column_name STRING,
tag_name STRING,
tag_value STRING,
classification STRING,
category STRING,
environment STRING,
load_dts TIMESTAMP
);

Tag Procedure

CALL SP_AUTO_TAG_PII_COLUMNS_JS(‘PROD’);

Tag Proc Output
Apply Tag to Cols

call APPLY_TAGS_FROM_CATALOG();

Verify in Snowflake if Tag has been applied:

Verify Tag

5.Dynamic Masking: Mask sensitive columns for unauthorized users, driven by tags

CREATE OR REPLACE MASKING POLICY MASK_PII_DYNAMIC
AS (val STRING) RETURNS STRING ->
CASE
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('PRIVACY_CATEGORY') IN ('IDENTIFIER', 'QUASI_IDENTIFIER')
AND CURRENT_ROLE() NOT IN ('ACCOUNTADMIN')
THEN '********'
ELSE val
END;

Enforce Masking

call APPLY_MASKING_POLICY_TO_TAGGED_COLUMNS();

Verify with Roles:

Verify with Roles

We’ve implemented an automated privacy tagging and masking system in Snowflake. It identifies sensitive fields like customer email, phone, and address using metadata, classifies them, tags them appropriately, and then applies dynamic masking policies. This ensures only authorized users can see PII, while others get masked values — helping us stay compliant with data protection regulations and internal security policies.

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *