DBT Incremental Models with Snowflake: Smart Filtering and Auditable Runs

Read Time:3 Minute, 26 Second

Imagine you’re building a data pipeline for a Fintech company that processes millions of payment transactions every day from platforms like Stripe and PayPal. These transactions are constantly flowing into your data warehouse through a streaming ingestion pipeline — for example, S3 → Snowpipe → Snowflake. In such real-time environments, it’s common to deal with late-arriving events, updated statuses, and duplicate records. This blog showcases how to use DBT incremental models with incremental_predicates, merge strategy, and auditable run tracking to handle these challenges efficiently and reliably.

In any real-time data system — especially in Fintech — things rarely arrive perfectly:

A transaction might occur on May 22, but due to network retries, it only arrives in your warehouse on May 25.
A record marked as PENDING yesterday may be updated to COMPLETED today.
Systems may replay events, sending duplicate transaction_ids with newer timestamps.

That means your raw data often contains:

Late-arriving records for past dates.
Updated records that were already processed.
Duplicate rows for the same transaction.

You need a pipeline that:

Process only recent transactions (e.g., last 3 days)
Merge updates without duplication
Avoid full table scans to reduce Snowflake costs
Log each DBT run for observability and traceability

Solution: DBT + Snowflake Incremental Model with Two Superpowers

Solution: DBT + Snowflake Incremental Model with Two Superpowers

This is where DBT’s incremental model helps — especially with:

Scans only recent records (past 3 days) using incremental_predicates
Processes only new or updated rows using last_updated with is_incremental()
Merges changes into the target table using incremental_strategy = ‘merge’

This lets us handle late data, updates, and replays — while keeping the pipeline fast, accurate, and cost-efficient.

Data Setup: Raw Payment Transactions

For demo purposes, we’ll use a simplified table RAW_TRANSACTIONS to simulate incoming payments.

CREATE OR REPLACE TABLE materialization_db.PUBLIC.RAW_TRANSACTIONS ( transaction_id STRING, user_id STRING, payment_date DATE, transaction_amount NUMBER, status STRING, last_updated TIMESTAMP_NTZ );

INSERT INTO materialization_db.PUBLIC.RAW_TRANSACTIONS VALUES ('T100', 'U001', '2025-05-22', 1200, 'PENDING', '2025-05-15 08:00:00'), ('T101', 'U002', '2025-05-21', 1300, 'PENDING', '2025-05-18 09:00:00'), ('T102', 'U003', '2025-05-20', 1400, 'COMPLETED', '2025-05-20 10:00:00');

Step 1: DBT View – Deduplicated Source

Step 2: Incremental DBT Model with Merge + Predicate

What This Does:

incremental_predicates limits the scan to recent records only.
is_incremental() + last_updated filters only new/changed transactions.
merge strategy ensures deduplication and updates by transaction_id.

Step 3: Create the Audit Table

CREATE OR REPLACE TABLE materialization_db.PUBLIC.INCREMENTAL_RUN_AUDIT ( model_name STRING, run_time TIMESTAMP_NTZ, row_count NUMBER, min_payment_date DATE, max_payment_date DATE );

Step 4: Run Initial DBT Build

dbt run --select source_transactions

dbt run --select incremental_transactions

Because this is the first time the model is built, DBT does a full table load — it processes all records, even if their payment_date is outside the 3-day window.

Note:
incremental_predicates take effect only after the initial run, when the model is already created and DBT recognizes the table as an incremental target

Step 5: Insert New + Updated Transactions

INSERT INTO materialization_db.PUBLIC.RAW_TRANSACTIONS VALUES ('T200', 'U010', '2025-05-29', 2000, 'PENDING', '2025-05-27 10:00:00'), -- new ('T101', 'U002', '2025-05-28', 1300, 'COMPLETED', '2025-05-29 11:00:00'), -- updated ('T300', 'U015', '2025-05-01', 1100, 'FAILED', '2025-05-28 12:00:00'); -- skipped (old payment_date)

Step 6: Incremental DBT Run

dbt run --select incremental_transactions

Why This Pattern Works

Snowflake Output:

By combining incremental_predicates, merge strategy, and audit logging, we’ve built a real-time DBT pipeline that’s both efficient and reliable. This approach ensures only the right data process— minimizing cost and maximizing accuracy.