
Imagine you’re building a data pipeline for a Fintech company that processes millions of payment transactions every day from platforms like Stripe and PayPal. These transactions are constantly flowing into your data warehouse through a streaming ingestion pipeline — for example, S3 → Snowpipe → Snowflake. In such real-time environments, it’s common to deal with late-arriving events, updated statuses, and duplicate records. This blog showcases how to use DBT incremental models with incremental_predicates
, merge strategy, and auditable run tracking to handle these challenges efficiently and reliably.
In any real-time data system — especially in Fintech — things rarely arrive perfectly:
- A transaction might occur on May 22, but due to network retries, it only arrives in your warehouse on May 25.
- A record marked as PENDING yesterday may be updated to COMPLETED today.
- Systems may replay events, sending duplicate transaction_ids with newer timestamps.
That means your raw data often contains:
- Late-arriving records for past dates.
- Updated records that were already processed.
- Duplicate rows for the same transaction.
You need a pipeline that:
- Process only recent transactions (e.g., last 3 days)
- Merge updates without duplication
- Avoid full table scans to reduce Snowflake costs
- Log each DBT run for observability and traceability
Solution: DBT + Snowflake Incremental Model with Two Superpowers
Solution: DBT + Snowflake Incremental Model with Two Superpowers
This is where DBT’s incremental model helps — especially with:
- Scans only recent records (past 3 days) using incremental_predicates
- Processes only new or updated rows using last_updated with is_incremental()
- Merges changes into the target table using incremental_strategy = ‘merge’
This lets us handle late data, updates, and replays — while keeping the pipeline fast, accurate, and cost-efficient.
Data Setup: Raw Payment Transactions
For demo purposes, we’ll use a simplified table RAW_TRANSACTIONS to simulate incoming payments.
CREATE OR REPLACE TABLE materialization_db.PUBLIC.RAW_TRANSACTIONS (
transaction_id STRING,
user_id STRING,
payment_date DATE,
transaction_amount NUMBER,
status STRING,
last_updated TIMESTAMP_NTZ
);
INSERT INTO materialization_db.PUBLIC.RAW_TRANSACTIONS VALUES
('T100', 'U001', '2025-05-22', 1200, 'PENDING', '2025-05-15 08:00:00'),
('T101', 'U002', '2025-05-21', 1300, 'PENDING', '2025-05-18 09:00:00'),
('T102', 'U003', '2025-05-20', 1400, 'COMPLETED', '2025-05-20 10:00:00');
Step 1: DBT View – Deduplicated Source

Step 2: Incremental DBT Model with Merge + Predicate

What This Does:
- incremental_predicates limits the scan to recent records only.
- is_incremental() + last_updated filters only new/changed transactions.
- merge strategy ensures deduplication and updates by transaction_id.
Step 3: Create the Audit Table
CREATE OR REPLACE TABLE materialization_db.PUBLIC.INCREMENTAL_RUN_AUDIT (
model_name STRING,
run_time TIMESTAMP_NTZ,
row_count NUMBER,
min_payment_date DATE,
max_payment_date DATE
);
Step 4: Run Initial DBT Build
dbt run --select source_transactions
dbt run --select incremental_transactions
Because this is the first time the model is built, DBT does a full table load — it processes all records, even if their payment_date is outside the 3-day window.
- Note:
incremental_predicates take effect only after the initial run, when the model is already created and DBT recognizes the table as an incremental target
Step 5: Insert New + Updated Transactions
INSERT INTO materialization_db.PUBLIC.RAW_TRANSACTIONS VALUES
('T200', 'U010', '2025-05-29', 2000, 'PENDING', '2025-05-27 10:00:00'), -- new
('T101', 'U002', '2025-05-28', 1300, 'COMPLETED', '2025-05-29 11:00:00'), -- updated
('T300', 'U015', '2025-05-01', 1100, 'FAILED', '2025-05-28 12:00:00'); -- skipped (old payment_date)
Step 6: Incremental DBT Run
dbt run --select incremental_transactions
Why This Pattern Works

Snowflake Output:


By combining incremental_predicates
, merge
strategy, and audit logging, we’ve built a real-time DBT pipeline that’s both efficient and reliable. This approach ensures only the right data process— minimizing cost and maximizing accuracy.