1 0
Read Time:3 Minute, 26 Second

Imagine you’re building a data pipeline for a Fintech company that processes millions of payment transactions every day from platforms like Stripe and PayPal. These transactions are constantly flowing into your data warehouse through a streaming ingestion pipeline — for example, S3 → Snowpipe → Snowflake. In such real-time environments, it’s common to deal with late-arriving events, updated statuses, and duplicate records. This blog showcases how to use DBT incremental models with incremental_predicates, merge strategy, and auditable run tracking to handle these challenges efficiently and reliably.

In any real-time data system — especially in Fintech — things rarely arrive perfectly:

  • A transaction might occur on May 22, but due to network retries, it only arrives in your warehouse on May 25.
  • A record marked as PENDING yesterday may be updated to COMPLETED today.
  • Systems may replay events, sending duplicate transaction_ids with newer timestamps.

That means your raw data often contains:

  • Late-arriving records for past dates.
  • Updated records that were already processed.
  • Duplicate rows for the same transaction.

You need a pipeline that:

  • Process only recent transactions (e.g., last 3 days)
  • Merge updates without duplication
  • Avoid full table scans to reduce Snowflake costs
  • Log each DBT run for observability and traceability

Solution: DBT + Snowflake Incremental Model with Two Superpowers

Solution: DBT + Snowflake Incremental Model with Two Superpowers

This is where DBT’s incremental model helps — especially with:

  1. Scans only recent records (past 3 days) using incremental_predicates
  2. Processes only new or updated rows using last_updated with is_incremental()
  3. Merges changes into the target table using incremental_strategy = ‘merge’

This lets us handle late data, updates, and replays — while keeping the pipeline fast, accurate, and cost-efficient.

Data Setup: Raw Payment Transactions

For demo purposes, we’ll use a simplified table RAW_TRANSACTIONS to simulate incoming payments.

CREATE OR REPLACE TABLE materialization_db.PUBLIC.RAW_TRANSACTIONS (
transaction_id STRING,
user_id STRING,
payment_date DATE,
transaction_amount NUMBER,
status STRING,
last_updated TIMESTAMP_NTZ
);

INSERT INTO materialization_db.PUBLIC.RAW_TRANSACTIONS VALUES
('T100', 'U001', '2025-05-22', 1200, 'PENDING', '2025-05-15 08:00:00'),
('T101', 'U002', '2025-05-21', 1300, 'PENDING', '2025-05-18 09:00:00'),
('T102', 'U003', '2025-05-20', 1400, 'COMPLETED', '2025-05-20 10:00:00');

Step 1: DBT View – Deduplicated Source

Source_transactions

Step 2: Incremental DBT Model with Merge + Predicate

incremental_transactions

What This Does:

  • incremental_predicates limits the scan to recent records only.
  • is_incremental() + last_updated filters only new/changed transactions.
  • merge strategy ensures deduplication and updates by transaction_id.

Step 3: Create the Audit Table

CREATE OR REPLACE TABLE materialization_db.PUBLIC.INCREMENTAL_RUN_AUDIT (
model_name STRING,
run_time TIMESTAMP_NTZ,
row_count NUMBER,
min_payment_date DATE,
max_payment_date DATE
);

Step 4: Run Initial DBT Build

dbt run --select source_transactions

dbt run --select incremental_transactions

Because this is the first time the model is built, DBT does a full table load — it processes all records, even if their payment_date is outside the 3-day window.

  • Note:
    incremental_predicates take effect only after the initial run, when the model is already created and DBT recognizes the table as an incremental target

Step 5: Insert New + Updated Transactions

INSERT INTO materialization_db.PUBLIC.RAW_TRANSACTIONS VALUES
('T200', 'U010', '2025-05-29', 2000, 'PENDING', '2025-05-27 10:00:00'), -- new
('T101', 'U002', '2025-05-28', 1300, 'COMPLETED', '2025-05-29 11:00:00'), -- updated
('T300', 'U015', '2025-05-01', 1100, 'FAILED', '2025-05-28 12:00:00'); -- skipped (old payment_date)

Step 6: Incremental DBT Run

dbt run --select incremental_transactions

Why This Pattern Works

Summary

Snowflake Output:

Incremental_Transaction_Tbl
Audit table

By combining incremental_predicates, merge strategy, and audit logging, we’ve built a real-time DBT pipeline that’s both efficient and reliable. This approach ensures only the right data  process— minimizing cost and maximizing accuracy.

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *