End To End Flow
0 0
Read Time:5 Minute, 17 Second

Most AI document demos stop at “Can AI summarize this PDF?” Useful — but nowhere near enough for enterprise teams.Legal doesn’t just need a summary. Procurement doesn’t just need a chatbot answer. Compliance doesn’t just need confirmation a document was read. They need structured, traceable, actionable intelligence:

  • Which contracts have auto-renewal clauses?
  • Which contain SLA commitments or penalty exposure?
  • Which involve cross-region data processing?
  • Which obligations need review by Legal, Procurement, Privacy, or Compliance?
  • Which clauses map to our internal control catalog?

This is where Snowflake Cortex AI becomes powerful.

In this blog, I walk through converting a SaaS contract PDF into a governed Contract Risk Control Tower using Snowflake Stage, Cortex AI_PARSE_DOCUMENT, Cortex AI_COMPLETE, Snowflake Notebook, a custom Python package (.whl), and a final risk intelligence table.

Use Case:

Use Case: Contract Risk Intelligence from a SaaS Agreement

The demo uses software_subscription_sla_betacloud.pdf — a fictional SaaS agreement with BetaCloud containing realistic enterprise clauses:

service availability, service credits, auto-renewal, price escalation, data processing, subprocessors, audit evidence, termination restrictions.

What’s Inside the PDF?

Before writing code, it helps to understand the document from a business lens. The PDF contains.

PDF Details

What Are We Building?

The end-to-end flow:

End To End Flow

Why Snowflake Cortex?

Why Snowflake Cortex?

Cortex powers the intelligence layer in two ways:

  1. Document extraction — AI_PARSE_DOCUMENT reads the contract PDF from a Snowflake stage and extracts page-level text. No manual reading — structured page text lands directly in a Snowflake table.
  2. Obligation extraction — AI_COMPLETE analyzes that text and identifies risk-relevant clauses, returning for each: clause type, clause text, AI risk level, obligation owner, timeline/notice period, and reason for risk. The result is structured contract intelligence.

Why a Custom Python Package?

Cortex extracts meaning — but enterprise systems need standard mapping, not free-form AI output.

Consider: the PDF says “subscription will renew automatically unless sixty days non-renewal notice is given” while your internal control catalog describes it as “contract automatically renews unless cancellation notice is given before expiry.” Same risk, different wording. SQL equality fails. LIKE is too weak.

This is where rapidfuzz (installed via .whl in Snowflake Notebook) comes in — it compares extracted clauses against your control catalog and returns a similarity score.

Step 1: Create Database, Schema, and Stage

First, create a database, schema, and stage to hold the contract PDF.

CREATE OR REPLACE DATABASE CONTRACT_AI_DB;
CREATE OR REPLACE SCHEMA CONTRACT_AI_DB.DOC_INTEL;
CREATE OR REPLACE STAGE CONTRACT_AI_DB.DOC_INTEL.CONTRACT_STAGE
DIRECTORY = (ENABLE = TRUE)
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

Then upload the PDF: software_subscription_sla_betacloud.pdf

Verify the uploaded file:

SELECT
RELATIVE_PATH,
SIZE,
LAST_MODIFIED
FROM DIRECTORY(@CONTRACT_AI_DB.DOC_INTEL.CONTRACT_STAGE);

At this point, Snowflake knows that the PDF exists in the stage.

Step 2: Parse the PDF Using Cortex

Next, we use Cortex AI_PARSE_DOCUMENT to extract content from the PDF.

CREATE OR REPLACE TABLE CONTRACT_AI_DB.DOC_INTEL.CONTRACT_PARSED_RAW AS
SELECT
RELATIVE_PATH AS file_name,
SIZE AS file_size,
LAST_MODIFIED AS last_modified,
AI_PARSE_DOCUMENT(
TO_FILE('@CONTRACT_AI_DB.DOC_INTEL.CONTRACT_STAGE', RELATIVE_PATH),
OBJECT_CONSTRUCT(
'mode', 'LAYOUT',
'page_split', TRUE
)
) AS parsed_doc
FROM DIRECTORY(@CONTRACT_AI_DB.DOC_INTEL.CONTRACT_STAGE);

This step does three important things:

  1. Reads the PDF from the Snowflake stage.
  2. Extracts document content using Cortex.
  3. Stores the parsed output as JSON.

Step:

Step 3: Convert Parsed JSON into Page-Level Text

Now we flatten the parsed document into page-level rows.

CREATE OR REPLACE TABLE CONTRACT_AI_DB.DOC_INTEL.CONTRACT_PAGE_TEXT AS
SELECT
FILE_NAME,
PAGE.VALUE:index::NUMBER + 1 AS PAGE_NUMBER,
PAGE.VALUE:content::STRING AS PAGE_TEXT
FROM CONTRACT_AI_DB.DOC_INTEL.CONTRACT_PARSED_RAW,
LATERAL FLATTEN(INPUT => PARSED_DOC:pages) PAGE;

Step 4: Use Cortex to Extract Contract Obligations

Now we ask Cortex to analyze each page and extract risk-relevant obligations.

Parse

Step 5: Flatten Cortex Output into Rows

If the AI output is valid JSON array text, we flatten it into a proper table.

CREATE OR REPLACE TABLE CONTRACT_AI_DB.DOC_INTEL.CONTRACT_OBLIGATIONS_FLATTENED AS
SELECT
FILE_NAME,
PAGE_NUMBER,
VALUE:clause_type::STRING AS CLAUSE_TYPE,
VALUE:clause_text::STRING AS CLAUSE_TEXT,
VALUE:ai_risk_level::STRING AS AI_RISK_LEVEL,
VALUE:obligation_owner::STRING AS OBLIGATION_OWNER,
VALUE:timeline_or_notice_period::STRING AS TIMELINE_OR_NOTICE_PERIOD,
VALUE:ai_reason::STRING AS AI_REASON
FROM CONTRACT_AI_DB.DOC_INTEL.CONTRACT_OBLIGATIONS_AI_RAW,
LATERAL FLATTEN(INPUT => TRY_PARSE_JSON(AI_OUTPUT));

Flatten Output

Step 6: Create Internal Risk Control Catalog

The control catalog represents the organization’s standard risk taxonomy.

This is not extracted from the PDF.
This is our enterprise reference table.

CREATE OR REPLACE TABLE CONTRACT_AI_DB.DOC_INTEL.RISK_CONTROL_CATALOG (
CONTROL_ID STRING,
RISK_DOMAIN STRING,
CONTROL_PATTERN STRING,
DEFAULT_SEVERITY STRING,
CONTROL_OWNER STRING,
RECOMMENDED_ACTION STRING
);

Control Table

Step 7: Install Custom Python Package Using .whl

In my environment, direct package installation from PyPI was not available in Snowflake Notebook.

So I downloaded the Linux-compatible wheel file for rapidfuzz and uploaded it into the Snowflake Notebook workspace.

whl file upload

Step 8: Match Extracted Clauses with Control Catalog

Now define the matching logic:

Apply the matching:

Notebook Code
Final Tbl

Business Value

  1. Legal Teams

Legal teams can quickly identify clauses related to termination, penalties, service credits, and liability.

  1. Procurement Teams

Procurement teams can track auto-renewal clauses, renewal notice periods, and price increase obligations.

  1. Privacy Teams

Privacy teams can identify contracts involving customer data, subprocessors, and cross-border processing.

  1. Compliance Teams

Compliance teams can track audit evidence requirements and control attestation obligations.

  1. Risk Teams

Risk teams can prioritize review based on AI risk level, control severity, and match confidence.

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *