Read Time:7 Minute, 46 Second

PDF Resumes to AI-Powered Hiring Decisions: Recruitment teams often work with two very different types of data at the same time:

Unstructured data like resumes in PDF format
Structured data like job postings, roles, and required skills

Traditionally, this kind of use case would require multiple systems: a document parser, an NLP pipeline, a vector database, an LLM orchestration layer, and then a dashboard on top.But with Snowflake Cortex AI functions, we can design this solution entirely inside Snowflake.

In this blog, We will walk through a working implementation of Smart Hire, an AI-powered resume screening and job matching engine built with:

AI_PARSE_DOCUMENT
AI_EXTRACT
AI_CLASSIFY
AI_EMBED
VECTOR_COSINE_SIMILARITY
AI_COMPLETE
CORTEX SEARCH SERVICE
Streamlit in Snowflake

The Problem We Are Solving

A recruiter typically wants answers to questions like:

Which candidates are the best fit for a specific job?
Can I automatically extract name, email, experience, and skills from resumes?
Can I classify candidates by seniority?
Can I semantically match resumes with job descriptions instead of relying only on keyword matching?
Can recruiters consume this through a UI instead of writing SQL?

That means we need to solve four technical problems:

Read and parse resumes in PDF format
Convert resume text into structured candidate attributes
Compare candidate profiles against job requirements
Present results in a recruiter-friendly form

To make this maintainable, I split the design into three logical layers:

RAW
ENRICHED
MATCHING

Solution Architecture

The flow looks like this:

Step 0 — Database, Schemas, and Resume Stage

CREATE DATABASE IF NOT EXISTS SMART_HIRE;

CREATE SCHEMA IF NOT EXISTS SMART_HIRE.RAW; CREATE SCHEMA IF NOT EXISTS SMART_HIRE.ENRICHED; CREATE SCHEMA IF NOT EXISTS SMART_HIRE.MATCHING;

CREATE OR REPLACE STAGE SMART_HIRE.RAW.RESUMES DIRECTORY = (ENABLE = TRUE) ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

Step 1 — Upload Resume PDFs

Step 2 — Store Job Postings as Structured Input

CREATE OR REPLACE TABLE SMART_HIRE.RAW.JOB_POSTINGS (

JOB_ID VARCHAR PRIMARY KEY,

JOB_TITLE VARCHAR,

DEPARTMENT VARCHAR,

LOCATION VARCHAR,

JOB_TYPE VARCHAR,

EXPERIENCE_REQ VARCHAR,

SALARY_RANGE VARCHAR,

JOB_DESCRIPTION VARCHAR,

REQUIRED_SKILLS VARCHAR,

POSTED_DATE DATE,

STATUS VARCHAR DEFAULT 'OPEN'

);

Why we did this

A matching engine always needs two sides:

candidate information
job requirements

Resumes arrive as unstructured PDFs, but job postings are already structured enough to store directly in a relational table.

comparison.

Step 3 — Parse Resume PDFs into Text

CREATE OR REPLACE TABLE SMART_HIRE.RAW.PARSED_RESUMES AS

SELECT

RELATIVE_PATH AS FILE_NAME,

SIZE AS FILE_SIZE_BYTES,

LAST_MODIFIED AS UPLOAD_TIMESTAMP,

AI_PARSE_DOCUMENT(

TO_FILE('@SMART_HIRE.RAW.RESUMES', RELATIVE_PATH),

{'mode': 'LAYOUT'}

) AS PARSED_CONTENT,

PARSED_CONTENT:content::VARCHAR AS RESUME_TEXT

FROM DIRECTORY(@SMART_HIRE.RAW.RESUMES)

WHERE RELATIVE_PATH ILIKE '%.pdf';

Why we did this

This is the first major transformation in the pipeline.

Recruiters do not upload JSON or rows. They upload resumes, and resumes are usually PDFs. So the first challenge is: how do we turn a document into usable text without leaving Snowflake?

That is exactly why AI_PARSE_DOCUMENT is so valuable here.Because it lets us extract the textual content of a PDF inside Snowflake.

Step 4 — Build the Candidate Registry

CREATE OR REPLACE TABLE SMART_HIRE.RAW.CANDIDATES AS

WITH base AS (

SELECT

'CAND-' || LPAD(ROW_NUMBER() OVER (ORDER BY FILE_NAME)::VARCHAR, 4, '0') AS CANDIDATE_ID,

FILE_NAME,

UPLOAD_TIMESTAMP AS APPLIED_DATE,

AI_EXTRACT(

text => RESUME_TEXT,

responseFormat => {

'full_name': 'Full name of the candidate',

'email': 'Email address',

'phone': 'Phone number if available',

'linkedin': 'LinkedIn URL if available'

}

) AS CONTACT_INFO,

RESUME_TEXT

FROM SMART_HIRE.RAW.PARSED_RESUMES

)

SELECT

CANDIDATE_ID,

FILE_NAME,

APPLIED_DATE,

CONTACT_INFO,

CONTACT_INFO:response:full_name::VARCHAR AS FULL_NAME,

CONTACT_INFO:response:email::VARCHAR AS EMAIL,

CONTACT_INFO:response:phone::VARCHAR AS PHONE,

CONTACT_INFO:response:linkedin::VARCHAR AS LINKEDIN,

CONTACT_INFO:error::VARCHAR AS EXTRACT_ERROR,

RESUME_TEXT

FROM base;

Why we did this

At this stage, the resume is still just text. That is useful, but it is not recruiter-friendly yet.A recruiter wants immediately identifiable fields like:

name
email
phone
LinkedIn

This is why I created a separate CANDIDATES table.

Step 5 — Extract Deeper Candidate Attributes

CREATE OR REPLACE TABLE SMART_HIRE.ENRICHED.CANDIDATE_PROFILES AS

SELECT

c.CANDIDATE_ID,

c.FULL_NAME,

c.EMAIL,

AI_EXTRACT(

text => c.RESUME_TEXT,

responseFormat => {

'technical_skills': 'List all technical tools, languages, and frameworks mentioned',

'years_experience': 'Total years of professional experience as a number',

'highest_education': 'Highest degree and university name',

'most_recent_role': 'Most recent job title',

'most_recent_company': 'Most recent employer name',

'key_achievements': 'Top 3 quantifiable achievements with metrics'

}

)::VARIANT AS EXTRACTED_PROFILE,

c.RESUME_TEXT,

c.APPLIED_DATE

FROM SMART_HIRE.RAW.CANDIDATES c;

Why we did this

Contact information is not enough to rank candidates. We need deeper business attributes that a recruiter would actually care about.

This table enriches each candidate profile with hiring-relevant information.

Step 6 — Classify Candidate Seniority

CREATE OR REPLACE TABLE SMART_HIRE.ENRICHED.CANDIDATE_SENIORITY AS

SELECT

c.CANDIDATE_ID,

c.FULL_NAME,

AI_CLASSIFY(

c.RESUME_TEXT,

['Entry Level (0-1 years)', 'Junior (1-3 years)', 'Mid-Level (3-5 years)', 'Senior (5-8 years)', 'Staff/Principal (14+ years)']

) AS SENIORITY_CLASSIFICATION

FROM SMART_HIRE.RAW.CANDIDATES c;

Why we did this

Years of experience as a raw number is useful, but recruiters also think in business labels like:

Junior
Mid-Level
Senior
Staff/Principal

This step translates raw resume content into one of those categories.

Step 7 — Generate Embeddings for Semantic Matching

CREATE OR REPLACE TABLE SMART_HIRE.ENRICHED.RESUME_EMBEDDINGS AS SELECT CANDIDATE_ID, FULL_NAME, AI_EMBED('snowflake-arctic-embed-l-v2.0', RESUME_TEXT) AS RESUME_VECTOR FROM SMART_HIRE.RAW.CANDIDATES;

CREATE OR REPLACE TABLE SMART_HIRE.ENRICHED.JOB_EMBEDDINGS AS SELECT JOB_ID, JOB_TITLE, AI_EMBED( 'snowflake-arctic-embed-l-v2.0', JOB_TITLE || ' ' || JOB_DESCRIPTION || ' ' || REQUIRED_SKILLS ) AS JOB_VECTOR FROM SMART_HIRE.RAW.JOB_POSTINGS;

Why embed both sides?

Because matching is a comparison problem. We need both:

candidate representation
job representation

Step 8 — Calculate Candidate-to-Job Similarity

CREATE OR REPLACE TABLE SMART_HIRE.MATCHING.SIMILARITY_SCORES AS SELECT r.CANDIDATE_ID, r.FULL_NAME, j.JOB_ID, j.JOB_TITLE, ROUND(VECTOR_COSINE_SIMILARITY(r.RESUME_VECTOR, j.JOB_VECTOR)::NUMERIC, 4) AS MATCH_SCORE FROM SMART_HIRE.ENRICHED.RESUME_EMBEDDINGS r CROSS JOIN SMART_HIRE.ENRICHED.JOB_EMBEDDINGS j;

Why we did this

Now that we have semantic vectors, we need a measurable score between each candidate and each job.

That is what VECTOR_COSINE_SIMILARITY gives us.

Why `CROSS JOIN`?

Because at this stage, the goal is to compare every candidate against every job. This gives us a full candidate-job scoring matrix.

Step 9 — Generate Recruiter-Friendly Summaries

CREATE OR REPLACE TABLE SMART_HIRE.MATCHING.MATCH_SUMMARIES AS WITH TOP_MATCHES AS ( SELECT * FROM SMART_HIRE.MATCHING.SIMILARITY_SCORES QUALIFY ROW_NUMBER() OVER (PARTITION BY JOB_ID ORDER BY MATCH_SCORE DESC) <= 3 ) SELECT tm.CANDIDATE_ID, tm.FULL_NAME, tm.JOB_ID, tm.JOB_TITLE, tm.MATCH_SCORE, AI_COMPLETE( 'mistral-large2', 'You are an expert HR recruiter. Given the job description and candidate resume below, write a concise 3-sentence assessment explaining: 1) Why this candidate is or is not a good fit 2) Their strongest relevant qualification 3) Any gap or concern

JOB: ' || jp.JOB_TITLE || ' — ' || jp.JOB_DESCRIPTION || ' REQUIRED SKILLS: ' || jp.REQUIRED_SKILLS || '

CANDIDATE RESUME: ' || c.RESUME_TEXT )::VARCHAR AS RECRUITER_SUMMARY FROM TOP_MATCHES tm JOIN SMART_HIRE.RAW.JOB_POSTINGS jp ON tm.JOB_ID = jp.JOB_ID JOIN SMART_HIRE.RAW.CANDIDATES c ON tm.CANDIDATE_ID = c.CANDIDATE_ID;