Data Engineering Pipeline

Job Search Intelligence Pipeline

SerpAPI · dlt · Dagster · dbt · Snowflake · Medallion Architecture

Source
🔍 SerpAPI · Google Jobs Live API
Queries: Data Engineer · Analytics Engineer · Business Analyst
Date filter: after:YYYY-MM-DD appended dynamically · conserves API credits · no stale listings
dlt · schema inference · Snowflake load
Bronze
🥉 Raw Landing Zone · Snowflake raw_jobs
JSON flattened by dlt · no transformation applied · orchestrated by Dagster daily
ingest.py asset · absolute imports · scheduled via Dagster definitions
dbt run · stg_jobs.sql · transform.py asset
Silver
🥈 Staging View · Intelligence Layer stg_jobs
Four intelligence signals extracted from raw job data
🎯
Seniority flag
REGEXP_LIKE title
Sr · Lead · Principal
Staff · Director · VP
📅
Experience extract
POSIX ERE regex
Snowflake-safe
TRY_TO_NUMBER
🛂
Sponsorship flags
is_citizen_only
is_potential_lead
LIKE pattern match
🧹
Schema enforce
Type casting
Null handling
Timestamp added
dbt run · fct_jobs.sql · tiered filters applied
Gold
🥇 Curated Strike List · Fact Table fct_jobs
Tiered filters applied by role category · deduplicated · ranked by discovery time
Experience thresholds
DE / AE: 1 to 3 years
BA / Ops: up to 6 years
NULL experience included
Sponsorship filter
Exclude citizen-only roles
Exclude no-sponsorship
Exclude senior titles
Deduplication
ROW_NUMBER() QUALIFY
Partition by job_key
Latest discovery wins
Output
📋 Daily Strike List · GOLD.FCT_JOBS Refreshed daily
Curated, deduplicated, sponsorship-friendly job leads ready for daily review
Columns delivered
job_title · company_name
job_location · search_category
min_experience_years
posted_at_relative
discovery_time · apply_link
Daily operations
Run: Dagster Materialize All
View: GOLD.FCT_JOBS
Update filters: edit .sql files
Cost: zero Snowflake credits
for filter-only changes
SerpAPI dlt Dagster dbt Snowflake Medallion Architecture POSIX ERE Python
💡 Built to solve a real problem — filter at the source, apply intelligence in Silver, deliver clean leads in Gold. Every engineering decision in this pipeline was driven by a real error, a real cost constraint, or a real limitation discovered during development.