We process data from 200,000+ employer career portals and major job boards, running large-scale scraping infrastructure, complex ETL pipelines, and ML enrichment models across billions of records. This is not a typical web app role. You will work directly with one of the largest job market datasets in the world: 900M+ unique postings, each enriched with 82 fields.
What You Will Do
- Build and maintain web scraping systems that collect job postings from thousands of sources using Scrapy, Playwright, and custom crawlers.
- Design and optimize data processing pipelines that clean, deduplicate, and transform raw job postings into structured, enriched records.
- Work with our database layer across PostgreSQL, MongoDB, Redis, Aerospike, and ClickHouse, each serving a specific role in our data architecture.
- Write Python scripts and services for data ingestion, validation, and quality assurance across the pipeline.
- Deploy and monitor you...