Data Acquisition & Infrastructure Engineer
Master Art Index
Job Description
About Master Art Index Master Art Index (MAI) is building the definitive intelligence layer for the global art market. We develop AI-driven valuation models, structured databases, and financial-grade data infrastructure for blue-chip modern and contemporary artworks. Our platform sits at the intersection of finance, technology, and art β enabling institutional-quality analysis in one of the world's most opaque asset classes. We are a cross-functional team of AI engineers, computer vision specialists, product managers, and full-stack developers. We move fast, value ownership, and build things that don't yet exist. Role Overview As Data Acquisition & Infrastructure Engineer, you will be the foundation of MAI's data capabilities. Your primary focus will be the development and operation of automated data collection pipelines that aggregate publicly available information from across the art market ecosystem. You will also own the design and maintenance of the underlying database infrastructure β built on PostgreSQL and AWS β that stores and serves this data. This is a high-ownership role.
The infrastructure is partially built; you will take it to production scale. You will work closely with AI engineers, a computer vision engineer, a product manager, full-stack developers, and the Head of Art Research, reporting directly to the Head of AI Engineering. Key Responsibilities DATA PIPELINE & COLLECTION β Architect and maintain automated pipelines that collect, normalize, and ingest publicly available art market data from web-based sources β Build reliable, maintainable collection systems using Python (Scrapy, BeautifulSoup, Playwright, or equivalent), with a strong emphasis on resilience, scheduling, and data freshness β Manage pipeline orchestration and scheduling using tools such as Apache Airflow, AWS EventBridge, or cron β Navigate the practical challenges of large-scale public data collection, including access patterns, rate constraints, and source reliability β Handle messy, inconsistent real-world datasets β clean, transform, and standardize data for downstream consumption DATABASE ENGINEERING β Design, build, and maintain relational database schemas in PostgreSQL (hosted on Amazon RDS) to support complex, multi-entity art market data β artists, works, transactions, provenance, and valuation history β Develop and optimize queries, indexes, and data models to ensure performance at scale β Establish and enforce data quality standards, validation rules, and integrity constraints across the database β Collaborate with AI engineers and the computer vision team to ensure the data layer supports model training and inference requirements INFRASTRUCTURE & OPERATIONS β Deploy and manage pipeline workloads on AWS (Lambda, EC2, S3, RDS) β Monitor pipeline health, data freshness, and system reliability β proactively address failures β Contribute to infrastructure-as-code practices as the team scales Core Requirements β 3β5 years of professional experience in data engineering or a closely related discipline β Proven experience building and maintaining automated data collection pipelines from web-based public sources using Python (Scrapy, BeautifulSoup, Playwright, or Selenium) β Strong data cleaning and normalization skills, with demonstrated ability to handle heterogeneous, inconsistent real-world datasets β Solid PostgreSQL experience: schema design, query optimization, and database maintenance β Hands-on AWS experience: Lambda, EC2, S3, RDS β Experience scheduling and orchestrating data pipelines (Apache Airflow, AWS EventBridge, or equivalent) β Experience navigating the constraints of large-scale public data collection, including reliability, access patterns, and data freshness challenges Nice to Have β Knowledge of data quality frameworks and validation pipeline design β Experience with containerization (Docker) and infrastructure-as-code (Terraform, AWS CDK) β Familiarity with ETL/ELT tooling (dbt, AWS Glue, or equivalent) β Exposure to art market platforms (Christie's, Sotheby's, Artsy, Artnet) or understanding of how auction and gallery data is structured β Background or genuine interest in the art world, collectibles, or alternative asset markets β Experience in a startup or early-stage environment where ownership and adaptability are essential What We Offer β A technically differentiated problem set β art market data is sparse, inconsistent, and largely unstructured; making it machine-readable is genuinely hard β High ownership from day one on infrastructure that is already in use and growing β A lean, senior cross-functional team: no bureaucracy, fast decisions, direct access to leadership β Proximity to cutting-edge AI and computer vision work applied to a non-commoditized domain β Flexible Friday remote work with four collaborative in-office days in Montreal β Competitive base compensation: $80,000 β $90,000 CAD, commensurate with experience Team & Reporting This role reports to the Head of AI Engineering and works in close collaboration with: β AI Engineers β Computer Vision Engineer β Product Manager β Full-Stack Developers β Head of Art Research