Data Engineer
ALESAYI HOLDING | العيسائي القابضة
Job Description
Purpose of the role Build and maintain all data pipelines feeding the Databricks lakehouse, ensuring clean, timely, and governed data flows from every source across the Investment Division’s five portfolio clusters and, in later phases, the Group’s operating divisions. Own the bronze-to-silver-to-gold transformation logic, data quality monitoring, security master management, and the Document Intelligence Engine’s ingestion and classification pipelines. Key responsibilities Build and maintain automated ingestion pipelines: Addepar API (daily positions, transactions, cash), Capital IQ API (prices, fundamentals, consensus), Canoe (alternative fund documents), eVestment/Mercer (manager database), email (Outlook Graph API).
Implement the medallion architecture: bronze layer (raw immutable data), silver layer (cleaned, conformed, security master mapped, FX normalised, GICS classified, entity ownership resolved), gold layer (analytics-ready tables optimised for ML and SQL queries). Build and maintain the security master: mapping ISIN, CUSIP, SEDOL, Bloomberg ticker, and custodian internal identifiers across all sources, handling corporate actions (mergers, ticker changes, share class conversions). Build the data quality monitoring framework: automated checks for completeness, timeliness, and reconciliation, with exception alerts to the operations analyst.
Build the Document Intelligence Engine ingestion pipeline: receive documents from email, file drop, and Canoe; route to the classification and extraction models (built by the ML engineer); store structured output in gold layer tables and raw text in the vector store. Build the reverse pipeline: push model outputs (factor scores, anomaly flags, risk metrics) from the lakehouse to Addepar via API as custom position attributes. In later phases, build ingestion pipelines for operating division ERP systems (Nama core banking, UMG DMS, hotel PMS, IFM CMMS, electronics POS).
Monitor pipeline health daily: investigate and resolve ingestion failures, data quality alerts, and schema changes from upstream sources. Optimise pipeline performance and cost: right-size compute clusters, implement incremental processing, manage storage lifecycle. Qualifications and experience 5–8 years of data engineering experience, with at least 2 years on Databricks or equivalent Spark-based platforms.
Strong proficiency in Python, SQL, PySpark, and Delta Live Tables. Experience building API integrations and working with financial data feeds (custodian files, market data APIs, fund administrator reports). Understanding of security master management and corporate action processing in financial services.
Experience with Azure Data Factory, Azure Blob Storage, and Azure Functions. Familiarity with document processing pipelines (PDF parsing, OCR, text extraction). Bachelor’s degree in computer science, engineering, or equivalent.
Detail-oriented with strong debugging and problem-solving skills.