Senior Data Engineer

Incedo Inc

Hayes Valley, CA, CA permanent IT

Salary & Market Data

Matched to BLS occupational data · California

Job Description

Role Description

We are seeking an experienced Senior Data Engineer with deep expertise in Databricks, PySpark, and cloud data platform delivery. The role involves owning the design and build of production data pipelines on Databricks, leading technical decisions on lakehouse architecture, and working directly with client stakeholders to translate business requirements into working data products.

This is a hands-on, client-facing position based in the US (SSF). You will serve as the technical lead for a life sciences data platform build, guiding architecture, mentoring offshore engineers, and owning delivery quality across the engagement.

Role and Responsibilities

Design and build production data pipelines on Databricks using PySpark, Python, and SQL across Bronze, Silver, and Gold medallion layers.
Define lakehouse architecture patterns including Delta Lake table design, partitioning strategy, and compute optimization using Spark/Photon.
Configure and manage Unity Catalog for data governance, access control, lineage tracking, and audit logging.
Build and maintain ingestion frameworks using Databricks Auto Loader, Lakeflow Connect, and batch/API connectors.
Implement data quality checks, validation rules, and monitoring as part of every pipeline deployment.
Set up CI/CD pipelines for Databricks notebooks and jobs using GitHub, Databricks Repos, and Lakeflow Jobs.
Own technical communication with client stakeholders: architecture walkthroughs, design reviews, sprint demos.
Mentor offshore data engineers on Databricks best practices, code quality, and pipeline design standards.

Requirements

7+ years of hands-on data engineering experience.
2+ years working on Databricks (notebooks, workflows, Delta Lake, Unity Catalog).
2+ years working with PySpark or Python in a pipeline development context.
Experience designing data models for analytical and BI workloads (dimensional models, SCD patterns, medallion architecture).
Solid understanding of CI/CD practices for data pipeline deployments (GitHub, Databricks Repos).
Experience operating in client-facing, consulting, or services delivery environments.
Strong communication skills. Comfortable presenting architecture decisions to both technical and business stakeholders.
Hands-on experience with Jira and Confluence.
Strong understanding of Agile / Scrum methodologies.

Good to Have

Life sciences, pharma, or biotech domain experience.
Experience with Veeva systems, clinical data, or regulatory data environments.
Exposure to data observability tools (Monte Carlo, Databricks Lakehouse Monitoring).
Experience with Vector Search, ML Runtime, or MLFlow on Databricks.
AWS or Azure cloud platform experience alongside Databricks.

Qualifications

A bachelor's degree in Computer Science, Information Systems, Engineering, or a related field. A master's degree may be preferred but is not required.

ATS Score

Important Notice

This listing was syndicated from Adzuna. We strive to keep information accurate, but do not assume responsibility for the content of this posting.

Use the Apply button above to contact the employer directly
Verify the employer and position details before applying
Review our Terms of Service for listing policies