Lead Developer Snowflake + Dbt

Raritan, NJ
Contracted
Experienced

Job Title: Lead Developer Snowflake + Dbt
Location:
 New Jersey - US
Experience: 10–15 Years
Employment Type: Full-time

Job Summary

We are seeking a highly experienced Lead Data Engineer (10+ years) with deep expertise in Snowflake, DBT, Apache Airflow, and StreamSets, and strong hands-on experience in designing enterprise-grade ETL/ELT, data migration, and multi-source ingestion frameworks within the Life Sciences domain.

This role will lead large-scale data platform modernization initiatives including legacy-to-cloud migrations, cross-system integrations, and enterprise data harmonization in regulated environments.

Key Responsibilities

1. Snowflake Architecture & Enterprise Data Platform Design

  • Lead architecture and implementation of scalable Snowflake data platforms:
    • Multi-layered architecture (Landing → Raw → Staging → Curated → Data Marts)
    • Separation of compute and storage optimization
    • Multi-cluster warehouses and workload isolation
  • Design secure cross-account data sharing strategies.
  • Implement:
    • Snowpipe for automated ingestion
    • Streams & Tasks for CDC-based incremental processing
    • Time Travel & Zero-copy cloning for environment management
  • Implement data masking, row-level security, and RBAC frameworks.
  • Optimize storage, partitioning (micro-partition pruning), and query performance.

2. Data Migration & Modernization

  • Lead end-to-end data migration initiatives including:
    • Legacy data warehouse (Teradata, Oracle, SQL Server, Netezza) to Snowflake
    • On-prem to cloud modernization programs
  • Conduct:
    • Source system analysis and profiling
    • Data quality assessment and remediation planning
    • Schema conversion and transformation mapping
  • Design migration frameworks:
    • Bulk historical data loads
    • Incremental migration strategies
    • Parallel-run validation strategies
  • Perform reconciliation and data validation between legacy and target systems.
  • Develop automated validation scripts using SQL and DBT tests.
  • Support cutover planning and production readiness.

3. Data Ingestion & Multi-Source Integration

Design and implement ingestion frameworks for structured, semi-structured, and unstructured data from multiple enterprise systems:

Structured Sources

  • Oracle, SQL Server, SAP, PostgreSQL
  • Clinical systems (EDC, CDMS, CTMS)
  • Regulatory systems (RIM)
  • Commercial systems (CRM, ERP)

Semi-Structured Sources

  • JSON, XML, Avro files
  • API responses
  • External vendor feeds

Unstructured Sources (where applicable)

  • Document metadata ingestion
  • Log and audit trail ingestion

Ingestion Responsibilities

  • Build ingestion pipelines using:
    • StreamSets for batch and streaming ingestion
    • Snowpipe with cloud storage integration (S3/Azure Blob/GCS)
    • API-driven ingestion frameworks
  • Implement CDC mechanisms using:
    • Database log-based CDC
    • Timestamp-based incremental extraction
    • Snowflake Streams
  • Develop metadata-driven ingestion frameworks.
  • Design resilient pipelines with error handling, retry logic, and monitoring.
  • Ensure schema evolution handling and version control.

4. DBT – Enterprise Transformation Framework

  • Architect and govern DBT transformation layers:
    • Staging models
    • Intermediate models
    • Data marts
  • Implement:
    • Incremental models
    • Snapshot strategies for historical tracking
    • Surrogate key management
  • Develop custom macros and reusable transformation components.
  • Implement comprehensive DBT testing framework:
    • Source freshness tests
    • Schema validation tests
    • Business rule validation
  • Generate lineage documentation for audit and regulatory needs.
  • Optimize DBT models specifically for Snowflake compute efficiency.

5. ETL / ELT Orchestration & Automation

  • Design ELT-first architecture leveraging Snowflake processing power.
  • Orchestrate complex workflows using Apache Airflow:
    • DAG dependency management
    • SLA monitoring
    • Automated recovery workflows
  • Implement CI/CD for:
    • DBT deployments
    • Airflow pipelines
    • Snowflake objects
  • Build data observability frameworks (pipeline monitoring, anomaly detection).

6. Enterprise Data Modelling

  • Design scalable data models:
    • Dimensional (Star/Snowflake schemas)
    • Data Vault 2.0 (for auditability and traceability)
    • Canonical data models
  • Align models with Life Sciences business domains:
    • Clinical trial lifecycle
    • Regulatory submissions
    • Pharmacovigilance
    • Commercial analytics
  • Support cross-domain data harmonization.

7. Life Sciences Domain Expertise

Experience delivering data platforms supporting:

  • Clinical trial data (EDC, CDMS, CTMS)
  • Regulatory and submission systems
  • Pharmacovigilance & safety systems
  • Commercial & sales analytics
  • Real-World Evidence (RWE)

Ensure compliance with:

  • GxP validation standards
  • 21 CFR Part 11
  • HIPAA / GDPR
  • ALCOA+ principles

Support audit readiness and regulatory traceability.

Required Qualifications

  • 10+ years of experience in Data Engineering and Enterprise Data Platforms.
  • 4–6+ years hands-on Snowflake implementation experience.
  • Strong experience in:
    • Large-scale data migration programs
    • Multi-source data ingestion frameworks
    • DBT advanced transformation design
    • Apache Airflow orchestration
    • StreamSets ingestion pipelines
  • Advanced SQL expertise.
  • Experience in Life Sciences domain projects.
  • Cloud platform experience (AWS/Azure/GCP).
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*