Lead Developer Snowflake + Dbt
Job Title: Lead Developer Snowflake + Dbt
Location: New Jersey - US
Experience: 10–15 Years
Employment Type: Full-time
Job Summary
We are seeking a highly experienced Lead Data Engineer (10+ years) with deep expertise in Snowflake, DBT, Apache Airflow, and StreamSets, and strong hands-on experience in designing enterprise-grade ETL/ELT, data migration, and multi-source ingestion frameworks within the Life Sciences domain.
This role will lead large-scale data platform modernization initiatives including legacy-to-cloud migrations, cross-system integrations, and enterprise data harmonization in regulated environments.
Key Responsibilities
1. Snowflake Architecture & Enterprise Data Platform Design
- Lead architecture and implementation of scalable Snowflake data platforms:
- Multi-layered architecture (Landing → Raw → Staging → Curated → Data Marts)
- Separation of compute and storage optimization
- Multi-cluster warehouses and workload isolation
- Design secure cross-account data sharing strategies.
- Implement:
- Snowpipe for automated ingestion
- Streams & Tasks for CDC-based incremental processing
- Time Travel & Zero-copy cloning for environment management
- Implement data masking, row-level security, and RBAC frameworks.
- Optimize storage, partitioning (micro-partition pruning), and query performance.
2. Data Migration & Modernization
- Lead end-to-end data migration initiatives including:
- Legacy data warehouse (Teradata, Oracle, SQL Server, Netezza) to Snowflake
- On-prem to cloud modernization programs
- Conduct:
- Source system analysis and profiling
- Data quality assessment and remediation planning
- Schema conversion and transformation mapping
- Design migration frameworks:
- Bulk historical data loads
- Incremental migration strategies
- Parallel-run validation strategies
- Perform reconciliation and data validation between legacy and target systems.
- Develop automated validation scripts using SQL and DBT tests.
- Support cutover planning and production readiness.
3. Data Ingestion & Multi-Source Integration
Design and implement ingestion frameworks for structured, semi-structured, and unstructured data from multiple enterprise systems:
Structured Sources
- Oracle, SQL Server, SAP, PostgreSQL
- Clinical systems (EDC, CDMS, CTMS)
- Regulatory systems (RIM)
- Commercial systems (CRM, ERP)
Semi-Structured Sources
- JSON, XML, Avro files
- API responses
- External vendor feeds
Unstructured Sources (where applicable)
- Document metadata ingestion
- Log and audit trail ingestion
Ingestion Responsibilities
- Build ingestion pipelines using:
- StreamSets for batch and streaming ingestion
- Snowpipe with cloud storage integration (S3/Azure Blob/GCS)
- API-driven ingestion frameworks
- Implement CDC mechanisms using:
- Database log-based CDC
- Timestamp-based incremental extraction
- Snowflake Streams
- Develop metadata-driven ingestion frameworks.
- Design resilient pipelines with error handling, retry logic, and monitoring.
- Ensure schema evolution handling and version control.
4. DBT – Enterprise Transformation Framework
- Architect and govern DBT transformation layers:
- Staging models
- Intermediate models
- Data marts
- Implement:
- Incremental models
- Snapshot strategies for historical tracking
- Surrogate key management
- Develop custom macros and reusable transformation components.
- Implement comprehensive DBT testing framework:
- Source freshness tests
- Schema validation tests
- Business rule validation
- Generate lineage documentation for audit and regulatory needs.
- Optimize DBT models specifically for Snowflake compute efficiency.
5. ETL / ELT Orchestration & Automation
- Design ELT-first architecture leveraging Snowflake processing power.
- Orchestrate complex workflows using Apache Airflow:
- DAG dependency management
- SLA monitoring
- Automated recovery workflows
- Implement CI/CD for:
- DBT deployments
- Airflow pipelines
- Snowflake objects
- Build data observability frameworks (pipeline monitoring, anomaly detection).
6. Enterprise Data Modelling
- Design scalable data models:
- Dimensional (Star/Snowflake schemas)
- Data Vault 2.0 (for auditability and traceability)
- Canonical data models
- Align models with Life Sciences business domains:
- Clinical trial lifecycle
- Regulatory submissions
- Pharmacovigilance
- Commercial analytics
- Support cross-domain data harmonization.
7. Life Sciences Domain Expertise
Experience delivering data platforms supporting:
- Clinical trial data (EDC, CDMS, CTMS)
- Regulatory and submission systems
- Pharmacovigilance & safety systems
- Commercial & sales analytics
- Real-World Evidence (RWE)
Ensure compliance with:
- GxP validation standards
- 21 CFR Part 11
- HIPAA / GDPR
- ALCOA+ principles
Support audit readiness and regulatory traceability.
Required Qualifications
- 10+ years of experience in Data Engineering and Enterprise Data Platforms.
- 4–6+ years hands-on Snowflake implementation experience.
- Strong experience in:
- Large-scale data migration programs
- Multi-source data ingestion frameworks
- DBT advanced transformation design
- Apache Airflow orchestration
- StreamSets ingestion pipelines
- Advanced SQL expertise.
- Experience in Life Sciences domain projects.
- Cloud platform experience (AWS/Azure/GCP).