13-08-2025 13:29:14
Job_302878
5 - 8 years
We are looking for a Senior Data Engineer to design, build, and optimize large-scale data processing systems supporting healthcare analytics and operational reporting. This role will involve working closely with DataOps, DevOps, and QA teams to enable scalable and reliable data pipelines.
Key Responsibilities
Design and implement ETL/ELT pipelines using Python and PySpark
Develop scalable data workflows using Apache Spark and AWS Glue
Collaborate with QA and DevOps to integrate CI/CD and testing automation
Manage data lake structures and ensure data quality, lineage, and auditability
Optimize and monitor performance of batch and streaming pipelines
Build infrastructure as code (IaC) using tools like Terraform, GitHub Actions
Work across structured, semi-structured, and unstructured healthcare datasets
Required Technical Skills
Core & Deep Knowledge Assessment:
Python
PySpark
SQL (including Window functions and CASE)
AWS Glue, S3, Lambda
Apache Spark
Apache Airflow
Delta Lake/Data Lakehouse Architecture
CI/CD (Terraform, GitHub Actions)
ETL/ELT pipeline design and optimization
Basic Overall Knowledge Assessment:
Kafka
Data modeling and normalization
Unix/Linux
Infrastructure as Code (IaC)
Cloud storage, IAM, and networking fundamentals (AWS)
Git version control
Healthcare data domain knowledge