29-08-2025 13:37:51
Job_302910
8 - 12 years
We are seeking a proactive and highly skilled Data Engineer with deep expertise in AWS cloud services and Python programming. The ideal candidate will design, build, and maintain scalable, efficient data pipelines and ETL workflows for large datasets using AWS. This role entails working on both batch and real-time processing pipelines, ensuring high performance, durability, and scalability.
Key Responsibilities
Design, create, and maintain robust, scalable data and ETL pipelines in AWS for batch and real-time processing.
Develop event-driven data workflows using AWS EventBridge to orchestrate and automate data pipelines.
Write clean, efficient Python code using libraries like Boto3 and Pandas for data processing and analysis.
Implement containerized deployments using Docker and orchestrate with EKS.
Build and manage streaming and messaging pipelines using Kafka, AWS Kinesis, and AWS Step Functions.
Integrate, optimize, and maintain relational databases like MySQL and PostgreSQL for high-performance data operations.
Utilize big data technologies such as Apache Spark, AWS Glue, and AWS Athena to build and maintain data lakes and lakehouse architectures.
Refactor Python code to improve readability, testability, and performance.
Write automated tests using unittest or pytest frameworks for pipeline robustness.
Monitor and troubleshoot pipeline performance and data workflows using AWS CloudWatch and other monitoring tools.
Maintain secure access and permissions through AWS IAM by managing roles and policies for resources.
Develop and manage APIs and data endpoints using AWS API Gateway.
Collaborate with DevOps teams to implement and maintain CI/CD pipelines for data projects.
Use version control systems like GitHub to manage codebase, conduct code reviews, and enable collaboration.
Required Skills & Experience
Proficient in AWS core services: EventBridge, IAM, API Gateway, Glue, Athena, Step Functions, S3, EC2, etc.
Strong programming skills in Python, including Boto3 and Pandas.
Experience with Kafka, EKS, Docker, and container orchestration.
Solid knowledge of MySQL and PostgreSQL database management and optimization.
Expertise in building and maintaining batch and real-time data pipelines.
Proficient with Apache Spark, AWS Glue, and Athena for ETL and big data processing.
Experience with Python code refactoring and writing automated tests (unittest, pytest).
Strong skills in monitoring, troubleshooting, and performance tuning of data workflows using CloudWatch.
Familiarity with GitHub for version control and collaborative development.
Good to Have Skills
Experience developing Flask APIs in Python.
Familiarity with AWS Timestream, SNS, SQS, and Apache Airflow.
Exposure to AWS Kinesis data streaming.
Experience with CI/CD automation tools and pipelines.
Knowledge of AWS Redshift, ElastiCache Redis, and AWS DMS for data migration, caching, and warehousing.
Soft Skills
Strong problem-solving skills with a focus on performance, scalability, and durability.
Ability to work effectively in fast-paced, collaborative Agile teams.
Good communication skills to work with technical and business stakeholders.