Large-Scale Data Migration Engineer

Responsible for designing and implementing large-scale data migration and ingestion pipelines to move high-volume data from diverse sources into cloud platforms. Sources include HDFS, relational databases such as MySQL and PostgreSQL, and real-time streaming systems like Kafka. Develop and maintain robust data pipelines using PySpark, ensuring efficient processing of batch and streaming data. Implement automated scheduling mechanisms to orchestrate data workflows on daily and monthly intervals, ensuring reliability and timely data availability. Optimize data ingestion and storage through advanced performance tuning, partitioning, and compaction strategies to handle large-scale datasets efficiently. Ensure data quality, consistency, and fault tolerance across all pipelines. Deploy and manage data processing workloads in Kubernetes, leveraging containerization for scalability, resilience, and resource optimization. Monitor job performance and implement improvements based on system metrics and evolving requirements.

Python

Регистрация