Senior Data Engineer - Generative AI
New York - hybrid
$160-180k
We are working with a leading financial services client who are looking for a dynamic and enthusiastic Senior Data Engineer to spearhead the design and execution of our company s modern data management strategy. As a pivotal member of the Enterprise Data team, you will collaborate with a diverse group of engineers, architects, data scientists, and end-users to innovate, design, and construct the next generation of solutions in the cloud. Utilizing both time-tested and cutting-edge technologies and platforms such as AWS and Azure cloud, APIs, and AI, you will play a central role in shaping the future of our data infrastructure.
Key Responsibilities:
- Maintain and grow collaborative relationships across architecture, product management, data, and delivery teams to design and implement solutions addressing defined business challenges.
- Participate in agile scrum teams, ensuring on-time delivery with exceptional quality.
- Develop and maintain data engineering solutions on cloud platforms, leveraging AWS or Azure services.
- Design, implement, and optimize scalable data pipelines and ETL/ELT processes using Python, PySpark, and API integrations.
- Implement robust data quality and validation checks to ensure the accuracy and consistency of data.
- Collaborate closely with data scientists and analysts to understand data requirements and translate them into scalable, high-performing data pipeline solutions.
- Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues.
- Stay abreast of the latest technologies and trends in big data, data engineering, data science, and REST API development, providing recommendations for process improvements.
- Utilize AWS Cloud technologies to support the expansion of Machine Learning/Data Science capabilities, applications, BI/analytics, and cross-functional teams.
- Develop methods and routines to transition data from on-premise systems to the AWS Cloud.
- Build infrastructure for optimal ETL/ELT of data from various sources using SQL and AWS big data technologies.
- Contribute to the development and debugging of code in PySpark/Spark/Python, with a strong emphasis on SQL and query optimization techniques.
- Maintain a high level of expertise in cloud-based technologies, contributing to best practices, standards, and patterns.
Required Skills:
- Proficiency with database technologies for structured and unstructured data storage and retrieval.
- Hands-on experience with AWS Cloud tools such as AWS RDS, Redshift, AWS Glue, AWS S3, AWS Lambda, etc.
- Strong expertise in data engineering and data pipeline development using Python, PySpark, Java, etc.
- Experience with Cloud-native applications and a solid understanding of Cloud architecture and best practices.
- Familiarity with relevant code repository and project tools like GitHub, JIRA, Confluence.
- Working knowledge of Continuous Integration/Continuous Deployment (CI/CD) practices.
- Experience with highly scalable data stores, Data Lake, Data Warehouse, and unstructured datasets.
- Proficiency in data integration, data processing, data streaming, and message queuing techniques.
- Detail-oriented with excellent analytical, problem-solving, and organizational skills.
- Ability to communicate effectively with both technical and business teams.
Desired Skills:
- Understanding of the insurance, wealth management, or financial services industry.
- Proficiency in scripting tools on multiple platforms.
- Extensive development experience with Cloud (AWS, Azure) and Storage services.
- Experience in developing solutions leveraging Large Language Models (LLMs) and machine learning pipelines.
- Strong hands-on experience with API integrations.