Data Engineer

Posted at: 06/27/2025

Cupertino, CA

Full Remote  -  IT - Development / Other Technologies  -  Contract  -  Job ID: 25-14750

ABOUT THIS FEATURED OPPORTUNITY

Join our Data Operations Team as a Python Engineer , supporting machine learning and AI teams that depend on high-quality datasets to train their models. You'll work at the intersection of data engineering, automation, and operational excellence , delivering datasets across approximately 200 projects per year . These include use cases such as image generation, animation, and other generative AI applications . Many projects are highly confidential— engineers must be able to assess data quality and relevance even without full visibility into the end use case .

We're looking for someone who can design and manage data pipelines, debug issues efficiently , and operate independently across multiple fast-paced projects. Strong communication and attention to detail are essential —you'll need to respond quickly, handle issues proactively, and deliver accurate work the first time. Mistakes or rework can pose serious risks to project timelines , so precision and accountability are critical. The ideal candidate will be highly responsive, reliable, and thorough in communication , and must be available to work 9am–4pm PST , even if located in a different state.

 

 

THE OPPORTUNITY FOR YOU

  • Work on 3–4 projects to start , scaling up to 6–10 during peak season
  • Contribute to data collection, annotation, and generation pipelines using Python and distributed systems (Spark)
  • Collaborate with a tight-knit and highly responsive team , engaging in biweekly check-ins with team leads
  • Gain experience with confidential, multimodal, and LLM-related datasets across a high volume of AI/ML projects
  • Influence how large-scale datasets are prepared for training models across an enterprise AI org

 

 

KEY SUCCESS FACTORS

  • 2+ years of experience in data engineering or Python development, with a strong foundation in Computer Science or Data Science
  • Proficiency in distributed systems (e.g., Spark), and solid understanding of multithreading vs. multiprocessing
  • Demonstrated ability to design scalable pipelines , handle diverse data structures, and manage large-scale workflows
  • Comfortable operating under pressure, context-switching across multiple projects, and working with ambiguity

 

 

NICE TO HAVES

  • Familiarity with Airflow , Spark , or Flask for scalable API/UI development
  • Experience with Docker , containerization, and CI/CD tools (e.g., Jenkins)
  • Exposure to LLMs , multi-modal data , or generative AI workflows
  • Prior involvement in designing tools to automate or scale ML data pipelines
  • Ability to collaborate in a high-volume, high-trust environment —your work will power some of the most impactful ML use cases in the organization

 

 

25-14750

MORE OPPORTUNITIES


Deerfield Beach, FL


Bellevue, WA


Cupertino, CA

APPLY NOW

TAKE THE NEXT STEP.

MORE OPPORTUNITIES


Deerfield Beach, FL


Bellevue, WA


Cupertino, CA