The SF Climate Week 2025 calendar is now open! 🎉 Register for 200+ events now before they fill up.
Back

Research Engineer Intern - Perception, Vision Language Models

9 days ago
Internship
In-person · Santa Clara, CA, US... more
As a Research Engineer Intern – Vision-Language Models for E2E Autonomous Driving, you’ll explore the potential of vision-language models to enhance reasoning, scene understanding, and interpretability in end-to-end autonomous driving. You’ll have the opportunity to work towards a publication at a top tier venue by contributing to key areas of model development, including curating both real-world and synthetic training data, fine-tuning foundational vision-language models, and designing robust evaluation frameworks.


Responsibilities:

  • Lead model development efforts using vision-language models for end-to-end autonomous driving systems
  • Curate high-quality training datasets from both real-world trips and synthetic sources
  • Optimize model architectures and fine-tune pre-trained foundational models to enhance performance and adapt to specific challenges
  • Design and implement evaluation frameworks to rigorously assess model performance in real-world driving environments

Required Skills:

  • Pursuing MS or PhD in CS, EE, mathematics, statistics or related field
  • Thorough understanding of deep learning principles and familiarity with vision language models
  • 2-3 years experience with implementing and training deep learning models in at least one deep learning framework (PyTorch, Tensorflow, Jax)

Preferred Skills:

  • Past experiences in projects involving design, training or fine-tuning of vision language models and familiarity with knowledge distillation, quantization, vLLM
  • Past experiences in deep learning projects related to autonomous driving 
  • Publication record in relevant venues (CVPR, ICLR, ICCV, ECCV, NeurIPS, AAAI, SIGGRAPH)

Subscribe