Research Engineer Intern - Perception, Vision Language Models
In-person · Santa Clara, CA, US... more
In-person · Santa Clara, CA, US... more
Job Description
As a Research Engineer Intern – Vision-Language Models for E2E Autonomous Driving, you’ll explore the potential of vision-language models to enhance reasoning, scene understanding, and interpretability in end-to-end autonomous driving. You’ll have the opportunity to work towards a publication at a top tier venue by contributing to key areas of model development, including curating both real-world and synthetic training data, fine-tuning foundational vision-language models, and designing robust evaluation frameworks.
Responsibilities:
- Lead model development efforts using vision-language models for end-to-end autonomous driving systems
- Curate high-quality training datasets from both real-world trips and synthetic sources
- Optimize model architectures and fine-tune pre-trained foundational models to enhance performance and adapt to specific challenges
- Design and implement evaluation frameworks to rigorously assess model performance in real-world driving environments
Required Skills:
- Pursuing MS or PhD in CS, EE, mathematics, statistics or related field
- Thorough understanding of deep learning principles and familiarity with vision language models
- 2-3 years experience with implementing and training deep learning models in at least one deep learning framework (PyTorch, Tensorflow, Jax)
Preferred Skills:
- Past experiences in projects involving design, training or fine-tuning of vision language models and familiarity with knowledge distillation, quantization, vLLM
- Past experiences in deep learning projects related to autonomous driving
- Publication record in relevant venues (CVPR, ICLR, ICCV, ECCV, NeurIPS, AAAI, SIGGRAPH)