Staff Engineer, Cloud Platform

2 months ago
Full time role
Hybrid · Remote... more

Who we are:

At Glydways, we believe that mobility is a basic human right. Low-cost and ubiquitous access to affordable housing, employment, education, commerce and care lead to economic and social prosperity. As such our goal is to provide:

Public transit with the highest capacity, the best user experience, the lowest cost, and the lowest carbon footprint.

Our solution is a system of interconnected, profitable, and carbon footprint neutral transportation networks that uses standardized autonomous vehicles and a closed roadway. Together, they provide a 24/7 on-demand private mobility service without burdening the public with heavy upfront costs or annual system subsidies.

Meet the team:

The Developer Platform team as a part of Software Platform is responsible for building and maintaining the infrastructure that enables all developers to track, build, deploy and operate our software.  From the build system everyone uses, to the tools used to submit and deploy changes, to the cloud infrastructure they use to operate it - we support the entire development experience.  The Developer Platform team forges the tools that empower the entire software engineering organization while working with engineers from almost every team.

Roles & Responsibilities:

  • Cloud Infrastructure: Lead the design, implementation, & maintenance of scalable, reliable, and secure cloud infrastructure solutions.
  • Infrastructure as Code: Automate deployment, configuration, & management processes to improve efficiency & reliability.
  • Cloud Operations, Optimization, & Scaling: Oversee day-to-day cloud operations, monitoring system performance, & optimizing resource utilization. Develop scaling strategies to accommodate growing workloads and keeping costs low.
  • Security & Compliance: Ensure cloud environments follow security best practices and compliance requirements. Implement and maintain security controls, conduct regular audits, & respond to security incidents.
  • High Availability & Disaster Recovery: Design & implement high availability & disaster recovery solutions to ensure business continuity. Conduct regular testing of disaster recovery plans.
  • Collaboration: Collaborate closely with software engineering teams to identify and implement improvements within the cloud infrastructure to align with business needs. 
  • Observability: Design and implement observability and alerting systems to track down outages & monitor trends. 

Knowledge, Skills and Abilities:

  • Strong working experience with AWS Environments
  • Deep Expertise in Kubernetes, including on-premises deployment and management using core OSS applications such as: CoreDNS, ExternalDNS, Karpenter, Kyverno, Tetragon, Traefik
  • Proficiency in infrastructure with IaC tooling, preferably Terraform
  • Solid understanding of networking, security, and identity management in cloud environments.
  • Strong in scripting / programming languages such as Make, Bash, Python, & Golang.
  • Knowledge of on-call support processes, incident management and monitoring tooling.
  • Familiar with developing infrastructure provisioning pipelines with CI/CD tooling such as Buildkite & ArgoCD
  • Experience building out observability & alerting systems with OpenTelemetry and vendors like Honeycomb

Glydways provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.