Site Reliability Engineer

about 2 years ago
Full time role
Seattle, WA, US... more
Seattle, WA, US... more

Job Description

Convoy is transforming the $800 billion trucking industry, where even 10 years ago, most truck drivers did not have smartphones and paper driver logs were common. In 2015, Convoy launched the digital freight movement with an open and fully connected marketplace for brands and manufacturers to work directly with freight carriers who move truckload shipments throughout the country. This ‘Digital Freight Network’ is powered by machine learning and automation, and supported by a broad, ever-growing set of features that benefit both sides of the marketplace.
We’ve been honored to work with some great companies, such as Unilever, The Home Depot, and Procter & Gamble. We’ve also been backed by world-class investors, including Google, YCombinator, Fidelity, Greylock Generation, Lone Pine, T. Rowe Price, Baillie Gifford and the founders and CEOs of Amazon, Salesforce, eBay, LinkedIn, Expedia, Dropbox, Starbucks, and others. We are proud to have been named a CNBC Disruptor 50 Winner (3x), a Fast Company World Changing Idea, a LinkedIn Top Startup, a Forbes' Best Startup Employer, one of Fortune Magazine's 'Impact 20' list, a best place to work in Washington State, a BloombergNEF Pioneer winner, and more.
This is your opportunity to collaborate with an incredible group of people and help transform the freight industry. Join Convoy and help us transport the world with endless capacity and zero waste.

The Foundation Platforms team builds the infrastructure that enables teams to build, test, deploy, host, debug and monitor their systems. We ensure that engineering teams are able to build. We own the compute platform, online datastores, monitoring and metrics systems, infrastructure orchestration, test infrastructure, developer tooling, and the general reliability program for Convoy. The members of our team are highly skilled at delivery, designing for resilience, software development, and operational best practices. We help teams understand stability and scalability in order to maintain high feature development velocity in robust and reliable services. We guide the rest of engineering and provide them with platforms that offer infrastructure best practices. 
About you: Foundation Platforms Engineers are eager to build wide-ranging platforms that impact all teams at Convoy and have a passion for providing systems that are resilient, safe, and capable of scaling to meet the demands of an always-on internet-based product. You have experience scaling distributed systems and an understanding of common failure cases and strategies used to avoid them. You have the software engineering expertise and are capable of modifying open source solutions or building new platforms when there is a need. You have experience leveraging open source work to assemble new platforms to meet the needs of quickly-growing products. You have good communication skills that you’ll use to educate product teams on the platforms and systems we create and how they can be leveraged to make their jobs easier. You are willing to dive deeply into problems and participate in the operational excellence program including root cause analysis and working directly with service teams and orgs to remediate and prevent incidents. 
These are some of the technologies that we work with. Don’t worry if you are not familiar with all of them. This is a broad list and not all members of the team interact with all systems. This is simply intended to give you an understanding of the spaces in which we operate and technologies that you’ll be able to work with:

  • Operating Systems: Linux
  • Datastores: PostgresDB (RDS), DynamoDB, Redis, ElasticSearch
  • Compute: Kubernetes, Docker, Containers, Elastic Container Service (legacy)
  • Monitoring: Datadog, Datadog APM, Cloudwatch
  • Logging: ElasticSearch, Kibana
  • Primary Programming Languages (what we write): Typescript/NodeJS and Python
  • Secondary Programming Languages (what we sometimes use to augment open source systems and support teams): Rust, Java, Go

You Will:

  • Help drive the technical direction and set team goals to improve the infrastructure at Convoy
  • Be familiar with industry-wide trends to help assess and develop new technologies
  • Build production hosting systems at Convoy including extending open source solutions and designing new systems
  • Be the company stakeholder for reliability and advisor for other engineers throughout Convoy
  • Mentor and develop engineers
  • Write software that ranges from hosting platforms, metrics collection, log analysis, developer tooling 

We’re looking for someone who has:

  • Experience leading projects and delivering impact
  • Experience building infrastructure systemsPassion for raising the quality bar and working across teams to ensure they have and are using the tools necessary to meet that bar
  • Experience performing root cause analysis in a distributed environment
  • Experience with some of the following: Kubernetes, Spinnaker, Containers, Monitoring and Metrics systems, Postgres, Redis and Terraform

Benefits: Employees' wellbeing is top of mind for the Convoy team. Outside of offering excellent medical, dental, and vision benefits, we also offer the following: 
* On demand mental and emotional health benefits through Lyra * On demand primary care through 98.6* Generous paid time off* Paid parental leave program * Fertility benefit solutions via Progyny * Child-care and adult/elder-care options through Bright Horizons* Opportunity to join and contribute to one of our Employee Resource Groups* Ability to make a real world impact!
Convoy is an equal-opportunity employer and we welcome applicants from all backgrounds. If you’re a passionate team player who wants to have an outsized impact on a diverse and dynamic team, we’d love to hear from you!

Similar jobs