Site Reliability Engineer

over 2 years ago
Full time role
Remote... more
Remote... more

Company

Voltus is a fully remote clean energy technology platform. Our mission is to be the Distributed Energy Platform that fulfills ...

View Company Profile

Job Description

Are you interested in building the technical foundation of the worldwide transition to clean energy? Do you enjoy working with a highly motivated and talented team to deliver mission critical software? Voltus is growing our Site Reliability Engineering [or “Platform”] team to help deploy, manage, troubleshoot, and enhance our Platform and tools for its internal and external customers.

As a Site Reliability Engineer you will be responsible for deploying and maintaining our core Platform, which consists of Hashicorp’s Nomad, Consul, and Vault systems in AWS. In addition, you will help manage and maintain our monitoring systems, which currently include Prometheus and Datadog. 

You will build innovative automated solutions and tools to help debug and resolve problems in production and prevent them from recurring. Further, you will proactively seek out system weaknesses and find ways to fix them before they cause production issues using monitoring data, logs, and watching trends.

Responsibilities

  • Keeping our core Platform (Nomad, Consul, Vault) up and running and performing optimally.
  • Working closely with internal partners and teams to ensure that we ship software that meets security, SLA, and performance requirements.
  • Writing, updating, and using documentation, including runbooks/playbooks
  • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
  • Debugging complex problems across an entire stack and creating solid solutions
  • Developing CI/CD processes to improve cadence.

Key Skills and Attributes

Required

  • 5 years experience with software engineering, software development, or system operations.
  • Excellent communication skills, both verbal and written.
  • Knows their way around a Unix/Linux shell, can write shell scripts, and understands Linux internals.
  • Experience debugging complex problems.
  • Experience designing, building, and operating large-scale production systems
  • Knows Python, Java, Go, Rust, or similar.
  • Understands networking and messaging, especially between services.
  • Has hands-on experience using source control (Git, GitHub) and feature branching strategies.
  • Has experience with a variety of open-source databases (MySQL, Postgres, Redis, Cassandra, etc.).
  • Intellectually curious and always wants to learn more.

Preferred

  • Experience with DevOps engineering or SRE.
  • Experience with the Hashistack (Vault, Consul, Nomad).
  • Experience with containers, such as with Docker.
  • Experience with monitoring and observability such as with Datadog, Prometheus or similar.
  • Build systems such as Make, Bazel, or similar.
  • Experience automating infrastructure, testing, and deployments using tools like Ansible, Chef, or Terraform and can explain the Infrastructure as Code paradigm.
  • Understands the difference between provisioning and configuration management.

Similar jobs





Voltus is a fully remote clean energy technology platform. Our mission is to be the Distributed Energy Platform that fulfills ...

View Company Profile

Similar jobs