Palo Alto, CA

Senior Site Reliability Engineer

We are seeking a Senior Site Reliability Engineer who will join an SRE team, in creating best practices and solutions to keep the Rivian Digital Commerce sites and applications highly available and reliable. This is an exciting role working with software engineering teams from the ground up to build cloud-based solutions using the latest technologies, tools, and practices. The right candidate will be passionate about site reliability and how to serve millions of customers with full automation, and limited downtime.

This is what you’ll do:

  • Work with engineering teams to deliver high quality products and solutions that delight Rivian customers.
  • Work with engineering teams to design robust cloud-based architectures and redundant, fault tolerant solutions utilizing practices around CICD, blue-green deployments, canary testing, and traffic management.
  • Define non-functional requirements (NFRs) for engineering teams around security, logging, monitoring, alerting, configuration, and testing and work with those teams in their implementations of apps and services.
  • Develop runbooks and standard operating procedures (SOPs) for each service and application to ensure DevOps and SRE teams can detect incidents or issues before customers are impacted and act quickly to restore impacted services.
  • Define practices and procedures around postmortems and root cause analysis to ensure service quality and maintainability KPIs are improving and downtime and service interruption are negligible.
  • Work collaboratively with various stake holders to provide team-based solutions, creating a culture of inclusion and diversity of skillsets.
  • Participate in a 24×7 on-call rotation and define and implement on-call practices and procedures.

This is what you’ll need:

  • 5+ years in a technical role in Site Reliability, Operations, Systems Administration, or Cloud Infrastructure.
  • 5+ years of experience being responsible for the uptime and reliability of customer facing web or mobile applications and critical services.
  • 5+ years of experience maintaining and administrating large scale Linux based environments with best practices for security and automation.
  • 5+ years of experience providing and maintaining cloud-based infrastructure such as AWS, GCP, Azure, or internal data center solutions based on VSphere, Openstack etc.
  • 3+ years implementing and maintaining monitoring and alerting systems, creating service level indicators (SLIs), service level objectives (SLOs), and focusing on systems that self-heal or alert teams to take action before system downtime.
  • 3+ years designing and operating fault tolerant systems, with zero to no downtime.
  • Expert knowledge of monitoring systems such as: AppDynamics, New Relic, Prometheus, Grafana, Graphite, Nagios, AWS CloudWatch etc.
  • Knowledge of network architectures, security, and troubleshooting of connectivity or latency issues.
  • Comfortable managing several thousand node deployments and the automation it takes to ensure system uptime and redundancy.
  • Experience with Docker, K8S, AWS Lambda is a plus.
  • Proficiency in writing automation scripts and tools using bash / python / awk etc.
  • Bachelor’s degree in computer science, electrical engineering, information systems or equivalent work experience.

About Rivian

Rivian is on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract. Starting from a clean sheet has allowed us to learn from and leverage the past while giving us the freedom to reimagine mobility from the ground up. Our vehicles are designed and engineered to inspire the exploration of our world in new ways through sustainable technology. Forever starts with us. There’s a magic that occurs when people from different worlds, with different lived experiences and perspectives, surround a challenge from all sides. If it were up to us, every person on the planet would be joining forces to mitigate climate change. The carbonization of our atmosphere presents a great threat to humanity, but with some imagination, and a shared commitment to drastically reducing emissions, a better way forward is possible.To get there, we’re putting our adventurous side to work — inventing technology, building products and designing services and experiences that enable and inspire a shift to renewable energy. Since our earliest days, Rivian has been a magnet for optimistic thinkers and doers who will stop at nothing to leave this planet better than we found it. It takes more than great ideas to make a difference. Bringing them to life — the right way — requires relentless curiosity and a whole lot of heart, not to mention difficult decisions, unexpected turns, redrawn plans and gnawed pencils. Every time we greet the unknown with open arms, what once seemed impossible becomes fun. Digital Commerce Rivian’s Digital Commerce team is responsible for the end-to-end digital experiences outside our vehicles across the web, mobile app and in-store. We build customer-centric features, like the vehicle configurator, interactive maps and a world-class commerce platform that will make learning about and purchasing Electric Adventure Vehicles intuitive, seamless and fun. We are seeking adventurous developers to lead the execution of customer-facing programs and projects from conception-to-launch and drive the development of core technologies needed to enable our commerce platform and mobile app.