Westwing Home & Living

Munich, DE

Site Reliability Engineer - Observability (m/f/d)

The Westwing mission is to inspire and make every home a beautiful home. We are on a quest to bring the wonders of eCommerce for home & living to our more than 1.5 million active customers in 11 countries. In only 10 years on the market, we have grown to more than 433 million Euros in annual revenue. More importantly, the opportunity ahead of us is massive.

The Westwing team is the secret to our success. Our more than 1,700 team members balance creativity and style with innovative technology and strong business fundamentals. We are proud to be working with inspiring colleagues who are smart, fun, ambitious, and looking for the challenge to transform an industry and take it into the future.

Do you think this could be you? To pursue our ambitious expansion strategy, we are looking for a Site Reliability Engineer (m/f/d) to join our tech department.

What we are looking for:

If your everyday passion is to design, build, perfect and evolve highly resilient and scalable hybrid solutions. And you have experience as Dev, DevOps, SRE or SysOps for data-centers, cloud, and clustered environments. With experience in continuously assessing risks and innovating to increase the 99.99% availability score of any platform. And you are looking for an opportunity that will allow you to make a global impact in IT operations. You are what we are looking for!

What you will be doing:

  • Clustered queue-messaging system config and administration (RabbitMQ)
  • Metrics and logs collection (Prometheus, ElasticSearch, NewRelic, AWS CloudWatch)
  • Grown-Up Monitoring Setup (Kibana, Grafana, AlertManager, NewRelic, AWS CloudWatch)
  • Automated deployment and configuration (GitLab, Terraform, Consul, Puppet)
  • Kubernetes application deployment, tuning, monitoring and low-level profiling
  • Load Balancing and Caching (HAProxy, Memcache/Redis, AWS ELB/ElasticCache)
  • Ad-Hoc Go, NodeJS or Python scripting for systems data collection.
  • Ad-Hoc Go, NodeJS or Python scripting for micro-services API load test
  • On-Premise and Cloud (AWS) architectures and services operation


You come with:

  • Familiarity with most of the technology described above
  • Proficiency in statistical event correlation of a root cause
  • Experience in IT infrastructure 24x7 operation and on-call support
  • Experience in AWS cloud based web applications and micro-services
  • Background in e-Commerce infrastructure and solutions (a plus)
  • Experience in tuning high-traffic-spike compute environments (a plus)
  • Certification in AWS, Kubernetes (CKA) or any other technology in the stack (a plus)
  • Living for learning, share and document knowledge with others.
  • Passionate, highly-organised, self-driven and goal oriented professional looking for a challenging environment
  • Fluency in English

We offer:

  • A great opportunity to grow as a technical leader and people manager
  • A truly shaping tech position with high degrees of responsibility and autonomy
  • Plenty of room for personal growth, professional development and high impact
  • A highly talented, dynamic, and international team
  • Entrepreneurial experience in a well-financed, high-growth eCommerce company

Location: Munich or Remote in Germany, Poland or Spain
Contact Person: Jennifer Stennei
Requisition Number: 2284

Interested? Looking forward to your full application under specification of your possible starting date and salary expectation.