Senior Site Reliability Engineer

Remote (Central / South America) Full-time Allows remote

Couchsurfing is the world's premier social travel platform, with over 18 million members in nearly every country in the world. Hearing the stories of Couchsurfers whose lives have been transformed by travel using our platform is the greatest reward for doing what we do. Join us and every day thousands of people from every country in the world will meet in person and embark on new adventures as a direct result of your work.

Our platforms include Web, iOS and Android, and our mobile audience is growing rapidly. If you're interested in making a big impact at a small company with a passionate and large user base and growing revenue streams, then you'll like it here.

Core values

At Couchsurfing, we expect a lot of each other. Our mission and culture is what gets us up in the morning and makes our work meaningful and fun. You should be excited about the following:

  1. Couchsurfing serves a socially positive mission: Share your life.  Couchsurfing is a community of friends you haven’t met yet. Open the door to new people, places, and perspectives.
  2. As a team, we have all agreed to 7 core commitments:
    1. I agree to do my personal best
    2. I agree to have proactive transparent communication
    3. I agree to acknowledge impact, success and small wins
    4. I agree to have a courageous vision I agree to play team
    5. I agree to be my word
    6. I agree to ask for support when I need it
    7. I agree to be treated like a leader

Technologies we work with:

  • Backend:  Ruby 2.4, Ruby 5.0
  • Frontedn:  React
  • Infrastructure:  AWS, Terraform, Jenkins, Kubernetes
  • Client side (Android and iOS):  Java and few initial components written in Kotlin, Objective-C and Swift

Who we are looking for?

We are looking for a Senior Site Reliability Engineer that can help us automate and streamline our operations to continue to deliver web and mobile experiences that allow travelers to connect all around the world.

You will:

  • Design and deliver solutions to improve the availability, scalability, latency, and efficiency of our services.
  • Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
  • Diagnose and resolve production issues in conjunction with software engineering teams
  • Support and advise software engineering teams in the design of scalable services
  • Build and maintain tools for deployment, monitoring, and debugging
  • Maintain our security posture with tooling and process
  • Plan and execute disaster recovery drills
  • Participate in rotating on-call duties, including incident management

Experience and skills we're interested in:

  • Experience managing a container-based microservice architecture, including orchestration, service-discovery, monitoring, and debugging
  • Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP, the OSI Model, Subnetting, and Load Balancing
  • In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.).
  • Systematic problem solving approach, coupled with a strong sense of ownership and drive
  • Track-record of working cooperatively with software engineering teams
  • Focus on security in the delivery of all levels of a system
  • Passion for modern software development and operation, including agile, CI/CD, and infrastructure-as-code
  • Desire to learn and grow
  • 6+ years of experience