Senior Site Reliability Engineer (Remote Build)
Remote · EMEA
Job Description
About Remote
Remote is dedicated to simplifying global employment for modern organizations. We empower businesses of all sizes to seamlessly recruit, pay, and manage international teams. Our culture is driven by innovation, ambitious problem-solving, and a commitment to asynchronous, globally distributed work. We encourage every team member to bring their unique talents and experiences to help us build a best-in-class HR platform. If you are energetic, curious, motivated, and ambitious, join us and help define the future of work!
About the Role: Senior Site Reliability Engineer (Remote Build)
The Remote Build team is at the forefront of an agentic shift, ensuring that AI agents can effectively navigate the complexities of global employment infrastructure. This includes compliance with labor laws across numerous countries, payroll, entity structures, and the overall compliance framework. The platform powering these integrations must be exceptionally reliable, secure, and built for scale.
As a Senior Site Reliability Engineer for Remote Build, you will be instrumental in ensuring the operational excellence and infrastructure strategy that underpins the Build platform's reliability, performance, and security. You will report to the Engineering Manager and collaborate closely with the Remote Build leadership, product managers, engineers, and customer success teams to guarantee that every engagement scales reliably from its inception.
Key Responsibilities
- Infrastructure as Code at Scale: Design, implement, and maintain infrastructure-as-code patterns using Terraform and Kubernetes to support both standard connectors and custom builds, enabling engineers to deploy and operate with confidence.
- Observability and Incident Response: Build and manage comprehensive monitoring, logging, and alerting systems. Lead incident response efforts, conduct thorough post-mortems, and drive continuous improvements in system reliability.
- Security and Compliance: Collaborate with the Security team to embed security practices across all layers of the Build infrastructure, ensuring compliance with regulations in over 100 jurisdictions without hindering developer or customer workflows.
- Performance and Cost Optimization: Continuously optimize system performance, resource utilization, and cloud costs, providing recommendations to enhance both reliability and unit economics.
- Automation and Operational Leverage: Identify and systematically eliminate manual operational tasks, developing tools and processes that allow teams to operate efficiently without a proportional increase in headcount.
- Platform Reliability and Developer Experience: Partner with platform teams to ensure the resilience and observability of APIs, MCP, and CLI. Provide feedback on infrastructure to influence the platform's evolution.
Requirements
- Senior-Level SRE Experience: Demonstrated experience in a Site Reliability Engineering, DevOps Engineering, or SysOps role, with a proven track record of building and operating production systems at scale.
- Kubernetes and AWS Expertise: Deep, hands-on experience running Kubernetes in production environments and solid foundational knowledge of AWS services, including compute, networking, storage, and managed services.
- Infrastructure as Code Proficiency: Proficiency with Terraform or similar Infrastructure as Code (IaC) tools, with a strong preference for defining infrastructure through code rather than manual console operations.
- CI/CD and Deployment Automation: Practical experience setting up and operating CI/CD pipelines using tools like GitLab, GitHub Actions, or Jenkins, with a solid understanding of deployment strategies, rollback mechanisms, and safety nets.
- Scripting and Systems Knowledge: Strong bash scripting skills and comfort in debugging system-level issues, analyzing logs, and understanding fundamental Linux kernel concepts.
- Excellent Communication Skills: Ability to clearly articulate complex infrastructure decisions to both technical and non-technical stakeholders, and a talent for writing clear runbooks and documentation.
Nice to Have
- Experience with at least one backend programming language (e.g., Elixir, Python, Go, Java, Node.js).
- Experience in consultancy settings.
- Familiarity with container registry and artifact management (e.g., ECR, Docker Hub).
- In-depth knowledge of observability stacks (e.g., Datadog, Prometheus, ELK, Grafana).
- Experience working with or scaling multi-tenant platforms.
What We Offer
- Location: Fully Remote (Worldwide)
- Contract: Permanent
- Reporting to: Engineering Manager
- Team: Engineering
- Salary Range: $54,000 - $150,000 USD annually (based on location, experience, and other factors)
- Benefits & Perks:
- Work from anywhere
- Flexible paid time off
- Flexible working hours (asynchronous work culture)
- 16 weeks paid parental leave
- Mental health support services
- Stock options
- Learning budget
- Home office budget & IT equipment
- Budget for local social events or co-working spaces
Application Process
- Complete the application form and upload your CV in PDF format (English
✨ This description was enhanced by AI based on the original listing.