Staff Software Engineer – Platform, SysEng | Canada | Remote
Grafana Labs · Canada
Job Description
Staff Software Engineer – Platform, SysEng | Remote (Canada)
Grafana Labs, the company behind the open observability cloud, is seeking a Staff Software Engineer to join our Platform SysEng squad. We are a 100% remote company with a global team and are looking for talented individuals to help us scale our fully managed observability platform.
About the Role
The Internal Engineering Platform (IEP) team provides engineers with the tools, systems, and Kubernetes clusters needed to build, deploy, and run their workloads. As part of the Platform department, you will focus on improving the performance, reliability, and efficiency of our platform as we scale to handle hundreds of millions of metrics, log lines, and traces per second.
The Platform SysEng squad is focused on the maturity and scalability of the platform, working to reduce new region build timelines and meet customer demands. You will be part of a team managing infrastructure for core Grafana Labs products like Grafana, Mimir, Loki, and Tempo. This role involves an on-call rotation to ensure system health, providing valuable insight into our product usage.
Key Responsibilities
- Contribute to the maturity and scalability of the internal engineering platform.
- Work across engineering teams to optimize infrastructure and deployment processes.
- Develop and operate systems that support Grafana Labs' core products.
- Collaborate with engineers and management to achieve team and company goals.
- Participate in the full lifecycle of code, from design to production.
- Contribute to on-call rotations to ensure system reliability.
Requirements
- Proven experience delivering and operating large-scale distributed systems with technical leadership.
- Demonstrable experience in system design, with a deep understanding of tradeoffs in latency, consistency, availability, scaling, and cost.
- Hands-on experience with cloud-native architectures (microservices, containers/Kubernetes, Infrastructure as Code) and operational practices.
- Experience defining SLOs/SLIs, capacity planning, performance tuning, and driving reliability initiatives.
- Excellent coding and design skills, preferably in Go, Python, or similar languages.
- Comfort and curiosity in using AI-powered developer tools for tasks like prototyping, testing, and documentation.
- Ability to influence without authority and align cross-functional stakeholders in a remote-first environment.
- Strong written and verbal communication skills.
- Comfort working in a remote-first company, emphasizing collaboration, kindness, and respect.
- Eagerness to learn and grow within a supportive team environment.
Bonus Points For
- Experience with open-source projects.
- Familiarity with Kubernetes scheduling and projects like Karpenter.
- Experience with Terraform and/or Crossplane.
- Experience with Tanka and/or Jsonnet.
What We Offer
- Compensation: Base compensation range in Canada is CAD 186,368 - CAD 223,642. Actual compensation will vary based on level, experience, and skillset.
- Benefits: Includes equity (Restricted Stock Units - RSUs), bonus (if applicable), and other comprehensive benefits.
- 100% Remote: Work from anywhere within Canada, with a preference for candidates in Canadian time zones (EST and CST highly preferred).
- Global Culture: Join a diverse, collaborative, and supportive international team.
- Growth Opportunities: Ample room for professional development and career advancement.
- Innovation: An environment that encourages autonomy, transparency, and trying new things.
- Work-Life Balance: A global annual leave policy of 30 days, including Grafana Shutdown Days.
- Onboarding: In-person onboarding to help you thrive from day one.
- AI-Assisted Development: Access to modern AI coding assistants and a funded usage budget.
✨ This description was enhanced by AI based on the original listing.