Back to jobs

Senior Customer Reliability Engineer – Infrastructure

Astronomer · Ireland

🏠 Remote📅 2 Jun 2026

Job Description

Senior Customer Reliability Engineer – Infrastructure (Remote, Ireland)

Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life with Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. Trusted by over 800 leading enterprises, Astronomer enables businesses to unlock insights, unleash AI value, and power data-driven applications.

About the Role

The Astronomer Customer Reliability Engineering (CRE) team is dedicated to ensuring the success of our customers using our managed Airflow service. As a Senior Infrastructure Specialist within this team, you will focus on the reliability of the underlying cloud infrastructure and Kubernetes clusters. This involves responding to incidents, resolving them, and implementing measures to prevent recurrence. You will be instrumental in owning and improving our observability platform, directly impacting the reliability of our product and delivering exceptional customer outcomes.

This is a customer-facing role offering exposure to a diverse range of technical challenges and customer requirements across various industries and cloud providers. Your contributions will significantly influence customer success with Astronomer products and enhance the overall customer experience.

What You Will Do

  • Provide solutions to customers to ensure their success with our products.
  • Troubleshoot customer environments and actively triage issues with them.
  • Participate in an on-call rotation for weekend coverage.
  • Provide valuable feedback to product development teams regarding customer needs and pain points.
  • Develop and enhance our monitoring and alerting systems.
  • Build and maintain automation to streamline daily operational tasks.
  • Contribute to the architecture of our products.
  • Own the customer experience by working directly with them to prioritize and solve issues, meet Service Level Agreements (SLAs), and provide expert guidance.
  • Collaborate remotely within a fully distributed team.
  • Enhance and enrich customer documentation.
  • Work with cutting-edge technology and multi-cloud implementations.

What You Bring to the Role

  • 6 years of experience, ideally with large, complex cloud infrastructures operating at scale.
  • 4 years of experience with Kubernetes.
  • Experience managing a production distributed system on at least one major cloud provider (AWS, GCP, or Azure).
  • Strong Linux experience.
  • Knowledge of operating and monitoring distributed systems.
  • Previous experience handling customer issues (internal or external).
  • Strong communication skills.
  • DevOps or CI/CD experience.
  • Python scripting skills.
  • Excellent troubleshooting abilities.

Bonus Points If You Have

  • Experience as a Site Reliability Engineer.
  • Experience with Kubernetes Custom Resources.
  • In-depth knowledge of Azure.
  • Experience with Airflow or Big Data Orchestration.
  • Infrastructure as Code (IaC) experience.

Astronomer is an equal opportunity employer committed to diversity and inclusion. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

✨ This description was enhanced by AI based on the original listing.