Principal DevOps Engineer
Zartis · Europe
Job Description
Principal DevOps Engineer (Remote, Europe)
Zartis is a global AI transformation and technology consulting partner. We collaborate with ambitious organizations to design, build, and scale technology solutions that deliver real impact. Our teams possess deep expertise in AI-driven platforms, secure API architectures, and cloud-native engineering. You will work on meaningful projects that accelerate the adoption of advanced technologies, from strategy and discovery through to full product delivery, helping turn complex challenges into measurable outcomes. With engineering hubs across EMEA and LATAM, and long-term partnerships in financial services, healthcare and life sciences, and energy and climate, we offer opportunities to work on projects that truly matter.
We are seeking a talented Principal DevOps Engineer to join a founding-stage initiative within one of Europe's largest digital-native groups. This role will focus on building the internal agentic platform that enables safe, observable, and reusable AI-driven workflows at marketplace scale. You will be part of a founding platform cell at the heart of an AI House, central to making agentic engineering the default across all marketplace teams.
About the Project
The platform you will help define and build does not yet exist. You will work across secure agent runtimes, tool gateways, evaluation pipelines, observability systems, and reusable scaffolding. These components will be built for a heterogeneous, multi-cloud, multi-stack estate spanning products like Leboncoin, Kleinanzeigen, Marktplaats, Subito, Mobile.de, and more. Our team values a diverse range of backgrounds and is committed to fostering an inclusive culture based on trust and innovation.
Key Responsibilities
- Build and operate secure agent runtimes with sandboxing, runtime isolation, network policies, secrets management, RBAC, and approval flows.
- Design and maintain integration surfaces with MCP-style adapters and gateways for agents to interact with source control, CI/CD, ticketing, documentation, cloud, and observability systems.
- Build and own the evaluation pipeline, including golden tasks, graders, regression tests, and release gates, to ensure agentic workflow correctness is measurable and cost-effective.
- Implement observability and cost control measures, including traces, telemetry, token usage, cost-per-workflow, rate limits, and failure handling.
- Create reusable scaffolding such as templates, starter kits, and wrapper scripts to facilitate the adoption of proven patterns.
- Partner with the AI Architect to translate the reference architecture into production-ready code.
Requirements
- Proven track record of shipping internal developer platforms or developer tooling that real teams depend on, including users, SLAs, and on-call rotations.
- Hands-on experience designing and operating secure execution environments (sandboxing, runtime isolation, RBAC, secrets, audit logging) at production scale.
- Experience shipping integrations across source control, CI/CD, ticketing, documentation, cloud, and observability systems in polyglot environments.
- Experience building or operating tool gateways, adapters, or MCP-style integrations for agents, with an understanding of the new failure modes they introduce.
- An instrumentation-first mindset with a focus on traces, telemetry, cost-per-workflow, and evaluation pipelines.
- An SRE mindset prioritizing reliability, rollback, rate limiting, graceful degradation, and cost control.
- A record of shipping reusable templates and starter kits that achieve widespread team adoption.
- Comfort with modern infrastructure including containers, cloud (AWS or equivalent), Infrastructure as Code (IaC), CI/CD, APIs, and scripting languages.
- Experience supporting multiple engineers or teams as internal customers, balancing opinionated defaults with special cases.
- Ability to collaborate effectively with security, legal, and risk stakeholders.
Nice to Have
- Prior experience building agentic platforms or AI infrastructure at a multi-team scale.
- Familiarity with MCP (Model Context Protocol) integrations or equivalent agent tool gateway patterns.
- Experience with multi-cloud or multi-stack platform engineering in distributed marketplace or e-commerce environments.
- Exposure to cost governance tooling, token budgeting, or LLM observability platforms.
What We Offer
- 100% Remote Work
- Work From Home Allowance: Monthly financial support for remote working.
- Career Growth: Access to a career development program with 360º feedback for career progression.
- Training: Dedicated time for tech training, including online courses, English classes, books, conferences, and events.
- Mentoring Program: Opportunities to mentor or be mentored.
- Zartis Wellbeing Hub (Kara Connect): Access to specialists for mental health, nutrition, physiotherapy, fitness, and webinars.
- Multicultural Working Environment: Regular tech events, webinars, parties, and online team-building activities.
✨ This description was enhanced by AI based on the original listing.