AI Engineer
Ruby Labs · Europe
Job Description
Senior AI Engineer
Company: Ruby Labs Location: Remote (Europe, within approx. ± 4 hours of CET) Work Model: Remote Contract: Permanent Industry: Tech
About Us
Ruby Labs is a leading tech company dedicated to creating and operating innovative consumer products across the health, education, and entertainment industries. Our forward-thinking teams are shaping the future of consumer-led products, and we are always seeking passionate individuals to join our mission. Learn more about our journey at https://rubylabs.com/about-us/.
About the Role
We are looking for a Senior AI Engineer to take ownership and drive the quality, reliability, and evolution of our AI systems in production. This is a high-ownership role where you will be responsible for the end-to-end delivery of major AI features, ensuring the production stability of AI systems, and conducting data-driven experimentation using tools like Langfuse, Mixpanel, and OpenRouter.
You will work within a modern tech stack built on Next.js, TypeScript, and Node.js, collaborating closely with product, growth, data, and billing teams. A significant part of this role involves building agentic, tool-using AI systems, defining clear tool contracts (including MCP-based tools), and orchestrating AI interactions with internal services and business systems. You will operate within an AI engineering squad, contributing as a senior technical voice and driving engineering quality within your product area.
Key Responsibilities
- Take complete ownership and deliver major AI engineering features within agreed timelines.
- Own AI output quality, structure, and predictability across all user-facing AI interactions.
- Design, implement, and maintain output-type-based AI systems, including segmentation, routing, and enforcement.
- Ensure consistent output structure and formatting across different Large Language Models (LLMs) for the same request type.
- Integrate and orchestrate multiple LLM providers via OpenRouter, managing model selection, fallback strategies, and cost optimizations.
- Design and orchestrate tool-using and agentic AI workflows, defining clean tool contracts, function-calling interfaces, and reliable AI-to-system integrations.
- Build and maintain complex, multi-step LLM workflows, including with orchestration frameworks such as LangChain or LlamaIndex, for advanced reasoning, context reuse, and retrieval.
- Design and manage production prompt systems with dynamic prompting, context injection, and conditional logic.
- Own the deployment and release of LLM experiments, prompt management, and Langfuse-based evaluation pipelines.
- Run A/B tests across models, analyze results, and present data-driven impact assessments of AI features and experiments.
- Monitor AI system metrics, quality signals, latency, and release health using Langfuse and other observability tools.
- Deep-debug complex LLM chains using Langfuse traces, identifying bottlenecks and optimizing for cost, latency, and context-window usage.
- Build output-scoring systems to root-cause hallucinations and logic errors.
- Write clean, scalable, and maintainable TypeScript code across the Next.js and Node.js stack.
- Build reliable backend logic for AI systems with strong error handling, request validation, fallback flows, and predictable production behavior.
- Ensure high code quality through testing, code reviews, and clear engineering standards.
- Monitor, troubleshoot, and improve production performance, reliability, and system health.
- Drive maintainability and technical quality through solid architecture, refactoring, and disciplined release practices.
Requirements
- 6+ years of backend/full-stack software engineering experience, including production-grade TypeScript/Node.js.
- 2+ years of experience building AI/LLM systems in production.
- Deep hands-on experience working with LLM APIs (OpenAI, Anthropic, or similar) in production environments.
- Experience with Agentic AI, multi-agent orchestration, tool-based workflows (function calling/tool execution), and/or Retrieval-Augmented Generation (RAG) pipelines.
- Experience with LLM observability tools such as Langfuse, LangSmith, or similar platforms.
- Experience with AI gateways and model routing solutions, such as OpenRouter or equivalent technologies.
- Solid understanding of Redis and relational databases, such as PostgreSQL.
- Exceptional ownership mindset and personal responsibility for engineering quality and delivery.
Nice to Have
- Experience with AI-centered development tools (e.g., Cursor, Claude Code).
- Familiarity with evaluation frameworks (e.g., LLM-as-a-judge, RAGAS).
- Experience working in high-pressure startup environments with rapid product iteration cycles.
- Experience with MCP (Model Context Protocol).
- Experience
✨ This description was enhanced by AI based on the original listing.