Senior AI Software Engineer
Ruby Labs · Europe
Job Description
Senior AI Software Engineer
Ruby Labs is a leading tech company building innovative consumer products across health, education, and entertainment. We are seeking a passionate Senior AI Software Engineer to join our remote team and drive the quality, reliability, and evolution of our AI systems in production.
About the Role
This is a high-ownership role where you will be responsible for the end-to-end delivery of major AI features, ensuring the production stability of our AI systems, and conducting data-driven experimentation. You will work with a modern tech stack including Next.js, TypeScript, and Node.js, collaborating closely with product, growth, data, and billing teams. A key focus will be on building agentic, tool-using AI systems, defining clean tool contracts, and orchestrating AI interactions with internal services. You will operate within an AI engineering squad, providing senior technical leadership and driving engineering quality.
Key Responsibilities
- Take complete ownership and deliver major AI engineering features within agreed timelines.
- Own AI output quality, structure, and predictability across all user-facing AI interactions.
- Design, implement, and maintain output-type-based AI systems, including segmentation, routing, and enforcement.
- Ensure consistent output structure and formatting across different Large Language Models (LLMs) for the same request type.
- Integrate and orchestrate multiple LLM providers via OpenRouter, managing model selection, fallback strategies, and cost optimizations.
- Design and orchestrate tool-using and agentic AI workflows, defining clean tool contracts, function-calling interfaces, and reliable AI-to-system integrations.
- Build and maintain complex, multi-step LLM workflows using orchestration frameworks like LangChain or LlamaIndex for advanced reasoning, context reuse, and retrieval.
- Design and manage production prompt systems with dynamic prompting, context injection, and conditional logic.
- Own the deployment and release of LLM experiments, prompt management, and Langfuse-based evaluation pipelines.
- Run A/B tests across models, analyze results, and present data-driven impact assessments of AI features and experiments.
- Monitor AI system metrics, quality signals, latency, and release health using Langfuse and other observability tools.
- Deep-debug complex LLM chains using Langfuse traces, identifying bottlenecks and optimizing for cost, latency, and context-window usage.
- Build output-scoring systems to root-cause hallucinations and logic errors.
- Write clean, scalable, and maintainable TypeScript code across the Next.js and Node.js stack.
- Build reliable backend logic for AI systems with strong error handling, request validation, fallback flows, and predictable production behavior.
- Ensure high code quality through testing, code reviews, and clear engineering standards.
- Monitor, troubleshoot, and improve production performance, reliability, and system health.
- Drive maintainability and technical quality through solid architecture, refactoring, and disciplined release practices.
Requirements
- 6+ years of backend/full-stack software engineering experience, including production-grade TypeScript/Node.js.
- 2+ years of experience building AI/LLM systems in production.
- Deep hands-on experience working with LLM APIs (OpenAI, Anthropic, or similar) in production environments.
- Experience with Agentic AI, multi-agent orchestration, tool-based workflows (function calling/tool execution), and/or Retrieval-Augmented Generation (RAG) pipelines.
- Experience with LLM observability tools such as Langfuse, LangSmith, or similar platforms.
- Experience with AI gateways and model routing solutions, such as OpenRouter or equivalent technologies.
- Solid understanding of Redis and relational databases, such as PostgreSQL.
- Exceptional ownership mindset and personal responsibility for engineering quality and delivery.
Nice to Have
- Experience with AI-centered development tools such as Cursor, Claude Code, or similar platforms.
- Familiarity with evaluation frameworks, including LLM-as-a-judge, RAGAS, or similar approaches.
- Experience working in high-pressure startup environments with rapid product iteration cycles.
- Experience with Model Context Protocol (MCP), including building MCP servers/clients or designing tool contracts for AI agents.
- Experience with edge and serverless runtimes, such as Cloudflare Workers.
- Experience with payments, billing, and checkout flows, or orchestration platforms.
- Practical experience fine-tuning models for domain-specific tasks or achieving strict JSON/schema compliance.
- Working proficiency in Python for data science, evaluation scripts, or AI tooling.
Location
Ruby Labs operates within the CET (Central European Time) zone. Applicants from any country are welcome to apply as long as they are located within approximately ± 4 hours of CET to ensure optimal collaboration.
What We Offer
✨ This description was enhanced by AI based on the original listing.