Principal Operations Engineer Hardware â Data Center Operations
Fluidstack · Remote — Worldwide
Job Description
About Fluidstack
Fluidstack is building civilization-scale infrastructure for AI, aiming to deliver massive amounts of compute faster than anyone else. We are rethinking every layer of the stack, from acquiring power and designing data centers to operating them with integrated hardware and software teams. Our mission is to ensure that AI expands human freedom, and we are looking for passionate individuals to join us in this critical endeavor.
About the Role
We are seeking a Principal Operations Engineer, Hardware to be the foremost technical authority for our operational hardware fleet across our hyperscale AI data center portfolio. This role is crucial for ensuring the reliability and performance of our GPU systems, servers, and supporting hardware at scale. You will act as the technical lead in the field, driving operational excellence, conducting site assessments, leading technical readiness for new sites, and providing operational insights to hardware engineering and supply chain teams. You will be a key force multiplier, bridging the gap between hardware operations, engineering, network, facilities, and customer-facing teams.
Key Responsibilities
- Serve as the senior technical authority for operational hardware across the fleet.
- Lead site assessments and operational audits.
- Drive technical readiness of teams before site activation.
- Review hardware platforms and integration designs from an operational perspective.
- Provide operational learnings back into hardware engineering, deployment, and supply chain.
- Act as a force multiplier for site hardware leads, deployment teams, and reliability engineers.
- Facilitate communication between hardware operations, hardware engineering, network, facilities, and customer-facing teams.
- Diagnose hardware issues and lead fleet-wide root cause investigations.
- Manage and improve hardware operational processes and vendor relationships.
Requirements
- 10+ years of hands-on experience operating mission-critical hardware infrastructure, with at least 5 years as the senior technical voice on a site, campus, or fleet.
- Data center operations experience is strongly preferred; hyperscale, large HPC, cloud, or other mission-critical compute infrastructure experience will be considered.
- Deep understanding of GPU systems, server platforms, storage infrastructure, firmware lifecycle management, and hardware diagnostics, gained through practical field experience.
- Demonstrated ability to author, approve, and execute high-risk Method of Procedures (MOPs) and change records in live production environments.
- Proven track record of leading root cause analysis for significant hardware events and driving corrective actions.
- Experience holding OEMs, ODMs, service vendors, and deployment partners accountable while maintaining positive relationships.
- Strong written communication skills for operational health assessments, RCAs, procedure reviews, and design feedback.
- Comfort operating as the senior technical voice across multiple departments.
- Willingness to travel extensively across the fleet (50-75%).
Preferred Qualifications
- Bachelor's degree in Computer Engineering, Electrical Engineering, Computer Science, or a related field.
- Hyperscale or large-scale compute operational experience supporting thousands of servers and accelerator systems.
- Direct experience operating modern GPU platforms at production scale.
- Strong working knowledge of Linux administration, hardware management tooling, and production troubleshooting workflows.
- Experience supporting liquid-cooled compute infrastructure.
- Experience operating across multiple sites or as part of a global fleet operations function.
- Experience in standing up new sites from deployment handover through steady-state operations.
- Experience contributing operational requirements into hardware platform decisions and data center builds.
- Scripting and automation experience for fleet-scale hardware operations.
What We Offer
- Competitive total compensation package, including salary and equity.
- Retirement or pension plan, in line with local norms.
- Health, dental, and vision insurance.
- Generous Paid Time Off (PTO) policy, in line with local norms.
- The base salary range for this position is $150,000 - $250,000 per year, depending on experience, skills, qualifications, and location. Total compensation may also include equity in the form of stock options.
Fluidstack is committed to pay equity and transparency and is an Equal Employment Opportunity Employer.
✨ This description was enhanced by AI based on the original listing.