America's Job Portal
We are seeking a senior infrastructure leader to lead the deployment, performance, and reliability of large-scale bare-metal GPU clusters at the core of next-generation AI factory environments. This individual will play a pivotal role in bringing new compute capacity into production, ensuring infrastructure is commissioned effectively, operated reliably, and continuously optimised to support mission-critical AI workloads.
This is a high-impact leadership role at the intersection of infrastructure engineering, systems performance, production reliability, and operational scale-up.
Role
This role is responsible for leading the end-to-end deployment and operational management of bare-metal GPU clusters, from initial setup through to ongoing optimisation. You will drive infrastructure performance across compute, networking, and storage layers while establishing operational frameworks for monitoring, incident response, and system reliability. The position also involves bu...