Position Overview
Job Summary We are looking for a GPU / AI Infrastructure Engineer with 5β7 years of experience to build, optimize, and support scalable AI/ML and HPC environments. The ideal candidate will have strong expertise in GPU acceleration, containerized workloads, and MLOps pipelines, along with hands-on experience managing AI infrastructure across on-prem or cloud platforms. Key Responsibilities Design, deploy, and manage GPU-enabled infrastructure for AI/ML and HPC workloads. Install, configure, and optimize GPU software stacks including NVIDIA AI Enterprise, CUDA, ROCm, OpenCL, and NIMS. Support GPU acceleration for machine learning frameworks and scientific applications. Build and manage containerized environments using Docker, Kubernetes (K8s), and Singularity. Deploy and manage Kubernetes GPU workloads using GPU Operator and related ecosystem tools. Support ML frameworks such as TensorFlow, PyTorch, Scikit-learn, and MXNet. Develop and maintain MLOps pipelines using MLflow and Kubeflow. ...