🇺🇸 USAJobs.work

America's Job Portal

← Back to USA Jobs

Senior Software Engineer, DGX Cloud Production Engineering

Company

NVIDIA

Location

Santa Clara, CA

Posted

May 31, 2026

Position Overview

NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and safe to run. This role is part of a production engineering team focused on Kubernetes-based infrastructure, GPU cluster operations, reliability, automation, GitOps, and Day 2 operability across DGX Cloud environments.


What you’ll be doing:
+ Build and operate automation for large-scale GPU clusters across NVIDIA Cloud Partners (NCP) and on-prem environments.
+ Develop tools and services for provisioning, validation, upgrades, monitoring, repair, and cluster lifecycle operations.
+ Improve Day 0 / Day 1 / Day 2 workflows for cluster bringup, handoff, and production operations.
+ Reduce manual production touches through APIs, GitOps, automation, and agent-assisted workflows.
+ Participate i...

Ready to Apply?

Join thousands of Americans building their careers

Apply Now