America's Job Portal
About the Role
Our client is building its own foundation model for video generation, based on DiT and Flow Matching architectures. They are looking for a Training Infrastructure Engineer who can turn cutting-edge research code into a stable, scalable, and high-throughput training system running on large-scale GPU clusters.
This role is ideal for an engineer who enjoys solving deep systems problems at the intersection of distributed training, CUDA performance, video data pipelines, model training stability, and large-scale ML infrastructure. You will work closely with researchers and platform engineers to ensure that our video generation training stack can reliably produce results at the thousand-GPU scale.
Key Responsibilities