πŸ‡ΊπŸ‡Έ USAJobs.work

America's Job Portal

← Back to USA Jobs

Senior LLM Inference Engineer β€” Performance & GPU Optimization

Company

Confidential

Location

singapore, singapore

Posted

June 25, 2026

Position Overview

Own the performance of large language models in production β€” the latency, the throughput, the cost-per-token. This is deep inference-optimization work: profiling and tuning at the GPU and serving-engine level to make models run faster and cheaper at scale. You'll join a small, senior team at an established enterprise software company building LLM-powered capabilities into its products.

What you'll do:
  • Optimize LLM inference for latency, throughput, and cost β€” at the kernel and serving-engine level
  • Profile and tune GPU performance (CUDA, TensorRT-LLM); apply quantization, speculative decoding, and batching strategies
  • Get the most out of serving frameworks like vLLM, SGLang, and Triton β€” and extend them where they fall short
  • Optimize across hardware targets where relevant (NVIDIA and other accelerators)
  • Partner with model and platform teams to take new architectures from works to fast
What you'll bri...

Ready to Apply?

Join thousands of Americans building their careers

Apply Now