Own the architecture of production AI systems — inference stacks, fine-tuning pipelines, retrieval and evaluation infrastructure, monitoring
Build on frontier models (Claude, GPT, and peers) with real rigor: tool use, structured outputs, context and cost management, evals, and guardrails — not just prompt-and pray
Deploy and operate open-source models (Llama, Qwen, Mistral, DeepSeek, and whatever comes next) on our cloud environment — including quantization, serving frameworks (vLLM, TGI, SGLang, TensorRT-LLM), and multi-GPU inference.
Make the frontier-vs-open-source call deliberately, on cost, latency, control, and data sensitivity grounds — and be able to defend it
Design the cloud infrastructure underneath it all: GPU orchestration, autoscaling, cost controls, VPC/networking, IAM, observability. This is not a “hand it to DevOps” role
Fine-tune, distill, and evaluate models agains...
Ready to Apply?
Join thousands of Americans building their careers