We're looking for a Senior/Staff AI Engineer β Inference & Agent Systems for a rapidly growing Fintech startup setting up their operations in India.
Why Join?
Get an opportunity to be of the founding member of the team and develop product from scratch.
About Role and the work:
Inference Optimization
- Drive TTFT below 400ms for multi-step agent pipelines
- Streaming optimization: first token to user while sub-agents are still running
- KV cache strategy, prompt compression, dynamic context window management
- Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models
Infrastructure
- Model serving and cold start optimization
- Async worker architecture for parallel sub-agent execution
- Observability: trace every token, every tool call, every synthesis s...