You bring 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering, with a specific focus on operating large-scale distributed systems in production.
You possess expert-level knowledge of Kubernetes Control Plane internals, including the API Server, Controller Manager, Scheduler, and etcd.
You demonstrate proficiency in Go and write production-grade code to build automation tools, Kubernetes Operators, or glue code that integrates disparate systems.
You hold deep experience with Infrastructure as Code and container infrastructure, alongside proficiency in Linux system internals (kernel tuning, memory management) and networking (TCP/IP, CNI, Load Balancers, eBPF).
You bring experience in operating datastores (e.g., PostgreSQL, Redis) and messaging systems (e.g., Kafka, NATS) in scalable environments.
You run towards fires to learn from them, you automate yourself out of a job, and you ...
Ready to Apply?
Join thousands of Americans building their careers