← Back to USA Jobs

Site Reliability Engineer

Company

RCS TECH

Location

mexico, mexico

Posted

June 04, 2026

Position Overview

What You’ll Do  
 Reliability & Operations 
 - Own availability, latency, and scalability across SaaS and AI systems  
 - Define and enforce SLOs, SLIs, and error budgets 
 - Participate in a global on-call rotation (~1 week every 4 weeks) 
 - Lead incident response and drive blameless postmortems with systemic fixes 
 Platform & Infrastructure  
 - Architect and operate on-premise and multi-region, multi-cloud environments 
 - Manage large-scale Kubernetes workloads 
 - Build and evolve infrastructure using Terraform and Ansible 
 - Improve system resilience, fault isolation, and capacity planning 
 AI/ML & Automation  
 - Build and scale agentic AI systems for triage, anomaly detection, and self-healing 
 - Ensure reliability of model serving infrastructure 
 - Operate, optimize and scale distributed systems 
 What You Bring ...
        

🇺🇸 USAJobs.work

Site Reliability Engineer

Position Overview

Ready to Apply?