πŸ‡ΊπŸ‡Έ USAJobs.work

America's Job Portal

← Back to USA Jobs

Site Reliability Engineer

Company

Applexus Technologies

Location

Bengaluru, Karnataka

Posted

June 04, 2026

Position Overview

Job DescriptionKey Responsibilities:- System Reliability and Monitoring: Design and implement monitoring, alerting, and automation for S3 storage clusters to achieve 99.99%+ uptime. Use tools like Prometheus, Grafana, or Catchpoint to track performance metrics, capacity utilization, and anomaly detection.- Capacity Planning and Scaling: Forecast storage needs based on data growth trends (e.G., fleet expansion exceeding 80 PB) and proactively scale S3 buckets, lifecycle policies, and multi-region replication to support up to 150 PB+ capacities.- Incident Management: Lead on-call rotations, troubleshoot storage-related incidents (e.G., data access latency, replication failures), and perform root cause analysis using methodologies like blameless post-mortems.- Automation and Infrastructure as Code: Develop and maintain automation scripts (e.G., using Terraform, Ansible, or Python) for provisioning, configuring, and managing S3 resources, including security policies, encryption, and access...

Ready to Apply?

Join thousands of Americans building their careers

Apply Now