About Role
We are looking for a Site Reliability Engineer (SRE) to ensure the availability, reliability, and performance of our production systems. The ideal candidate will be responsible for monitoring infrastructure and applications, managing incidents, analyzing logs, performing system health checks, and supporting operational excellence across the organization.
Roles & Responsibilities
- Monitor servers, applications, databases, cloud infrastructure, and third-party integrations
- Configure and manage alerts, dashboards, and observability tools
- Respond to incidents, perform initial diagnosis, and coordinate escalations within defined SLAs
- Analyze system and application logs to identify issues, anomalies, and recurring patterns
- Perform daily health checks for production systems, databases, APIs, dashboards, and infrastructure components
- Maintain SOPs, runbooks, incide...