Job Overview
The Sr. Site Reliability Engineer (SRE) will be responsible for the availability, performance, security, and scalability of Broadridge’s infrastructure and applications. The role works closely with development & operations teams to streamline the software development lifecycle, automate processes, and maintain reliable, scalable systems.
Key Responsibilities
- Monitor systems and lead incident response for production outages; develop and enhance monitoring systems such as Datadog.
- Design and maintain scalable infrastructure using IaC tools (Chef, Terraform, Ansible, CloudFormation).
- Ensure stability, performance, and scalability of Linux‑based infrastructure and services while applying SRE practices to meet reliability targets (SLAs, SLOs, SLIs).
- Build, manage, and maintain CI/CD pipelines for rapid and safe release cycles.
- Develop and implement scripts and tooling to automate repetitive operational task...