Monitor and Maintain Systems: Ensure the availability, performance, and reliability of our production environment by monitoring system health and responding to incidents.
Automation: Develop and implement automation tools to reduce manual intervention and improve system efficiency.
Collaboration: Work closely with development teams to design and implement scalable and reliable systems.
Performance Tuning: Analyze system metrics to identify performance bottlenecks and optimize system performance.
Incident Management: Lead incident response efforts, conduct root cause analysis, and implement preventive measures.
Documentation: Create and maintain comprehensive documentation for system architecture, processes, and procedures.
Capacity Planning: Conduct capacity planning and en...
Ready to Apply?
Join thousands of Americans building their careers