Position Overview
Responsibilities
Design, deploy, and support large-scale distributed clusters and platform environments. Manage and automate provisioning of compute resources in both on-premises and cloud platforms. Design, implement, and manage CI/CD pipelines for applications and platform services. Monitor cluster usage, health, performance, and availability. Improve infrastructure provisioning, management, and monitoring through automation. Troubleshoot system-level issues related to Linux platforms, Kubernetes, networking, and distributed systems. Optimize system parameters (e.g., OS, drivers, networking, libraries) for platform performance and reliability. Conduct system benchmarking and keep up with the latest advancements in infrastructure and cloud technologies. Set up monitoring and logging using Zabbix, Prometheus, and other tools. Implement security best practices for multi-tenant platform environments. Collaborate with software teams and administrators to streamline workflows and improv...