America's Job Portal
Your tasks
β’ You design and implement observability standards for data pipelines, data products and platform services.
β’ You support the progressive adoption of OpenTelemetry for logs, metrics and traces.
β’ You will work with existing tools such as ELK / OpenSearch, Fluent Bit, Grafana and Prometheus.
β’ You define reusable patterns for instrumentation, dashboards, alerting and runbooks.
β’ You improve detection and diagnosis of data pipeline incidents, delays, failures and SLA breaches.
β’ You correlate telemetry with meaningful context such as pipeline, run ID, dataset, data product, owner and environment.
β’ You build actionable alerts and reduce noise from non-relevant or redundant alerts.
β’ You support incident response, root cause analysis and post-incident improvements.
β’ You automate observability onboarding using CI/CD, Infrastructure as Code and reusable templates where relevant.
β’ You enable self-service observ...