الوظائف الحالية
اكتشف و تقدم بالطلب الآن
System Observability Engineer (m/f/d)
Contract
Abu Dhabi, United Arab Emirates
09.03.2026
System Observability Engineer
A technology-driven enterprise organization is seeking a System Observability Engineer to implement and manage comprehensive observability solutions across its platforms. The successful candidate will build and maintain monitoring, logging, and alerting frameworks that provide full visibility into system health, enabling proactive issue detection and informed operational decision-making.
Responsibilities:
- Design, implement, and manage observability platforms covering metrics, logs, and distributed tracing.
- Deploy and configure monitoring and alerting tools such as Grafana, Prometheus, Datadog, ELK Stack, or Dynatrace.
- Define and implement SLIs, SLOs, and error budgets aligned to service reliability requirements.
- Build dashboards and visualizations for operational, performance, and business-level metrics.
- Tune alerting thresholds to reduce noise and ensure all alerts are actionable and meaningful.
- Collaborate with DevOps, cloud, and application teams to instrument services and workloads for observability.
- Support root-cause analysis and performance investigations using observability data and tooling.
- Maintain and evolve the observability strategy as infrastructure and application landscapes grow.
- Develop runbooks and documentation for observability tooling, monitoring standards, and on-call procedures.
Qualifications and Skills:
- 4+ years of experience in systems monitoring, observability, or Site Reliability Engineering (SRE) roles.
- Hands-on experience with observability tools such as Grafana, Prometheus, Datadog, Dynatrace, or ELK Stack.
- Understanding of distributed tracing concepts and tools such as Jaeger, Zipkin, or OpenTelemetry.
- Experience instrumenting applications and infrastructure components for monitoring and alerting.
- Scripting ability in Python, Bash, or similar for automation and alerting customization.
- Knowledge of cloud-native monitoring services including CloudWatch, Azure Monitor, or GCP Operations Suite.
- Datadog, Dynatrace certification, or familiarity with SRE practices and reliability engineering principles is advantageous.