Title:  DevOps Engineer (SRE)

The DevOps Engineer (SRE)  will enable fast, safe change while keeping systems reliable and performant for customers. The role defines and manages service level indicators and objectives, treats reliability as a product feature, and uses software engineering to eliminate toil, improve incident and problem management, and strengthen change and release practices.

Key Responsibilities:

Reliability and Service Level Objective (SLO) Management

  • Define, track, and report Service Level Indicators (SLI), SLOs, and error tracking for critical services. Partner with product and platform owners to align reliability goals with customer impact.
    SLO attainment meets or exceeds targets. Errors are tracked and used to guide release pace. Weekly reliability status and trends are shared with stakeholders.

Incident Response and Problem Management

  • Participate in on-call. Lead or support incident triage, mitigation, and blameless post-incident reviews. Drive corrective actions to prevent recurrence.
    Mean time to recovery (MTTR) and time to detect improve quarter over quarter. Postmortems are completed for priority incidents with corrective actions closed within agreed SLAs.

Observability, Performance, and Capacity

  • Build actionable monitoring, alerting, and dashboards. Establish performance baselines, run load and chaos tests in preproduction, and plan capacity and disaster recovery (DR).
    High signal-to-noise alerting with low false positives. Performance and DR tests pass before releases. Capacity stays within targets with no resource-related outages.

Agile Collaboration and Communication

  • Work in agile ceremonies. Share reliability insights, codify runbooks, and mentor peers to spread SRE practices across teams.
    Consistent participation in stand-ups and planning. Runbooks updated each sprint. Measurable uplift in team adoption of SRE practices and tooling.

 

Job Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Relevant cloud and SRE certifications are a plus especially with AWS.
  • Minimum of 5 years in SRE, DevOps, platform engineering, or production operations. Proven work with SLOs and SLIs, incident response, observability, and automation. Experience operating services in cloud and containerized environments, specifically Amazon Web Services (AWS).
  • Must be willing to work in Ortigas, Pasig (Hybrid Work Setup).