JOB RESPONSIBILITIES
- Build and evolve an Internal Developer Platform as a product, including self-service infrastructure, standardized CI/CD pipelines, and well-defined “golden paths” for engineering teams.
- Drive end-to-end automation of the software development lifecycle, aiming for near-zero manual intervention and elimination of operational toil.
- Lead and develop a team of DevOps/SRE engineers, fostering an automation-first, engineering-led culture with strong ownership and accountability.
- Scale best practices across engineering teams through enablement, training, and platform adoption rather than centralized control.
- Integrate AI tools (e.g. Cursor, Claude Code, Copilot) into engineering workflows to:
– accelerate development and infrastructure code generation
– automate incident analysis and root cause identification
– optimize CI/CD pipelines and testing processes - Proactively identify and implement opportunities to improve SDLC efficiency, reduce lead time, lower change failure rate, and minimize operational overhead.
- Design, build, and maintain highly scalable, reliable CI/CD systems with fast feedback loops.
- Implement and enforce automated quality gates, including unit/integration/e2e testing, security (SAST/DAST), and dependency and container scanning.
- Establish Infrastructure as Code (IaC) and Configuration as Code (CaC) as standard practices.
- Ensure consistency, reproducibility, and auditability across all environments.
- Own and evolve the observability stack (metrics, logs, traces), and drive adoption of SLO/SLI-based reliability practices.
- Implement proactive monitoring, alerting, and, where appropriate, automated remediation (self-healing systems).
- Drive reliability engineering practices, including incident management, blameless postmortems, and error budgets.
- Troubleshoot complex issues in distributed systems and resolve performance bottlenecks.
- Embed security into the platform (security by design), including RBAC, secrets management, network policies, and automated compliance controls.
- Optimize infrastructure and cloud costs and promoting efficient resource utilization.
- Collaborate closely with engineering, QA, and product teams to continuously improve developer experience.
- Train and enable teams to effectively leverage platform capabilities and AI-assisted development tools.
JOB REQUIREMENTS
- 5+ years’ experience in DevOps, SRE, or Platform Engineering, including leadership responsibilities.
- Strong hands-on experience building and operating scalable platforms and distributed systems.
- Proven track record in developing Internal Developer Platforms or similar self-service engineering ecosystems.
- Deep expertise in CI/CD, automation, and infrastructure at scale.
- Practical experience integrating AI tools (e.g. Cursor, Claude Code, Copilot) into engineering workflows.
- Demonstrated ability to improve SDLC metrics such as lead time, deployment frequency, and MTTR.
- Strong expertise in Kubernetes and container ecosystems.
- Solid experience with major cloud platforms (AWS, GCP, Azure).
- Proficiency in Infrastructure as Code (e.g. Terraform) and configuration management tools (e.g. Ansible).
- Strong scripting or programming skills (e.g. Python, Go, or similar).
- Experience with observability tooling (e.g. Prometheus, Grafana, OpenTelemetry).
- Good understanding of reliability engineering principles (SLOs, SLIs, error budgets).
- Experience with DevSecOps practices and modern security approaches.
- Familiarity with SQL and NoSQL databases.
- Strong systems thinking and ability to simplify complex technical environments.
- Excellent communication skills and the ability to influence engineering culture and practices.
- Experience working in fast-paced, high-scale environments.
- Cost optimisation experience would be advantageous.