Job requirements and application form

Senior DevOps Engineer Team Lead
(552803)

Job description

We are seeking for Senior DevOps Engineer Team Lead to lead a small, hands-on DevOps team operating our Azure and Hetzner estates for both Predict⁺ and EI. Own reliability, security (ISO27001), and cost efficiency; drive incident response and an on-call rotation; stay deeply hands-on across Kubernetes, IaC, CI/CD, networking, and monitoring. Key responsibilities • Team leadership: roadmap ownership, mentoring, SLAs/SLOs, and postmortems. • Kubernetes (self-hosted & AKS): upgrades, scaling, backup/restore/DR, and security hardening. • Azure & colo integration: subscriptions, identity, networking, cost; connectivity with on‑prem colo (VPN/storage). • IaC & automation: Terraform or Pulumi (plus Ansible) to standardize infra. • CI/CD at scale: Jenkins / Azure DevOps / GitLab (build, tests, security scans, rollouts/rollbacks). • Monitoring & incident response: Prometheus, Grafana, Zabbix; actionable alerts, SLO dashboards, runbooks, on‑call. • Security & compliance: ISO27001 controls, secrets management, least privilege, image scanning,logging/audit.

Job requirements

Must‑have qualifications (Team Lead, 7+ years) • 7+ years in DevOps/SRE with recent team‑lead responsibilities. • Self‑hosted Kubernetes and Azure AKS (hands‑on). • Terraform or Pulumi (Ansible for configuration management). • CI/CD pipelines: Jenkins and Azure DevOps / GitLab. • IT background with on‑prem colocation (servers/storage/networking, VPNs). • Managing Azure environments (subscriptions, identity, networking, security, cost). • Networking: TCP/IP, DNS, VPNs, load balancers, firewalls; strong troubleshooting. • Monitoring with Prometheus, Grafana, Zabbix. • Strong Linux fundamentals and scripting (Bash/Python). Nice to have / Advantages • Experience supporting globally distributed, customer‑facing production systems. • TimescaleDB & ClickHouse operations (backup/restore, performance, retention). • Keycloak/OIDC/SAML; SIEM/ELK/Wazuh; SAST/DAST. • Container registries (Harbor/ACR) and SBOM practices. • Cost optimization, multi‑cloud exposure, GPU workloads for AI/MLr service issues.loud environments (Azure, Hetzner, or others).

Submit application

*Full name:
*Phone:
*Email:
*CV:
Drag & drop a file here to upload
or
Select file

Share this job opportunity with friends: