Protect yourself from scammers/fraudsters in recruitment
Read how to identify scam/fraudulent messages

Site Reliability Engineer

29647
  • Negotiable
  • Asia Pacific

About Our Client:

  • A pioneering AI innovation leader


Key Responsibilities:

  1. Manage production-grade container ecosystems (Kubernetes/Docker) and open-source component clusters across multiple business units
  2. Develop infrastructure operation platforms encompassing CI/CD pipelines, monitoring/alerting systems, and centralized logging solutions
  3. Execute rapid incident response protocols to maintain service continuity and minimize downtime
  4. Optimize system architecture and deployment strategies to ensure 99.9%+ service availability
  5. Spearhead automation programs to streamline operations and eliminate manual processes
  6. Partner with engineering teams to implement infrastructure-as-code (IaC) principles and reliability patterns
  7. Maintain 24/7 operational readiness through rotational on-call support


Qualifications:

  • 5+ years in SRE/DevOps roles managing large-scale distributed systems
  • Expert-level proficiency with AWS/Azure/GCP cloud ecosystems
  • Advanced Linux administration skills with hands-on maintenance experience
  • Scripting mastery in Python/Shell for operational automation
  • Deep technical expertise in optimizing Nginx, JVM, Redis, Kafka, and SQL/NoSQL datastores
  • Production experience managing Kubernetes clusters and containerized workloads
  • CI/CD implementation experience using GitLab CI/ArgoCD or comparable tools
  • Proven ability to diagnose complex system failures under time constraints
  • Effective remote collaboration skills across technical teams
  • Self-driven work ethic with strong technical ownership mentality
  • Full professional fluency in English and Chinese


Rachel Mou Divisional Director | China

Apply for this role