Back to Results

Site Reliability Engineer

29647 Posted: 23/05/2025

About Our Client:

Key Responsibilities:

Manage production-grade container ecosystems (Kubernetes/Docker) and open-source component clusters across multiple business units
Develop infrastructure operation platforms encompassing CI/CD pipelines, monitoring/alerting systems, and centralized logging solutions
Execute rapid incident response protocols to maintain service continuity and minimize downtime
Optimize system architecture and deployment strategies to ensure 99.9%+ service availability
Spearhead automation programs to streamline operations and eliminate manual processes
Partner with engineering teams to implement infrastructure-as-code (IaC) principles and reliability patterns
Maintain 24/7 operational readiness through rotational on-call support

Qualifications:

5+ years in SRE/DevOps roles managing large-scale distributed systems
Expert-level proficiency with AWS/Azure/GCP cloud ecosystems
Advanced Linux administration skills with hands-on maintenance experience
Scripting mastery in Python/Shell for operational automation
Deep technical expertise in optimizing Nginx, JVM, Redis, Kafka, and SQL/NoSQL datastores
Production experience managing Kubernetes clusters and containerized workloads
CI/CD implementation experience using GitLab CI/ArgoCD or comparable tools
Proven ability to diagnose complex system failures under time constraints
Effective remote collaboration skills across technical teams
Self-driven work ethic with strong technical ownership mentality
Full professional fluency in English and Chinese

Rachel Mou Divisional Director | China

First Name

Last Name

Telephone

Country

City

Nationality

Current Job Title

CV Uploader

Choose file

Message

By submitting this form you agree to our Privacy Policy

Recruitment