Key ResponsibilitiesCluster Operations & ManagementManage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business unitsEnsure optimal performance, scalability, and reliability of distributed systemsInfrastructure Platform DevelopmentDesign, build, and enhance infrastructure operation platformsDevelop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized loggingDrive platform standardization and automation initiativesHigh Availability & ReliabilityEnsure maximum uptime for production services through proactive monitoring and incident responseContinuously optimize service architecture, deployment strategies, and operational processesImplement and maintain SLA/SLO frameworks and reliability engineering practicesAutomation & Process ImprovementLead the development of automated operations and maintenance systemsCreate self-service tools and workflows to improve team productivityEstablish best practices for infrastructure such as code and configuration managementRequired QualificationsExperience & Education2+ years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE)Bachelor's degree in Computer Science, Engineering, or related technical field preferredCloud & InfrastructureExperience with public cloud platforms (AWS, Azure, or GCP) is highly valuedStrong understanding of large-scale internet architecture and distributed systemsProven experience with infrastructure monitoring, logging, and observability toolsTechnical SkillsProficiency in scripting and automation using Shell, Python, or similar languagesStrong knowledge of containerization technologies (Kubernetes, Docker)Hands-on experience operating production-grade container clusters and managing CI/CD pipelinesStrong familiarity with common infrastructure components: Nginx, MySQL, Redis, Kafka, ElasticsearchAdvanced Networking (Preferred)Experience with Service Mesh architectures, Cilium CNI, and eBPF technologiesUnderstanding network security, load balancing, and traffic managementKnowledge of cloud-native networking patterns and best practicesAbout Manus AIManus is a general AI agent that bridges minds and actions: it doesn't just think, it delivers results. Manus excels at various tasks in work and life, getting everything done while you rest. At Manus AI, we offer a highly collaborative and innovative environment where experts across engineering, research, and business come together to push the boundaries of AI applications. If you're passionate about cutting-edge technology and making a real impact, we’d love to hear from you!Contact us: recruiting@manus.im