
Manu Balakrishnan Sreekumari
Site Reliability Engineer & Cloud Architect
Professional Summary
A highly accomplished Site Reliability Engineer with 18 years of progressive experience in designing, evaluating, implementing, and maintaining highly available and scalable systems. Proven track record of ensuring system reliability and uptime, managing incidents, and implementing long-term solutions. A collaborative professional who offers well-thought-out solutions, excels at identifying and automating toil, and is adept at critical problem-solving and task coordination. Possesses strong expertise in cloud infrastructure (AWS, VMware, Nutanix), CI/CD, Kubernetes, and observability tools, making a valuable asset to any organization requiring reliable and scalable systems.
Key Achievements
- 18 years of progressive experience in Site Reliability Engineering
- $200K+ annual savings through AWS SSM patching tool automation
- 30% cost reduction in development accounts through resource optimization
- 20% cost savings through EKS/ECS consolidation
- Managed infrastructure with 2000+ VMs across multiple cloud providers
- Reduced QA turnaround time from weeks to hours through containerization
- Successfully managed teams across multiple time zones (North America, Asia Pacific)
Certifications
AWS Certified Solutions Architect Associate
Validation Number: 4X7D1B1LEME4QN3P
Technical Skills
Infrastructure & Cloud
- AWS (EC2, ECS, EKS, Lambda, S3, RDS, CloudFront, Route53, SSM, Transfer Family)
- Multi-cloud environments (AWS, Azure, GCP)
- VMware vSphere, ESXi, vCloud, Lab Manager
- Nutanix HCI, AHV
- Infrastructure as Code (Terraform, CloudFormation)
- Managed hybrid environment with 2000+ Linux and Windows VMs
Container & Orchestration
- Kubernetes (K8s, EKS, AKS)
- Docker, Containerd, Amazon ECS
- Service Mesh (Istio)
- GitOps (ArgoCD)
- Helm Charts
Automation & DevOps
- CI/CD (GitLab, GitHub, Azure DevOps)
- Configuration Management (Ansible, Packer)
- Scripting (Python, Bash, basic Golang)
- Terraform, Packer, Ansible
Observability & Monitoring
- Prometheus, Grafana
- Splunk, Honeycomb
- PagerDuty incident management
- Log analysis and root cause investigation
Databases
- Oracle, DB2, MySQL, MSSQL, PostgreSQL
- Database deployment and management
- Performance tuning
Systems
- Linux (extensive experience)
- Unix systems (HP-UX, AIX)
- Windows Server
- NetApp Storage Support
SRE Practices
- Incident response and management
- Toil reduction and automation
- Service level objectives (SLOs)
- Post-incident reviews and continuous improvement
Professional Experience
Site Reliability Engineer
GE Healthcare (Contract via TCS)
Toronto, ON | 05/2024 to Current
- Developed tools to automate and streamline operations
- Collaborated with teams to validate and verify releases, providing feedback to improve deployment processes
- Documented validation processes for improved consistency
- Worked effectively with operations teams across different time zones
- Identified and automated processes to reduce toil
- Created an AWS Transfer Family-based tool for zero-downtime transfer of terabytes of data
- Developed an AWS Nuke-based tool to identify and remove unused cloud resources, saving approximately 30% in development accounts
Site Reliability Engineer
KAR Global
Toronto, ON | 05/2022 to 05/2024
- Developed and maintained CI/CD pipeline to deploy applications and their resources in various environments through Azure DevOps
- Deployed and used observability tools Splunk, Honeycomb, Grafana, and Prometheus to evaluate and enhance reliability of designs on AWS
- Consolidated multiple applications to EKS/ECS, saving about 20% cost compared to the previous year
- Automated and improved system scalability using tools like Terraform
- Created an AWS SSM based patching tool for cloud and On-premise systems, saving the company $200k annually
- Responded to PagerDuty alerts and troubleshoot incidents, reviewed logs and alerts to investigate root causes and suggest improvements
- Maintained internally developed tool that assists teams in speeding up incident response
- Maintained and promoted adoption of Kubernetes using Helm Charts, EKS, Argo CD and Terraform
- Responded to security alerts on ORCA and took corrective measures
- Collaborated with cross-functional teams to develop, test, and deploy scalable software solutions
- Mentored co-op engineers, sharing knowledge of best practices for site reliability engineering methodologies
- Established a culture of continuous improvement within the team by encouraging feedback loops and iterative development processes
IT Software Engineer Principal 3
Progress Software Development Private Limited
Hyderabad, Telangana | 07/2009 to 04/2022
- Worked with development team to design, evaluate, automate and maintain on-premise and cloud development environments
- Developed tools and processes for automating CI/CD, focused on improving software delivery throughout the SDLC
- Maintained the development infrastructure in North America and Asia Pacific
- Planned, designed, and coordinated the evaluation, installation, upgrade and integration of various software and systems
- Developed applications to provision Virtual Machines in VMware and Nutanix, contributing to datacenter consolidation from Hyderabad to Boston
- Migrated QA database and applications to Kubernetes-based containers, reducing turnaround time from a week to hours
- Maintained the Kubernetes-based container infrastructure
- Monitored system performance, reviewed logs to perform root cause analysis and made recommendations to optimize performance
- Participated in the Agile Scrum process
- Worked with all teams across the board and engaged with stakeholders like product owners and quality analysts
- Created end-user and administrator guide for self-service provisioning portal vCloud Director
- Worked with software vendors to evaluate new technologies and participated in all phases from PoC to Production
Education
Bachelor of Engineering - Electronics and Communication Engineering
Manonmaniam Sundaranar University, Tamil Nadu, India
Diploma - Electronics and Communication Engineering
State Board of Technical Education and Training, Tamil Nadu, India