Experience
DevOps and Site Reliability Engineer
CraftSchoolship · San Jose, Remote · Oct 2023 - Present
Key Highlights
- Reduced AWS costs by 45% through FinOps practices
- Automated 70% of operational tasks
- Reduced MTTR by 50% with monitoring solutions
View all responsibilities
- Designed and built cloud architecture on AWS, including ALBs and multi-region EKS clusters, by developing Terraform and CloudFormation templates, ensuring infrastructure consistency and reducing cluster provisioning time to 10 minutes.
- Applied FinOps practices using Kubecost and load testing to analyze cloud spend and optimize resources, reducing AWS costs by 45%.
- Automated 70% of operational tasks using Python and Bash scripts with cron scheduling, reducing manual effort and operational overhead.
- Managed and troubleshot production Kubernetes clusters, ensuring reliability, availability, and security.
- Built a deployment automation platform using GitHub Actions, Helm, and Argo CD, providing a centralized way to manage application deployments and configurations.
- Implemented CI/CD pipelines with GitHub Actions covering linting, testing, build, and deployment, improving code quality and reducing deployment time to under 5 minutes.
- Managed and optimized PostgreSQL clusters, improving performance and reducing resources usage by 35%.
- Automated backup and restore workflows for Kubernetes resources and PVs (EBS, EFS), improving disaster recovery readiness.
- Created Grafana dashboards for analyzing Loki logs, improving debugging and reducing troubleshooting time.
- Built monitoring and alerting using Prometheus and Grafana, helping reduce MTTR by 50%.
- Participated in on-call rotations, handling incidents and contributing to postmortems, documentation, and SOPs.
- Implemented centralized authentication using Keycloak and managed company workspace, reducing employee on-boarding time by 70%.
- Developed backend systems for payments, user management, and analytics, including AI-driven features, by building and maintaining REST APIs using Python and Go.
- Implemented a service mesh using Istio and configured Jaeger distributed tracing to improve service observability and debug latency issues.
Software Developer Intern
Box2Home · Sousse, Hybrid · Feb 2023 - June 2023
Key Highlights
- Built custom internal platform replacing expensive third-party system
- Managed 100+ MySQL tables with NestJS, React, and Prisma
View all responsibilities
- Developed a custom internal platform using NestJS, React and Prisma to replace an expensive third-party system, allowing developers to easily view and manage 100+ MySQL tables.
- Implemented operations logging and role-based user management to ensure data integrity and security.
- Deployed the platform with Docker and AWS ECS, enabling reliable and faster environment provisioning.