Their monorepo released every three weeks, gated by a flaky integration suite that could only run on a single Jenkins box. Hotfixes meant cherry-picks and a manual paging tree. The platform team spent half its week shepherding releases instead of building the platform.
We rebuilt CI on GitHub Actions with sharded test runners, moved deploys to ArgoCD, and put release health behind progressive delivery on EKS. We left behind runbooks the on-call rotation actually uses, not a 60-page Confluence doc.
Two-day release cycle, with hotfixes shipping in under an hour. The platform team got its week back. The change-failure rate dropped because every deploy is now small.
3wk → 2d
Release cycle
1h
Hotfix lead time
−62%
Change-failure rate
"The first release cycle that finished without anyone being paged on a weekend, we sent the team flowers. They earned it."