Benny migrated to a world-class DevEx on Kubernetes in under a day with Kapstan

Shyam Kumar
Shyam Kumar
In this article

What you'll learn

  • <1 day migration time of a Brownfield application+infrastructure onto Kapstan (ECS + App Runner → EKS)
  • Secrets Managements and Deployments - simplified
  • Kapstan: a luxurious DevEx that developers at enterprises can depend on, but finally brought to start-up teams
“If there was a Vercel for back-end equivalent that doesn’t lock me into a 3rd party infrastructure, we would use it. That’s what we saw in Kapstan.”
— Steve, CTO, Benny

Benny is a Fintech application designed to help individuals who use SNAP/EBT cards earn cash back on groceries and other purchases. Users of Benny typically link their EBT card, find available offers, and submit receipts to receive cash back. The app also allows users to check their EBT balance and transaction history instantly. Benny is free to use and provides added security through bank-level encryption to protect user data.

Background

Steve (CoFounder & CTO) never once considered platforms like Vercel when just getting started. While Vercel offered a clear path to relieve infrastructure management headaches with a world-class DevEx, Steve didn’t want to be tied down via vendor-lock in. Besides, he was familiar with AWS given his previous experience @ Block, and made the strategic decision to own his own infra. Steve and his two founding engineers were also comfortable with Terraform. As the CTO, Steve was responsible for all infrastructure decisions. He, along with his two founding engineers made a few key early decisions:

  1. Leverage ECS Fargate and AWS App Runner to deploy their 3 main services
  2. Use IaC (Terraform) to manage their resources
  3. Pay the fee for DataDog despite the premium price, because of the team’s familiarity with the product

Benny’s technical stack was the following:
  • Services: AWS ECS to run core business, customer-facing services, AWS App Runner for secondary services and log collection deployment tasks, AWS EC2 Bastion Hosts to allow external tools to connect to databases in a private VPC, and AWS Lambda for certain scraping functionalities
  • Database: Postgres RDS for data storage
  • Tools: Metabase and Retool
  • CI/CD: Images were built via GH Actions, pushed to ECR, and ECS Task Revision is updated to use the new image. After that, a new GH tag is released. CD was set up in Staging, with manual set ups in Production.
  • AI infrastructure: Azure AI services including computer vision and document intelligence.

The Problem

As the engineering team grew to a team size of 5, so did the the architectural challenges and workload. Benny faced multiple pain points:

  • Failed Migrations. After deployments on the server, if a migration fails, servers entered a crash and restart loop until rolled back.
  • Deployments. While Benny had already set up GH Actions, their method of deployments was very error prone. The team had a repository of GH Action Scripts that were used as needed, and constantly modified. Scripts would fail for various reasons, and it was not easy to promote a staging build to production. Lack of visibility reduced the speed of code promotion.
  • Secrets Mis-Management. The process for adding a new secret was to: a) define the Terraform variable, b) Add a Terraform Secrets Manager secret, c) Add the secret in Terraform Cloud as a workspace variable, d) Run Terraform apply which forced a new deployment, and e) Add the secret to 1Password. Leveraging EKS would have simplified this, but the overhead of managing Kubernetes clusters was not in the cards.
  • Traces. Unable to connect logs via trace ID, and having other general logs configuration challenges.

“When we began, it was just me and one engineer, and we were both comfortable with Terraform. I didn’t think I needed a tool like Kapstan. My base understanding changed, as we onboarded more engineers. It wasn’t worth folks learning it. We began seeing more silly issues pop up - once, someone forgot to update an environment variable and production got taken down. I realized we can solve for this by relying on a platform with built-in automation.” — Steve, CTO

Migrating onto Kapstan

Benny’s main use for Kapstan was to take advantage of a world-class DevEx running on Kubernetes; an experience typically reserved for the largest companies with large, expensive, and difficult to build platform engineering teams.

“At Cash App, we had great Developer Experience because the team really invested years into it. I wanted a nice UI with deploys in one place, full DevOps visibility in one pane of glass. Large companies have a luxurious DevExp set up that triples the velocity of engineering teams; why can’t we? — Steve (CTO)
Migrating existing applications and infrastructure onto Kapstan took less than a day
  1. Connect Kapstan to Benny’s AWS (1 min)
  2. Set up an EKS cluster in staging, that replicates Benny’s current infrastructure (1 hr)
  3. Modify GH Actions to create container images and push the images to GHCR (1 hr)
  4. Configure all applications via Kapstan (2 hrs)
  5. Benny team to test staging and learn to use Kapstan (3 days)

Production cutover happened 12 days later, once the Benny team was comfortable in Staging.

“I appreciated the level of care and communication. Staging was spun up incredibly quickly in a way that wasn’t disruptive to our own workflow. This was an incredibly smooth transition, especially for something as delicate and difficult as navigating someone else’s rats nest of infrastructure.”

Kapstan Recommendations

Along the migration journey, the Benny engineering team had a number of questions that the Kapstan team was ready to answer and assist via a 24/7 dedicated Slack channel, with hands-on engineering effort. These included, but were not limited to the following recommendations the Kapstan team provided and implemented:

  • Kubernetes: The Benny team knew they needed to migrate to Kubernetes eventually; but with the Kapstan workflows, the Benny team got a Vercel-like developer experience, while harnessing the power of Kubernetes under the hood.
  • Tailscale: The Benny team wanted uniform access to their infrastructure. Kapstan recommended and implemented Tailscale instead of OpenVPN for simplicity purposes.
  • PGBouncer: The Benny team wanted to provide access of their RDS to external tooling (Metabase, Retool), and the Kapstan team recommended PGBouncer for this exact use case, instead of other, more complicated and less battle-tested options.
  • Database Cost Control: Kapstan provided Benny the ability to provision DBs of multiple applications to point to a single RDS server to control cost. This saved roughly $1000/mo, or 60% of the original RDS cost.
  • Spot Instances: Benny now runs SPOT instances in staging. This can be toggled with a single click via Kapstan.

Most Used Features

Secrets Management

Kapstan Secrets Management makes it simple to copy configurations across environments. This process is typically terrible unless the team is leveraging k8s. Small companies that typically aren’t ready for k8s yet, don’t really have good options for Secrets Environments out of the box. The amount of tedious work required to properly and securely store a secret was a big problem.

“AWS Secrets Manager is ok, but we don’t want to interact with it. Managing that ourselves was hard. We didn’t want to throw all our secrets in a flat, encrypted text file, but we weren’t ready to manage EKS. So we ended up with a bunch of hacky scripts that would break half the time. Until Kapstan.” — Steve (CTO)

Kapstan’s environment variable system allows you to define variables at the environment or application level. Need to import a file? Mount it securely inside your container. Your secrets are securely stored in our Secret Manager.

Deploy

Benny did have Spot Checks in their GH Actions custom runners, and they would deploy into ECS and lambdas. This was done with a bunch of GH Action scripts, which became cumbersome to maintain.

“There were tons of action scripts. If we had a new repository, the team had to modify/refactor the script. Other times, individual parts of the script would fail (e.g. ECR would be a success, but ECS would time out). Having a single Deploy button automated with GH Actions that handled everything - that was a huge productivity boost for us.” — Steve (CTO)

Deploys are seamless. In Staging, it is fully automated, and in Production, it is just a single click. No more hacky scripts.

Slack Notifications

Kapstan also provides real time updates on slack about service upgrades, deployment or cron job failures and run time issues. Think of Kapstan as an always on, 24/7 DevOps support assistant.

Kapstan Value

After migrating to Kapstan, Benny’s DevOps toil was reduced to zero. The automation Kapstan platform gives Benny includes the following:

  • Secrets management - simplified
  • Zero downtime
  • Automatic cluster version upgrades
  • Spot instance control with a single click
  • DataDog agent management via the Kapstan UI - no need to manage any more complicated .yaml files
  • Resource right sizing via Out of Band System Metrics

We asked Steve what he thought about Kapstan after a few months on the platform. Here’s what he had to say:

“Large companies have a great Developer Experience because they can invest in it. WIth Kapstan, I get it right now. I get the best DevEx tools, running k8s on my infrastructure, with none of the usual overhead, with the expertise and fractional support of the best DevEx engineers I know.” -Steve (CTO)

Simplify your DevEx with a single platform

Schedule a demo