Infrastructure Headaches – Where’s the Tylenol?

For the last few years, I’ve been deep into the weeds of infrastructure provisioning as Head of Infrastructure at Ketch, (now) a Series A startup. We’ve come so far on the cloud journey since the early days of AWS, and yet still, I see the same pain points repeated again and again.

Cloud adoption has progressed like The Borg across the galaxy, slowly assimilating everything in its path. But the thing about The Borg is that while it is relentless, it isn’t pretty. Their spaceship looks like a giant cube, but really it’s a giant amalgamation of robot parts. Individual borgs have a bunch of android paraphernalia hanging out of their hairless skulls. “Individual” borgs (technically there aren’t individuals as a collective organism) have faces only a cold, artificial intelligence mother could love.

Cloud infrastructure is the same way. In 2023, infra has exploded with different tools like Terraform, Kubernetes, Helm, Prometheus, etc., crowding the market as cloud adoption has grown. At the same time, the current trend in microservices architecture has also complicated everything it has touched (again, like The Borg!). 

With microservices, instead of building a beehive where every drone is the same, we’re building a beehive where each drone is a different species of insect. Each loosely-coupled component is independent, autonomous, and different. Managing a plethora of microservices that need their own deployment, environment settings, and communication can be a nightmare.

Infrastructure today is a headache that has me reaching for my Costco-sized bottle of Tylenol first thing in the morning and a liter of lager in the evening. Here’s my analysis of why things have gotten to be this way.

Infra Best Practices Are Mission Critical

From previous startups (such as Krux, which exited to Salesforce in 2016), I knew that best practices for setting up your initial infrastructure would pay dividends down the line. Ironically, this is precisely why infrastructure is such a pain in the butt today: it’s really freaking important mate! This is for a few reasons:

Scaling – this is the most obvious; as your startup grows, things get more complicated and more annoying and you need to do them more efficiently with your larger scale.
My infrastructure teams make heavy use of auto-scaling groups. We used to use resource management tool like Mesos, but now we use container orchestration systems like Kubernetes, and utilizing its capabilities like HorizontalPodAutoscaler for our highly elastic workloads.

Cost Management – also obvious; with more scale comes more costs, and your leadership hates costs. So let’s help the CTO along a bit by tagging our infrastructure, using spot instances, and adding lifecycle to storage objects with access patterns for different tiers of storage and eventual expiration. 

Security and Compliance – that CTO, he or she is always concerned about the “customers” and making sure that they can “trust us.” So using secure credentials, multifactor authentication, access reviews and following the principles of least privilege are top-of-mind security best practices that will make your CTO happy.

You’ve heard this litany before. Nothing here is groundbreaking. It is simple in theory but the devil is in the details to actually keep up to date with all these best practices across all your microservices all the time. Keep in mind not every startup has an infrastructure specialist from the get-go to focus on the minutiae.

Infra Burns Time

I’m someone who lives and breathes the latest in infra tools. But that’s my full-time job, and even then it’s hard to stay current. There are three major time-sucks facing developers in a startup when it comes to infrastructure:

Infra is time-intensive to learn: With more and more tools being launched into the market that solve more and more specific problems, it’s just plain hard to stay up-to-date with them all. The cloud providers themselves are launching new services every year. Just going through the changelog of Terraform module and Kubernetes API changes is a slog.

Infra is time-intensive to implement: After learning the tools in the first place, you have to test out if it meets your requirements and what the best practices are for implementing infra to your product’s stack. 

Infra is time-intensive to maintain: As current infrastructure is up and running in production, upgrades and security patches need to be planned and rehearsed before actually scheduling downtime and deploying the changes. There is also a cost to this with blue-green deployments as you spin up new infrastructure while the current infra is still running.

So much time! That’s time that would be better spent watching Formula One or capturing nebulas with your SkyWatcher Esprit 100ED or, you know, building net new product to solve customer problems. Engineers should be focused on building their product, not the product’s infrastructure.

The Cloud is Complex

Navigating hundreds of cloud products is hard. It’s only getting harder as the cloud providers jam the market with more and more tools. This isn’t just about the time investment searching through a competitive field of offerings, reading up on them, and getting knowledgeable enough to make a decision. It is about the search cost to do so while evaluating the impartial best fit for your startup and the peculiarities of your product. There are too many marketers out there muddying the conversation, too many power users with far too nuanced points of view.

The cloud infra market today is like Amazon’s review system. Every product is 4 ½ stars, and there are hundreds of fake reviews. We need a ‘Wirecutter’ just to provide one simple, clear-cut and reliable recommendation. We need an opinionated architecture developed and recommended by infrastructure experts.

Introducing Kapstan

This is all just a long-winded way of saying: I am so excited for Kapstan and very happy to be an advisor for the company. Kapstan is seeking to solve these problems with its three core beliefs:

Infrastructure should be well-architected. Outsource and automate for secure, reliable infra.

Infrastructure should be instant. Quickly bring your product to market without worrying about infra provisioning. 

Infrastructure should be simple. It’s one simple no-code tool. Can’t get simpler than that.

If you have questions about Kapstan, don’t bother me – I like Kapstan because it let’s me spend time on things other than infrastructure for once. (Just kidding – feel free to reach out to the Kapstan team at hi@kapstan.io or even get in touch with me for some third-party validation at anton@ketch.com).