The way we design and deploy web applications is changing, in ways that don’t seem to make sense.
A while back, I gave a talk at a local DevOps event and then a later version of that talk at “All Day DevOps”. It was essentially about questioning our foundations. While I called out agile and microservices as examples, I didn’t dig in completely.
Today I want to focus a bit on microservices, but also use this as an example to say “step back and consider who we are listening to and why”.
Microservices almost feels like one example of an existential battle between engineering principles and trendiness. I’ve only been in software a bit of time (a couple of decades if you count internships) so I imagine for others older than I there are tons of examples of apparent backsteps, of technology and approaches cast off for no reason.
We’ve probably fought battles like this many times. We could perhaps allude to Neo repeating his mission in the confusing 2nd or 3rd Matrix sequels, if those movies in fact had ever existed. But I disgress.
Let’s start with an example as we start dissecting ideas and where they came from.
Modern Containers Come From PaaS
Docker Containers as a concept were not new things, LXC existed before. Lots of technology existed before that. Jails. Zones. Older stuff. No need for a history lesson here. Google if you want it.
But with the current-wave of containers specifically, as a company, Docker evolved from “dotcloud”, a platform as a service company. As we all know, most PaaS platforms today have evolved away from PaaS and now market themselves as platforms to run containers. A few other platforms have also pivoted in similar ways. My point here is just that containers make sense in the internal Enterprise PaaS use case. They were made for that case. Understanding this case understands what they are good for — utilization in cases where groups of like-minded employees work together, and security needs are not maximal.
Most significantly, the enterprise PaaS use case is one where with a very high degree of certainty the code running in question will never evolve to run beyond the need of one VM. An example would be a timesheet application. It’s not going to need to scale much.
Web Application Scaling Invalidates The Utilization Argument
Web apps/services are usually pretty simple, or should be.
In the classic web application model, an application runs on many VMs for both fault tolerance and horizontal scaling, situated behind a load balancer, eventually speaking to a data store.
Even if composed of many services, as load increases, eventually each application will grow to need more than one VM. In a fault tolerant use case, EVERY application will need at least two VMs. This will happen if your application is at least even minimally successful.
Say the auto-scaling point of an application is when an application hits (arbitrarily) 80% average CPU. In this case, the application has already consumed the resources of the VM and there is no room for more work within the VM. So is utilization still a valid concern where we need something smaller running inside that VM? Not really.
In this web application use case, the “utilization” argument of containers falls flat.
Developers like docker because it means they don’t have to write much automation to get their app out there. They describe apps in docker files.
The thing most resembling a Docker file today is a Packer file, because a Docker file is basically a direct copy of the concept. Packer is great, easy to use application that took a shell script (more or less) and produced images suitable for use in clouds or VMWare or what have you.
We must give Docker Containers some credit for making this approach to immutable infrastructure more well known, but it wasn’t invented new for containers, it was already there.
In truth, you don’t even really need Packer, because Packer can be replicated (for basic use cases) in about 250 lines of Python. Before I wrote that simple hack, there were also many other very simple ways to construct images.
So, with the utilization argument out the window, and the “build file” argument reduced to “this isn’t new — we already had that”, we are left with the idea that what is novel about containers is that it is a way to “ship” applications between a build system and a cloud.
The problem these blobs introduce are obfuscated dependencies and potentially non-reproducible environments.
One of the microservices darlings at the moment is the language Go, which typically pulls software dependencies from the tip of various GitHub branches. A build of a given project today (to fix a bug, etc) may not even be able to build tomorrow. It may require substantial re-engineering day to day to keep the software working. This is because the Go community didn’t care about dependencies at all until something like last month — I suspect they’ll come around.
Even without Go, container content should be built from source every single time, so that security configurations can be applied. You should never run opaque blobs that you do not know how to recreate, and are not guaranteed to be able to re-create. yet it is pitched and sold as a way to distribute binary blobs — the one thing that ops should never trust. Images should come out of a private build system every single time.
Security And Reliability
In addition the opaqueness, we also lose security (access between containers) and reliability (containers don’t stay up). I regularly here from folks who have to restart Docker containers with scripts and such, or reports of container processes randomly killing other processes. I don’t really need to dig into this, but compared with VMs, things are not as bulletproof yet. As I came from a place where turning on SELinux was a good idea and because I’m a strong believer in reliability and correctness, I think these speak for themselves — this tech reduces security and reliability, which in turn increases operations and security team workload.
Efficiency In Calls Between Services
In adopting an architecture of a lot of network communication that is highly “gossipy”, the efficiency of that application is going to be a lot lower than one that can make local function calls.
This will result in consuming MORE resources for a given operation, which in turn could mean increased costs (even environmental impacts through electricity consumption!) as well as a slower experience.
The reason for decoupling the code? Questionable. It’s fine to have a few tiers where large components of the stack may not be as widely used (or there is a programming language boundary), but a ton of different services in a tangled web is not a good engineering design.
It is a lack of design.
Managing the Management
Why is cloud attractive?
It’s attractive because application developers can focus on just writing the pieces of software that are unique to the particular problem they are trying to solve. Similarly, operations professionals can focus on just running the infrastructure that is unique to the problems as well.
In a scenario where we have a strong public cloud (say AWS), the need to run additional scheduling infrastructure on top of another scheduler seems kind of cross-purposes.
When that software itself requires additional software (like zookeeper, which in turn requires etcd), we introduce additional parts that the cloud itself does not run.
We complicate the voyage for ops by making MORE things to maintain, by the virtue of introducing something that allegedly was supposed to reduce the number of things we maintain.
Where before we only had to worry about the application falling over, now we have 5 or 6 underpinning applications that we also have to worry about scaling, upgrading, and how THOSE are going to fall over. Not only are the containers themselves less reliable, our whole “cloud” itself is less reliable, and the ops team needs to know how to upgrade it, and follow the development decisions of many infrastructure projects behind it.
Debugging Gets Complicated
There’s a quote I read somewhere on twitter that said (roughly) “The great thing about Microservices is that every outage can be treated like a murder mystery”.
This is because when you have 145 different applications talking to one another, you can’t really be sure where a logic error is that fed invalid data from one to another. If one crashed, who caused the crash? How were all the calls made in sequence between them.
In the get-off-my-lawn olden days, we ran a minimal tiered architecture, with maybe a background jobs layer, and a web application layer. When something failed, exceptions would be able to log a full traceback throughout the stack.
We were not reliant on something like Splunk or SumoLogic to attempt to figure out what happened in the great sea of projects.
Worst still, we reduce compile time checking across services and push more into runtime space.
Communication Solved By Not Communicating
The often misunderstood aphorism of “Conway’s Law” says that the software organizations produce mirrors the structure of those organizations. (Unlike some who get it wrong, it does not say that it SHOULD mirror the structure of those organizations).
While design-by-contract is a noble aim, the idea that the only part of a piece of software that matters is it’s contract is questionable.
Microservices are often adopted by teams that grew too large (for whatever reason) to handle code review, or decided to not collaborate on software architecture. They facilitate less communication.
Given, I understand software often feels much like an art, and being able to segment your work and be a free spirit is liberating, but it brings software closer to “art” and further than “engineering”.
In social practice, in microservices environments, you also have less people to talk to, less people to collaborate with, each team building up walls around itself that are isolating and potentially damaging to culture.
I’ve summed this up before by saying microservices are a technological solution to a people problem. Except tech problems don’t solve people problems most of the time — you have to fix the people problem. So do that, and go back to having working stacktraces, code sharing, and compile time checking.
While I don’t want to reduce employment, you can probably have less developer teams this way too — because your developers will be more productive.
Is Continuous Deployment Really Important?
First off, I’m absolutely in favor of zero-downtime and fully automated deployments. This has been a huge part of my life.
This should be no surprise given some of the things I’ve worked on. I’m completely for automated builds and automated tests.But should deployment beyond-testing and stage automatically occur?
Often the example used for pushing for “agile” and “continuous deployment” is a hosted web application company, where a marketing executive wishes to push out a new ad campaign. How many orgs does this fit?
Most organizations do not have these needs. If there is a utility service, should the team automatically get the service to hop into prod when the team desires?
I’d say usually no, there should be some engineering process signoff, as well as potentially some review with what goes into that service. Small behaviors, even that do not violate the contract, can seep in and add unneeded stability.
Even if you have a relatively monolithic app, you can still do A/B deployments, you just deploy the whole app instead of part of it. You need bulletproof tests to do microservices, so we’d hope you still had those bulletproof tests in a more classic deployment model.
Disposability And MVP Design
One of the recent articles I’ve read, unfortunately endorsed by Martin Fowler no less, argued that microservices endorsed architectures that could be thrown away in 5 months. That’s a huge loss of planning. As a developer who enjoys design for efficiency and quality, that feels like a bad idea. For someone who knows behavior that is codified can easily be broken, that is a bad idea. Software design (by developers) is incredibly important, and our application stacks are not just a sea of functions that desire to call each other.
Whether The Emporer’s Clothes Are Nice Enough
We’ve been told that these software systems — microservice architectures, continous deployment, hyper-agile disposable architectures, are things that go with them are things that we should be adopting.
These topics are sold by repeated conference speakers, popular authors, and internet personalities as being better than the other options. Yet, some of these speakers haven’t run an infrastructure for an incredibly long amount of time. They haven’t had to argue for cohesive software architectures recently in large software teams. They have seen some failures, but the solutions they advocate are not solutions they have actually had to use.
Diversity of experience and being exposed to a large number of competing viewpoints in tech is how opinions are formed. While I wouldn’t necessarily say changing jobs is good for your resume, the more you see, the more diverse places you work for, you start to acquire a very good instinct for where organizations break down. It was good for me, to build experience, even though the reasons I did it were only related to wanting to be happy.
So yeah, I question the religion we seem to be being sold. If we find it hard to communicate, splitting our application into chunks isn’t going to make it better. Running more software to juggle our applications on top of the software we already had to run isn’t going to make it better.
Tech is changing very fast, but I don’t know if it is evolving per se. It’s not a natural selection change. A lot of the changes we are seeing happen feel like the result of collective group-think and a desire for newness, they are driven by an incomplete picture, and I think we might be revisiting them soon.
Unfortunately many of the things we finally eliminate or push back on do lasting damage for decades before we figure it out. (I left three jobs because of Scrum, including at one company who later bought one of my companies — so Scrum was expensive!)
So yeah, if people see me skipping DevOps conference talks, that’s why. I don’t believe in it, I have evidence and experience to back it up, but my evidence isn’t trendy.
I’m not really into ops fashions. I’m a fan of software design well before I was a fan of ops. I worked in writing software for operationg because of an affinity for distributed applications and a minor-dislike for UI development.
I’m a fan of minimal moving parts. Of efficiency, of security, simplicity, and a really really big fan of inter-team communication. I love whiteboards and getting people together, and yep, objects. Microservices/container-infrastructure does seem to often run counter for these things, and I do not think has proved out the benefits of the pitch. It’s a hypothesis without a question.
But for me at least, it makes software much less appealing as a developer and someone who has an interest in making software simple to run in production.
And This Wasn’t About Containers?
That’s a really long-winded take-down, right? You’ll either like it or maybe think I’m really misguided. Well I didn’t really even have an agenda on saying things about microservices up front.
My point is software development — we have people causing changes right now, and many are changes in fashion that I don’t agree with. This is not because I’m contrary or want to see them not succeed — but because I believe in fewer moving parts, security, reliability, reproducibility and also keeping things simple on ops teams.
So while to me, it feels there is no real engineering behind many of these choices, they are just pushed through with a lot of excitement. When people decide they don’t like one framework or system or way, they advocate for a new one. Where are we leading it and why? We don’t know. Can we prove it is better? We can’t.
My question is, will it ever stabilize?
I find most of this churn in the things I see — web development, ops. I don’t follow the embedded world and the desktop world — or the IT or networking world — as much. Maybe it’s a little more organized there, where the application problems are harder and more rigor is required to play in the sandbox.
I would like to see software development more closely approach a formality of process. To be closer to bridge building, and less in fashion. I would like software to not be driven by developer-advocates and vendors, but by the people on the ground that build and run tools.
Now, perhaps a civil engineer will come along and document a lot of huge churn in the way we design intersections and bust me up, or an EE will say the circuit design community is fractured too. That’ll probably happen. I kind of like the SPUI. But the thing I want is … ENGINEERING.
Ok, back to regularly scheduled programming! Happy thoughts!