Connect with us


Five Koans of Software Architecture

Random advice I find myself repeating a lot…

“Those who should decide on the architecture are those that will be on call for it”

Software architecture is fun. So much so that there’s never any shortage of smart people eager to jump in with their opinions. In my various engineering leadership roles throughout my career so far I have received many unsolicited architecture diagrams from ambitious, well meaning, software engineers absolutely convinced that they have found a brilliant solution to a really hard problem I owned.

My first question is always the same: “…are you going to be on call for this?”

Nothing is more effective at defusing and dispersing rubberneckers and backseat drivers than that question. I mean … the people who say yes get brought into the inner circle — obviously — but those people are few and far between.

At the end of the day the people who need to make the decisions around what the technology looks like are the people who are going to be woken up at 3am when it goes wrong. Accountability is important at work, but it’s essential in engineering.

A lot of engineers get pulled into the trap of thinking there’s one right way to build a thing and allow themselves to be drawn into arguments over technical correctness. Those fights are likely to be difficult enough to handle between colleagues within the on call team, you don’t need more opinions from people who want the glory of a win but won’t put any skin in the game.

The most important feature of an architecture is that the team running it feel comfortable with running it. Solving every other problem comes second.

“Build it the stupid way first”

Most attempts to make systems “scalable” and “resilient” from the beginning fall back on assumptions of what operating conditions will be that aren’t realistic. You end up making the system complex in places where complexity doesn’t serve any of your interests. All because you’re afraid of design flaws making you look dumb.

It’s much wiser in my experience to start with a simple and naive design, figure out how it will go wrong in theory, monitor or verify that before fixing. That way instead of having to resolve unnecessary complexity when the system scales differently than you assumed it would, you get to pocket the benefits of simplicity.

Those stories you keep hearing about the 10x engineer who built a system that scaled perfectly for ten years? Let’s just say it out loud: they’re urban legends. No one does that and it’s not a realistic standard to hold yourself to. You know no one does it because the 10x engineer in question is always a colleague of a colleague. Or someone who used to work at your company but left just before you got there. Or some traveling gray beard who only does consulting work that an early stage VC brought in as an advisor ages ago but has since Bagger Vanced out of the company. It’s never people you know and the systems in these stories never involve the system being unchanged for ten years. In all the versions of this urban legend I’ve heard, the system is never just left in a basement to run for 10 years without any updates. And yet, the magical scaling factor is never attributed to the years of diligent maintenance and operations work. The fact that whole teams of people have been iterating the system this whole time doesn’t count. If it scales it’s obviously because a wizard/architect foresaw all the problems it would ever have and fixed them before launch.

Keep things simple and solve only the problems you know you have.

“Optimization is death”

Similar to the last koan although almost always about automation. Look, it’s a great idea to automate things, but for every task you automate you make the system a little less resilient. That’s because automation and other forms of optimization rely on steps or interactions being generalizable. Every time you optimize something to make it faster, more memory efficient or involve fewer humans, you’re crossing your fingers and hoping the generalization holds. The more you do this in a system, the more likely edge cases are to create surprises.

Eventually there are so many edge cases that your system just stops working, either because you’ve optimized to such a specific and narrow happy path that the number of customers for whom the technology doesn’t really fit is greater than the number of customers for whom it does, or because it literally stops working and becomes so brittle, accidents are a normal experience.

People tend to just assume that any and all optimizations are good, but no change to system boundaries or capacity is free. Write some SLOs and figure out whether further generalizations make sense given your system goals.

“Graph databases are lies”

I keep waiting for this one to go out of fashion and it never does. Don’t get me wrong, I love a good graph. But most of the time when someone brings up graph databases it is either preceded or followed by some kind of assertion that there’s no possible way to store the data in a table, only a (dramatic music) graph database will do.

Except the data storage layer in a graph database is just a table … typically Postgres although a few use Cassandra. So obviously if we can use a graph database we actually can just store the data in a table.

The issue is not if graph databases are good or bad, it’s more that engineers tend to idealize technology they don’t understand and criticize technology they do. People who insist graph databases are more performant than relational databases don’t know enough about graph databases to run them in production unsupervised. Graph databases have their uses and can be really valuable in some cases, but you should only choose a graph database if you can explain what those uses are (and what they aren’t).

In general be skeptical of anyone who can’t explain a full set of pros and cons along with their recommendation.

“Don’t let dependencies own the logic”

This is a new one for me. Like most of the engineers reading this column I have generally worked on projects where I own and am responsible for the environment. Until recently that is. My deploy story today involves deploying to some environments I control, some I don’t and some that are … uh … complicated, either very old, very slow, resource limited or all of the above.

This has taught me one very important lesson: know what the value of the technology you’re building is and know where the logic that creates that value lives.

And then make damn sure you own that logic.

To put it a different way: If the core functionality of your software is dependent on an AWS hosted service … then you can’t easily move your software off AWS. Managed services and SaaS is great, but the first draft of your architecture should focus on making sure those kind of dependencies don’t lock you in. Once you know where the logic that defines your value is, then you can go to town with the managed services.

Owning the logic doesn’t necessarily mean writing the code yourself. You can use open source, no need to build your own database solution here. But pay attention to your licenses! So many software engineers (myself included) don’t even look at licenses. If it’s on Github we assume it’s free to use however we want and that’s not true.

Architecture Is Easier When It’s Harder

Building software is hard and software people put themselves under a lot of pressure to do it right. Unfortunately most of our instincts on how to do that point us in the wrong direction. You shouldn’t jump in with brilliant ideas if you’re not going to own them. You should keep things simple for as long as possible until you know what your usage goals are. You shouldn’t assume technology you don’t understand has some magic or secret sauce under the hood. You shouldn’t default to products and tools owned by other people until you know what value your software brings to the table.

Full Article: Marianne Bellotti @ Medium
👋: 150
Apr 15, 2022