Fundamentals of Monolithic Codebases
February 20, 2019
I’ll start this by saying that I believe service oriented archetecture is the right move for most web applications. I also believe that Monolithic codebases are not inherantly bad, even large ones, they just get progressively easier to get wrong over time.
Domain Driven Design
The Domain Driven Design book is very good. Like any book on software, it requires practice, critical thought, and reflection when implementing ideas out of it or it can get you into trouble. But there are a number of concepts in it which you will not find elsewhere.
The core premise of the book is that software built for automation of real-world activity can be viewed as a model of the domain in which it operates. If you build a model which is correct, maintenance and enhancement of the system over time will be refinements to the model, you wont ever code yourself into a corner. This shared understanding and communication between domain experts and developers is viewed as critical to the long term health of a software project.
Evans touches on many ideas both strategic and tactical in nature on how to build a domain model into software. When working under a monolithic archetecture, I believe the single most important of these is the “Bounded Context”
The Bounded Context
It is important to understand that in any complex domain, you will find a number of contexts.
For example, in e-commerce you have a public website which displays a catalog of products which people can browse, select, and order.
Once an order is placed, it must be fulfilled. That means somebody must find the product in a warehouse somewhere, package it, and ship it.
These are two adjancent contexts. A product catalog model will be quite different from a warehousing and fulfillment model. In fact, chances are they won’t really have anything to do with each other, except for a shared knowledge of a
But here is the interesting thing: is that knowledge really “shared”? A product in a catalog context is all about price, description, types, variants, etc. A product in a fulfillment context is all about weight, location, packaging strategy, harmonize codes, etc. Sure, they share an ID. maybe they share a title as a reference for the shipper, but frequently in a warehouse, the product is identified by a SKU code.
This is the idea of a Bounded Context. Having a single cannonical model in a codebase which has multiple contexts is incorrect, since aspects of that model will share nothing other then an identifier in those different contexts.
Going back to the
Product, if we have a single object to perform all the responsabilities of a
Catalog::Product and a
Fulfillment::Product, we end up with an object that has multiple responsabilities. But more importantly, we have an object which becomes a magnet for cross-boundary coupling. This means a change in
Product has the potential to break multiple, completely different parts of a codebase.
Its actually much worse then that, because only quite simple applications have only two context. Completely anecdotally with a small sample size of myself, it seems like most applications have somewhere between 4-6 bounded contexts. Central concepts (like
Product) may exist in all of them.
The Myth of the Cannonical Model
This comes back to
Find the Nouns method of teaching OOP. A
Product is far too broad a concept to be completely described by a single class, and stored in a single row in a single database. An application is really a system of systems (of systems of systems…). If we want to be able to modify an application at a low cost of work over time, we must keep separate things separate. That is impossible to do when you have a single cannonical model for your entire application.
I realize I am repeating myself, but it is such an important thing to understand. I have worked at multiple places where, even after reading the DDD book and creating a context map, we created a “Common” package which held all the shared entities. This is missing the point. A context is an encapsulated system, an encapsulated system cannot share entities.
The idea of a “Bounded Context” is that a domain model is not a single thing. It is a group of things which are connected.
A Monolith is a System of Systems
With an understanding of bounded contexts, we can now look at monolithic archetecture in a slightly different way. We are not choosing to build a single thing in place of building multiple things. That is a false dichotomy, and will result in a disaster. We are instead choosing to build multiple things in the same place. This has a number of implecations. Here are a few that stand out to me:
- If you are building a monolith, and do not have multiple levels of namespacing, that is a warning sign. Having proper module decomposition and boundaries within your code becomes both more difficult and more important. How are you going to express that without namespacing?
- Discipline must grow as the monolith grows. This is because compromised choices can be hard to see, and are very easy to do when you have no “hard” boundaries.
- Operationally, a monolith will be easier to deploy and maintain at first. But watch out down the road, they can be much harder to scale horizontally. That being said, vertical scaling will take you pretty far, if you go down this road, embrace vertical scaling.
However, the most important one is boundaries. You will live or die on the strength of the boundaries enforced between contexts.
How to build boundaries
The cost of coupling grows exponentially at the context level, so it is something which should be agonized over. Three things stand out as valuable design heuristics to help with this analysis.
Only share data, not code
Do not pass your
Fulfillment. Definitely do not reach into
Catalog to create a
Fulfillment::Product. Instead, develop a data schema of the information required by fulfillment to define a new product. Do this completely from the perspective (or in the context of)
Once you have that schema defined, it is not something that should ever change. That means respect the Interface Segregation Principle, and keep these schemas as tightly focused as possible for the task at hand.
Avoid CRUD Semantics at Boundaries
If you are renaming a user, do not expose something like
update_user(name: new_name) across a boundary. Instead, have
rename_user(new_name). This means that if
rename has to change, it can change in
rename, and every other type of update to user will not be impacted. It also increases the likelyhood of getting
rename right in analysis.
The more focused the interface, the easier it is to get right.
Each context should stand on its own. It may not do anything useful without inputs from another context, but conceptually it is discrete. If it requires complex interaction with another context, you likely have drawn a boundary wrong. Autonomy happens when Tell-Dont-Ask is in action.
A pressing question at this point would be, how can you even limit querying across context boundaries?
Do not tolerate bi-directional dependencies
If the context you require information from is the same context that requested work be performed, consider including all the data required to perform the work in the request for work. So if the
Shopping context is placing an
Fulfillment, if you include all required external data to perform the fulfillment, you wont need to turn around and ask
Shopping more questions. This is a strategy for eliminating bi-directional dependencies, which can be seen as an indicator something has gone wrong.
Another thing is to consider an idea of a purpose-built read-only data aggregation based on the operational data. We do this frequently with building a search index in something like elasticsearch based off an RDBMS. Why not do it in other cases? The key thing is stability, the aggregation you build should have a stable schema, and again, stability is something that is helped by focus. If you have a
pricing_list, that is a data set which could potentially be used across boundaries, and be stable enough to be able to be depended on.
Boundaries are Fundamental
I have touched on some more advanced strategies, but the idea of boundaries is fundamental to all software. In a monolith, the core benefit is reducing your need to deal with networks (which is a huge deal), but you are trading that off with the illusion you are building a single thing. That illusion is insidious, and will result in weak boundaries. Weak boundaries are probably the greatest root cause to long term failure in monolithic archetectures.
Matt Briggs thinks about programming.