Three Years of LFI

In which we ramble towards a vision for sustainable software development. What if you could organize engineering around learning and around incidents? Status of this page: scattered seeds, mostly incoherent

Everything humans create becomes tangled and complex. Complexity is our natural state.

Complexity is necessary for sustainability. Embrace it instead of resist it.

The timeless advice to "keep it simple" is brilliant. Prefer simplicity everywhere. Do the hard to make things more simple everywhere you can. Yet we must also accept that even comparatively tiny combinations of otherwise simple components quickly becomes complex.

Keeping things simple is necessary, but insufficient for everything we create as a group.

Insight: code, all code, is tightly coupled to the human brains that created it. It takes exceptional skill and a significant amount of time to edit our code into something that another programmer can get their heads around easily.

Tangent (or is this actually the same point made more deeply?): this coupling between the code and the humans also explains why pair-programming yields better code. Every line of code written between a pair of programmers becomes the product of a joint activity. As the programmers explain themselves to each other their shared code comes to reflect the shared understanding. The conversation over the business problem becomes encapsulated in the code. Each line is effectively nurtured right at birth into something more readable and more than twice as likely to be readable by other programmers. When the practice of pair-programming is combined with frequent rotation of pairs, the central abstractions that anchor the architecture get the most attention from the greatest combination of brains. The code itself becomes clearer and the baseline of common knowledge about the architecture similarly grows with the team.

New definition for tech debt and legacy code: any code you have that has been separated from its original authors.

How Complex Systems Fail fail

Complex systems cannot be understood completely, have emergent behavior, feature feedback loops with non-linear behavior. Efforts to manage the complexity with simplicity fail.

Your business is in production. Everything else is a simulation.

All software engineering has become operations. We do not ship software. We run the software. CS curriculum and bootcamps focus on learning how to write software. But we have to run it. The only place to learn to run software is in production.

In a dev environment, we can reset the universe and start with an empty database. In production we have to migrate the data with every change to the schema.

Software development has never been sustainable. Businesses have demonstrated for decades their manifest preference for unmaintainable code.

Everyone builds a chicken bus [link]

Role of LFI

Role of LFI for individual teams:

Continuous apprenticeship with the experts helping bring novices up to a level of comfort working in production. The incident is anywhere reality surprises us. We invest extra effort around those moments of surprise to keep our design and our plan calibrated with production. We know the monitoring is incomplete and we have to get good, continually improving how we adjust to production.

Role of LFI for CTO:

Signal for CTO about where to allocate resources and how to monitor the balance of investment between reliability and sustainable engineering practice and new features.

Role of LFI for vendors:

Your company runs your software on someone else's hardware. This means that in addition to understanding loops, conditionals, variable assignment, functional and object-oriented abstractions, you have to also understand how to navigate the customer support of your vendors.

At some point you will have a problem affecting your customers where you can't tell if it is your code or something wrong with the vendor. And you will need to know how to reach them and how to run an incident with them. Coordination across vendors support portals will slow down your incident response. Plan for this. Or you are just at the whim of the vendor.

Outsourcing a tech to vendors does not really get you out of requiring local expertise. You change the expertise, but do not escape the need for it.

Role of LFI for platforms:

Your internal platforms are under funded. You don't have a cost model in place to justify the creation of effective docs or internal customer support.

as for vendors, so for internal platforms.

self serve means you're on your own when it breaks.

self-serve also means tragedy of the commons.