Three Years of LFI

In which we ramble towards a vision for sustainable software development. What if you could organize engineering around learning and around incidents? Status of this page: scattered seeds, a list of insights, no narrative structure, but maybe sorted into useful buckets.

# Complexity is what humans do

Everything humans create becomes tangled and complex. Complexity is our natural state.

Complexity is necessary for sustainability. Embrace it instead of resist it.

Complex systems cannot be understood completely, have emergent behavior, feature feedback loops with non-linear behavior. Efforts to manage the complexity with simplicity fail.

How Complex Systems Fail fail

Software development has never been sustainable. Businesses have demonstrated for decades their manifest preference for unmaintainable code.

Everyone builds a chicken bus [link]

# Simplicity is necessary but insufficient

The timeless advice to "keep it simple" is brilliant. Prefer simplicity everywhere. Do the hard to make things more simple everywhere you can. Yet we must also accept that even comparatively tiny combinations of otherwise simple components quickly becomes complex.

Keeping things simple is necessary, but insufficient for everything we create as a group.

# Code and coders

Insight: code, all code, is tightly coupled to the human brains that created it. It takes exceptional skill and a significant amount of time to edit our code into something that another programmer can get their heads around easily.

New definition for tech debt and legacy code: any code you have that has been separated from its original authors.

Tangent (or is this actually the same point made more deeply?): this coupling between the code and the humans also explains why pair-programming yields better code. Every line of code written between a pair of programmers becomes the product of a joint activity. As the programmers explain themselves to each other their shared code comes to reflect the shared understanding. The conversation over the business problem becomes encapsulated in the code. Each line is effectively nurtured right at birth into something more readable and more than twice as likely to be readable by other programmers. When the practice of pair-programming is combined with frequent rotation of pairs, the central abstractions that anchor the architecture get the most attention from the greatest combination of brains. The code itself becomes clearer and the baseline of common knowledge about the architecture similarly grows with the team.

# Software is operations

Your business is in production. Everything else is a simulation.

All software engineering has become operations. We do not ship software. We run the software. CS curriculum and bootcamps focus on learning how to write software. But we have to run it. The only place to learn to run software is in production.

In a dev environment, we can reset the universe and start with an empty database. In production we have to migrate the data with every change to the schema.

In a dev environment we can throw away the code we have and try out a new programming language or framework. In production, we have to migrate customers from the old and busted to the new hotness.

# LFI for individual teams

Continuous apprenticeship with the experts helping bring novices up to a level of comfort working in production. The incident is anywhere reality surprises us. We invest extra effort around those moments of surprise to keep our design and our plan calibrated with production. We know the monitoring is incomplete and we have to get good, continually improving how we adjust to production.

# LFI for CTO

Signal for CTO about where to allocate resources and how to monitor the balance of investment between reliability and sustainable engineering practice and new features.

- [ ] developing teams and leaders - [ ] back channel signals about previous decisions - [ ] governance and compliance - [ ] what about LFI for VPs & Directors?

The engineering leader's contributions to resilience in the face of surprising events is to create the conditions for learning and adaptations to occur. Designing for Resilience

# LFI with vendors

Your company runs your software on someone else's hardware. This means that in addition to understanding loops, conditionals, variable assignment, functional and object-oriented abstractions, you have to also understand how to navigate the customer support of your vendors.

At some point you will have a problem affecting your customers where you can't tell if it is your code or something wrong with the vendor. And you will need to know how to reach them and how to run an incident with them. Coordination across vendors support portals will slow down your incident response. Plan for this. Or you are just at the whim of the vendor.

Outsourcing a tech to vendors does not really get you out of requiring local expertise. You change the expertise, but do not escape the need for it.

# LFI for platform teams

Your internal platforms are under funded. You don't have a cost model in place to justify the creation of effective docs or internal customer support.

as for vendors, so for internal platforms.

self serve means you're on your own when it breaks.

self-serve also means tragedy of the commons.

A platform with 10 customers has an order of magnitude more traffic than any of its individual customers. As the platform grows, it becomes impossible for the platform to detect failures affecting only one or two customers.

If you're the customer of a platform you have to monitor your own stuff on that platform. The platform cannot monitor it for you.