Blog Barista: Tim Hollosy | Nov 20, 2018 | Development Practices | Brew time: 6 min
What word is the opposite of fragile? When I’ve asked people this, the first thing they often say is robust or resilient. But, robustness is not the opposite of fragility. When something is fragile, stress from the environment makes it weaker. Every time I drop my phone it’s more likely that the next fall will crack the screen. When something is robust, stress doesn’t hurt it but it doesn’t make it stronger either. It’s agnostic towards stress. So, what becomes stronger when under environmental stress? If you’re an athlete, you know that the only way to get stronger is by lifting heavy things—putting stress on your body. Your body is an antifragile system, something that gets stronger from stress. Nassim Taleb coined this concept in 2012 with his book Antifragile. It’s a fascinating read, the sort of book that you keep thinking about long after you’ve read it. Naturally, what has stuck in my mind after reading it was how do I apply this concept to developing software? Antifragility isn’t something we think about when building software.
But, what about Agile?
Well, for the past 10 years or so, the largest innovation in software development has been the SCRUM Methodology and Agile framework. This is the default method that most teams use to create software. By breaking software features into smaller parts and emphasizing quality, we certainly have the pieces in place to make higher quality software. But, robustness is not anti-fragility. Let’s look at one concept of antifragility: getting stronger from stress. We’ll use errors as a proxy for stress in this example.
Think about how errors are handled in a typical web application. They are trapped and logged. Trapping an error is more about hiding the error rather than learning from the error and making corrections. As developers, we don’t usually expect an error, but an engineer may take some extra time to think ahead of what errors may occur. Then, trap them specifically, break out of the flow, and display a message to the user that lets them know what’s going on. This happens to make sure that the process doesn’t continue on and cause damage, but that’s usually the end of the reaction. Today, we would probably log the error, or maybe send an email to the operations staff automatically to look at the details for future manual fixes as well. We may even run some statistical analysis on aggregated errors and defects from time to time. This alerts us to fragility for certain, but it certainly doesn’t make the software able to handle that issue in the future. The feedback between user interaction (the environment) and the system is created; however, the software does not improve itself based on this interaction. It just gracefully fails.
The ideal solution would be for the software to automatically adjust to the environment like our body does. Physiologists have been aware of this feedback cycle in the body since the 1800’s. However, the feedback loop is much more complex in your body than anything that exists in computer programs. Let’s look at how your heart beats. The cells in your heart are arranged based on laws of physics and your genes. Proteins in the membranes of these cells create electrical potentials through ion channels. Ionic concentrations are controlled by many other higher order functions that you can trace all the way back to the food we put in our body, when the heart pumps blood a whole other cascade of causal relationships occur that feed all the way back around, even modifying the genome through epigenetic mechanisms; effectively rewriting our genome, which contains the framework for the sort of things our body can build.
Nearly every application built today does not adjust themselves in this way. It’s dizzying to try to think about, but perhaps we could look at something simple like rerouting traffic away from errors to begin with. Let’s think about a case where a database has become slow to respond due to too many poorly optimized searches occurring at the same time. A sort of unintended denial of service attack emerges when search results are slow, users cancel a slow query and re-run it, thinking perhaps it was just a one time glitch. What if our program was aware that searches were not returning and warned the user so that the problem was not exacerbated? That solves one problem, but the program did not actually improve. Perhaps the next step would be to make slow queries faster by adding database indexes based on aggregated data. Now, we’ve used the stress from the environment (multiple unexpected queries) to change how our program functions. It’s a small step and has issues of its own. Maybe these database changes cause performance of other areas to worsen. However, these are solvable problems through automated integration and performance tests and something we could do today if the resources were available.
Being adaptive is not the same as being antifragile. To be antifragile, we must not only adapt to environments we can imagine, but environments we can’t imagine. I think there’s a tendency to excessively control inputs when designing software. Human beings wouldn’t be as successful of a species as we are if the solution to our body getting hot was to make sure the environment is never hot. We want to control the environment when we build computer applications. It’s not hard to see why. We want to minimize unknowns. However, we are limiting errors when we do this and dooming our system to be inflexible. Let’s look at an example of date inputs. Everyone has their favorite way to input dates, maybe it’s 2018-10-01 or 10/1/2018 or 1/10/2018. How you input dates varies on your cultural background, enterprise standards and probably all sorts of other factors. Yet, we want to make sure people enter correct dates — so systems only allow one type of date format. It’s important to know if you’d like that letter to go out on the 10th of January or the 1st of October after all. What if we let the system adapt though? There’s not a good reason we can’t accept multiple date formats and ask the user to clear up ambiguity. Yet I don’t think I’ve ever seen it done outside of something that we expect to be very flexible like a Google search.
Antifragility as a concept is complex and deep, I’ve just scratched the surface in this post about how it might be applied to software design. What about how we use performance testing as a feedback for virtual machine sizing? What about how we organize teams or plan features could improve antifragility? There have been some attempts to codify antifragile software, in fact an Antifragile Manifesto was published in 2014 that touches on many facets that I did not. Those of us on the front line of developing end-user software have a responsibility in the field to push it forward. Developers can’t count on academics to improve this field for us, we must examine how we’re building software and experiment in order to grow.
Other recent posts:
Blog Barista: Katelyn Cripps | Dec 19, 2018 | Culture | Brew time: 4 min
Throughout the month of December, employees everywhere have highly anticipated their holiday vacations which are starting this week. The break from work life is the perfect chance to unwind, refresh, and be present…
Blog Barista: Bob Marquis, CPA, PMP | Dec 12, 2018 | Project Management | Brew time: 8 min
In Part One of this post, I said that despite the complexity of a matrixed structure, there are many good reasons to use one, particularly for a horizontal PMO in a large organization. In part two, we’ll look at some of the…