Slowly over the last few years I’ve started to form what I’m calling a continuous automation developers manifesto. At a high level I think everyone agrees on how helpful a continuous automation system can be, and yet many developers either don’t take time to set it up, or they only use it for one small automation. I do understand some of the reasons, especially in mobile application development. As I was working on my own continuous automation system, I had a few foundational principles in mind, and these have turned into this manifesto. Now I use them to continue to guide my decisions and maybe you can use them to evaluate your automation systems.
Integration with the tools I already use
As a developer we have many tools at our disposal for doing our daily jobs. As the SAAS model has proliferated, this means that we have many siloed tools that we are required to use. We have systems for tickets, code quality scanning, security scanning, app distribution, app review monitoring, app download counting, app building, app UI evaluation, app UI design, and on and on. All of these advancements are great, but we get tool fatigue. It is at the point where we need a dedicated app just to hold all the links to the websites and apps that we need to use. And with many different tools we run into the dreaded context switching energy vampire. Each time we have to switch into a different tool we lose some energy. This was one of the reasons that an IDE was originally created. To integrate our development tools.
When we start to automate things, we need to have these automation tools integrated into our existing development environment as much as possible. Not all tools are very customizable (Xcode!), but there is still much we can do to bring the automation into tools we already use. For example, checks in GitHub pull requests are great at bringing the automation status into a tool that we already use frequently:
Tools can do even more here with integration into our chatrooms, and OS notifications and more.
Manifesto rule #1: Continuous automation systems should integrate with existing tools as much as possible and avoid having a siloed tool/portal as the only method to interact with the automation.
Make automations as small and independent as possible, and run them in parallel
Imagine this scenario. You have a full automation script that lints you code, runs unit tests, compiles the app, signs the app, uploads to an app distribution platform, starts
security scans of the code, and then emails your mom. And it all happens every time you merge into
master. And one day the system you use to scan your code for security defects
goes down. Guess what, your automation will fail. Mom will not be emailed. You are stuck until you either 1) remove the security system from the automation script (then deploy it), 2) email mom yourself. This is how automation systems fall into a state of disrepair. Usually when the above happens, we just ignore it. Or a worse scenario is that something will fail only occasionally. Now whenever the system fails you have to check to see what failed. Oh, its that ding danged security system again. Then you ignore it. Then another failure
happens in the future and you think, oh it’s failing due to that reason we already know about. So you don’t even look. Eventually you just don’t even care about any of
the automations and you manually upload your build to the app distribution system.
As developers we are familiar with the term
highly coupled system. We know that each coupling increases the complexity of the system by a factor depending on the nature of the coupling. This same rule applies in your automations in that the more automations you string together, the more possible outcomes there might be. We will get to outcomes in the next rule, but needless to say the more possible outcomes there are, the more work we have to do to figure out which one of them was achieved.
Another subtle issue with these kinds of automation pipelines is that they are all or nothing. If something fails in the middle of the pipeline, the entire pipeline stops. So in our example above, even if the security system was down, that shouldn’t impact any of the rest of the automation. Mom should still be emailed! So what we must do is break our automations up into independent tasks.
At this point some may balk here and say, but how can I take the output of one automation to feed another one if they are independent? That isn’t efficient! Well first, please take a deep breath because what I’m about to tell you might come as a shock: Duplication is NOT the root of all evil. It is amazing to me the lenghts that developers will go to prevent duplicating code. And the same with automations. It is fine to run something twice just to make the overal system simpler. A simple, yet non-optimal system, can beat a complex and supremely optimal system over the long run. In our automations we should be trying to keep each one simple and not dependent on other automations. Then mom can be emailed on every push to master without worry if any of the other automations fail.
One nice result of having simple and indepedent tasks is that they can be run in parallel. Now mom will get emailed immediately when we push to master, along with all the other tasks that will execute. Again, a highly parallel simple system can beat a complex non-parallel system in many situations.
Manifesto rule #2: Keep automations simple and independent. Run them in parallel.
Never fail! But when you do, tell me why without having to look at a log file
We have a lofty goal when we talk about continuous automation across the entire development process. We want to automate the things that are automatable and we want developers to have to worry less and think less about them. We want to give developers a rich set of feedback on their code all the way from writing it to while it is part of an app that users are using. With this goal in mind, one of the huge impediments to achieving this is automation failures. In order to achieve our lofty goals, we must ensure that they never fail.
Before we discuss how to never fail, we should examine the cost of these failures. What is interesting is that the cost of failures in an automation system start to grow exponentially as the failure rate grows. This can become such a huge burden that you have to burn it all down and start over because just troubleshooting one thing has become too cumbersome. We must attack every unknown failure as mission critical to address as soon as possible. Another factor is the perception issue. If the automation system fails consistently, we percieve it as faulty and unreliable. Now even if the failure rate declines, this perception sticks around for a while. Worst case scenario here is that developers just ignore failures. That is the stake through the heart of an automation system. You might as well get rid of it when this happens.
So how do you build an automation system that never fails?
Well, that is a trick question. You can’t! The universe is filled with entropy that is out to destroy everything good and pure. That is essential to how the universe works. So I’m sorry to say but we can’t go against the universe here. But isn’t there something we can do? Yes! Here are a few things:
Categorize every single failure and make it clear WHY it failed and WHAT you can do about it. Sometimes the failure is 100% on purpose. For example your GitHub check might fail if the unit tests don’t all pass. That is good! What is bad is to just fail and make the developer figure it out. At the least you need to make it clearly visible that the check failed because of a unit test failure, not a build system failure. And even better tell them which tests failed!
Value your developers time and treat looking at log files as the equivalent of torturing them (note: it is!). Use regular expressions, simple pattern matching or complex machine learning, but whatever you do, find the reason for the failure and report that in a way that is quick to see and the resolution is clear.
Treat every unknown failure as important and critical to find the root cause. Set the batsignal. Turn on the fire alarm. Whatever it takes, do it! Classifying unknown failures into known failures is what makes the difference here. And eventually you will get to a place where unknown failures really are weird edge cases and uncommon. While we can’t prevent all failures, we can keep the trends towards never failing.
Manifesto rule #3: Failure happens but knowing what failed and what to do about it is key.
Abstract what developers need to know, but give them ways to go to bare metal if needed
Creating an automation system, especially one for mobile applications, is a specialized skill. Not everyone is interesting in learning this! Don’t make your developers learn how to be system admins unless they want to. In fact, we already have a pattern for dealing with this kind of thing in software development, it is called
Instead of making developers learn all the command line arguments to
xcodebuild, create a task that is higher level like
Build my app. Then include parameters that are required
to complete the task such as the project file, scheme and what version of Xcode to use. Make sure that non-required parameters have sane defaults. Now for the developers who want to focus simply on getting their app built, they can just use this high level task and not worry about the details. And for those who do want to deal with the low level implementation, give them the ability to create their own abstracted tasks! This is the best of both worlds.
When abstracting tasks, be sure to follow the rules above as well. We want tasks that are as simple and independent as they can be. We also want them to never fail, but if they do provide a clear reason why. This is exactly like providing a
Result as a return type to our abstraction. On success give me the results of the task, but on failure give me the error for why.
Manifesto rule #4: Use abstraction to create a library of tasks that are simple, independent and have meaningful failure reasons.
I feel the need… The need for speed
How many times have we linked to the above when talking about our automation systems? We push to
master and then go make a sandwich and wait until it is posted to our app distribution system so testers will be able to test our latest changes. While we can’t make builds instantanous, we can certainly improve the speed by following a few simple rules:
- Use bare metal hardware whenever possible. Obvious I know, but those shared cloud boxes you might be using now aren’t the fastest things around.
- Run tasks in parallel. See the rule above, but this really does help in the overall performance of the system.
- Make longer running tasks asynchronous. If something is going to take a long time, be sure to give developers an indicator of that. Use words like
queuedso developers know that it can take a while. If possible, give progress indicators or a time estimate.
Manifesto rule #5: Make it fast
Mobile automation systems are still evolving and we are learning new things each day. So I might add more things to the manifesto in the future, but the above five principles will provide you with a foundation for an excellent automation system for both developers and admins.