Living in a digital world, we take a lot for granted. For a lot of people, it’s hard to think of a time when software didn’t impact our everyday lives in one way or another.
With anything so complex and detailed – and most importantly, human-made – there will always be errors, big and small. Nowadays, we might see a glitch on the Facebook app on our phone, but we barely notice when the phone carries out an automatic upgrade and the glitch is magically fixed. Of course this is on a very small scale, but there have been many circumstances where a software bug has been or could have been catastrophic. This nerve-inducing tale of a bug in the Soviet early-warning system that could have caused the world to descend into nuclear oblivion should serve as a terrifying warning of the implications a software bug could have on such a large scale.
“Sometimes it pays to stay in bed on Monday, rather than spending the rest of the week debugging Monday's code.” ~Dan Salomon
Luckily, our Engineering team don't have to worry about accidentally causing nuclear warfare. However, with the ticketing, marketing and fundraising operations of 220+ arts organisations in our hands, we take bugs in our system just as seriously. Even something minute could have a big impact on the system. So we wanted to explain a bit more about why bugs exist and how we in the Engineering team go about finding and fixing them.
What is a software bug?
A software defect (or bug) is any behaviour in a system that doesn’t conform to the requirements of the system, either explicitly and implicitly stated. Building a large software system is complex, and it’s impossible to get rid of bugs completely. The aim for any engineering team is to find and fix all the important bugs in a release of their software as quickly as possible, however, sometimes a couple of nasty bugs manage to slip through.
Why do bugs get through into releases?
Despite the tireless efforts of engineers to make sure their code is perfect, some bugs manage to get into releases of software. There are a number of reasons why this can happen:
- Complexity. Software has millions of lines of code so there are lots of places for pesky bugs to hide.
- Edge cases. The main journeys a user can take through the system are always tested. But there are thousands of other journeys (edge cases), including handling all the possible errors that could occur, whether through user action or some external failure. Minor releases of software, especially public-facing software, happen so often that it would be impossible to test all the edge cases before each release, due to the time it would take.
- Regressions. Because a large software system is complex, sometimes a new feature or a fix for another defect can have effects on other parts of the system that are difficult to anticipate and may cause a defect themselves. These defects are called regressions because they break operations that previously worked.
- Ambiguous or incorrect requirements. Requirements for a software system are written in English, which is almost impossible to make completely precise. The requirements may have been interpreted incorrectly when they were turned into code, meaning the system doesn’t do what it’s supposed to do.
- The human factor. A system is a computer, but it’s implemented by humans, and we all make mistakes. Even engineers aren’t perfect!
- Infrastructure problems. We deploy the Spektrix software on an infrastructure that is a complex combination of hardware and third party system software designed to cater for scalability and stability. An incorrect configuration or a failure in the infrastructure can also appear to the end user as a defect in the overall system.
What do we do at Spektrix to minimise software bugs?
Many software design and development processes have been created to maximise the efficiency of developing software and to minimise the number of bugs in a release. Any good engineering team will have their own processes to detect bugs and here at Spektrix, we have our own:
- Requirements review. For each new feature, we write a formal statement of the requirements and this is reviewed by representatives of all functions within Spektrix to ensure that they are correct, complete and implementable.
- System design. The development team will design the feature implementation, first at a high level (software architecture) and then at a detailed technical level. Other people in the team then review each of these.
- Developer test. As the developers write their code, they test it with automated tests, both as small components and as larger subsystems composed of the small components. Automation is important because it allows the tests to be re-run frequently and consistently to make sure that regressions aren’t being introduced.
- Code review. The code is reviewed by other developers before delivery. They’re looking for bugs and also checking that the code has been written in a way that makes it easy to modify and enhance in the future, without introducing bugs.
- Quality Assurance Testing. The final stage is to test the whole system. A separate team carry out manual and automated tests of the system based on the requirements specification. The team aren’t given any knowledge of the internal implementation to avoid being biased by that knowledge. A full test of all the new and existing requirements of the system can take a long time, so for minor releases we run a selection of tests to test the new functionality and the existing functionality most at risk of breaking. We also perform ad hoc testing to check of the most-used functionality. Our Support team help with that to get more eyes on the problem.
- Beta testing. We release to a small number of clients, who want to get the latest features first. They can alert us to any problems found in a real life setting before the entire user base gets it, so that we can fix any bugs remaining before they impact a lot of people.
What do we do when find bugs?
If a bug does slip through, it’s often apparent when it’s being deployed or when a user reports it to our Support team. For a minor release, we can roll back to the previous version, but we would only do that if the new bug had a bigger impact than the bugs that were fixed by the release.
If we don’t roll back to the previous version, the Engineering team get straight to work debugging and fixing the issue. For a bug with a big impact, we’ll deploy a new release as soon as the fix is available, often within hours. However, we have to balance this against the risk of causing a regression elsewhere in the system. If we decide that the risk of causing a regression is higher than the severity of the bug being fixed, then we will spend longer reviewing and testing it before we release it. For less serious bugs, we add these to our workload in order of priority and and they get fixed as part of one of our regular planned releases.
Building new software and, in the case of Spektrix great new features for our users, will always come with risks. But we believe it’s worth it to continue making Spektrix the best it can be.
So even though it’s easy for us to take for granted all the wonderful software at our fingertips, next time you see an issue on your Facebook app, spare a thought for the testers and the engineers who work tirelessly to give us what we what need.