Repeatable Failures

From CitconWiki
Jump to navigationJump to search

Repeatable failures (over repeatable success). Write-up of a lunch-time session at CITCON Europe 2015, Helsinki ( #TL;DR at bottom.

The premise

When a test fails, we want to be able to repeat it, exactly. And so we run our automated tests on each commit. But can't we do better than running ever more checks in CI? And what about all those hours of the day that the CI machines are idle? Couldn't they be used to explore something?

Emerging theme of this CITCON seems to be tossing away large numbers of automated tests, ones that apparently got disconnected from delivering value to the team. We are ever more aware of the costs of maintenance and lengthening the feedback loop. So when do automated checks deliver value to us? When they specify and validate functional or technical aspects of the system; key examples of how the system is meant to work implemented as checks that indeed it does.


Therefore the first thing we focused on was challenging the automated tests by mutating the code, to weed the crap tests out from the valuable one and highlight coverage gaps. With mutation test tools like PIT (for Java, ), it should be possible to get a much better impression of what is truly covered by tests, providing valuable feedback for tests and code alike. (Ideally each mutation of the code will be 'killed' by exactly one test specifying that specific behaviour.) Some pointed out this is also a teaching and design aid and thus running a good number of mutation tests should probably become a regular thing.

Input / Data

Next we wondered about the implications of testing with random (valid) values for input (or data). When the spec for example says we can add any two integers between 1 and 10, how could we test with just any two values in that range? Well, isn't that simple? You add the two in your test and check that the answer you get is correct! Fortunately for us, Jeffrey (@Jtf) has a lot of experience in this area and quickly caught this line of thinking.

Are we rebuilding the system in our tests? Seems unwieldy and prone to the same errors as the application code. Can we rely on an oracle, like we probably did for our key examples? If an accurate oracle is that easily available, why are we building the system? No, we'll have to let go of asserting the specific values and work with weak assertions instead. What are the invariants that no answer should violate? In our example: the answer should always be an integer between 2 and 20.

Jessica Kerr has done a lot of work in this field, which also ended up in jUnit as support for property-based testing ( ).

As we're talking about invariants now, you needn't know the specific situation anymore, but you can - and probably should - monitor them constantly, in production as well. Unfortunately, what your production monitoring picks up, may well be hard to repeat. This is why injecting the failures yourself and seeing how they play out is so practical.

Connections / Environment

Thinking of it as failure injection brought us to two other examples. First one Nat Pryce (@natpryce) gave at CukeUp! 2015: he brutally vandalized JSON messages to make sure the software would never crash due to a poor connection, as well as ran the CI environment with live data streams. Then Netflix's Chaos Monkey. Had you ever wondered how important it must be for the teams that it reports exactly what it disrupted when, to quickly locate and fix issues with handling its disruptions? [Can only hope it indeed does. Does anyone know?]


This filled out our list of failure injections, things to (semi-randomly) manipulate in creative ways: program code, input, data, connections, environment. And our key tricks to repeatable failures: inject the failures yourself. If you've manipulated code: use your automated regression test set. Otherwise use weak assertions to detect the effect of the injected failure on the system.


I, session host Wim Heemskerk, picked up this premise from a presentation by Nat Pryce at CukeUp, where he gave the example mentioned of applying it in one way. My purpose for this session: to explore the various options for it. Coming at it from a testing perspective, I focused on failure injection; throwing artifical challenges at the system (generally done before the production environment). Working from pure monitoring / telemetry was placed out of scope for this particular discussion.