Root Cause Analysis

From CitconWiki
Revision as of 16:29, 20 November 2011 by Squirrel (talk | contribs)
Jump to navigationJump to search

09:00 on Saturday, Nov 12, 2011 morning in Space Invaders (the big room)

Squirrel has slides on how to go about doing a root cause analysis (PJ reminder: get the slides from Squirrel to attach to the wiki) (Squirrel: I can't figure out how to attach the slides. Help! In the meantime, you can watch a video that contains the slides.)

Target a specific event

If you want to complete the analysis in a short meeting (30-60 minutes) it's best to focus on a specific recent event, such as a production bug or outage. Good to understand what the level of pain is. One person said he had seen a root cause analysis on a "big" event or series of events completed over time (it took a month and was part of a master's thesis).

Everyone affected attends

The "feature" team attends (typically developers) as well as senior managers and representatives from other areas of the business, e.g. client support or operations. Not always feasible to get "everyone" in the room. One technique is to give them results and tasks from an RCA they did not show up for. You can't assign actions to someone not in the room though, so you have to do something like "visit X daily for a week to ensure she does Y".

No blame

Ops folks tend toward blame Need to set it up ahead of time to avoid blame... "inoculate" people against blame Anti-pattern: as long as it isn't MY discipline, then I have gotten what I want out of this session

Poll to identify problems

Go around entire room and ask "Hey PJ, please list all the problems" Then go around the room and ask for add ons Private ballots on post-its. Email solicitation. Try to avoid proxies. Get the right people in the room

Write alot

Move down then across

If it doesn't hurt, then you aren't doing it right

Proportionate tasks

If you are re-writing your entire app because of a 3 minutes of down time, then you are not doing the right thing

All tasks done in a week

Every task agreed to:

1) Has to be do-able in one week
2) Has to actually be done in one week

How does this compare to retrospectives? Retros are related to teams, the pain is more direct

Other techniques for NOT losing focus? Keep it short term The next root cause analysis might highlight the "next" step, but for now, "all we have to do now is take this first step"

Vote every day on actions from retrospectives to determine whether or not they are being actioned Smiley faces or sad faces depending on votes

Bickering can be a problem. Having a senior person present helps diffuse these types of arguments.

Wallace & Gromit Video

Building snowmen Squirrel divided up the group into two: Wallace & Gromit

Bad things that happened

Snowman destroyed - lost good snowman Wallace covered in snowman Got a cold Wasted Gromit time and unhappy

Lost good snowman

in wrong place (wallace's garden) Wallace inconsiderate? Couldn't see - didn't look - Hard to look - Van too big - Wanted impressive snowman -

30 to 60 second pause is "good" (it has to hurt a little)

Worked down to Competition and Dog Can't Talk at end of 7 why's

Actions

Video, Mirrors, Reverse warning Lightning talk on snowman Board Agenda: Profit Sharing Daily meetings (standups), Sign language classes for gromit

(volunteers for each action)