zaterdag 17 december 2011

Hunting down problems

Our dishwasher has been malfunctioning for about 2 months. It didn't get hot, and sometimes it stopped in the middle of the program - beeping, and with a red tap icon blinking (as in: can't get water supply). So we called in a repair man. He started the machine, took temperatures, did tests, and concluded: there's nothing wrong with this machine. Cost: 138 euros.

My girlfriend called me about this after the guy has left. I got angry, because just not being able to reproduce the problem doesn't mean there *is* no problem. There are several things that can be said about the behavior of the repair man.

First of all, the fact is that there were problems, and often. You can't get away with just 'I cannot reproduce'. You need to figure out why and have a model for the future. Consider you go to the car shop telling them that the car had problems starting. They try it, and it starts like a dream. Problem solved, right? No, we need to know why the defect could appear in the first place, because that will give us an idea on chance of recurrence. And a plan to handle the problem in the future.

The second issue already shines through: it's almost impossible to get the exact same circumstances back. Thus, not being able to reproduce the problem may actually be a good diagnostic, telling you that your tests simply miss something. For example, you may have just one or a few processors, where the problem only starts with many processors.

That leads to something related to this. I often hear things like: 'I ran it through valgrind [an advanced diagnostics tool] and it found no problems'. The remark then is: "so?" It doesn't even mean that you're testing the wrong things. If valgrind does find something, then it is certain that something is wrong. But the reverse is not true. Sounds almost like real science.

When handling problems, it's very important to realize that: 'this has happened'. Reproducibility, or not being able to diagnose is not an excuse. You can ask for understanding - but only if you acknowledge that his problem was real - you just don't have the resources to help.

Geen opmerkingen:

Een reactie posten