The Trouble with Troubleshooting

It all starts when we’re told that an application is “broken.” There are many ways for something to be broken, so just saying that doesn’t help us much. We have to ask the client to tell us what they mean. The more details they provide the easier it is for us to replicate the error, and the easier it is to fix.

Our applications have been thoroughly tested before they get to our customers, so any bugs left over are not going to be easy to find. Oftentimes, we’ll find programs that work perfectly unless we follow the exact footsteps of the client.

Once we have replicated the error, we begin our search in earnest. Simple looking websites can contain thousands of lines of code. This is usually the result of multiple functions being written in the same page. If we can replicate the exact error, large swaths of the code can be ignored as they do different things.

After the relevant section is isolated, our job can still be far from done. We first have to analyze the section and check for typos or conditions that may not be met. Even small code pieces can be dense with information that must be sifted through. For complicated blocks, sometimes the only thing to do is remove sections of code and see what breaks. If we’re lucky, nothing breaks, and we might be able to remove the code entirely. If we’re very lucky, deleting it will fix the error completely because not only was it unnecessary, but it introduced the error we’re trying to solve.

Today, however, all the code here is correct and removing sections only caused more things to break. This problem lies deeper, buried on a page that is referenced by the page the client sees.

Investigating this new page, we may find the error, or we may find yet another reference to be explored. Some pages reference multiple files that each must be investigated. Branching paths that are all interconnected, and a single error in any one of them could have a cascading effect on the rest. Only by navigating to each new page and beginning the hunt all over again, can we find the bad code. To say nothing of how long it will take to fix it once we find it.

And that is why you should be patient with your IT guy.