Part of solving the problem is making sure that the solution is available to other people. This is especially important if it took you a long time to find the answer or if you are the expert and you realize that it is not that likely that someone else will know this answer. These problems and their solutions need to be documented and accessible to everyone.
Experience has taught me that neither well-documented solutions nor accessibility alone can make the solution useful. I’ve worked in places where half of the documentation was on-line, butmuch of it was incomplete, and the other half existed in people’s heads or buried in their personal directories and therefore not accessible to the rest of the help desk staff.
Although it may seem painfully obvious to say that the first step in solving a problem is finding the cause, many help desk analysts tend to forget this. Instead they try to cure specific symptoms of the problem without looking for the underlying cause. This manifests itself quite often when users ask for help in implementing a specific solution. They have decided that a specific solution is best for their problem, and it is up to the help desk to give them the necessary assistance in implementing that solution. The issue is that in many cases, the solution they have chosen is not the best one for the problem.
I have received many calls from users wanting to implement a particular function within their word processing program (such as MS Word) and have them get annoyed when I tell them it isn’t possible. It turns out that what they are trying to accomplish is much better done from MS Excel. One of the reasons to thatthis happening to it is the help desk analyst often looks for and tries to solve the symptoms rather than the underlying cause. For example, the symptom and not the problem itself is that they are having difficulty implementing something in MS Word. It is therefore extremely important that the help desk analyst understand what the goal of the user is.
One technique I find extremely useful is making sure you know what the user’s problem really is. This might seem like an obvious statement, but I have worked on problems myself where I discovered that the problem I was trying to solve was not the same as the one the user had. As in the case of the user who tried to solve a problem with MS Word that was best done in MS Excel, not knowing what the problem really is leads you to the wrong answer. Here, the simplest approach is just to ask the user what they are trying to accomplish.
I have often found that if the description of their problem also includes the solution, you may not have reached the core problem. Try to get the user to describe the problem without specifying the program or any other part of the solution. For example, in the case of using MS Word to create a complicated table, the user cannot use the phrase "MS Word." Instead, the user would have to describe what they are trying to accomplish with the table. In other words, you need to press the user for more specifics.
Part of this is having the user describe what the goal is, what the behavior is, and what theexpected behavior should be. This helps you to quickly get to those problems that are a result of eithermisconception or misunderstandings on the part of users. This also helps to clarify those issues where the user claims the software has a bug or just isn’t working right.
When the user is done describing what is happening, repeat it back to them and asked them to confirm that this is what is actually happening. It may also be useful to modify the statements slightly (such as changing the order of phrases) so that you’re not simply repeating word-for-word what the user said. This helps to ensure that you do not end up solving the wrong problem.
The goal here is to understand what the user needs. This means understanding what the problem is. I have worked on help desks myself where the first thing users do is give you a detailed list of their hardware and configuration or going into nauseating detail about a similar problem they had on another system. Although this is useful in many cases, it may become a burden when it is not relevant to the actual problem. It is up to you to guide the user to ensure they do not get too far offtrack.
The analyst needs to take control of the conversation to ensure that the user provides enough information without providing too much. Yes, as in the previous example, you can get too much information. First, too much information means you are getting things that are unrelated to the problem, and it is a waste of time to have the user provide it. Second, there is sometimes information that may be related, but you don’t yet need it. Getting it too early in the conversation may mean you either start thinking about it prematurely or you lose track of things because of "information overload."
When you are sure you have a clearer understanding of what the problem is, you begin to seeif it fits any known pattern. For example, if this problem has the exact same symptoms assomething that you have encountered before, the logical approach is to try the solution thatworked with the other problem. If the problems are not identical, can you find a problem that is similar? In essence, that is exactly what mechanics, doctors, physicist, or anyone else do who is trying to solve a problem or prove a theory that has a number of pieces and may have multiple solutions. Based on the evidence, you devise a theory and then investigate the system to see how well it matches your theory. If it doesn’t, you revise your theory.
For those problems to which there is no easy solution, you’ll have to apply a number ofrules of thumb (heuristics) as well as step-by-step instructions (algorithms). For example, therules of thumb may be to start with a simpler task and build yourself up to a more-complicated task in order to see where the behavior begins to change. Another rule of thumb may be to start with the default configuration and make changes one by one.
In dealing with problems with obvious causes or for which there is no step-by-step procedureto follow, you may need to dig for more information. The lead-off question can be something as simple as, "Just exactly what is happening?" This helps to clarify what you know about the user’s understanding of the problem and at the same time gives you an idea of the technical skills of the user. This is part of ensuring that both you and the user know exactly what the problem is.
In my experience, the more accurate this description, the more technically competentthe user is. Knowing the technical level of the user serves as a guide for all subsequent questions. For example, you can ask the more-detailed technical questions earlier in theconversation if the user as a stronger background in the product. For example, it is a great time-saver to be able to say, "Start the network applet in the Control Panel" versus "Click on the Start button. You now see a menu. Click on the entry labeled Settings. Now find the icon labeled Networking,…" and so forth.
The next step is to determine whether or not the system ever behaved as intended. I must admit that I have worked on number of calls where I assumed that the product was working correctly and then stopped for some reason. The emphasis on my approach was in determining what would cause this aspect of the system to stop working. This led me down the wrong path, because it never had work correctly at all. Needless to say, this wasted a lot of my time.
If the product was working before and stopped for some reason, the most likely cause is thatsomething in the system was changed, despite what many users may try to convince you; it is unlikely that something "just stops working." Therefore, you need to find out what changes have been made in the system.
One change that is often overlooked is the physical location of the computer. How could the physical location have anything to do with the behavior of a piece of software? Well, when you moved the computer you simply forgot to plug the printer back in!
This may seem a rather mundane issue, but you would be amazed at how many hotline calls are generated because of such problems. To avoid these kind of problems, you should run through a number of "quick fixes" before you begin looking at the mo-complicated solutions. Despite the simplicity of the problem, loose cables generate more than their share of headaches. Loose cables seem to be the last thing that anyone checks. However, itis extremely annoying, if not embarrassing, when that is actually the cause.
If you are running an internal help desk, checking cables may be something that you can require of the users prior to them calling the help desk. This may require some additional training to ensure the users know how to seek the cables properly, but it is definitely worth the time in the long run. When you ask the user whether or not they have checked the cables and they say "yes," you have already saved two or to three minutes on the call. (Assuming they actually did check the cables.
Despite the claims of Microsoft on the stability of Windows NT, I have found that sometimes the best solution is to simply reboot the machine. My success rate with using this in "solving" the problem is high enough that it has become part of my standard palette of solutions. However, keep in mind that rebooting the system like this may solve the symptoms, but not the problem.
Often just knowing what has changed is sufficient to determine the solution, particularly if what was changed was done improperly. However, it is sometimes necessary to repeat the steps on an existing, working system to see if you can recreate the problem. In order to do this, you have to be able to recreate the user’s environment as closely as possible. It is therefore extremely useful to have a test environment where you can try to recreate problems.
How many different kinds of machines and different hardware are available in your test laboratory will depend on your company. The more standardized your hardware and software, the fewer different kinds of both you’ll need to maintain. However, you need to remember that the purpose of the lab is to help you solve users’ problems in order that they work more efficiently. If users cannot work efficiently because you cannot solve their problem due to lack of resources, then you may end up losing more money than the cost of the equipment.
If you are dealing with mostly hardware problems, you may find it almost unavoidable to have spare copies of all the different types of hardware you use. These not only can be used for test purposes but also for emergencies should the hardware break down on production machines.
One thing you may want to consider is using some kind of removable media like the SyQuest drives we discussed in the chapter on sharing resources (Chapter xx). This allows you to create an extremely large number of hardware and software combinations with substantially less real investment in hardware. In addition, this saves you on licensing fees for the software that must be licensed for each installed copy regardless of whether it used productive use or not (such as from Microsoft).
The cost of a single SyQuest hard disk is much higher than the equivalent hard disk. However, you can install the system on the SyQuest drive, configure it to one of your standards, replace it with a different SyQuest disk, install a different standard system and so forth. Whenever you need to test a specific system, all you need to do is switch the SyQuest disk.
You might want to look into two products from KeyLabs (www.keylabs.com): RapidDeploy and LabExpert. RapiDeploy is used to automatically install and configure multiple machines. It addition, it creates "libraries" of standard configurations that you can use at any time. These libraries can be used to switch back and forth between configuration or for easy disaster recovery. LabExpert is intended for use in a lab or classroom environment, where changes are made to a lot of machines, and you want to return them to their original state.
Sometimes you’ll find that you can actually think too hard about the problem. That is, you spend too much time analyzing the problem and the possible solutions before eventually deciding on a possible solution. I have found in many cases that the best course of action is to simply start with a number of possible solutions and see if they solve the problem. Even if the solution you try does not solve the problem, you have eliminated it from consideration, and how it fails may give you valuable insight into the cause of the problem.
One of the most common quick fixes is user error. In a way, this is similar to the user having a misconception about how the program should function. However, identifying user error is usually not as simple, as you may have to go step-by-step and examine everything the user is doing. In such cases, it is often useful to have the documentation in front of you so you can follow along as the user performs each step. The reason I say you should have the documentation in front of you is that you may know a faster way of doing something than is described in the documentation. Therefore, having the documentation in front of you helps you to ensure that you do the same steps the user is doing and that the user is doing the steps correctly.
To some extent, going through the procedure step-by-step could be considered "hand-holding." You need to know where hand holding stops and troubleshooting begins. I know many users who will call the help desk and claim they followed everything exactly as it is in the manual just so they can get someone to walk them through the procedure. It is impossible to completely avoid people like this, but with a little practice, it’s easy to detect them, particularly if they keep calling with the same kinds of problems.
As I mentioned above, one of the heuristics that you can apply is getting back to basics. That is, getting the system back to use a state where you know what should work. In some cases, I’ve had to completely remove every card and every controller in the system and add them back one by one. Software and hardware conflicts are common causes of problems. In many cases, the simplest solution is actually to start from scratch and work your way back up. This is particularly important if you have mixed environments of Plug-and-Play, PCI, or anything else that sets the configuration automatically, plus ISA cards. Sometimes setting the cards manually is the easiest way to avoid conflicts.
Keep in mind that pulling every card out of the system may not be necessary to identifyconflicts. Often you can easily identify conflicts by examining the machine’s configuration.
Most users are not proficient enough to be able to provide you with the details of their system. This is where hardware inventory products such as NetSense come in extremely handy. Such products gather configuration information and store it in a central database. They can then easily be accessed by your help desk.
One stumbling block in determining the problem can often be the users themselves. If users knew everything there was to know about computers, there would probably been no need for them to be calling the help desk in the first place. This often results in users’ description of the problem or their system being done in ambiguous and often inappropriate terms. For example, I regularly get calls from users who say they cannot "get into the screen" when they mean to say the computer will not boot.
In many cases, what the user means is obvious. However, there will definitely be a number of calls where you cannot be sure. For example, I’ve had calls from users who tell me how big their hard disk is when asked how much memory they have. Others may confusingly call the taskbar the toolbar.
One tool that I have found extremely useful in solving problems (not just troubleshooting help desk calls) is MindManager from MindJet, LLC (www.mindjet.com). Mind Manager is a tool that helps you "map your mind." In essence, you start with a central idea and map out all of the related issues. Any number of branches can lead off from that central idea, and each of these branches in turn can have any number of branches. I talked in detail about Mind Manager in the chapter on System Administration (Chapter xx).
Another tool that I find extreme useful is a flow chart software, such as Visio Standard. Although less useful than MindManager during the actual problem resolution, I find flow charts that have been included in existing problem/solutions to be extremely useful. By following the flow chart as the computer or software goes about its business, you know what it should be doing at each step and can quickly identify those places where the system is misbehaving.
Another important aspect I refer to as the "Tao of Tech Support." It cannot be taught. It is notlearned. There are no words to describe it. It just is. This is the ability to go beyond thedescription of the physical manifestations of the problem and seek the "motivation" behind the problem. What "forces" are coming into play to make the problem manifest itself in this fashion.
You could run through a checklist of all related issues and examine each one by one, whichwould take an incredibe amount of time. Or you could "feel" the answer and solve theproblem very quickly. This ability is obviously hard to describe and it is something that not every analyst has or even will have. However, in my experience, this ability is far more useful than an encyclopedic knowledge of the system. There are places to look for the knowledge, but the Tao of Tech Support is something you just "have."