Let’s talk monitoring! Monitoring solutions more often than not end up being one of those install and forget type of technologies that generally leave us wanting more.
Unified Communications Monitoring systems can be so much more than break/fix solutions and should be used in everyday troubleshooting as a way to reduce the amount of effort your resources spend on a wide variety of problems.
Here are a few examples of challenges faced daily by organizations and how monitoring can help alleviate them.
Let’s start at the base level of troubleshooting, with phone registration. Some of these cases can be stubborn and sometimes you aren’t always given the most accurate information possible from your end users.
Example 1: My phone flashes often and I get no dial tone
From this description, it’s hard to tell what the symptoms are, much less what the problem is. What if the problem is intermittent? Is it a firmware bug? It’s tough to diagnose this problem successfully without additional proof of what is happening.
This is where we want to lean on our log monitoring solution for answers; we can run a quick search and see:
- If the phone is disconnecting
- Why the phone is disconnecting
By doing this, you can, confirm within seconds whether this is a physical problem, a network problem or whether the phone never lost registration at all. If the phone didn’t lose registration, it would either be a firmware bug or a misuse of the device.
Example 2: My phone is stuck at a blank screen
Again, there is not much that can be taken from this description other than, the phone is probably not registered to the Call Manager. Again, with a quick search in our log monitoring solution, we can determine if the phone is reaching the Call Manager and if so, why is it failing to register. If it’s not connecting, then we know right away that there is either a network issue or a physical problem which needs to be resolved before any troubleshooting can begin from the UC perspective.
Now, these files are also accessible through RTMT and upon a dire need, they can be fished out of there. However, this technique of collecting the log files can be difficult and could give you a false positive if your approach isn’t meticulous as the logs are per server with each server storing up to 5 files. You can search one file at a time from the interface, or you can download all of the log files onto your PC, check the time range that’s available and do a “search in folders” from notepad++ in order to find your data. Either way, the process is so long and technical that most admins will resort to the trial and error testing rather than getting the exact system information.
Did I mention that your UC server only stores five text files worth of data? This could amount to 2 days or less if you have a busy call manager system. Hardly a valid sample to troubleshoot an intermittent issue.
Trending / Problem Management
So, in the first examples, we saw how a log monitoring solution could accelerate and provide better results on troubleshooting. So, what about those bigger persistent issues, the elusive call quality, or intermittent audio problems. As a UC engineer, it’s easy to say that these are network issues, much harder to prove it.
Example 3: Reported Issue - Major Incident, Contact Center is down, agents are dropping calls
The standard response: The UC team will be able to get a sample call provided by the end user, find it in the Call Detail Records, collect and review traces. By the time you’re ready to make the first round of additional testing and manage to coordinate with the end user, the issue is self-resolved. What’s next?
You take this to your network team, they check their monitoring system, and there’s nothing out of the ordinary. There will be days’, maybe weeks’ worth of manpower spent on this problem, and it potentially will fall into the category of “self-resolved cases” until it is reported again.
With a proper log monitoring solution, not only is the log collection and diagnosis quicker, but it also provides a historical view of all calls that have had the same problem. From here, you can easily see when the problem occurs and what the error is. There is no longer any debate on what the issue is or who the owner of the issue is.
The above screen capture is a chart of all dropped calls over a seven-day segment at a single contact center location. The issue was reported on the 24th and the 25th, and it was cleared by the network team as the network had been stable over the past seven days. With this information, you can go back to your network team with confidence. The problem is consistently a signaling timeout, and it has been occurring over the past seven days.
We have seen how consistent monitoring of your system can take the guesswork and time out of issues faced by organizations every day and ensure that everything is working as it should. The true benefit of log monitoring is that you can solve the majority of big system problems just by solving the small re-occurring issues that you wouldn’t otherwise know about.