If you ask any IT person about their servers, they will in most cases tell you that the server will go down in the middle of the night, on weekends, or during the Super Bowl (you know during the fourth quarter when your team is making its final frantic rush.)
For this article, we will look at some of the tools the Stack8 Managed Services team have developed to help ensure our client's networks run at peak performance.
The company in question a large multinational manufacturing organization were looking to update their enterprise storage. They had a primary goal to eliminate any costly downtime by implementing a RAID configuration in order to have complete redundancy.
What can go wrong will go wrong
Problems began to arise several months after implementation. Initially, the organization paid little heed to it. In fact, their monitoring tool indicated no readily apparent issues. Luckily, even though the monitoring data did not reflect it, the IT team were convinced that something was wrong.
"When things need to be going a certain way, we're waiting for the other shoe to drop.”
The Stack8 Managed Services team were brought in to evaluate the multiple RAID drives. An analysis of the situation revealed that although the array was implemented correctly, a high number of disks were somehow randomly reassigned as JBODs or offline which were not picked up by their monitoring tool. The hard drives would then bypass these disks and in turn, cause the remaining disks to be overworked and the whole virtual machine host prone to failure.
To address this challenge the team from Stack8 developed a script to help automatically alert the IT team whenever this type of error occurred. The team built a VM Server RAID Monitoring tool that would work in parallel with their current monitoring tool to continually monitor the disk drives and detect these types of errors.
Instead of just logging data, the VM Server RAID Monitoring tool to automatically notify the IT department by email and text if the drive status had changed regardless of the issue. A detailed summary report of the RAID health check system was also created automatically sent to key stakeholders on a daily basis.
Time to watch the Super Bowl
Since the VM Server RAID Monitoring tool was implemented, it has prevented any system downtime. The IT department no longer has to inspect each drive manually. Best of all they were able to watch the Superbowl in peace this year.