The Myths Of Network Utilization & Automated Metrics
By Barry Koplowitz
The Interpath Technologies Networking Myths Series™ This artilce is also availabe as a Podcast of "The Sniffer Guy" though iTunes. Mythology is not fantasy or lies. It describes a basic truth—but as metaphor. If you understand that it is describing a fundamental reality—by telling a story—you know to look for the reality behind the story. If you believe the story—you will chase your tail until you decide to ignore everything. So, when I say that automated testing and metrics like utilization are myths, I mean that they are metaphors—stories that can direct you to the truth. They are not the truth in themselves. What is the IT person's mission? The mission is to keep a set of instructions (apps) interfacing with humans (users), over an enormously complex environment (enterprise networks and the Internet itself). Many senior and seasoned IT professionals have developed a sense over the years that something isn't quite right. There is too much "hoping" going on and not enough "knowing." An IT support person is attempting to visualize the transport of discrete units of data flowing across an "Interpath" between all the applicable components--for hundreds of applications and thousands of users. How is that done? Black magic mostly--and a bag of black box tools. Often these tools are purchased by less technical management and the support personnel receive little training. Even if they do receive training, that training is focused on how to manipulate the console and install the product. Seldom is there much raw technology exchanged. Let's face it, most of the time we don't really know what the tool is doing--only how to run it--yet we take their results as law. Tools poll and push and query and ask a selection of the components involved to report in about themselves. Then, by algorithms known to only a few people at the vendor's office, (if they are still employed there), the data is SUMMARIZED and presented to you, the IT professional who has received that week of training. Once you think about the fact that the information is just a summarization--the problem--and mythology--begins to come clear. AUTOMATED TESTING: Picture a business user sitting at a PC workstation in a remote office of your company. Many times a day--a network transaction that should take 15 seconds--instead takes 90 seconds--maybe longer. You have an automated tool that "simulates" that transaction every hour. In 24 polls we see the following: * 4 Samples show 10 seconds (much faster than normal) * 16 Samples show 15 seconds (expected) * 4 Samples show 90 seconds (long enough to drive a user crazy.) * The average is 26.7 seconds. Would an alarm sound in your monitoring system? With an expectation of 15 seconds--would you be concerned enough to escalate this issue? Sure, it's nearly double what is expected--but there are times that are quite excellent and anyway--it's still not even 30 seconds! Maybe it's not so critical... It is Critical--to that user experiencing 90 second delays from time to time--or more frequently! What is your Polling Interval? Hourly? -- Does that include night-time hours? If so, your metrics are already mostly useless (unless you have equal activity at night as during the day). If it is hourly--and day time only hours--it still means that 4 out of 24 business hour polls are 90 seconds. That does NOT mean 4 times--it means 4 times "POLLED." What about in-between polls? It is happening many more times than the 4--that were able to be seen! It could be happening more than enough to significantly damage production and good will. Is the logic in your automated tool really taking this into account? If so, what does it really tell you anyway? It can't say why--and it can't say how. It's a fire alarm that only rings if you pull it at the right time and frequently enough. That discussion opens the door to wondering about metrics in general. UTILIZATION METRICS: What does it mean when a tool tells you that your network/segment/WAN is 42% utilized? Does it mean that the wire is only used 42%? No. If you are measuring a transport medium, that medium is utilized 100%--whenever it is used. If you put an electrical current on a wire—can you use only 42% of that wire? No. It is either hot or cold. On or off. Does it mean that you only used it 42% of the time? No. It is getting closer though. To say that it means that the segment is 100% utilized—but only 42% of the time—is ALMOST correct. Time is the key word here. How is it measuring time? Is it always monitoring or is it polling? It is polling--probably. (Sorry--polling and metrics are always hand-in-hand). What is your polling interval? Every 10 minutes—every hour—once per day? How long do you monitor between polls? Is it a "continuous" value or is it only checked on every hour? Is it an interval that the manufacturer has never even disclosed or acknowledged? For the sake of discussion--let's answer these questions. Imagine we have a tool that monitors a wire for 1 minute every hour. That tool reports that your "Utilization" is 42%. That means that for the one minute every hour--that your wire is monitored--for that one minute ONLY-- it is 100% utilized. In other words--42% of those polls showed 100% (rather than 0%) utilization—for 1 minute per hour. Does that mean that it has 58% more room and all is well? No. So, what do you know about the utilization of your wire that can help you? Not as much as the monitoring tool's manufacturer would like you to think. Are you attempting to reconcile the results of two different tools? That is extremely difficult--impractical--and possibly ruinous. How much money are you spending—not only on the tool itself—but on decisions made as a result of that tool's reports? However, this doesn't mean that there is no point to those metrics. They are signpost pointing towards reality--fire alarms. Nevertheless, you should certainly not quote them when reality seems to imply different things. Reality is going to be more accurate that such metrics—every time. This is the reason for the Interpath Transactional Analysis approach to monitoring and reality based testing.
Barry Koplowitz founded Interpath Technologies Corporation in 1999. He was an instructor for Network General and NAI traveling around the USA teaching for Sniffer University ® and is a consultant to large enterprise environments in the area of Network & Application Analysis and Troubleshooting. He is the writer and host of The Sniffer Guy podcast. http://www.interpathtech.com