How to understand the normal rate of occurrence

Imagine that you're troubleshooting a problem by looking at a particular log file. You see a line in the log that looks like the following:

18/05/2017 15:16:00 DB Not Updated [Master] Table

Unless you have some intimate knowledge about the inner workings of the application that created this log, you may not know whether the message is important. Having the database be Not Updated possibly sounds like a negative situation. However, if you knew that the application routinely writes this message, day in and day out, several hundred times per hour, then you would naturally realize that this message is benign and should possibly be ignored, because clearly the application works fine every day despite this message being written to the log file.

The problem, obviously, is one of human interpretation. Inspection of the text of the message and the reading of a negative phrase (Not Updated) potentially biases a person toward thinking that the message is noteworthy because of a possible problem. However, the frequency of the message (it happens routinely) should inform the person that the message must not be that important because the application is working (that is, there are no reported outages) despite these messages being written to the log.

It can be hard for a human to process that information (assess the message content/relevance and also the frequency over time) for just a few types of messages in a log file. Imagine if there were thousands of unique message types occurring at a total rate of millions of log lines per day. Even the most seasoned expert in both the application content and search/visualizations will find this impractical, if not impossible, to wrangle.

ML comes to the rescue with capabilities that allow empirical assessment of both the uniqueness of the content of the messages and the relative frequency of occurrence. Let's first focus on the frequency aspect of things with an introduction to counting functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.171.52