Log analysis challenges

Logs are defined as records of incidents or observations. Logs are generated by a wide variety of resources, such as systems, applications, devices, humans, and so on. A log is typically made of two things; that is, a timestamp (the time the event was generated) and data (the information related to the event):

Log = Timestamp + Data

Logs are typically used for the following reasons:

  • Troubleshooting: When a bug or issue is reported, the first place to look for what might have caused the issue is the logs. For example, when looking at an exception stack trace in the logs, you may easily find the root cause of the issue.
  • To understand system/application behavior: When an application/system is running, it's like a black box, and, in order to investigate or understand what's happening within the system/application, you have to rely on logs. For example, you might log the time taken by various code blocks within the application and use this for understanding the bottlenecks and fine-tune your code for better performance.
  • Auditing: Many organizations have to adhere to some compliance procedures and are compelled to maintain the logs. For example, login activity or transaction activities carried out by a user are commonly captured and maintained in logs for a certain duration of time for the purpose of auditing, or for the analysis of malicious activity by users/hackers.
  • Predictive analytics: With advancements in machine learning, data mining, and artificial intelligence, a recent trend in analytics is predictive analytics. This is a branch of advanced analytics that is used to predict unknown events that may occur in the future. The patterns that result in historical and transactional data can then be utilized to identify opportunities, as well as risks for the future. Predictive analytics also lets organizations become proactive and forward thinking, anticipating outcomes and behaviors based on the results acquired and not just on some assumptions. Some examples of the use cases of predictive analytics are when suggesting movies or items for users to purchase, detecting fraud, optimizing marketing campaigns, and so on.

Based on the previous sample/typical usages of logs, we can come to the conclusion that logs are data rich and can be used in a wide variety of use cases. However, logs come with their own set of challenges. Some of the challenges are as follows:

  • No common/consistent format: Every system generates logs in its own format, and as an administrator or end user, it would require expertise in understanding the formats of logs raised by each system/application. Since the formats are different, searching across different types of logs would be difficult. For example, the following screenshot shows the typical format of SQL server logs, Elasticsearch exceptions/logs, and NGNIX logs:
  • Logs are decentralized: Since logs are generated by a wide variety of resources, such as systems, applications, devices, and so on, logs are typically spread across multiple servers. With the advent of cloud computing and disturbed computing, it is now much more challenging to search across the logs, as typical tools like SSH and grep won't be scalable in these cases. Hence, there is need for centralized log management, which assists the analyst/administrators in searching for the required information easily.
  • No consistent time format: Since logs are made up of timestamps, each system/application logs the time in its own format, thus making it difficult to identify the exact time of the occurrence of the event (some formats are more machine-friendly than human-friendly). Correlating events occur across multiple systems at the same time. Some example time formats that can be seen in the logs are as follows:
Nov 14 22:20:10
[10/Oct/2000:13:55:36 -0700]
172720538
053005 05:45:21
1508832211657
  • Data is unstructured: Log data is unstructured and thus it becomes difficult to perform analysis on it directly. Before analysis can be performed on it, the data would have to transform into the right structure so that searching or performing analysis would become easier. Most analysis tools depend on structured/semi-structured data.

In the next section, we will explore how Logstash can help us in addressing the preceding challenges and thus ease the log analysis process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.197.164