Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6 Filtering and extrapolation

This chapter covers

Applying filters to control log events
Implementing the record_transformer filter
Extrapolating information from log events
Injecting environmental information into a log event
Masking elements of log events to maintain data security

In chapter 5, we touched upon using the filter directive to send log events to standard out. Using filters this way, while helpful, is almost a sideshow to the full capability of the directive. The filter directive can help us to

Filter out specific log events, so only particular log events go to particular consuming systems
Filter out specific pieces of a log event message and allow us to record them as unique attributes of the log event (ultimately making it easier to apply logic with that data)
Enrich the log events by amending the tag and timestamp to reflect the dynamic content of the log event record itself (e.g., adjusting for upstream caching of events)
Further enrich log events; for example
- Using plugins that can add geographical location information based on the public IP (known as GeoIP)
- Attaching error guidance by identifying information in the log event(s) (e.g., if an error code is generic but the path to the code that generated it matched something, then annotate the log with a qualification, such as “root cause is DB connection error”)
- Adding contextual information, which can help you perform further analysis later (e.g., the Fluentd worker_id and server_id)
Apply changes to address security considerations (e.g., anonymization, masking and redaction of any sensitive data that finds its way into log events)
Calculate or extrapolate new data from the log event and its context (e.g., take two timestamps and calculate the elapsed time)
Filter out log events that confirm everything is running as expected

This chapter will explore why we may want to filter log events in or out and how the filters are configured to do this. As filters can be used to manipulate event logs, we’ll look at how this can be done, whether we should, and why we might want to do this.

6.1 Application of filters

We have just seen a brief summary of the breadth of possibilities for using filters; let’s dig into some of these applications to better understand why we might want to use them.

6.1.1 All is well events do not need to be distributed

A lot of log information will actually indicate to us that things are running as they should. Getting these events is important; as noted management consultant and author Peter Drucker said, “You can’t manage what you don’t measure.” Perhaps even more pertinent, Dag Hammarskjöld (economist and secretary-general of the United Nations) said, “Constant attention by a good nurse may be just as important as a major operation by a surgeon.” In other words, we need to actively observe, quantify, and qualify the state to know everything is well. This steady, constant observation will allow us to make minor adjustments to keep things well, rather than needing skilled but major change.

But we do not need to share with everyone every log event that confirms things are fine. Like a heart monitor, when things are not right, the alarms and signals all go off to ensure everyone is aware help is needed. If everything is within expected parameters, the data doesn’t go further than the monitor’s display. For example, Elastic Beats can generate heartbeat log events, such as 2017-12-17T19:17:42.667-0500 INFO [metrics] log/log.go:110 Non-zero metrics in the last 30s: beat.info.uptime.ms=30004 beat.memstats.gc_next=5046416. This is probably a log message that doesn’t need to be retained, or if retained, does not need to be distributed and is instead logged locally for a short period.

If we are getting log events indicating all is well and unlikely to yield any more insight, do we need to propagate the events to downstream systems? This will mean it will be easier to see significant events. By filtering out mundane information, we are also controlling costs. Physical infrastructure has a maximum amount of data it can transmit before we need more hardware. We pay for public network capacity based on bandwidth (i.e., data volume), so consuming the bandwidth distributing every “heartbeat” log event can accumulate cost with little gain—more networking hardware, more bandwidth, and so on. Over the last couple of years, it has been observed that the cost of data egress from cloud platforms can influence commercial decisions. In other words, pushing data out of one cloud to another location costs money, and that cost can become significant. But we don’t want to cut off that one event in a hundred that is important and worth transmitting.

6.1.2 Spotting the needle in a haystack

Filtering can be used to isolate those innocuous-looking events that are a warning of more significant problems to come. These occur when someone has wrongly classified what should be a warning log event as informational or even debug. The ability to identify and flag these kinds of events is important when you can’t get the logging to generate more helpful events (e.g., off-the-shelf software, legacy solutions that no one wants to touch).

6.1.3 False urgency

Sooner or later, we will encounter a situation where a warning or error log occurs, and the issue escalates up the management chain. Lots of “shouting” starts about a problem that must trump all other priorities to be fixed. But ultimately, the consequences of the issue and its impact don’t require everything to be dropped; yes, an error occurred, but it isn’t the end of the world. What has been detected is a problem that could have been handled by routine day-to-day operations tasks. With filters, we can define rules that can help us separate the “world will end” events from the “please fix me when you log in and direct the information accordingly” events.

Even better, if there are known operational steps to address an issue, we add the reference to the remediation process to the log event. So when the alerts are triggered, they’ve got the remediation information linked to them. Unnecessary escalation is avoided, actions that can compound a problem aren’t taken, and so on.

6.1.4 Releveling

The previous application is when a log event can be generated and tagged with a log level higher than it should be—for example, Error instead of Warning or Info. As before, if people can’t or won’t fix the issue, then we can modify the log event as it gets passed on. This is done by manipulating the log event to change the record’s log level to a less alarming and more accurate classification. Alternatively, tagging the log event with additional attributes with commentary shows this is a known incorrect log level.

6.1.5 Unimplemented housekeeping

As long as software development exists, business drivers will prioritize functional capabilities over nonfunctional ones such as housekeeping (archiving or deleting folders of processed files, etc.). When this is a characteristic of a legacy application, it is not unusual for people to fear changing anything to improve the system, such as cleaning up after itself. The typical result is that routine support processes are done manually, which we may then automate via scripts and just have to run in certain circumstances. Filtering out the indicators in logs alerting that housekeeping tasks need to be done (e.g., Fluentd capturing log events relating to disk space, for example) is a small step that triggers the execution of housekeeping tasks.

6.2 Why change log events?

Some filters allow us to modify log events. Why should we consider this, and how can this capability help us? Some might argue that modifying log events is also tampering with the “original truth,” so should we even allow it?

6.2.1 Easier to process meaning downstream

When we process log events, we often need to extract more meaning from the logs provided. The log event is unstructured, semi-structured, or even structured but needs to be reparsed to a suitable data structure (e.g., reading JSON text files). The structure can help filter, route, create new reporting metrics, and measure using the log event data. Once we have invested the effort to extract meaning from a log event, why not make that easy to reuse downstream? In other words, apply the principle of DRY (don’t repeat yourself). So, if you have extracted meaning and structure, don’t make people do it again later. Simply pass the derived information with the log event.

6.2.2 Add context

To process an event correctly, we may need additional context. When trying to diagnose why an application is performing poorly, it isn’t unusual to look at what else was happening around the events—for example, did the server have a large number of threads running? Sometimes it is easy to link this contextual data to the log event. The easiest way to associate additional context is to add it to the log event.

6.2.3 Record when we have reacted to a log event

We have already referred to the possibility that we initiate some sort of action due to a log event. In retrospect, it can be helpful to understand which event(s) triggered an action. Adding information to the triggering log event can be a more straightforward, acceptable action rather than correlating separate log events later to show cause and effect.

6.2.4 Data redaction/masking

When we are developing software, it is often helpful to log an entire data object being processed during the development phase. This isn’t a problem during development and testing, as it’s just test data. But if the data includes sensitive information, such as data that can be used to identify individuals (PII, personally identifiable information), as in health care or credit card use, for example, it can become a challenge. Any part of an IT system that handles such data becomes subject to a lot of legal, legislative, and contractual technical requirements. Such requirements come from international, national, and regional data laws such as

GDPR (General Data Protection Register)
HIPAA (Health Insurance Portability and Accountability Act) and other health care legislation
PCI DSS (Payment Card Industry Data Security Standard)

You can add to this list that many companies may also wish to treat some financial accounting data with the same sensitivity. The obvious solution would be to fix the software so it doesn’t log the data or limit the impact of such logging, limiting the “blast radius” of needing to apply extra extremely stringent controls, security mechanisms, and reporting. Fluentd provides an excellent means to address this:

Remove or redact/mask the data from the logs. Masking is typically done by replacing sensitive values with meaningless ones. Redaction is removing information from sight by either deleting it from the communication, or simply never making it visible in logs, and so on. We can see data being masked on payment card receipts with asterisks or hash characters replacing your card number. Any approach to masking can be used as long as it can’t be reversed to get back the original data.
Co-locate Fluentd with the log source so that the amount of infrastructure subject to the elevated data security requirements is limited. The smaller the scope of elevated security, the smaller the “attack surface” (i.e., the smaller the number of servers and software components that may be subject to malicious attacks attempting to get the data, the better).
Connect the main application’s logging directly to Fluentd using RPC (remote procedure call ) techniques rather than log files, so the log events are transient. We will see more on directly connecting applications to Fluentd in chapter 11.

It would be easy to read into what has been said here, concluding that security is an undesirable cost, and avoiding security is good. The reality is that today, security should be deemed an asset, and the application of security is a positive selling point. SaaS solution providers like Oracle do use their security as a virtue. The cost impact of data loss, particularly when the level of impact is not limited or understood, can easily outweigh the savings perceived of not having invested in securing against the risks. But the smaller the potential blast radius, the better. These days, a breach (malicious or accidental) is a matter of when, not if. The adage “assume the worst, hope for the best” is very appropriate.

6.3 Applying filters and parsers

In this section, we’ll look at the practical configuration and use of filters and parsers to

Manage the routing of log events
Manipulate log events

To manipulate log events, we may need to impose or extract some meaning from them. To extract that meaning, we need to parse unstructured log event content, so we will need to touch upon the use of parsers.

6.3.1 Filter plugins

Filter as a directive is like a match, in so far as the directive can include tags in the declaration (e.g., <filter myApp> or <filter *>). The difference is that if the log event complies with the filter expression, rather than the log event being consumed, it can pass into the next part of the configuration without resorting to a copy action, as illustrated with the match directive in chapter 3.

Within the Fluentd core are the following filter plugins:

record_transformer—The most sophisticated of the built-in filters; also provides a diverse set of options for manipulating the log event.
grep—Provides the means to define rules about log event attributes to filter them out of the stream of events. Multiple expressions can be provided to define cumulative rules.
filter_parser—Combines the capability of parser plugins with the filter.
stdout—We have seen this plugin at work. Every event is allowed to pass through the filter but is also written to stdout.

Fluentd comes with a core set of filter plugins; in addition to this, there are community-provided filter plugins. Appendix C contains details of additional plugins that we believe can be particularly helpful.

6.3.2 Applying grep filters

The grep parser allows us to define a search expression and apply it to a named attribute in the log event. For example, we could extend our routing such that the events with log entries explicitly refer to computers in the text. This is the basis of the following scenario; while a computer reference is relatively meaningless, we could easily replace or extend it with a reference to a cataloged error code. For example, a WebLogic notification starts with BEA-000.

While we are demonstrating the use of the filter, let’s use a different output plugin. Chapter 1 introduced EFK (Elasticsearch, Fluentd, Kibana), so we’ll bring Elasticsearch into the mix to show more of this stack (appendix A provides instructions on how to install Elasticsearch). The Fluentd configuration we’re going to use is shown in figure 6.1.

Figure 6.1 Application of filter and Elasticsearch as an output

We can apply a filter using the grep plugin, which will execute a regular expression whose result can be treated in a binary manner. The result will determine whether the log event is to be stored. This is all done by setting the directive to be regexp. We need to define the key, which is the log event’s element to examine. In this case, we want to look at the core log event called msg. Once we’ve identified where to look, we need to provide a pattern for the regex parser to look for. Bringing this together with the attribute name gives us

<regexp>
    key msg
    pattern /computer/
 </regexp>

With the filter defined, we need to send any matching log events to our installation of Elasticsearch. We do this using the match directive and a @type value of elasticsearch. The Elasticsearch plugin is incredibly configurable, with over 30 attributes covering behaviors ranging from caching control to determining how the log events are populated and indexed in Elasticsearch, and so on. We’re not going to cover all of these, as we’d end up with a book explaining Elasticsearch, and for that, you’d be better off with Elasticsearch in Action by Radu Gheorghe, et al. (www.manning.com/books/elasticsearch-in-action); however, we should touch upon the most common attributes that you’re likely to encounter.

As with the MongoDB connection, details must be provided to address the server (attributes host and port). Access credentials are likely to be required (user and password). As we haven’t set up any such restrictions using the out-of-the-box deployment, we don’t need to provide them. The scheme or type of communication, such as http or https, will dictate whether additional details will be needed (e.g., where the certificates to be used can be found); end-to-end SSL/TLS is always good security practice.

Once the means to connect to Elasticsearch have been defined, we need to declare where inside Elasticsearch the data should go (index_name) and what data to provide, such as whether to include the tag value in the core log event record (include_tag_key, tag_key). Remember, we also set how the data being passed is being represented. With the relationship between Elasticsearch and Logstash, it should come as no surprise that the plugin allows us to tell Fluentd to present the log events as Logstash would set the attribute logstash_format to true.

The Elasticsearch plugin also leverages helper caching plugins; thus, we need to consider how this can impact the behavior. For ease and speed, let’s use a memory buffer being set to flush every 5 seconds by using the attribute flush_interval 5s. This configuration can be seen in the following listing.

Listing 6.1 Chapter6/Fluentd/file-source-elastic-search-out.conf

  <filter *>                 ❶
    @type grep               ❷
    <regexp>                 ❸
 
      key msg
      pattern /computer/
    </regexp>
  </filter>
 
<match *>                    ❹
 
  @type elasticsearch
  host localhost
  port 9200
  scheme http                ❺
 
  reload_on_failure true
  index_name fluentd-book    ❻
 
  logstash_format false      ❼
 
  include_tag_key true       ❽
 
  tag_key key
  <buffer>
    flush_interval 5s
  </buffer>
</match>

❶ Allows the filter to process any tag

❷ Defines the type of filter

❸ Defines the field to apply the regular expression to and then the expression, which needs to yield a binary result

❹ All log events that have passed through the filter will now be processed by this match configured to write to file.

❺ While we do not explicitly need to set the scheme, as it defaults to http rather than https, it is worth including it to remind ourselves of the low-security threshold in use. You could also include the username and password as commented out as well.

❻ Defines the index to be used; if unspecified, it will default to Fluentd

❼ Tells Elasticsearch to add to the named index, rather than it creating new ones using a timestamp name, as is the case for Logstash connectivity

❽ Shows we’re telling the Elasticsearch plugin to include the log event tag in the data to be stored, giving it the name key, as shown in the next attribute

Let’s see the result of the configuration. As this uses a file source, we need to run the LogSimulator as well. Assuming Elasticsearch is also running and ready, the following commands are needed to run the example:

fluentd -c ./Chapter6/Fluentd/file-source-elastic-search-out.conf
groovy logSimulator.groovy ./Chapter6/SimulatorConfig/log-source-1.properties

We can verify the records in Elasticsearch with the UI tool by reviewing the index contents, which we configured as fluentd-book. (Appendix A also covers the setting up of Elasticvue for this purpose.) You should find that the index contains the same log events that we sent to stdout.

The idea that we can change log events can be a contentious subject. If you change the original log event, are you modifying the original truth? To use a TV detective analogy, messing with the original log event is like tampering with a crime scene. Shouldn’t Fluentd handle log events like the chain of custody for evidence? Generally, I would agree that the original log event should be retained unmodified. However, we often need to associate additional information to a piece of evidence (extending our analogy, a ballistics report would be attached to the relevant weapon). Rather than trying to keep the details separate, careful attachment of the details can be more helpful.

In the real world, the guidance we use is to keep a copy of the log event unaltered, with one exception—information security. If you need to mask or remove data, consider keeping an unadulterated copy somewhere safe that can be traced back to if necessary. Then any manipulated, extracted values can be kept along with the original. You might consider adopting a naming convention, so when those elements of a log event are manipulated, constructed, or enriched, the origin is clear.

6.3.3 Changing log events with the record_transformer plugin

Using a filter to control which log events are processed or not based on the log event’s contents addresses many previously described scenarios. Modifying log events to add additional contextual information and derived values or to extract and record log event values in a more meaningful and usable manner helps with some of the other scenarios mentioned.

To illustrate how this can work, we’re going to add to our log events new fields in addition to the standard ones, specifically the following:

A field called computer containing the name of the host running Fluentd.
Apply a prefix to the standard message of 'processed-' to illustrate the modification of existing values.
The example log messages contain a JSON structure that includes a name attribute comprising firstname and surname. This combination could be considered making the log data sensitive to PII rules, as it references an identifiable individual. We will tease out the firstname and create a new log event attribute called from and delete the surname to address this. There should be no reason for the new attribute from; it does allow us to see how to copy elements.

Our log event message is structured and, when received, will look like

{"msg": "something about computers",
 "name":
    {
       "firstname": "Computer",
       "surname": “AI”
    },
 "age": 404
}

Record directive

The essential part of the filter definition is the record directive. Each line within the directive represents a field name and a field value. For example, if we wanted to add a new field called myNewField with a literal value of aValue, then we would configure the directive as follows:

<record>
  myNewField aValue
</record>

Just incorporating a design-time literal value isn’t going to provide much value. To tell Fluentd that it needs to process a derived value, we need to wrap the expression within ${}. We can reference the other fields within the log event by placing the name within the brackets (e.g., ${msg}). To access the log event message, we use the notation record["<field name>"] (e.g., record["msg"]). Record is a reference to a function made available to use.

Within the characters ${}, we are being allowed to use a small subset of Ruby objects, functions, and operators, including some provided by Fluentd. To allow this, we can include an attribute within the filter attribute enable_ruby; when this is set to true, it will allow the full power of the Ruby language to be used. This is defaulted to false, as it creates more work for the parser, such as ensuring that it can resolve dependencies, and so on; to keep things efficient, it’s best not set to true unless necessary.

Accessing nested JSON elements

To obtain the firstname element, we need to navigate the JSON structure within the message part of the log event. This can be done with either the standard record method—for example, record["name"]["firstname"]—which would traverse to the firstname as a child attribute but requires the attribute to be present. This can be a problem if part of the structure is optional, as any part of the path that is missing will trigger a run-time error. The alternative approach is to use a function called dig provided by the record operator. The syntax is remarkably similar; however, a nil result is provided rather than an error if the path does not exist. The dig function is record.dig ("msg", "name", “firstname”). This does require the enable_ruby to be set to work.

JSON element deletion

The record_transformer includes several attributes that allow the control of the composition of the log event elements. This can be done by using optional attributes in the configuration for listing elements to delete (remove_keys) or defining which elements (other than mandatory ones like tag) should remain (keep_keys). This includes the notation to traverse the JSON structure (which also works in other parts of the plugin). The order of attributes in the configuration is important. In our example, the remove_keys attribute needs to appear after the record directive; otherwise, we will find ourselves without an element to copy. To delete specific elements within a structure, we use the attribute remove_keys with the path through the object, such as $.name.surname. In the notation, $ (dollar sign) effectively represents the root of the log event. This is then followed by the attribute name using dot notation to traverse the structure. This does go back to the previous point of whether we can trust the path to exist. A single remove_keys attribute can be extended to more elements by making it a comma-separated list; for example, remove_keys $.name.surname, $.name.anotherName, $.somethingElse.

Value replacement

The record operator includes a function that allows us to replace values in JSON elements. This is necessary for masking data and correcting values, such as the error message level, as described in section 6.1.4. This is done by referencing the element name and then invoking the function gsub followed by the parameters containing the value to replace and its replacement. For example, in our data set, the msg contains some occurrences of 'I'. Using the expression ${record["msg"] .gsub('I', 'We')}, the use of 'I' can be replaced with 'We'. In the following listing, we have included this expression. Rather than replacing the msg with the substituted string, a new attribute has been added to make the comparison easy.

Listing 6.2 Chapter6/Fluentd/file-source-transformed-elastic-search-out.conf

<filter *>
  @type record_transformer
  enable_ruby true                                  ❶
  <record>
    computer ${hostname}                            ❷
    from ${record.dig("name", "firstname")}         ❸
    msg processed ${record["msg"]}                  ❹
    msg_gsub ${record["msg"].gsub('I ', 'We ')}     ❺
 
  </record>
  remove_keys $.name.surname                        ❻
 
</filter>
 
<filter *>
  @type stdout
  <inject>
     worker_id_key                                  ❼
 
  </inject>
</filter>

❶ Enables Ruby to support the record.dig approach of locating values

❷ Adds an attribute using the known contextual values

❸ Creates a new value by finding a sub-element and retrieving its value

❹ Modifies the msg element by adding textual content

❺ Performs the string substitution of 'I' with 'We'. A white space character is included to avoid accidentally picking up characters, in other words.

❻ Deletes the surname element to ensure we’re not at risk with PII considerations

❼ The inject directive shown here allows the worker_id for the process to be added to the log event. The inject directive allows some different useful values to be added to provide additional context.

Let’s see the result of the configuration. As this makes use of a file source, we need to run the LogSimulator. So, to run the example, the following commands are needed:

fluentd -c ./Chapter6/Fluentd/file-source-transformed-elastic-search-out.conf
groovy logSimulator.groovy ./Chapter6/SimulatorConfig/log-source-1.properties

Before starting the UI to review log events stored in the fluentd-book-transformed index in Elasticsearch, the changes from the record_transformer and the inject directive should be visible on the console because of the stdout filter.

Predefined values

The record_transformer also helps by providing some predefined values, including

Hostname—The name of the computer host
Time—The current time
Tag—The current log event tag

With the Ruby flag enabled, we can extend the ability to get and set values to access any public class methods by using the ${}. For example, using a method in a class would use "${#<class>.<method>}" where <class> is the name of the Ruby class and <method> is the corresponding public class method. "#{Dir.getwd}" will retrieve the current working directory.

6.3.4 Filter parser vs. record transformer

The record_transformer plugin provides us with the means to work with the log event as a JSON payload. If the log event is simply a single block of text, we will likely need to parse it to obtain meaningful values. In chapter 3, we introduced the use of parsers to extract meaning out of log events. The parsers we saw, like regexp, also work with the filter directive. As a result, where a regexp expression defines the parts of the string to capture as named values, the parser’s behavior is extended such that the named elements will be made top-level log event attributes.

Let’s take the same expression (included in the following listing) and put it into the context of the filter directive. The essential difference here is that we need to tell Fluentd which log event attribute to process before the parser definition. This means we can target a specific part of a log event. We can also use consecutive filters to break down nested structures if we wanted. In listing 6.3, we expect the output to result in additional attributes called time, level, class, line, iteration, and msg. Like the record_transformer plugin, we can determine whether the log event attribute that is processed is retained or not using the reserve_data configuration element. We can make the control a bit more nuanced by adding remove_key_ name_field and setting it to true; Fluentd will remove the original attribute only if the parsing process was successful.

Listing 6.3 Chapter6/Fluentd/rotating-file-read-regex.conf—parse extract

<filter>
  @type parser
  key_name log                               ❶
  reserve_data true                          ❷
 
  <parse>
    @type regexp
    expression /(?<time>S+)s(?<level>[A..Z]*)s*(?<class>S+)[^d]*(?<line>[d]*)-(?<iteration>[d]*))[s]+{"log":"(?<msg>.*(?="}))/
    time_format %Y-%m-%d--%T
    time_key time
    types line:integer,iteration:integer     ❸
 
    keep_time_key true
  </parse>
</filter>

❶ Identifies the log event attribute to parse

❷ Tells Fluentd to retain the existing value so if there are more attributes, we can retrieve them downstream

❸ Tells Fluentd what data type the extracted values should be, making further transformation easier

6.4 Demonstrating change impact with stdout in action

As the application generating logs already needs to be securely locked down to limit the impact of recording this information, the Fluentd installation will be collocated with the source. As manipulating log events is discouraged, the decision has been made to

Add to the Fluentd configuration so that stdout outputs show the unmodified log event, so they can be observed in a contained but transient situation
Allow the modified log events that are desensitized to go into Elasticsearch

Chapter6/Fluentd/file-source-transformed-elastic-search-out.conf is the starting point for making the required changes.

6.4.1 A solution demonstrating change impact with stdout in action

You can compare your configuration modifications to our implementation of the solution shown in Chapter6/ExerciseResults/file-source-transformed-elastic-search-out-Answer.conf. The fundamental changes are the positioning of the filters with the type set to stdout.

6.5 Extract to set key values

Sometimes we need to be clever and set the primary attributes of the log event (time and tag) more dynamically. This may be because we don’t want to have a static value as part of the tag configuration (we typically set reflecting the source), but rather have it set dynamically reflecting an attribute of the log event. In doing so, we set ourselves up to filter more effectively with the match expressions. For example, we want the log event tag to reflect the name of a microservice, but we’re collecting the log events from Kubernetes-level stdout, so a single feed will reflect from multiple services. As a result, we need to extract the value needed as the tag from the log event record.

When it comes to the timestamp, we may wish to adjust it to a time value in the log event, reflecting when the event occurred rather than when Fluentd picked the log event up. This may be necessary, as there is some latency between the event generation and Fluentd getting it.

The extract feature allows us to perform such a task. Unlike our filters and parsers, the extract directive can be incorporated into the source, filter, and match (out) plugins such as exec. The extract mechanism is very flexible in its use, but it is limited to only manipulate the tag and time log event attributes.

The extract parameters allow us to declare how to interpret the time value. The value to be used could be the time represented as seconds from epoch (midnight 1 Jan 1970 [UTC]). For example, 1601495341 is Wednesday, 30 September 2020 19:49:01. Another possible format can be the ISO 8601 standard (more at www.w3.org/TR/NOTE-datetime).

Let’s consider a simple example. We should set the tag using the value obtained by the exec source plugin. As with our previous use of exec, we’ve chosen a simple command that is easy to use or adapt to different OSes. We also get a structured object, so there are no distractions from needing to parse the payload before retrieving a value. The source directive needs to set the tag attribute to a value that won’t come from our exec plugin.

To ensure we can see the impact of the data changing, let’s set the run_interval to 10 seconds. This means the same file will be captured as the input but gives us time to save changes to the file between executions. Try changing the file when running the configuration. We’ve told the parser to ensure that the exec command is then treated as a JSON object.

Finally, we have included in the extract the tag_key attribute; this tells Fluentd which log event element should be retrieved and used to set the tag. We copy the contents of the log event element to another tag to preserve the original log event record. This attribute is called keep_tag_key, and we’ve elected to retain the captured payload unmodified. This is demonstrated in the following listing.

Listing 6.4 Chapter6/Fluentd/exec-source-extract-stdout.conf

<source>
  @type exec
  command more TestDataexe-src.json
  run_interval 10s                     ❶
 
  tag exec
  <parse>
    @type json
  
 </parse>
  <extract>                            ❷
    tag_key msg                        ❸
    keep_tag_key true                  ❹
  </extract>
</source>
 
<match *>
  @type stdout
</match>

❶ Keeps the source capture iterating so we can modify the payload and see the consequences

❷ The extract directive

❸ Identifies the name of the element to use as the tag going forward

❹ Indicates we want to leave the retrieved log event content unmodified

To run this configuration, we need only run Fluentd with the command fluentd -c ./Chapter6/Fluentd/exec-source-extract-stdout.conf. Once Fluentd is running, change the value in the TestDataexe-src.json file and see how the change impacts the tag.

6.6 Deriving new data values with the record_transformer

With the ability to exclude log events from subsequent actions and extract specific values from log events, we can now consider the possibility of generating derived values and metrics. For example, we may want to understand how often errors occur, or which components or even which parts of the codebase are the source of most errors. While generating such measures using Elasticsearch or Splunk is possible, they are used sometime after the event analysis. If we want to be more proactive, we need to calculate the metrics more dynamically as the events occur.

In chapter 1, we introduced the idea that monitoring covers both textual-based contents and numeric metrics. Both log events and metrics are often, but not always, used within a time-based context (log events are seen in time order, and metrics often measure details such as a value over a period, such as CPU usage per second). Fluentd’s core doesn’t have the capabilities to generate time series–based metrics. However, some plugins written by the community, including the contributors to the core of Fluentd, can provide some basic numeric and time-series measures. As we’ll see later, there are ways to address time-series data.

Time-series data points are not the only valuable numeric data that could be helpful. For example, how an alert is signaled may be a function of the transaction value (unit value × quantity) or how long or how many times a system has been retrying and failing with a database connection (current time – original error timestamp). The record_transformer can generate numeric metric values by taking data values and performing mathematical operations.

Using our example data set, we could consider replacing the age with the birth year, as the expression would be the current year minus age. For example:

  <record>
    birthYr ${Date.today.year - record['age']}
  </record>

While this may not be a very real-world example, it does show the art of the possible.

NOTE Quoting attributes needs to be done with care. If wrongly used, you will experience odd behaviors, where some log event attributes are found and others are not. When using the record['xxxx'] approach, you need to use single quotes. Double quotes are necessary when using the dig method—that is, record.dig("xxxx").

6.6.1 Putting the incorporation of calculations into a log event transformation into action

Some people hold the view that adopting birth years is less personal than age. This means you’ve been asked to amend the logged data that is stored downstream. The case has been made that the birth year is added, and the age attribute is removed. The Chapter6/Fluentd/file-source-transformed-elastic-search-out.conf configuration file has been identified as the starting point to incorporate the necessary changes. The same test source (./Chapter6/SimulatorConfig/log-source -1.properties) can be used to exercise the configuration. To make it easy to identify output from this scenario, change the index_name in your match configuration.

Answer

Our implementation of the configuration can be seen in Chapter6/ExerciseResults/file-source-transformed-elastic-search-out-Answer2.conf. The essential changes are the inclusion of birthYr ${Date.today.year - record ['age']} in the record directive and after the remove_keys $.age directive. The results can be examined in the contents of Elasticsearch using the UI as previously described.

6.7 Generating simple Fluentd metrics

Fluentd has an excellent partner project under the control of the CNCF in Prometheus (https://prometheus.io/), whose role is to handle and create metrics-based data. Prometheus is typically also associated with Grafana for the visualization of such data. Prometheus and Grafana are associated with microservices. Like Fluentd, there aren’t any real constraints or reasons for not using such tools outside of a microservice ecosystem.

Given the mention of Prometheus, it is worth seeing how Fluentd can fit in with Prometheus’s architecture and broader metrics and monitoring ecosystem. As the following figure shows, Fluentd can relate to Prometheus at several points.

As the figure shows, Fluentd has several possible relationships with Prometheus, covering

A data feed into the Push Gateway as a source from which Prometheus can calculate metrics.
A feed of Fluentd internal metrics in a Prometheus format ready to be processed by the server (no preparation step needed from the Push Gateway). This is achieved using the monitor_agent plugin.
A channel for recording alerts for metrics via the Alert Mgr.

Prometheus architecture and how Fluentd can relate to it

The Prometheus plugin for Fluentd (installed with fluent-gem install fluent-plugin-prometheus) provides several measure options. The Prometheus plugin allows us to create metric values in the filter and match directives.

More about the Prometheus plugin can be found at https://github.com/fluent/fluent-plugin-prometheus, and information about Prometheus can be found at https://prometheus.io. There are also several books on the subject, such as Manning’s Microservices in Action by Morgan Bruce and Paulo A. Pereira (2018) (www.manning.com/books/microservices-in-action), which can also help.

Prometheus’s value lies in processing event series data and extracting and providing metrics data. If we can easily avoid sending every log event to Prometheus (or any other tool) to calculate basic metrics, there is an obvious case for not doing it. After all, why pass all this data around? As previously mentioned, there are community plugins to support time-series measures. The currently available plugins we think are worth considering for these kinds of requirements are

fluent-plugin-datacounter (http://mng.bz/voX7)
fluent-plugin-numeric-counter (http://mng.bz/4j9w)

Both plugins work in a similar fashion. The data counter will count log events based on matches to regular expressions. The numeric counter is looking to apply numeric meaning to the values. For example, we could use the numeric counter to count log events if an event attribute has a value in the range of one to ten. Both count over a defined period and emit log events based on the occurrences.

For instance, in our previous illustration of filtering, we isolated log events that referred to the word computer in the msg attribute of the event. We could change this to record how many log events every minute include a reference to computer, rather than filter these log events in or out.

In listing 6.5, we have amended the configuration so that the log event’s element to examine is msg, as specified by the count_key attribute. We’ve only defined a single expression using pattern1 and used count_interval using the standard Fluentd notation as to the duration over which to count—in this case, 1 minute.

Listing 6.5 Chapter6/Fluentd/file-source-counted-elastic-search-out.conf

<match *>
  @type datacounter        ❶
  @id counted
   tag counted
  count_key msg            ❷
  count_interval 1m        ❸
  aggregate all
  output_messages yes
  pattern1 p1 computer     ❹
 
</match>
 
<match *>
  @type elasticsearch
  host localhost
  port 9200
  index_name fluentd-book-counted
  scheme http
  logstash_format true
  reload_on_failure true
  include_tag_key true
  tag_key tag
  <buffer>
    flush_interval 5s
  </buffer>
</match>

❶ Defines the match directive to use the datacounter plugin

❷ Tells the plugin which element of the log event to examine

❸ Defines the period over which we are counting events

❹ By providing a numeric sequence of patterns, we can include the individual patterns.

Typically, we would not expect a match directive to allow any events onward without using a copy plugin. However, as the plugin utilizes the underlying emitter helper plugin, it can consume the matched log events and emit new events to be consumed downstream. To run this configuration, we need to install the fluent-gem by executing the command fluent-gem install fluent-plugin-datacounter.

The way threads and timing are handled within the plugin means that while there are inbound log events, the calculated values are not written to Elasticsearch. As a result, depending on the timing, you might not see the metrics written immediately.

As with the previous Elasticsearch scenarios, it is easier to see what is stored by changing the index name; for example, fluentd-book-counted. Assuming Elasticsearch is ready and running, we can run the scenario with the following commands:

fluentd -c ./Chapter6/Fluentd/file-source-counted-elastic-search-out.conf
groovy logSimulator.groovy ./Chapter6/SimulatorConfig/log-source-1.properties

6.7.1 Putting log event counting into action

The LogSimulator provides the means to set and change the rate at which log events are played through. Try changing the count_interval in the Fluentd configuration file and altering the LogSimulator configuration to send the log events through at different speeds (SimulatorConfig/log-source-1.properties). Add a pattern to the datacounter to locate occurrences of Unix in the message.