5 Routing log events

This chapter covers

  • Copying log events to send to multiple outputs
  • Routing log events using tags and labels
  • Observing filters and handling errors in Fluentd
  • Applying inclusions to enable reuse of configurations
  • Injecting extra context information into log events

So far in this book, we have seen how to capture and store log events. But in all the examples, routing was simply all events going to the same output. However, this can be far from ideal. As described in chapter 1, we may want log events to go to different tools, depending on the type of log event. It may be desirable to send a log event to multiple locations or none. In this chapter, we will, therefore, examine the different ways we can route events. In addition, we will look at some smaller features that can contribute to solving the challenges of routing, such as adding information into the log event to ensure the origin of the log event is not lost along the way.

Routing often aligns with how work is split among individuals or teams. As we will see, the use of inclusions supports how multiple teams can each work on their part of a Fluentd configuration without interrupting others and injecting specific configuration values. For example, we have seen the security team needing to apply routing and filtering of log events to their tool (and exclude events they’re not interested in). In contrast, the Ops team needs the log events in a different tool. With the routing and inclusion features, we can quickly achieve this.

The one aspect of routing we will not address in this chapter is the idea of forwarding log events to other Fluentd nodes, as that is best addressed when we look at scaling later in the book.

5.1 Reaching multiple outputs by copying

One way to get log events to all the correct output(s) is to ensure that all outputs receive the event, and each output includes one or more filters to stop unwanted content from being output. We’ll focus on copying in this section and will address filtering later, as before we filter things, we need to get the log events to the right place.

As described in chapter 2, log events are, by default, consumed by the first appropriate match directive, containing the output plugin. To allow a log event to reach more than one output plugin within a match directive, we need to use the copy plugin (@copy).

Each destination is held within a store declaration defined with XML style tags <store> and </store> within the match directive. While store may not always seem intuitive as a plugin name (many outputs are for solutions we wouldn’t associate with storage, like Grafana), it is worth remembering that more of the Fluentd plugins address the retrieval and storage of log events than anything else. The diagram in figure 5.1 illustrates how the directive and plugins relate to each other both logically and in the way the configuration file is written.

Figure 5.1 Visualization of the hierarchy of elements for a match directive using @copy and Store. Reading from left to right, we see the blocks of configuration with increasing detail and focus (i.e., Buffer or Formatter for a specific plugin type). The store configuration block can occur one or more times within the copy plugin.

Within each store configuration block, we can configure the use of a plugin. Typically, this is going to be an output plugin but could easily be a filter plugin. The store plugin’s attributes can be configured just as they would if used directly within a match directive, as we have done previously. This includes using helper plugins, such as buffers.

To illustrate this, we’re going to take a file input, and rather than send the log events from one file to another file, as we did in chapter 3, we will extend the configuration to send the output to both a file and a stdout (console). We can see a representation of this in figure 5.2.

Figure 5.2 Visualization of a configuration file using store and copy to send log events to multiple destinations

To implement this, we need to edit the match directive. The easiest way to do this is to first wrap the existing output plugin attributes within the store tags and then add the next store start and end tags. With the store start and end tags in place, each of the output plugins can be configured. Finally, introduce the @copy at the start of the match directive. The modified configuration is shown in the following listing, which contains the two store blocks, each holding an output plugin (file and stdout). You’ll also see a third store block with the output plugin type of null, followed by an @include directive. We will explain these shortly.

Listing 5.1 Chapter5/Fluentd/file-source-multi-out.conf—copy to multiple outputs

<match *>
  @type copy                
    <store>
      @type null
    </store>
    <store>                 
 
      @type stdout
    </store>
    <store>                 
      @type file
      @id bufferedFileOut
      tag bufferedFileOut
      path ./Chapter5/fluentd-file-output
      <buffer>
        delayed_commit_timeout 10
        flush_at_shutdown true
        chunk_limit_records 500
        flush_interval 30
        flush_mode interval
      </buffer>
      <format>
        @type out_file
        delimiter comma
        output_tag true
      </format>
    </store>
    @include additionalStore.conf
</match>

Declaring the plugin to be used

Start of the store block—each store reflects the action to take. This is often done to store a log event using a plugin or forward to another Fluentd node. In this case, we’re simply writing to the console.

Third store routes to a file

Let’s see the result of the configuration. As this uses a file source, we need to run the LogSimulator as well. So, to run the example, the following commands are needed:

  • fluentd -c ./Chapter5/Fluentd/file-source-multi-out.conf

  • groovy LogSimulator.groovy ./Chapter5/SimulatorConfig/log-source-1.properties

After running these commands, log events will appear very quickly on the console. Once the buffer reaches the point of writing, files will appear with the name fluentd-file-output.<date>_<number>.log. It is worth comparing the content in the file to the console, as we have included additional attributes into the payload.

5.1.1 Copy by reference or by value

In most, perhaps even all programming languages, there is the idea of shallow and deep copying, sometimes called copy by reference (illustrated in figure 5.3) and copy value (illustrated by figure 5.4). Whichever terminology you are used to, copy by reference means that the copy of the log event is achieved by each copy referring to the same piece of memory holding the log event. If the log event is modified, then that change impacts all subsequent uses for all copies. Copying by value means grabbing a new piece of memory and making a wholesale copy of the content. This means if one copy is modified, the other will not be because it is an outright clone. While we have not yet seen a reason to do anything other than use the default behavior, in the next chapter, we’ll see that it is possible to manipulate the contents of a log event.

Figure 5.3 How objects reside in memory when copied by reference

As shown in figure 5.3, when Object B is created as a shallow copy of Object A, then they both refer to the same memory holding the inner object (Object 1). So if we change Object 1 when updating through Object B, we will impact Object A as well.

Figure 5.4 How objects reside in memory when copied by value

Within the copy configuration, we can control this behavior with the copy_mode attribute. Copy mode has several settings that range in behavior from a copy by reference to a copy by value:

  • no_copy—The default state, and effectively copy by reference.

  • Shallow—This deep-copies the first layer of values. If those objects, in turn, reference objects, they are still referencing the same memory as the original. Under the hood, this uses Ruby’s dup method. While faster than a deep copy, the use of dup needs to be used with care; it is comparable to no_copy of nested objects.

  • Deep—This is a proper copy by value, leveraging Ruby’s msgpack gem. If in doubt, this is the approach we recommend.

  • Marshal—When Ruby’s msgpack cannot be used, then native language object marshaling can be used. The object is marshaled (serialized) into a byte stream representation. Then the byte stream is unmarshaled (deserialized), and an object representing the byte stream is produced.

How copy operations work

The following will help you better understand how copy behaviors work:

Ideally, we shouldn’t need to worry about copying by value, as log events are received in a well-structured manner with all the necessary state information, so content manipulation becomes unnecessary. Sadly, the world is not ideal; when using the copy feature, consider whether the default option is appropriate; for example, do we need to manipulate the log event for one destination and not another? The use of labels to create “pipelines” of log event processing will increase the possibility of needing to consider how we copy as well, as we will see later in this chapter.

Another consideration to be aware of is that when copying log events for different stores, if a log event can carry sensitive data, we may wish to redact or mask values for most cases, but not for the log events sent to the security department. If security does not wish to be impacted by any data masking or redaction, they will need a deep copy.

5.1.2 Handling errors when copying

In the example configuration we provided, both outputs are to the same local hardware, and it would need a set of unique circumstances that impacts one file and not the other. However, suppose the output is being sent to a remote service, such as a database or Elasticsearch. In that case, the chance of an issue impacting one output and not another is significantly higher. For example, if one of the destination services has been shut down or network issues prevent communication, what happens to our outputs? Does Fluentd send the log events to just the available stores, or to none of them unless they are all available?

Fluentd does not try to apply XA transactions (also known as two-phase commit), allowing an all-or-nothing behavior because the coordination of such transactions is resource-intensive, and coordination takes time. However, by default, it does apply the next best thing; in the event of one output failing, subsequent outputs will be abandoned. For example, if we copy to three stores called Store A, Store B, and Store C, which are defined in the configuration in that order, and we fail to send a log event to Store A, then none of the stores will get the event (see the first part of figure 5.5). If the problem occurred with Store B, then Store A would keep the log event, but Store C would be abandoned (see the second part of figure 5.5.).

Figure 5.5 How a store error impacts a containing copy. The bar across the bottom of each diagram indicates which store would get the data value, which store failed, and which store was not actioned. For example, in the middle example, if Store B failed, then Store A will have got the log event, Store B wouldn’t have the event, and Store C would not be communicated with.

But if you have a buffer as part of the output configuration, this may mask an issue, as the buffer may operate asynchronously and include options such as fallback and retry. As a result, an error, such as giving up retrying, may not impact the copy process, as described. Given this approach, there is the option to sequence the copy blocks to reflect the priority of the output.

The downside is that if you use asynchronous buffering with retries, the buffer will allow the execution to continue to the next store. But if it subsequently hits the maximum retries, it will fail that store, but subsequent store actions may have been successful.

How priority/order is applied should be a function of the value of the log event and the output capability. For example, the use of the output plugin allows a secondary helper plugin such as secondary_file. If the log events are so critical that they cannot be lost, it is best to prioritize the local I/O options first. If the log event priority is to get it to a remote central service quickly (e.g., Kafka or Splunk) and is failing, then that means the event is of little further help elsewhere (e.g., Prometheus for contributing to metrics calculations); therefore, it’s best to lead off with the highest priority destination.

Fluentd does offer another option to tailor this behavior. Within the <store> declaration, it is possible to add the argument ignore_error (e.g., <store ignore_ error>). Then, if output in that store block does cause an error, it is prevented from cascading the error that would trigger subsequent store blocks from being abandoned. Using our example three stores again, setting ignore_error on Store A would mean that regardless of sending the event to Store A, we would continue to try with Store B. But if Store B failed, then Store C would not receive the event.

5.2 Configuration reuse and extension through inclusion

As Fluentd configurations develop, mature, and potentially introduce multiple processing routes for log events, our Fluentd configuration files grow in size and complexity. Along with this growth, we’re also likely to discover that some configurations could be reused (e.g., different match definitions want to reuse the same filter or formatter), particularly when trying to achieve the ideal of DRY (Don’t Repeat Yourself). So in this section, let’s explore how to address the challenges of larger configuration files and maximize reuse.

We could try to solve this by ensuring that the Fluentd configuration actions appear in the correct order. The use of tags to filter events may work. This can get rather messy as an approach, and small changes could disrupt the flow of log events in unexpected ways.

The alternative is trying to massage configuration files to allow bits to be reused in different contexts. The first step is to isolate the Fluentd configuration that needs to be reused into its own file and then use the @include directive with a file name and path for wherever that configuration was needed. With the include statement, the referenced file is merged into the parent configuration file. This means we can reuse configurations and incorporate inclusions so that the Fluentd configuration doesn’t need to be manipulated, so the sequencing of directives is not a problem.

During Fluentd’s startup, the configuration file is parsed and has the inclusion directive replaced with a copy of the included file’s contents. This way, we can include the same configuration file wherever it is needed. For example, if we have an enterprise-wide setup for Elasticsearch, then all the different Fluentd configurations can reference a single file for using the enterprise Elasticsearch, and changes, for instance, optimizing connection settings, can then be applied to one file. Everyone inherits the change when the configuration file is deployed.

An inclusion does not have to contain a complete configuration; it can easily contain a single attribute. An excellent example of this is where you want to reuse some common Ruby (e.g., retrieving some security credentials) logic into the Fluentd configuration, as we’ll discuss later. Equally, an inclusion file may be used to inject a block of configuration, such as a store block or even an entire additional configuration file that could also be used independently. In listing 5.2, we have added several inclusions by introducing @include additionalStore.conf after the last store tag defines additional store configurations from a separate file. This means we could define a common destination for all our log events and repeat the configuration across this and other configuration files to log all events in a common place, and then allow the configuration to focus on the destinations.

Listing 5.2 Chapter5/Fluentd/file-source-multi-out2.conf—illustration of inclusion

<match *>
  @type copy  
    <store>
      @type null
    </store>
    <store>  
      @type stdout  
    </store>
    <store>  
      @type file
      @id bufferedFileOut
      tag bufferedFileOut
      path ./Chapter5/fluentd-file-output
      <buffer>
        delayed_commit_timeout 10
        flush_at_shutdown true
        chunk_limit_records 500
        flush_interval 30
        flush_mode interval
      </buffer>
      <format>
        @type out_file
        delimiter comma
        output_tag true
      </format>
    </store>
  @include additionalStore.conf    
 
</match>
 
@include record-origin.conf        

Incorporates the external file into the configuration providing an additional store declaration

Brings a complete configuration set that could be separately run if desired or could be reused

We have also added an inclusion directive referencing the file record-origin .conf. This illustrates the possibility that when multiple teams contribute functionality into a single run-time environment (e.g., a J2EE server), rather than all the teams trying to maintain a single configuration file and handling change collisions, each team has its own configuration file. But come execution time, a single configuration file uses inclusions to bring everything together. As a result, the Fluentd node needs to merge all the configurations together during startup. Within the record-origin .conf (if you review the content of record-origin.conf), we have introduced some new plugins, which we will cover later in the chapter.

Let’s see the result of the configuration. As this uses a file source, we need to run the LogSimulator as well. So, to run the example, the following commands are needed:

  • fluentd -c ./Chapter5/Fluentd/file-source-multi-out2.conf

  • groovy LogSimulator.groovy ./Chapter5/SimulatorConfig/log-source-1.properties

NOTE It is important to remember that the content of an inclusion can have an impact on the configuration, which has the include declaration. So the placement and use of inclusions must be done with care, as the finalized order of directives and their associated plugins is still applicable, as highlighted in chapter 2.

If the path to the included file is relative in the include statement, then the point of reference is the file’s location with the include directive. The include directive can use a comma-separated list, in which case the list order relates to the insertion sequence—for example, @include file.conf, file2.conf means file.conf is included before file2.conf. If the include directive uses wildcards (e.g., @include *.conf), then the order of insertion is alphabetical.

Figure 5.6 shows the dry-run output and highlights where include declarations have been replaced with the included configuration or configuration fragment contents.

Figure 5.6 Configuration file with the include resolved (highlighted in the box) as Fluentd starts up

NOTE As the process is a purely textual substitution, it does mean that the inclusion can easily be an empty placeholder file or a configuration fragment. If the inclusion is injected in the wrong place within a file, it can invalidate the entire configuration.

5.2.1 Place holding with null output

In listing 5.2, the additional inclusion fragment (@include additionalStore .conf) provided the configuration fragment shown in listing 5.3. This store definition uses the null output plugin; it simply discards the log events it receives.

Placing null plugins when working in an environment where different teams may wish to output log events to different tools allows developers to build a service to put the placeholder in the Fluentd configuration ready for the other team(s) to replace. In many respects, the use of null is the nearest thing to adding a TODO code comment.

NOTE TODO is a common tag used in code to flag when something still needs to be done.

Listing 5.3 Chapter5/Fluentd/additionalStore.conf—include configuration fragment

<store>
  @type null
  @id inclusion
</store>

5.2.2 Putting inclusions with a MongoDB output into action

Let’s apply some of the insights to this scenario. Knowing where best to apply effort is best driven by analytical insights. Directing error events into a database makes it easy to get statistics over time showing what errors occur and how frequently. When combined with an appreciation of the impact of an error, the effort can be targeted with maximum value.

We need to apply this to Chapter5/Fluentd/file-source-multi-out.conf. To help with this, the work from chapter 4, where we used Fluentd with a MongoDB plugin, can be leveraged. We can capitalize on it to see the impact of copy errors and the use of the ignore_error option. To do this, create a copy of the Chapter5/Fluentd/file-source-multi-out.conf that can be safely modified. For simplicity, let’s call this copy Chapter5/Fluentd/file-source-multi-out-exercise .conf. We need to replace the @type null with the configuration for MongoDB output. The commands you will need to run the scenario are

  • fluentd -c ./Chapter5/Fluentd/file-source-multi-out-exercise.conf

  • groovy LogSimulator.groovy ./Chapter5/SimulatorConfig/log-source-1.properties

With the changes applied, we should be able to complete the following steps:

  1. Check the configuration using the dry-run capability. This should yield a valid result.

  2. Confirm that the modified configuration produces the desired result by starting MongoDB and rerunning the LogSimulator and Fluentd.

  3. Verify the behavior is as expected if we cannot connect to MongoDB, and repeat the same actions for running the LogSimulator and Fluentd.

  4. The previous step should have highlighted the absence of the ignore_error option. Modify the Fluentd configuration adding the ignore_error option to the console output configuration. Rerun the configuration and LogSimulator. Confirm that the desired behavior is now correct.

Answers

  1. The modified Fluentd configuration file should now look like Chapter5/ExerciseResults/file-source-multi-out-Answer1.conf and yield a successful dry run.

  2. With MongoDB running, the database should continue to fill with events that reflect the content sent to the file, and the console will still display content.

  3. With MongoDB stopped, the output plugin will start realizing errors, as there is no configuration to ensure the issue does not cascade to impact other plugins. None of the output streams will produce log events. This is because of the default position that subsequent output plugins should not be executed once an error occurs.

  4. With the ignore_error added to the configuration, the configuration should now resemble Chapter5/ExerciseResults/file-source-multi-out-Answer2.conf. With the MongoDB still stopped, the MongoDB output will fail, but the failure will not stop output to the console but will inhibit output to the file.

5.3 Injecting context into log events

Providing more information and context can help us work with log events. To do this, we may need to manipulate the predefined log event attributes and capture additional Fluentd values. This section looks at this in more detail.

By injecting this information into the log event as identifiable log event attributes, we can then reference the values explicitly when trying to exclude directives with a filter, which will prevent log events from being processed any further in a sequence of events. For example, suppose log events associated with a specific host are deemed unnecessary to be forwarded by comparing the attribute set with the hostname. In that case, we can apply a filter with an exclude directive to stop the information from going anywhere.

The inject operation can only be used with match and filter directives, which is unfortunate, as we might want to apply it at the source. That said, it is not a significant challenge to overcome if desired, as we will see shortly. Using our example configuration Chapter5/Fluentd/record-origin.conf, we can see the injection at work in listing 5.4.

When configuring the injection of time data, it is possible to configure different representations of the time. This is covered by the time_type attribute, which accepts values for

  • String—Allows a textual representation to be used and defers to the time_ format attribute for the representation. The time_format uses the standard notation, as described in appendix A.

  • Float—Seconds and nanoseconds from the epoch (e.g., 1510544836 .154709804).

  • Unixtime—This is the traditional seconds from epoch representation.

In listing 5.4, we have gone for the most readable format of the string. In addition to describing the time data format, it is possible to specify the time as localtime or as UTC time by including the attributes localtime and utc, which take Boolean values. Trying to set both attributes could be the source of a lot of problems.

Listing 5.4 Chapter5/Fluentd/record-origin.conf—Inject declaration

<match **>
  <inject>                    
    hostname_key hostName     
 
    worker_id_key workerId    
 
    tag_key tag               
    time_key fluentdTime      
    time_type string          
 
    localtime true        
  </inject>
  @type stdout
</match>

The inject declaration within the generic match

Adds the name of the host of Fluentd and calls the value the name provided (e.g., hostName)

Adds the worker_id and calls it by the name provided. This helps when Fluentd has support processes to share the work across.

Puts the tag into the record output and uses the name provided

Provides the name for the time to be included with

Defines how the date-time should be represented. Here we are saying to provide a textual representation, but as we’ve omitted a value for time_format to define the format, use the standard format.

The properties for the inject configuration relate to the mapping of known values like hostname, tag, and so on, to attributes in the log event record.

To see this configuration in action, we have used the monitor_agent and stdout, so all we need to do is run Fluentd with the command fluentd -c ./Chapter5/fluentd/record-origin.conf. The outcome will appear in the console, something like

2020-05-18 17:42:41.021702900 +0100 self: {"plugin_id":"object:34e82cc", 
    "plugin_category":"output","type":"stdout","output_plugin":true,"retry_count":0,
    "emit_records":4,"emit_count":3,"write_count":0,"rollback_count":0,"slow_flush_count":0,
    "flush_time_count":0,"hostName":"Cohen","workerId":0,"tag":"self",
    "fluentdTime":"2020-05-18T17:42:41+01:00"}

Within this output, you will see that the injected values appear at the end of the JSON structure using the names defined by the attributes; for example, "hostName": "Cohen", where Cohen is the PC used to write this book.

5.3.1 Extraction of values

If we can inject certain values into the log event’s record, then it seems obvious that there should be a counter capability for extracting values from the record to set the tag and timestamp of the log event. This ability can be exploited by plugins that work with source, filter, and match directives. This gives us a helpful means to set tags dynamically based on the log event record content. Dynamically setting tags makes tag-based routing very flexible. For example, if the log event had an attribute called source, and we wanted to use that as a means to perform routing, we could use the extract operation. For example:

<inject>
  tag_key nameOfLogRecordAttribute
</inject>

Unfortunately, only a subset of the plugins available takes advantage of the extract helper. One of the core plugins that does incorporate this exec, which we have not covered yet. So as we explore tag-based routing in the next section, we’ll use exec, and we will explore the interesting opportunities it offers.

5.4 Tag-based routing

In all the chapters so far, we have always had wildcards in the match declarations (e.g., <match *>), but we have had the opportunity to define and change the tag values at different stages. We have seen the tag being manipulated in contexts ranging from taking the tag value from the URI to setting the tag within the configuration and even extracting the tag from the log event record, as just discussed. We can use the tags to control which directives are actioned, which is the subject of this section.

We can control which directives will process and consume log events by defining the match values more explicitly. For example, a configuration for two inputs called AppA and AppB includes the tag attribute setting the respective tags to be AppA and AppB. Now, rather than match *, we set the directives to be <match AppA> and <match AppB>. With this change, the match directives will only process log events from the associated source.

In our example, to keep the sources simple, we have configured two occurrences of the dummy source plugin to generate log events. We have added additional attributes to control the behavior to repeat at different frequencies (with the rate attribute representing the number of seconds between each log event generated) and different messages (dummy attribute).

In the following listing, we show the key elements of the configuration (we have removed some configuration elements for clarity; this can be seen with the use of an ellipsis [. . .]).

Listing 5.5 Chapter5/Fluentd/monitor-file-out-tag-match.conf—tag matching

<source>                             
 
  @type dummy
  dummy {"hello from":"App A"}
  auto_increment_key AppACounter
  tag AppA
  rate 5
</source>
 
<source>                             
 
  @type dummy
  dummy {"Goodbye from":"App B"}
  auto_increment_key AppBIncrement
  tag AppB
  rate 3
</source>
 
<match AppB>                         
 
    @type file                       
 
    path ./Chapter5/AppB-file-output
    @id AppBOut
    <buffer> . . . </buffer>
    <format> . . . </format>  
</match>
 
<match AppA>                         
    @type file
    path ./Chapter5/AppA-file-output
    @id AppAOut
    <buffer> . . . </buffer>
    <format> . . . </format>   
</match>

The first of two source definitions in this configuration file, but note that the port numbers are different, along with several other configuration attributes, so the sources are easy to distinguish.

The second self_monitor source configuration. Most crucially, note the tag name differences between the sources.

The first of two match declarations. Note how we can use wildcard characters so partial name matching can be defined.

File output configuration mapping to different output files for each match (compare the path attributes)

The second match, this time without any wildcarding

This setup can be run with the command

fluentd -c ./Chapter5/Fluentd/ monitor-file-out-tag-match.conf

The output files should reflect the different dummy messages, as the routing will have directed from the relevant source.

Despite the naming, it is still possible to use selective wildcarding with the tags. If we extend this example by adding an additional source and tagging it AppAPart2, we could catch AppA and AppAPart2. This is done by modifying the <match AppA> to become <match AppA*>. The log events captured from the new source would be incorporated into the AppA output.

This is illustrated in listing 5.6. If we do not want to reintroduce wildcard use, we can also utilize a comma-separated tag list in the match declaration; for example, <match AppA, AppAPart2>. To illustrate the wildcard behavior, this time we have introduced another source plugin called exec. The exec plugin allows us to call OS scripts and capture the result. We are simply using the more command (as it behaves the same way for Linux and Windows) within the exec statement.

Listing 5.6 Chapter5/Fluentd/monitor-file-out-tag-match2.conf—tag matching

<source>                                 
  @type dummy
  dummy {"hello from":"App A"}
  auto_increment_key AppACounter
  tag AppA
  rate 5
</source>
 
<source>
  @type exec                             
  command more .TestDatavaluePair.txt
  run_interval 7s
  tag AppAPart2
</source>
 
<source>                                 
  @type dummy
  dummy {"Goodbye from":"App B"}
  auto_increment_key AppBIncrement
  tag AppB
  rate 3
</source>
<match AppB> . . . </match>              
 
<match AppA*>                            
 
    @type file
    path ./Chapter5/AppA-file-output
    @id AppAOut
    <buffer> . . . </buffer>
    <format> . . . </format>   
</match>

Original source, which remains unaltered

The additional source, using the exec source plugin

The original AppB source, which remains unchanged

The match for AppB remains unmodified.

The original match for AppA has now been modified to include the wildcard, which means both AppA and AppAPart2 will be matched. This could described also be expressed as <match AppA, AppAPart2>.

This setup can be run with the command

fluentd -c ./Chapter5/Fluentd/ monitor-file-out-tag-match2.conf

The output files should reflect the different dummy messages, but the AppA output should now include the outcome of executing the OS command on a predefined test data file.

Tag naming convention

Despite using wildcard characters to help select tags for different directives regardless of the position, there is a convention normally applied. Tag naming typically follows a namespace-like hierarchy using the dot to break the hierarchy tiers (e.g., AppA.ComponentB.SubComponentC). Now the wildcard can filter the different namespaces (e.g., AppA.* or AppA.ComponentB.*). For example, if we had a web server hosting a domain with several different services, with each service potentially having one or more log outputs, we might see a convention of webserver.service .outputName in the tag convention.

5.4.1 Using exec output plugin

The exec plugin illustrated in listing 5.6 creates some interesting opportunities. When plugins cannot help us get the information required, we have several options:

  • Build a custom plugin (which will be explored later in the book).

  • Create an independent utility that can feed data to Fluentd directly via HTTP, UDP, forward plugins.

  • Produce a small script that can be invoked by the exec plugin.

Using the exec plugin makes it easy to retrieve environment-specific information or perform things like grabbing web page output using utilities like Wget and cURL—a modern version of screen scraping. The latter is particularly interesting, as it is possible to extract information from web interfaces or web endpoints—for example, if a third party provided a microservice (which therefore has to be treated as a black box)—and could still be effectively monitored. If the third party has followed the best practice of providing a /health endpoint (see http://mng.bz/5KQz for more information), we could run a script to extract the necessary values from the response to a Wget or cURL call to /health.

The exec plugin does need to be used with some care. Each exec process is executed in its own thread so that it does not adversely impact the consumption of other logging events whenever triggered. However, if the process is too slow, then we could experience the following:

  • The exec plugin will likely be triggered again before the last one has completed, which risks creating out-of-sequence events (due to how resources get shared across threads).

  • Thread death could occur because there are too many threads demanding too many resources (this kind of issue could come about if the buffer ends up with too many threads).

  • Events start being backed up, as logic will wait for threads to complete to allocate to another exec.

The takeaway is to think about what exec is doing; if it is slow or computationally demanding, then it’s probably unwise to run it within Fluentd. We could consider independently running the exec process that writes the results to a file, and log management should be relatively lightweight compared to the core business process.

5.4.2 Putting tag naming conventions into action

A decision has been made by the team that the logging configuration should reflect a naming convention of the domain.service.source. The current configuration does not reflect the domain being called Demo, and the services are called AppA and AppB, with AppA having two components of Part1 and Part2. You have been asked to update the configuration file monitor-file-out-tag-match2.conf to align with this convention. Change the match directive for AppA so that only Part1 is captured in the AppA file. Note the additional input, as the exec source is not yet needed in the output.

Answer

The outcome should result in a modified configuration that should look something like Chapter5/ExerciseResults/monitor-file-out-tag-match-Answer.conf. Note how the match condition has changed.

5.4.3 Putting dynamic tagging with extract into action

In section 5.3.1, we saw an explanation of how tags can be set dynamically. We should improve and rerun monitor-file-out-tag-match2.conf so that the exec sources set the tags based on the retrieved file value.

Answer

We should end up with a configuration that looks something like Chapter5/ExerciseResults/monitor-file-out-tag-match-Answer2.conf. Note that when we run this, the contents of the log events using the exec source will no longer reach the output because we’ve changed the tag, so it fails the match clause.

5.5 Tag plugins

There are plugins available to further help with routing using tags; let’s look at some certified plugins outside the core Fluentd (table 5.1).

When plugins are described as “certified,” it means they come from recognized and trusted contributors to the Fluentd community. As these plugins are not part of the core Fluentd, it does mean that to use these plugins, you will need to install them, just as we did for MongoDB in chapter 4.

Table 5.1 Additional tag-based routing plugins that can help with routing

Plugin name and link

Description

rewrite-tag-filter

https://github.com/fluent/fluent-plugin-rewrite-tag-filter

With one or more rules in the match directive, the log event has a regular expression applied to it by the plugin. Then, depending on the result, the tag is changed to a specified value. The rule can be set such that you can choose whether the rewrite is applied to a true or false outcome from the regex. The log event is re-emitted to continue beyond the match event using the new tag if a successful outcome is achieved.

route

https://github.com/tagomoris/fluent-plugin-route

The route plugin allows tags to direct the log events to one or more operations, such as manipulating the log event and copying it to intercept it by another directive.

rewrite

https://github.com/kentaro/fluent-plugin-rewrite

This enables tags to be modified using one or more rules, such as if an attribute of the log event record matches a regular expression. As a result, performing specific tasks based on the log event becomes very easy.

5.6 Labels: Taking tags to a new level

As we will see in this section, the label directive uses the basic idea of routing with tags and takes it to a whole new level. Ideally, we should be able to group a set of directives together clearly and distinctly for a particular group of log events, but this can become challenging. Labels allow us to overcome that. They have two aspects: first, an additional attribute using @label can be linked to a log event, in much the same way that tags are linked (although, unlike a tag, a label is not part of the log event data structure). Second, labels offer a directive (<label labelName> . . . </label>) that we use to group other directives (e.g., match and filter) that are executed in sequence. In effect, we are defining a pipeline of actions. To differentiate the two for the rest of the book, we will talk about labels as attributions to log events and directives as linking one or more directives together as a pipeline or a label pipeline.

There is one constraint for labels when compared to tags. It is possible to create a comma-separated list of tags (e.g., <match basicFile,basicFILE2>), but labels can have only a single label associated with that pipeline (e.g., <label myLabel>). You will find that trying to match multiple labels in the same way will result in an error—for example, 'find_label': common label not found (ArgumentError). This comes about as Fluentd does check that each label declaration can be executed during startup.

NOTE Unlike tags, the naming convention is usually more functional in meaning.

5.6.1 Using a stdout filter to see what is happening

To help illustrate the point, we will introduce a special filter configuration. The important thing about filters with stdout, unlike match directives, is that even if the event satisfies the filter rule, it is emitted by the plugin to be consumed by whatever follows. This setup for a filter is a bit like a developer’s println for helping to see what is happening during code development. We will look more closely at filters in the next chapter, but for now, let’s see how the stdout plugin behaves in a filter.

The stdout plugin effectively accepts all events; thus, the following filter will let everything pass through and send the details to the console:

<filter *>
    @type stdout
</filter>

This configuration is typically referred to as filter_stdout. Using this as an additional step will help us illustrate the label pipeline behavior. This is another handy way of peeking at what is happening within a Fluentd configuration.

5.6.2 Illustrating label and tag routing

To illustrate a label-based pipeline, we have created a configuration that tails two separate files (from two different log simulators). The configuration of the simulator output results in two differing message structures (although both are derived from the same source data). To observe the differences, compare basic-file.txt and basic-file2.txt once the simulators are running.

The configuration will illustrate the use of a label being applied to one source and not another. Then, within the label “pipeline,” one source (file) will be subject to both the stdout filter (as explained in section 5.6.1) and a file output that is separate from the output of the other file. This is illustrated in the following listing. As with other larger configurations, we have replaced sections with ellipses, so relevant aspects of the configuration are easier to read.

Listing 5.7 Chapter5/Fluentd/file-source-file-out-label-pipeline.conf label pipeline

<source>                                       
  @type tail
  path ./Chapter5/basic-file.txt
  read_lines_limit 5
  tag basicFile
  pos_file ./Chapter5/basic-file-read.pos_file
  read_from_head true
  <parse> @type none </parse>
  @label labelPipeline                     
 
</source>
 
<source>                                   
 
  @type tail
  path ./Chapter5/basic-file2.txt
  read_lines_limit 5
  tag basicFILE2
  pos_file ./Chapter5/basic-file-read2.pos_file
  read_from_head true
  <parse>  @type json </parse>
</source>
#### end - tail basic-file2
 
<label labelPipeline>                      
 
  <filter *>                               
 
    @type stdout
  </filter>
 
  <match *>                                
      @type file
      path ./Chapter5/label-pipeline-file-output
      @id otherSelfOut
      <buffer> . . . </buffer>
      <format> . . . </format>
  </match>
 
  <match *>                                
 
    @type stdout
  </match>
</label>                                   
 
<match basicFILE2>                         
 
    @type file
    path ./Chapter5/alt-file-output
    @id basicFILE2Out
    <buffer> . . . </buffer>
    <format> . . . </format>   
</match>

Our source attaches a label to the events it creates; in this case, labelPipeline. This will mean the step operation performed on these events will be in the <label labelPipeline> block.

This source is unlabeled. As a result, its log events will be intercepted by the next match blog that can consume the tag basicFILE2.

At the start of the label block, any log events with a label that match will pass through this sequence of operations, assuming the processing allows the event to output from the plugin.

Use the stdout filter to push the log events to stdout and output to the next plugin.

Use the match to direct content to a file.

We will never see any result from this stdout filter, as the preceding match will have consumed the log event. To send log events to both stdout and the file would require the use of the copy.

Defines the end of the label series of events

Outside of a label, the match will be applied to all no label events.

To run this configuration, we need to run the commands

  • fluentd -c Chapter5/Fluentd/file-source-file-out-label-pipeline.conf

  • groovy LogSimulator.groovy Chapter5/SimulatorConfig/basic-log -file.properties

  • groovy LogSimulator.groovy Chapter5/SimulatorConfig/basic-log -file2.properties

When running this setup, the log events can be seen in basic-file.txt and on the console. Additionally, there will be two more files, as the log content is output to label-pipeline-file-output.*_*.log and alt-file-output.*_*.log (wild- cards represent the date and file increment number). Neither file should have tags mixing.

While the match expression defined continues to use a wildcard within the label pipeline, it is possible to still apply the tag controls on the directives within the pipeline. If you edit the configuration setting to align the match clause with <match basicFILE2>, you will see the logs displayed on the console but not in the file.

5.6.3 Connecting pipelines

As configurations become more sophisticated, you will likely need to create pipelines and link them together. This can be done using the relabel plugin. Relabel does what it says; it changes the label associated with the log event. As relabel is an output plugin, the log event can change the label and emit the log event rather than consume it. For example, you might have a label with several directives that can manipulate a log event into a human-friendly representation and send it to a social platform such as Slack. But before you use your label to do that, you may wish to take the log events through a labeled pipeline of filters that exclude all log events representing business-as-usual events.

As our Fluentd configuration structures become more complex with pipelines, it helps to visualize what is happening, as shown in figure 5.7. As you can see, we have now made the match that feeds the alt-file-output a labeled pipeline called common. To illustrate the use of relabel, the match in our original labelPipeline (as we saw in listing 5.7) has been modified. We have introduced a copy plugin to ensure that the log event goes to both the output and relabel (highlighting the store declaration can be done for more than just storage plugins). When we run this configuration, alt-file-output files will now contain both sources.

Figure 5.7 Label routing example, with two label pipelines connected (labelPipeline and common)

We can run the configuration using the commands

  • fluentd -c Chapter5/Fluentd/file-source-multi-out-label-pipelines.conf

  • groovy LogSimulator.groovy Chapter5/SimulatorConfig/basic-log -file.properties

  • groovy LogSimulator.groovy Chapter5/SimulatorConfig/basic-log -file2.properties

The following listing shows the configuration with the application of relabel. Note the use of the ellipsis again, so you can focus on the key elements.

Listing 5.8 Chapter5/Fluentd/file-source-multi-out-label-pipelines.conf use of relabel

<source>
  @type tail
  path ./Chapter5/basic-file.txt
  read_lines_limit 5
  tag basicFile
  pos_file ./Chapter5/basic-file-read.pos_file
  read_from_head true
  <parse> . . . </parse>
  @label labelPipeline                          
 
</source>
 
<source>
  @type tail
  path ./Chapter5/basic-file2.txt
  read_lines_limit 5
  tag basicFILE2
  pos_file ./Chapter5/basic-file-read2.pos_file
  read_from_head true
  <parse> . . . </parse>
  @label common                               
 
</source>
 
<label labelPipeline>
    <filter *>
    @type stdout
  </filter>
 
  <match *>
    @type copy                                
 
    <store>
      @type file                              
      path ./Chapter5/label-pipeline-file-output
      <buffer> . . . </buffer>
      <format> . . . </format>
    </store>
    <store>                                   
 
      @type relabel                           
 
      @label common                           
 
    </store>
  </match>
 
</label>
 
<label common>                                
 
  <match *>
      @type file
      path ./Chapter5/alt-file-output
      <buffer> . . . </buffer>
      <format> . . . </format>   
  </match>
</label>

This links this source’s log events to a label, as we did in the previous example.

Sets the second source to use the common label rather than trust directives catching log events

Using the copy plugin within the match, we can cause the log event to be consumed by more than one output plugin.

The log event will get pushed to a file output plugin first.

As the copy directive is being used, we can force the log event to be processed by further operations.

Using the relabel plugin to change the log event's label until this operation is 'labelPipeline'

The label is now set to be common; as we leave this match and label block, the event will be consumed by the label directive called common.

The label common starts, which will now receive log events from both sources. One source’s event comes directly, and another passes through the labelPipeline before reaching the common pipeline.

5.6.4 Label sequencing

Unlike tags, the relative positioning of label directives does not matter, taking our current configuration shown in figure 5.7 as an example. While labelPipeline will trigger the use of the common label using relabel, the common label could be declared before the labelPipeline. The steps within a label pipeline are still sequential in execution. You can see and try this with the provided configuration file Chapter5/Fluentd/file-source-multi-out-label-pipelines2.conf. In the configuration, you can see

  1. A relabel declaration that used to be for common, but has been changed to use the label outOfSequence. With it, we have moved the filter out to the new label section.

  2. The outOfSequence label pipeline then redirects to the common, as we had previously and as illustrated in figure 5.8. The configuration file order actually reflects the appearance in the figure when reading the diagram left to right (ignoring the flow shown in the diagram).

Figure 5.8 This configuration illustrates that labels are not order-sensitive.

The scenario can be executed using the commands

  • fluentd -c Chapter5/Fluentd/file-source-multi-out-label-pipelines2.conf

  • groovy LogSimulator.groovy Chapter5/SimulatorConfig/basic-log-file-small.properties

  • groovy LogSimulator.groovy Chapter5/SimulatorConfig/basic-log-file2-small.properties

The simulator properties have been configured with a small data set to make it easier to confirm, and we do not get any accidental looping.

NOTE While we have illustrated that labeling sections out of order is possible, we would not necessarily advocate it as good practice. As labels could be compared to goto statements in some respects, it is preferable to try to structure the configuration files to be as linear as practical.

5.6.5 Special labels

Fluentd’s label feature includes a predefined @Error label. This means we can use the label directive to define a pipeline to process errors; those errors can be raised within our configuration or within a plugin executing the configuration. This does rely on the plugin implementation to use a specific API (emit_error_event). We can be confident that core Fluentd plugin implementations will use this API, but it might be worth checking to see if a third-party plugin uses the feature rather than simply writing to stdout. We will see this later in the book when we look at building our own plugin.

We could therefore build upon our existing Fluentd configuration steps to capture these errors. With this, we could do things like relabel the log event so that it gets picked up by a common pipeline or simply direct it to its own destination. In the next example, we’ve added a new label pipeline to our previous configuration that writes to its own file, as illustrated in the following listing.

Listing 5.9 Fluentd/file-source-multi-out-label-pipelines-error.conf using @Error

<source> . . . </source>
 
<source> . . . </source>
 
<label labelPipeline> . . . </label>
 
<label common> . . . </label>
 
<label @Error>            
 
  <match *>               
 
      @type file
      path ./Chapter5/error-file-output
      <buffer> . . . </buffer>
      <format>
        @type out_file
        delimiter comma
        output_tag true
      </format>   
  </match>
</label>

The start of the pipeline using the predefined @Error label. As a result, any plugin that sets this label will have its error(s) handled in this pipeline.

Matches all tags to write errors to file once log events are labeled with Error. But we could get clever and handle errors associated with different tags differently. For example, specific tags that experience an error that needs more urgent attention could be sent to a notification or collaboration tool.

Tip It is always useful to create a simple generic pipeline for handling errors. The pipeline raises some sort of problem ticket, a social notification (e.g., Slack, Teams), or simply an email to notify the event within its own Fluentd configuration. Then use include as standard practice in all of your configurations. So, if you don’t have any specific error-handling configuration, then the generic answer will kick in for you.

5.6.6 Putting a common pipeline into action

You have been asked to refactor the configuration referenced in listing 5.8 to have a single pipeline. This will allow the service to be incorporated into the main configuration through inclusion. This change will allow all of the different log event routes being developed by other teams to occur more safely. To help those teams, you should create an additional pipeline with a null output as a template.

Answer

The result actually involves three files. The core is Chapter5/ExerciseResults/file-source-multi-out-label-pipelines-Answer.conf. This uses @include to bring in the configurations Chapter5/ExerciseResults/label-pipeline-Answer.conf, which contains the refactored logic, and a template containing the null output defined by Chapter5/ExerciseResults/label-pipeline-template-Answer.conf.

Summary

  • Fluentd provides a null output plugin that can be used as a placeholder for output plugins. The plugin will simply delete the received log events.

  • Fluentd provides an exec plugin that allows us to incorporate the triggering of external processes, such as scripts, into the configuration. The external process can be invoked with arguments from the log event.

  • Fluentd provides several mechanisms to route log events through a configuration. The simplest of these is using the tag associated with each log event.

  • For more sophisticated routing where we want a “pipeline” of actions, Fluentd provides a label construct. The use of labels can also help us simplify the complexity in a configuration file when it comes to ordering configuration values.

  • Fluentd provides naming conventions by using the conventions we can group and namespace tags. We can use wildcards in the configuration to group tags together.

  • A Fluentd configuration can be assembled from multiple files through the use of the @include directive.

  • If Fluentd experiences an error (e.g., can’t connect to an instance of Elasticsearch), we can catch the error and perform actions using the custom @Error label.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.22.216