4 Using Fluentd to output log events

This chapter covers

  • Using output plugins for files, MongoDB, and Slack
  • Applying different buffering options with Fluentd
  • Reviewing the benefits of buffering
  • Handling buffer overloads and other risks of buffering
  • Adding formatters to structure log events

Chapter 3 demonstrated how log events can be captured and how helper plugins such as parsers come into play. But capturing data is only of value if we can do something meaningful with it, such as delivery to an endpoint formatted so the log events can be used—for example, storing the events in a log analytics engine or sending a message to an operations (Ops) team to investigate. This chapter is about showing how Fluentd enables us to do that. We look at how Fluentd output plugins can be used from files, as well as how Fluentd works with MongoDB and collaboration/social tools for rapid notifications with Slack.

This chapter will continue to use the LogSimulator, and we will also use a couple of other tools, such as MongoDB and Slack. As before, complete configurations are available in the download pack from Manning or via the GitHub repository, allowing us to focus on the configuration of the relevant plugin(s). Installation steps for MongoDB and Slack are covered in appendix A.

4.1 File output plugin

Compared to the tail (file input) plugin, we are less likely to use the file output plugin, as typically we will want to output to a tool that allows us to query, analyze, and visualize the events. There will, of course, be genuine cases where file output is needed for production. However, it is one of the best options as a stepping-stone to something more advanced, as it is easy to see outcomes and the impact of various plugins, such as parsers and filters. Logging important events to file also lends itself to easily archiving the log events for future reference if necessary (e.g., audit log events to support legal requirements). To that end, we will look at the file output before moving on to more sophisticated outputs.

With the file output (and, by extension, any output that involves directly or indirectly writing physical storage), we need to consider several factors:

  • Where can we write to in the file system, as dictated by storage capacity and permissions?

  • Does that location have enough capacity (both allocated and physical capacity)?

  • How much I/O throughput can the physical hardware deliver?

  • Is there latency on data access (NAS and SAN devices are accessed through networks)?

While infrastructure performance isn’t likely to impact development work, it is extremely important in preproduction (e.g., performance testing environments) and production environments. It is worth noting that device performance is essential for the file plugin. Other output plugins are likely to be using services that will include logic to optimize I/O (e.g., database caching, optimization of allocated file space). With output plugins, we have likely consolidated multiple sources of log events. Therefore, we could end up with a configuration that has Fluentd writing all the inputs to one file or location. The physical performance considerations can be mitigated using buffers (as we will soon see) and caching.

4.1.1 Basic file output

Let’s start with a relatively basic configuration for Fluentd. In all the previous chapters’ examples, we have just seen the content written to the console. Now, rather than a console, we should simply push everything to a file. To do this, we need a new match directive in the configuration, but we’ll carry on using the file source for log events.

To illustrate that an output plugin can handle multiple inputs within the configuration, we have included the self-monitoring source configuration illustrated in the previous chapter, in addition to a log file source. To control the frequency of log events generated by the Fluentd’s self_monitor, we can define another attribute called emit_interval, which takes a duration value—for example, 10s (10 seconds). The value provided by emit_interval is the time between log events being generated by Fluentd. Self-monitoring can include details like how many events have been processed, how many worker processes are managed, and so on.

At a minimum, the file output plugin simply requires the type attribute to be defined and a path pointing to a location for the output using the path attribute. In the following listing, we can see the relevant parts of our Chapter4/Fluentd/rotating-file-read-file-out.conf file. The outcome of this configuration may surprise you, but let’s see what happens.

Listing 4.1 Chapter4/Fluentd/rotating-file-read-file-out.conf—match extract

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 24220
  @id in_monitor_agent
  include_config true 
  tag self
  emit_interval 10s
</source>
 
<match *>
    @type file                             
    path ./Chapter4/fluentd-file-output    
</match>

Changes the plugin type to file

The location of the file to be written

The outcome of using this new match directive can be seen if the LogSimulator and Fluentd are run with the following commands:

  • fluentd -c ./Chapter4/Fluentd/rotating-file-read-file-out.conf

  • groovy LogSimulator.groovy ./Chapter4/SimulatorConfig/jul-log-output2.properties./TestData/medium-source.txt

The easy presumption would be that all the content gets written to a file called fluentd-file-output. However, what has happened is that a folder is created using the last part of the path as its name (i.e., fluentd-file-output), and you will see two files in that folder. The file will appear with a semi-random name (to differentiate the different buffer files) and a metadata file with the same base name. What Fluentd has done is to implicitly make use of a buffering mechanism. The adoption of a buffer with a default option is not unusual in output plugins; some do forgo the use of buffering—for example, the stdout plugin.

4.1.2 Basics of buffering

Buffering, as you may recall from chapter 1, is a Fluentd helper plugin. Output plugins need to be aware of the impact they can have on I/O performance. As a result, most output plugins use a buffer plugin that can behave synchronously or asynchronously.

The synchronous approach means that as log events are collected into chunks, as soon as a chunk is full, it is written to storage. The asynchronous approach utilizes an additional queue stage. The queue stage interacting with the output channel is executed in a separate thread, so the filling of chunks shouldn’t be impacted by any I/O performance factors.

The previous example had not explicitly defined a buffer; we saw the output plugin applying a default to using a file buffer. This makes more sense when you realize that the file output plugin supports the ability to compress the output file using gzip, increasing effectiveness the more content you compress at once.

In figure 4.1, we have numbered the steps. As the arrows’ varying paths indicate, steps in the life cycle can be bypassed. All log events start in step 1, but if no buffering is in use, the process immediately moves to step 5, where the physical I/O operation occurs, and then we progress to step 6. If there is an error, step 6 can send the logic back to the preceding step to try again. This is very much dependent upon the plugin implementation but is common to plugins such as database plugins. If the retries fail or the plugin doesn’t support the concept, some plugins support the idea of a secondary plugin to be specified.

Figure 4.1 Log event passing through the output life cycle (e.g., the italic steps are only used when things go wrong)

A secondary plugin is another output plugin that can be called (step 7). Typically, a secondary plugin would be as simple as possible to minimize the chance of a problem so that the log events could be recovered later. For example, suppose the output plugin called a remote service from the Fluentd node (e.g., on a different network, in a separate server cluster, or even a data center). In that case, the secondary plugin could be a simple file output to a local storage device.

NOTE We would always recommend that a secondary output be implemented with the fewest dependencies on software and infrastructure. Needing to fall to a secondary plugin strongly suggests broader potential problems. So the more straightforward, less dependent it is on other factors, the more likely that the output won’t be disrupted. Files support this approach very well.

If buffering has been configured, then steps 1 through 3 would be performed. But then the following action would depend upon whether the buffering was asynchronous. If it was synchronous, then the process would jump to step 5, and we would follow the same steps described. For asynchronous buffering, the chunk of logs goes into a separate process managing a queue of chunks to be written. Step 4 represents the buffer operating asynchronously. As the chunks fill, they are put into a queue structure, waiting for the output mechanism to take each chunk to output the content. This means that the following log event to be processed is not held up by the I/O activity of steps 5 onward.

Understanding gzip compression

Gzip is the GNU implementation of a ZLIB compression format, defined by IETF RFC’s 1950, 1951, and 6713. Zip files use an algorithm known as Lempel-Ziv coding (LZ77) to compress the contents. In simple terms, the algorithm works by looking for reoccurring patterns of characters; when a reoccurrence is found, that string is replaced with a reference to the previous occurrence. So the larger the string occurrences identified as reoccurring, the more effective reference becomes, giving more compression—the bigger the file, the more likely to find reoccurrences.

Out of the box, Fluentd provides the following buffer types:

  • Memory

  • File

As you may have realized, the path is used as the folder location to hold its buffered content and contains both the content and a metadata file. Using file I/O for the buffer doesn’t give much of a performance boost in terms of the storage device unless you establish a RAM disk (aka, RAM drive). A file-based buffer still provides some benefits; the way the file is used is optimized (keeping files open, etc.). It also acts as a staging area to accumulate content before applying compression (as noted earlier, the more data involved in a zip compression, the greater the compression possible). In addition, the logged content won’t be lost as a result of some form of process or hardware failure, and the buffer can be picked back up when the server and/or Fluentd restart.

NOTE RAM drives work by allocating a chunk of memory for storage and then telling the OS’s file system that it is an additional storage device. Applications that use this storage believe they are writing to a physical device like a disk, but the content is actually written to memory. More information can be found at www.techopedia.com/definition/2801/ram-disk.

With buffers in many plugins being involved by default, or explicitly, we should look at how to start configuring buffer behaviors. We know when the events move from the buffer to the output destination, how frequently such actions occur, and how these configurations impact performance. The life cycle illustrated in figure 4.1 provides clues as to the configuration possibilities.

4.1.3 Chunks and Controlling Buffering

As figure 4.1 shows, the buffer’s core construct is the idea of a chunk. The way we configure chunks, aside from synchronous and asynchronous, will influence the performance. Chunks can be controlled through the allocation of storage space (allowing for a reserved piece of contiguous memory or disk to be used) or through a period. For example, all events during a period go into a single chunk, or a chunk will continue to fill with events until a specific number of log events or the chunk reaches a certain size. Separation of the I/O from the chunk filling is beneficial if there is a need to provide connection retries for the I/O, as might be the case with shared services or network remote services, such as databases.

In both approaches, it is possible through the configuration to set attributes so that log events don’t end up lingering in the buffer because a threshold is never fully met. Which approach to adopt will be influenced by the behavior of your log event source and the tradeoff of performance against resources available (e.g., memory), as well as the amount of acceptable latency in the log events moving downstream.

Personally, I tend to use size constraints, which provides a predictable system behavior; this may reflect my Java background and preference not to start unduly tuning the virtual machine.

Table 4.1 shows the majority of the possible controls on a buffer. Where the control can have some subtlety in how it can behave, we’ve included more elaboration.

Table 4.1 Buffer configuration controls

Attribute

Description

timekey

This is the number of seconds each chunk will be responsible for holding (by default, 1 day). The time attribute of a log_event then determines which chunk to add the event to. For example, if our timekey was set to 300 (seconds) and the chunk started on the hour, then when an event timestamped 10:00:01 arrived and further events arrived every 30 seconds, an additional 9 more events would be held in the first chunk. The next chunk would hold events that started arriving after 10:05:00, so the next event would be 10:05:01.

timekey

If we had additional out-of-order events arriving before 10:05 (e.g., with timestamps 10:03:15 and 10:03:55), but they didn’t arrive until 10:04:31, then they would still be added to the first chunk.

This behavior can be further modified by the timekey_wait attribute.

timekey_wait

This is the number of seconds after the end of a chunk’s storage period before the chunk is written. This defaults to 60 seconds.

Extending our timekey example, if this value was set to 60s (60 seconds), then that chunk would be held in memory until 10:06 before being flushed. If another event was received at 10:05:21 with a timestamp of 10:04:49, this would go in our first chunk, not the chunk covering the received time.

chunk_limit_size

This defines the maximum size of a chunk, which defaults to 8 MB for memory and 256 MB for a file buffer. The chances of increasing this threshold are small, but you may consider reducing it to limit the maximum footprint of a container or constraints of an IoT device. Remember that you can operate multiple chunks.

chunk_limit_records

This defines the maximum number of log events in a single chunk. If the size of log events can fluctuate wildly in size, this will need to be considered. A number of large logs could create a very large chunk, creating risks around memory exhaustion and varying durations for writing chunks.

total_limit_size

This is the limit of storage allowed for all chunks before the new events received will be dropped with error events lost accordingly.

This defaults to 512 MB for memory and 64 GB for file.

chunk_full_

threshold

Once the percentage of the buffer’s capacity exceeds this value, the chunk is treated as full and moved to the I/O stage. This defaults to 0.95. If log events are very large relative to the allocated memory, you may consider lowering this threshold to ensure more predictable performance, particularly if you limit the queue size.

queued_chunks_limit_size

This defines the number of chunks in the queue waiting to be persisted as required. Ideally, this should never be larger than the flush_thread_count.

The default value is 1.

compress

This accepts only the values of either text (default) or gzip. When gzip is set, then compression will be applied. If other compression mechanisms are introduced, the options available will expand.

flush_at_shutdown

This tells the buffer whether it should write everything to the output before allowing Fluentd to shut down gracefully. For the memory buffer, this defaults to true but is false for a file, as the contents can be recovered on startup. We recommend setting it to true in most cases, given that you may not know when Fluentd will restart and process the cached events (if it can).

flush_interval

This is a duration defining how frequently the buffered content should be written to the output storage mechanism. This means we can configure behavior centered on either volumes or time intervals. This defaults to 60 seconds (60s).

flush_mode

Accepted values are

  • default—Uses lazy if chunk keys are defined, otherwise interval

  • lazy—Flush/write chunks once per timekey.

  • interval—Flush/write chunks per specified time via flush_interval.

  • immediate—Flush/write chunks immediately after events are appended into chunks.

flush_thread_count

The number of threads to be used to write chunks. A number above 1 will create parallel threads—this may not be desirable depending on the output type. For example, if a connection pool to a database can handle multiple connections, more than 1 is worth considering. But more than 1 on a file could create contention or write collisions. This defaults to 1.

flush_thread_interval

The length of time the flush thread should sleep before looking to see if a flush is required. Expressed as a number of seconds in floating-point format and defaults to 1.

delayed_commit_timeout

When using the asynchronous I/O, we need to set a maximum time to allow the thread to run before we assume it must be experiencing an error. If this time is exceeded (default 60s), then the thread is stopped. This needs to be tuned to take into account how responsive the target system is. For example, writing large chunks to a remote database will take longer than writing small chunks to a local file system.

overflow_action

If the input into the buffer is faster than we can write content out of the buffer, we will experience an overflow condition. This configuration allows us to define how to address that problem. Options are

  • throw_exception—Throw an exception that will appear in the Fluentd log as a BufferOverflowError; this is the default.

  • block—Block input processing to allow events to be written.

  • interval—Flush/write chunks per specified time via flush_interval.

  • drop_oldest_chunk—Drop the oldest chunk of data to free a chunk up for use.

  • Throwing exceptions may be okay when an overflow scenario is never expected, and the potential loss of log events is a risk worth taking. But in more critical areas, we would suggest consciously choosing an alternative, such as interval.

We can amend the configuration with the understanding of buffer behavior (or use the prepared one). As we recommend flushing on shutdown, we should set this to true (flush_at_shutdown true). As we want to quickly see the impact of the changes, let’s set the maximum number of records to 10 (chunk_limit_records 10) and the maximum time before flushing a chunk to 30 seconds (flush_interval 30). Otherwise, if we have between 1 and 9 log events in the buffer, they’ll never get flushed if the sources stop creating log events. Finally, we have added an extra layer of protection in our configuration by imposing a time-out of the buffer write process. We can see this in the following listing.

Listing 4.2 Chapter4/Fluentd/rotating-file-read-file-out2.conf—match extract

<match *>
    @type file                             
    path ./Chapter4/fluentd-file-output    
    <buffer>
     flush_at_shutdown true       
 
      delayed_commit_timeout 10        
     chunk_limit_records 10       
 
     flush_interval 30            
 
    </buffer>
</match>

By default, the file buffer doesn’t flush on shutdown, as events won’t be lost by stopping the Fluentd instance. However, it is desirable to see all events completed at shutdown. There is the risk that a configuration change will mean the file buffer isn’t picked up on restart, resulting in log events effectively being in limbo.

As we understand our log content and want to see things happen very quickly, we will use several logs rather than the capacity to control the chunk size, which indirectly influences how soon events are moved from the buffer to the output destination.

As the buildup of self-monitoring events will be a lot slower than our file source, forcing a flush on time as well will ensure we can see these events come through to the output at a reasonable frequency.

To run this scenario, let’s reset (delete the structured-rolling-log.* and rotating-file-read.pos_file files and the fluentd-file-output folder) and run again, using each of these commands in a separate shell:

  • fluentd -c Chapter4/Fluentd/rotating-file-read-file-out2.conf

  • groovy LogSimulator.groovy Chapter4/SimulatorConfig/jul-log-file2.properties ./TestData/medium-source.txt

Do not shut down Fluentd once the log simulator has completed. We will see that the folder fluentd-file-output is still created with both buffer files as before. But at the same time, we will see files with the naming of fluentd-file-output.<date>_<incrementing number>.log (e.g., fluentd-file-output.20200505_12.log). Open any one of these files, and you will see 10 lines of log data. You will notice that the log data is formatted as a date timestamp, tag name, and then the payload body, reflecting the standard composition of a log event. If you scan through the files, you will find incidents where the tag is not simpleFile but self. This reflects that we have kept the source reporting on the self-monitoring and matches with our simpleFile, which is tracking the rotating log files.

Finally, close down Fluentd gracefully. In Windows, the easiest way to do this is in the shell: press CTRL-c once (and only once), and respond yes to the shutdown prompt (in Linux, the interrupt events can be used). Once we can see that Fluentd has shut down in the console, look for the last log file and examine it. There is an element of timing involved, but if you inspect the last file, chances are it will have less than 10 records in it, as the buffer will have flushed to the output file whatever log events it had at shutdown.

Buffer sizing error

If you set a buffer to be smaller than a single log event, then handling that log event will fail, with an error like

emit transaction failed: error_class=Fluent::Plugin::Buffer
::BufferChunkOverflowError error="a 250bytes record is larger than
buffer chunk limit size" location="C:/Ruby26-x64/lib/ruby/gems/2.6.0/
gems/fluentd-1.9.3-x64-mingw32/lib/fluent/plugin/buffer.rb:711:in `block
in write_step_by_step'"

4.1.4 Retry and backoff

The use of buffers also allows Fluentd to provide a retry mechanism. In the event of issues like transient network drops, we can tell the buffer when it recognizes an issue to perform a retry rather than just losing the log events. For retries to work without creating new problems, we need to define controls that tell the buffer how long or how many times to retry before abandoning data. In addition to this, we can define how long to wait before retrying. We can stipulate retrying forever (attribute retry_forever is set to true), but we recommend using such an option with great care.

There are two ways for retry to work using the retry_type attribute—through either a fixed interval (periodic) or through an exponential backoff (exponential_backoff). The exponential backoff is the default model, and each retry attempt that fails results in the retry delay doubling. For example, if the retry interval was 1 second, the second retry would be 2 seconds, the third retry 4 seconds, and so on. We can control the initial or repeated wait period between retries by defining retry_wait with a numeric value representing seconds (e.g., 1 for 1 second and 60 for 1 minute).

Unless we want to retry forever, we need to provide a means to determine whether or not to keep retrying. For the periodic retry model, we can control this by number or time. This is done by either setting a maximum period to retry writing each chunk (retry_timeout) or a maximum number of retry attempts (retry_max_attempts).

For the backoff approach, we can stipulate a number of backoffs (retry_exponential_backoff_base) or the maximum duration that a backoff can go before stopping (retry_max_interval).

Suppose we wanted to configure the buffer retry to be an exponential backoff starting at 3 seconds. With a maximum of 10 attempts, we could end up with a peak retry interval of nearly 26 minutes. The configuration attributes we would need to configure are

retry_exponential_backoff_base 3
retry_max_times 10

The important thing with the exponential backoff is to ensure you’re aware of the possible total time. Once that exponential curve gets going, the time extends very quickly. In this example, the first 5 retries would happen inside a minute, but intervals really start to stretch out after that.

4.1.5 Putting configuring buffering size settings into action

You’ve been asked to help the team better understand buffering. There is an agreement that an existing configuration should be altered to help this. Copy the configuration file /Chapter4/Fluentd/rotating-file-read-file-out2.conf and modify it so that the configuration of the buffer chunks is based on the size of 500 bytes (see appendix B for how to express storage sizes).

Run the modified configuration to show the impact on the output files.

As part of the discussion, one of the questions that came up is if the output plugin is subject to intermittent network problems, what options do we have to prevent the loss of any log information?

Answer

The following listing shows the buffer configuration that would be included in the result.

Listing 4.3 Chapter4/ExerciseResults/rotating-file-read-file-out2-Answer.conf

<match *>
    @type file
    @id bufferedFileOut
    path ./Chapter4/fluentd-file-output
    <buffer>                             
 
      delayed_commit_timeout 10
      flush_at_shutdown true
      chunk_limit_size 500               
      flush_interval 30
    </buffer>
</match>

Note that we’ve retained the delay and flush time, so if the buffer stops filling, it will get forced out anyway.

Size-based constraint rather than time-based

A complete configuration file is provided at Chapter4/ExerciseResults/rotating -file-read-file-out2-Answer.conf. The rate of change to the log files is likely to appear different, but the same content will be there. To address the question of mitigating the risk of log loss, several options could be applied:

  • Configure the retry and backoff parameters on the buffer to retry storing the events rather than losing the information.

  • Use the capability of defining a secondary log mechanism, such as a local file, so the events are not lost. Provide a means for the logs to be injected into the Kafka stream at a later date. This could even be an additional source.

4.2 Output formatting options

How we structure the output of the log events is as important as how we apply structure to the input. Not surprisingly, a formatter plugin can be included in the output plugin. With a formatter plugin, it’s reasonable to expect several prebuilt formatters. Let’s look at the out-of-the-box formatters typically encountered; the complete set of formatters are detailed in appendix C.

4.2.1 out_file

This is probably the simplest formatter available and has already been implicitly used. The formatter works using a delimiter between the values time, tag, and record. By default, the delimiter is a tab character. This can only be changed to a comma or space character using the values comma or space for the attribute delimiter; for example, delimiter comma. Which fields are output can also be controlled with Boolean-based attributes:

  • output_tag—This takes a true or false value to determine whether the tag(s) are included in the line (by default, the second value in the line).

  • output_time—This takes true or false to define whether the time is included (by default, the first value in the line). You may wish to omit the time if you have it incorporated into the core event record.

  • time_format—This can be used to define the way the date and time are used in the path. If undefined and the timekey is set, then the timekey will inform the structure of the format. Specifically:

    • 0...60 seconds then '%Y%m%d%H%M%S'
    • 60...3600 seconds then '%Y%m%d%H%M'
    • 3600...86400 seconds then '%Y%m%d%H'

NOTE If the formatter does not recognize the attribute value set, it will ignore the value provided and use the default value.

4.2.2 json

This formatter treats the log event record as a JSON payload on a single line. The timestamp and tags associated with the event are discarded.

4.2.3 ltsv

As with the out_file formatter and the parser, the limiters can be changed using delimiter (each labeled value) and label_delimiter for separating the value and label. For example, if delimiter was set to delimiter ; and label_delimiter = was set, then if the record was expressed as {"my1stValue":" blah", "secondValue": "more blah", "thirdValue": "you guessed – blah"}, the output would become my1st Value=blah; secondValue= more blah; thirdValue=you guessed – blah.

As the values are label values, the need for separating records by lines is reduced, and therefore it is possible to switch off the use of new lines to separate each record with add_newline false (by default, the value is set to true).

4.2.4 csv

Just like ltsv and out_file the delimiter can be defined by setting the attribute delimiter. The attribute has a default, which is a comma. Additionally, the csv output allows us to define which values can be included in the output using the fields attribute. If I use the record again to illustrate if my event is {"my1stValue":" blah", "secondValue":" more blah", "thirdValue":"you guessed – blah}, and I set the fields attribute to be the secondValue, thirdValue fields, then the output would be "more blah", "you guessed – blah". If desired, the quoting of each value can be disabled with a Boolean value for force_quotes.

4.2.5 msgpack

As with the msgpack parser, the formatter works with the MessagePack framework, which takes the core log event record and uses the MessagePack library to compress the content. Typically, we would only expect to see this used with HTTP forward output plugins where the recipient expects MessagePack content. To get similar compression performance gains, we can use gzip for file and block storage.

NOTE You may have noticed that most formatters (out_file being an exception) omit adding the time and keys from the log event. As a result, if you want those to be carried through, it will be necessary to ensure that they are incorporated into the log event record or that the output plugin utilizes the values explicitly. Adding additional data into the log event payload can be done using the inject plugin, which can be used within a match or filter directive. We will pick up the inject feature when we address filtering in chapter 6.

4.2.6 Applying formatters

We can extend our existing configuration to move from the current implicit configuration to being explicit. Let’s start with the simplest output using the default out_ file formatter, using a comma as a delimiter (delimiter comma) and excluding the log event tag (output_tag false). We will continue to use the same sources as before to demonstrate the effect of formatters. This out_file formatter configuration is illustrated in the following listing.

Listing 4.4 Chapter4/Fluentd/rotating-file-read-file-out3.conf—formatter configuration

<match *>
    @type file
    @id bufferedFileOut
    path ./Chapter4/fluentd-file-output
    <buffer>
      delayed_commit_timeout 10
      flush_at_shutdown true
      chunk_limit_records 50        
 
      flush_interval 30
      flush_mode interval
    </buffer>
    <format>
      @type out_file                
 
      delimiter comma               
      output_tag false              
    </format>
</match>

To reduce the number of files being generated, we have made the number of records a lot larger per file.

We explicitly define the output formatter to be out_file, as we want to override the default formatter behavior.

Replace the tab delimiter with a comma.

Exclude the tag information from the output.

Assuming the existing log files and outputs have been removed, we can start the example using the following commands:

  • fluentd -c Chapter4/Fluentd/rotating-file-read-file-out3.conf

  • groovy LogSimulator.groovy Chapter4/SimulatorConfig/jul-log-file2.properties ./TestData/medium-source.txt

4.2.7 Putting JSON formatter configuration into action

Your organization has decided that as standard practice, all output should be done using JSON structures. This approach ensures that any existing or applied structural meaning to the log events is not lost. To support this goal, the configuration file /Chapter4/Fluentd/rotating-file-read-file-out3.conf will need modifying.

Answer

The format declaration in the configuration file should be reduced to look like the fragment in the following listing.

Listing 4.5 Chapter4/ExerciseResults/rotating-file-file-out3-Answer.conf

<format>
@type json
</format>

A complete example configuration to this answer can be found in /Chapter4/ ExerciseResults/rotating-file-read-file-out3-Answer.conf.

4.3 Sending log events to MongoDB

While outputting to some form of file-based storage is a simple and easy way of storing log events, it doesn’t lend itself well to performing any analytics or data processing. To make log analysis a practical possibility, Fluentd needs to converse with systems capable of performing analysis, such as SQL or NoSQL database engines, search tools such as Elasticsearch, and even SaaS services such as Splunk and Datadog.

Chapter 1 highlighted that Fluentd has no allegiance to any particular log analytics engine or vendor, differentiating Fluentd from many other tools. As a result, many vendors have found Fluentd to be an attractive solution to feed log events to their product or service. To make adoption very easy, vendors have developed their adaptors.

We have opted to use MongoDB to help illustrate the approach for getting events into a tool capable of analytical processing logs. While MongoDB is not dedicated to textual search like Elasticsearch, its capabilities are well suited to log events with a good JSON structure. MongoDB is very flexible and undemanding in getting started, so don’t worry if you’ve not used MongoDB. The guidance for installing MongoDB can be found in appendix A.

Summary of MongoDB

We do not want to get too sidetracked by the mechanics of MongoDB, but it’s worth summarizing some basic ideas. Most readers will be familiar with relational databases and their concepts. The role of the database structure in both MongoDB and a relational DB is analogous. Within a database schema is a set of tables. In MongoDB, the nearest to this is a collection. Unlike a relational database, a collection can contain pretty much anything, which is why it is sometimes described as using a document model. You could say that each entry in a collection is roughly comparable to a record in a table containing a DB-assigned ID and a BLOB (binary large object) or text data type. Commonly the row or document of MongoDB is a structured text object, usually JSON. MongoDB can then search and index parts of these structures, making for a flexible solution. For Fluentd, it means we can store log events that may have different record structures.

More recent versions of the MongoDB engine provide the means to validate the content structure going into a collection. This provides some predictability in the content. If this is used, then we can exploit Fluentd to structure the necessary payload.

In addition to controlling how strictly the contents must adhere to a structure for each document, Mongo also incorporates the idea of capped size. This feature allows us to set limits on the amount of storage used by the collection, and the collection operates as a FIFO (first in, first out) list.

When you want to empty a table in a relational database, it is often easier to simply drop and re-create the table. The MongoDB equivalent is to delete the collection; however, if this is the only collection within the database, the database will be removed by MongoDB. There are two options: only delete the collection’s contents and keep the collection, or create a second empty collection so the database is not empty.

You can discover a lot more with the book MongoDB in Action by Kyle Banker, et al. (www.manning.com/books/mongodb-in-action-second-edition).

4.3.1 Deploying MongoDB Fluentd plugin

The MongoDB input plugin is provided in the Treasure Data Agent build of Fluentd but not in the vanilla deployment. If we want Fluentd to work with MongoDB, we need to install the RubyGem if it isn’t already in place.

To determine whether the installation is needed and perform installations of gems, we can use a wrapper utility that leverages the RubyGems tool called fluent-gem. To see if the gem is already installed, run the command fluent-gem list from the command line. The command will show the locally installed gems, which contain Fluentd and its plugins. At this stage, there should be no indication of a fluent-plugin-mongo gem. We can, therefore, perform the installation using the command fluent -gem install fluent-plugin-mongo. This will retrieve and install the latest stable instance of the gem, including the documentation and dependencies, such as the MongoDB driver.

4.3.2 Configuring the Mongo output plugin for Fluentd

Within a match, we need to reference the mongo plugin and set the relevant attributes. Like connecting to any database, we will need to provide an address (which could be achieved via a host [name] and port [number on the network]) and user and password, or via a connection string (e.g., mongodb://127.0.0.1:27017/Fluentd). In our example configuration, we have adopted the former approach and avoided the issue of credentials. The database (schema) and collection (table comparable to a relational table) are needed to know where to put the log events.

Within a MongoDB output, there is no use of a formatter, as the MongoDB plugin assumes all content is already structured. As we have seen, without configuration, a default buffer will be adopted. As before, we’ll keep the buffer settings to support only low volumes so we can see things changing quickly. We can see the outcome in the following listing.

Listing 4.6 Chapter4/Fluentd/rotating-file-read-mongo-out.conf—match configuration

<match *>
    @type mongo                
    @id mongo-output   
    host localhost             
 
 
    port 27017                 
 
 
    database Fluentd           
 
 
    collection Fluentd         
 
    <buffer>                   
 
      delayed_commit_timeout 10
      flush_at_shutdown true
      chunk_limit_records 50
      flush_interval 30
      flush_mode interval
    </buffer>
</match>

Mongo is the plugin name.

Identifies the MongoDB host server. In our dev setup, that’s simply the local machine; this could alternatively be defined by using a connection attribute.

The port on the target server to communicate with. We can also express the URI by combining hostp, port, and database into a single string.

As a MongoDB installation can support multiple databases, we need to name the database.

The collection within the database we want to add log events to—a rough analogy to an SQL-based table

Buffer configuration to ensure the log events are quickly incorporated into the MongoDB

Mongo vs Mongo replica set plugins

If you’ve looked at the Fluentd online documentation, you may have observed two output plugins, out_mongo and out_mongo_replset. The key difference is that the replset (replica set) can support MongoDB’s approach to enable scaling, where additional instances of Mongo can be defined as replicas of a master. When this happens, the ideal model is to directly write activity to the master, but read from the replicas. In terms of configuration differences, a comma-separated list of nodes is needed rather than naming a single host. Each node represents a node in the replica group (e.g., nodes 192.168.0.10:27017, 192.168.0.20:27017, 192.168.0.30:27017). The replica set name is also needed (e.g., replica_set myFluentReps). More information on the Mongo replica set mechanism is documented at https://docs.mongodb.com/manual/replication/.

Ensure that the MongoDB instance is running (this can be done with the command mongod --version in a shell window or by attempting to connect to the server using the Compass UI). If the MongoDB server isn’t running, then it will need to be started. The simplest way to do that is to run the command mongod.

With Mongo running, we can then run our simulated logs and Fluentd configuration with the commands in each shell:

  • groovy LogSimulator.groovy Chapter4/SimulatorConfig/jul-log-file2-exercise.properties ./TestData/medium-source.txt

  • fluentd -c Chapter4/fluentd/rotating-file-read-mongo-out.conf

Mongo plugin startup warning

When Fluentd is started up with the MongoDB plugin, it will log the following warning: [mongo-output]. Since v0.8, invalid record detection will be removed because mongo driver v2.x and API spec don't provide it. You may lose invalid records, so you should not send such records to the Mongo plugin.

This essentially means that the MongoDB driver being used does not impose any structural checks on the payload (which the collection may be configured to require). As a result, if strong checking is being applied by the MongoDB engine, it may drop an update, but the information will not get passed back through the driver, so Fluentd will be none the wiser, resulting in data loss. For the more common application of Fluentd, it is better not to impose strict checks onto the payload. If this isn’t an option, then a filter directive could identify log events that will fail MongoDB’s checks.

Figure 4.2 MongoDB is viewed through the Compass UI tool, with logged content.

Viewing log events in MongoDB

To see how effective MongoDB can be for querying the JSON log events, if you add the expression {"msg" : {$regex : ".*software.*"}} into the FILTER field and click FIND, we’ll get some results. These results will show the log events where the msg contains the word software. The query expression tells MongoDB to look in the documents, and if they have a top-level element called msg, then evaluate this value using the regular expression.

If you examine the content in MongoDB, you will see that the content stored is only the core log event record, not the associated time or tags.

To empty the collection for further executions of the scenario in a command shell, run the following command:

mongo Fluentd –-eval "db.Fluentd.remove({})".

The mongo plugin has a couple of other tricks up its sleeve. When the attribute tag_mapped is included in the configuration, the tag name is used as the collection name, and MongoDB will create the collection if it does not exist. This makes it extremely easy to separate the log events into different collections. If the tag names have been used hierarchically, then the tag prefix can be removed to simplify the tag_ mapped feature. This can be defined by the remove_tag_prefix attribute, which takes the name of the prefix to remove.

As collections can be established dynamically, it is possible to define characteristics within the configuration; for example, whether the collection should be capped in size.

In this configuration, we have not formally defined any username or password. This is because in the configuration of MongoDB, we have not imposed the credentials restrictions. And incorporating credentials in the Fluentd configuration file is not the best practice. Techniques for safely handling credentials within Fluentd configuration, not just for MongoDB but also for other systems needing authentication, are addressed in chapter 7.

4.3.3 Putting MongoDB connection configuration strings into action

Revise the configuration to define the connection by a connection attribute, not the host, port, and database used. This is best started by copying the configuration file Chapter4/fluentd/rotating-file-read-mongo-out.conf. Adjust the run commands to use the new configuration. The commands should look something like this:

  • groovy LogSimulator.groovy Chapter4/SimulatorConfig/jul-log-file2-exercise.properties ./TestData/medium-source.txt

  • fluentd -c Chapter4/fluentd/my-rotating-file-read-mongo-out.conf

Answer

The configuration of the MongoDB connection should look like the configuration shown in the following listing.

Listing 4.7 Chapter4/ExerciseResults/rotating-file-read-mongo-out-Answer.conf

<match *>
    @type mongo
    @id mongo-output
    connection_string mongodb://localhost:27017/Fluentd     
 
    collection Fluentd
    <buffer>
      delayed_commit_timeout 10
      flush_at_shutdown true
      chunk_limit_records 50
      flush_interval 30
      flush_mode interval
    </buffer>
</match>

Note the absence of the host, port, and database properties in favor of this. You could have used your host’s specific IP or 127.0.0.1 instead of localhost.

The example configuration can be seen at Chapter4/ExerciseResults/rotating-file-read-mongo-out-Answer.conf

Before we move on from MongoDB, we have focused on the output use of MongoDB so it can be used to query log events. But MongoDB can also act as an input into Fluentd, allowing new records added to MongoDB to be retrieved as log events.

4.4 Actionable log events

In chapter 1, we introduced the idea of making log events actionable. So far, we have seen ways to unify the log events. To make log events actionable, we need a couple of elements:

  • The ability to send events to an external system that can trigger an action, such as a collaboration/notification platform that Ops people can see and react to log events or even invoke a script or tool to perform an action

  • Separating the log events that need to be made actionable from those that provide information but require no specific action

The process of separating or filtering out events that need to be made actionable is addressed later in the book. But here, we can look at how to make events actionable without waiting until all the logs have arrived in an analytics platform and running its analytics processes.

4.4.1 Actionable log events through service invocation

One way to make a log event actionable is to invoke a separate application that can perform the necessary remediation as a result of receiving an API call. This could be as clever as invoking an Ansible Tower REST API (http://mng.bz/Nxm1) to initiate a template job that performs some housekeeping (e.g., moving logs to archive storage or tell Kubernetes about the internal state of a pod). We would need to control how frequently an action is performed; plugins such as flow_counter and notifier can help. To invoke generic web services, we can use the HTTP output plugin, which is part of the core of Fluentd. To give a sense of the art of the possible, here is a summary of the general capabilities supported by this plugin:

  • Supports HTTP post and put operations

  • Allows proxy in the routing

  • Configures content type, setting the formatter automatically (a formatter can also be defined explicitly)

  • Defines headers so extra header values required can be defined (e.g., API keys)

  • Configures the connection to use SSL and TLS certificates, including defining the location of the certificates to be used, versions, ciphers to be used, etc.

  • Creates log errors when nonsuccessful HTTP code responses are received

  • Supports basic authentication (no support for OAuth at the time of writing)

  • Sets time-outs

  • Uses buffer plugins

4.4.2 Actionable through user interaction tools

While automating problem resolution in the manner described takes us toward self-healing systems, not many organizations are prepared for or necessarily want to be this advanced. They would rather trust a quick human intervention to determine cause and effect than rely on an automated diagnosis where a high degree of certainty can be difficult. People with sufficient knowledge and the appropriate information can quickly determine and resolve such issues. As a result, Fluentd has a rich set of plugins for social collaboration mechanisms. Here are just a few examples:

Obviously, we do need to include the relevant information in the social channel communications. A range of things can be done to help that, from good log events that are clear and can be linked to guidance for resolution, to Fluentd being configured to extract relevant information from log events to share.

4.5 Slack to demonstrate the social output

Slack has become a leading team messaging collaboration tool with a strong API layer and a free version that is simply limited by the size of conversation archives. As a cloud service, it is a great tool to illustrate the intersection of Fluentd and actionable log events through social platforms. While the following steps are specific to Slack, the principles involved here are the same for Microsoft Teams, Jabber, and many other collaboration services.

If you are already using Slack, it is tempting to use an existing group to run the example. To avoid irritating the other users of your Slack workspace as a result of them receiving notifications and messages as you fill channels up with test log events, we suggest setting up your own test workspace. If you aren’t using Slack, that’s not a problem; in appendix A, we explain how to get a Slack account and configure it to be ready for use. Ensure you keep a note of the API token during the Slack configuration, recognizable by the prefix xoxb.

Slack provides a rich set of configuration options for the interaction. Unlike MongoDB and many IaaS- and PaaS-level plugins, as Slack is a SaaS service, the resolution of the Slack instance is both simplified and hidden from us by using a single token (no need for server addresses, etc.). The username isn’t about credentials, but how to represent the bot that the Fluentd plugin behaves as; therefore, using a meaningful name is worthwhile. The channel relates to the Slack channel in which the messages sent will be displayed. The general channel exists by default, but you may wish to create a custom channel in Slack and restrict access to that channel if you want to control who sees the messages. After all, do you want everyone within an enterprise Slack setup to see every operational message?

The message and message_keys attributes work together with the message using %s to indicate where the values of the identified payload elements are inserted. The references in the message relate to the JSON payload elements listed in the message_keys in order sequence.

The title and title_keys work in a similar way to message and message_keys, but for the title displayed in the Slack UI with that message. In our case, we’re just going to use the tag. The final part is the flush attribute; this tells the plugin how quickly to push the Slack messages to the user. Multiple messages can be grouped up if the period is too long. To keep things moving quickly, let’s flush every second.

Edit the existing configuration provided (Chapter4/Fluentd/rotating-file -read-slack-out.conf) to incorporate the details captured, in the Slack setup. This is illustrated in the following listing.

Listing 4.8 Chapter4/Fluentd/rotating-file-read-slack-out.conf—match configuration

<match *>
   @type slack
   token xoxb-9999999999999-999999999999-XXXXXXXXXXXXXXXXXXXXXXXX    
 
   username UnifiedFluent                                            
   icon_emoji :ghost:                                                
   channel general                                                   
 
   message Tell me if you've heard this before - %s                  
 
   message_keys msg                                                  
 
   title %s                                                          
 
   title_keys tag                                                    
   flush_interval 1s                                                 
 
</match>

This is where the token obtained from Slack is put to correctly identify the workspace and legitimize the connection.

Username that will be shown in the Slack conversation

Defines an individual emoji (icon) to associate the bot

Which channel to place the messages into, based on the name. In our demo, we’re just using the default channel.

Message to be displayed, referencing values in the order provided in the configuration. The message attribute works in tandem with the message_keys attribute.

Names the log event’s JSON elements to be included in the log event. The named element is then taken by the message tag and inserted into the message text.

Title for the message takes the values from title_keys. Using order to map between the config values.

Definition of the message’s title

Defines the frequency at which Fluentd will get Slack to publish the log events

Before running the solution, we also need to install the Slack plugin with the command fluent-gem install fluent-plugin-slack. Once that is installed, we can start up the log simulator and Fluentd with the following commands:

  • fluentd -c Chapter4/Fluentd/rotating-file-read-slack-out.conf

  • groovy LogSimulator.groovy Chapter4/SimulatorConfig/social-logs.properties ./TestData/small-source.txt

Once this is started, if you open the #general channel in the web client or app, you will see messages from Fluentd flowing through.

All the details for the Slack plugin can be obtained from https://github.com/sowawa/fluent-plugin-slack. Our illustration of Slack use is relatively straightforward (figure 4.3). By using several plugins, we can quickly go from the source tag to routing the Slack messages to the most relevant individual or group directly. Alternatively, we have different channels for each application and can direct the messages to those channels.

Figure 4.3 Our Fluentd log events displayed in Slack

4.5.1 Handling tokens and credentials more carefully

For a long time, good security practice has told us we should not hardwire code and configuration files with credentials, as anyone can look at the file and get sensitive credentials. In fact, if you commit to GitHub, it will flag with the service provider any configuration file or code that contains a string that looks like a security token for a service GitHub knows about. When Slack is told and decides it is a valid token, then be assured that it will revoke the token.

So how do we address such a problem? There are a range of strategies, depending upon your circumstances; a couple of options include the following:

  • Set the sensitive credentials up in environment variables with a limited session scope. The environment variable can be configured in several ways, such as with tools like Chef and Puppet setting up the values from a keystore.

  • Embed a means to access a keystore or a secrets management solution, such as HashiCorp’s Vault, into the application or configuration.

The configuration files may not look like it is possible to secure credentials based on what we have seen so far in the book. We can achieve both of these approaches for securely managing credentials within a Fluentd configuration file, as Fluentd allows us to embed Ruby fragments into the configuration. This doesn’t mean we need to immediately learn Ruby. For the first of these approaches, we just need to understand a couple of basic patterns. The approach of embedding calls to Vault is more challenging but can be done.

   token “#{ENV[’slack-token']}”

4.5.2 Externalizing Slack configuration attributes in action

The challenge is to set up your environment so that you have an environment variable called SlackToken, which is set to hold the token you have previously obtained. Then customize Chapter4/Fluentd/rotating-file-read-slack-out.conf to use the environment variable, and rerun the example setup with the commands

  • fluentd -c Chapter4/Fluentd/rotating-file-read-slack-out.conf

  • groovy LogSimulator.groovy Chapter4/SimulatorConfig/social-logs.properties ./TestData/small-source.txt

Confirm that log events are arriving in Slack.

Answer

By setting up the environment variable, you’ll have created a command that looks like either

set slack-token= xoxb-9999999999999-999999999999-XXXXXXXXXXXXXXXXXXXXXXXX  

or for Windows or Linux

Export slack-token= xoxb-9999999999999-999999999999-XXXXXXXXXXXXXXXXXXXXXXXX  

The configuration will now have changed to look like the example in the following listing.

Listing 4.9 Chapter4/rotating-file-read-slack-out-Answer.conf—match configuration

<match *>
   @type slack
   token "{ENV["slack-token"]}"
   username UnifiedFluent  
   icon_emoji :ghost:  
   channel general  
   message Tell me if you've heard this before - %s
   message_keys msg  
   title %s 
   title_keys tag 
   flush_interval 1s 
</match>

4.6 The right tool for the right job

In chapter 1, we highlighted the issue of different people wanting different tools for a range of reasons, such as the following:

  • To perform log analytics as different tools and to have different strengths and weaknesses

  • To multicloud, so specialist teams (and cost considerations of network traffic) mean using different cloud vendor tools

  • To make decisions that influence individual preferences and politics (previous experience, etc.)

As we have illustrated, Fluentd can support many social platforms and protocols. Of course, this wouldn’t be the only place for log events to be placed. One of the core types of destination is a log analytics tool or platform. Fluentd has a large number of plugins to feed log analytics platforms; in addition to the two we previously mentioned, other major solutions that can be easily plugged in include

  • Azure Monitor

  • Graphite

  • Elasticsearch

  • CloudWatch

  • Google Stackdriver

  • Sumo Logic

  • Logz.io

  • Oracle Log Analytics

Then, of course, we can send logs to a variety of data storage solutions to hold for later use or perform data analytics with; for example:

  • Postgres, InfluxDB, MySQL, Couchbase, DynamoDB, Aerospike, SQL Server, Cassandra

  • Kafka, AWS Kinesis (time series store/event streaming)

  • Storage areas such as AWS S3, Google Cloud Storage, Google BigQuery, WebHDFS

So, the question becomes, what are my needs and which tool(s) fit best? If our requirements change over time, then we add or remove our targets as needed. Changing the technology will probably raise more challenging questions about what to do with our current log events, not how to get the data into the solution.

Summary

  • Fluentd has an extensive range of output plugins covering files, other Fluentd nodes, relational and document databases such as MongoDB, Elasticsearch, and so on.

  • Plugin support extends beyond analytics and storage solutions to collaboration and notification tools, such as Slack. This allows Fluentd to drive more rapid reactions to significant log events.

  • Fluentd provides some powerful helper plugins, including formatters and buffers, making log event output configuration very efficient and easy to use.

  • Log events can be made easy to consume by tools such as analytics and visualization tools. Fluentd provides the means to format log events using formatter plugins, such as out_file and json.

  • Buffer helper plugins can support varying life cycles depending on the need, from the simple synchronous cache to the fully asynchronous. With this, the buffer storage can be organized by size or number of log events.

  • Buffers can be configured to flush their contents not just on shutdown, but also on other conditions, such as new events being buffered for a while.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.51.241