DSL for generating reports from logfiles

In this recipe, we will give another DSL example for constructing a simple configuration language for the analysis of logfiles, and the generation of reports based on the content of such logfiles. The technique used in this recipe is similar to the one used in the recipe DSL for executing commands over SSH.

Getting ready

Let's consider having the following performance log data:

execution of getCustomerName took 244ms
execution of getCustomerName took 144ms
execution of getAccountNumber took 44ms
execution of getCustomerName took 244ms
execution of getCustomerName took 24ms
execution of getAccountNumber took 112ms
execution of getCustomerName took 200ms
execution of getCustomerName took 22ms
...

The goal is to calculate the average and total times spent on each method. Of course, we could have written a very simple script to reach the same result, but our purpose is to create a DSL that will allow parsing any arbitrary logfile format and extract both grouped and aggregated numeric information from it. A reasonable DSL may look like the following code snippet:

format '^execution of (\w+) took (\d+)ms$'
column 1, 'methodName'
column 2, 'duration'

source('PerformanceData2012') {
  localFile 'log1.log'
  localFile 'log2.log'
}
report('Duration') {
  avg 'duration'
  sum 'duration'
  groupBy 'methodName'
}

We will try to define the language exactly like this example.

The first expression defines a log line format; then we define a regular expression group mapping to column names, which are used later to refer to log data inside the report definition. The report definition contains a list of calculated values (average of duration and sum of duration) and a column to group report data by. Another important component of the DSL is the definition of the data source.

How to do it...

To define our internal DSL, we first need to define its building blocks, that is, the data structures that compose our mini language:

  1. The first step is to define the report data structure:
    class Report {
    
      def name
    
      def sumColumns = [] as Set
      def avgColumns = [] as Set
      def groupByColumns = [] as Set
    
      Report(String name) {
        this.name = name
      }
    
      void sum(String columnName) {
        sumColumns << columnName
      }
    
      void avg(String columnName) {
        avgColumns << columnName
      }
    
      void groupBy(String columnName) {
        groupByColumns << columnName
      }
    }
  2. We also define the data source structure:
    class Source {
    
      def name
      def files = [] as Set
    
      Source(String name) {
        this.name = name
      }
    
      void localFile(File file) {
        if (file) {
          files << file.absoluteFile.canonicalFile
        }
      }
    
      void localFile(String file) {
        localFile(new File(file))
      }
    }
  3. Then we compose a common configuration object, which hold sources, reports, format, and column mapping data:
    class Configuration {
    
      def format
    
      private final columnNames = [:]
      private final columnIndexes = [:]
      private final sources = [:]
      private final reports = [:]
    
      private static int sourceCounter = 0
      private static int reportCounter = 0
    
      void format(String format) {
        this.format = format
      }
    
      void column(int group, String name) {
        columnNames[group] = name
        columnIndexes[name] = group
      }
      void source(Closure cl) {
        def generatedName = "source${sourceCounter++}"
        source(generatedName, cl)
      }
    
      void source(String name, Closure cl) {
        Source source = new Source(name)
        cl.delegate = source
        cl.resolveStrategy = Closure.DELEGATE_FIRST
        cl()
        sources[name] = source
      }
    
      void report(Closure cl) {
        def generatedName = "report${reportCounter++}"
        report(generatedName, cl)
      }
    
      void report(String name, Closure cl) {
        Report report = new Report(name)
        cl.delegate = report
        cl.resolveStrategy = Closure.DELEGATE_FIRST
        cl()
        reports[name] = report
      }
    }
  4. The final step is to define the engine class that will glue together the configuration creation and actual report generation:
    class LogReportDslEngine {
    
      void process(Closure cl) {
    
        Configuration config = new Configuration()
        cl.delegate = config
        cl.resolveStrategy = Closure.DELEGATE_FIRST
        cl()
    
        config.sources.values().each { Source source ->
          config.reports.values().each { Report report ->
    
            // Collect report data.
            def reportData = [:]
            source.files.each { File sourceFile ->
              sourceFile.eachLine { String line ->
    
                // Match the data line.
                if (line =~ config.format) {
                  def fields = (line =~ config.format)[0]
    
                  // Map column names
                  def fieldMap = fields.collect {}
    
                  // Generate group key, for which
                  // to aggregate the data.
                  def group = report.groupByColumns
                      .collect {
                        fields[config.columnIndexes[it]]
                      }.join(', ')
    
                  // Create empty group record
                  // if it does not exist.
                  reportData[group] =
                    reportData[group] ?: emptyRecord
    
                  // Calculate report values for given key.
                  def g = reportData[group]
                  report.avgColumns.each { String column ->
                    def fieldIndex =
                      config.columnIndexes[column]
                    g['avg'][column] = g['avg'][column] ?: 0
                    g['avg'][column] +=
                      fields[fieldIndex].toDouble()
                  }
                  report.sumColumns.each { String column ->
                    def fieldIndex =
                      config.columnIndexes[column]
                    g['sum'][column] = g['sum'][column] ?: 0
                    g['sum'][column] +=
                      fields[fieldIndex].toDouble()
                  }
                  g['count'] += 1
    
                }
              }
            }
            // Produce report output.
            def reportName = "${source.name}_${report.name}"
            def reportFile = new File("${reportName}.report")
            reportFile.text = ''
            reportData.each { key, data ->
              reportFile <<
                "Report for $key
    "
              reportFile <<
                "  Total records: ${data['count']}
    "
              data['avg'].each { column, value ->
                reportFile <<
                  "  Average of ${column} is " +
                  "${value / data['count']}
    "
              }
              data['sum'].each { column, value ->
                reportFile <<
                  "  Sum of ${column} is ${value}
    "
              }
            }
    
          }
        }
      }
    
      def getEmptyRecord() {
        [count: 0, avg: [:], sum: [:]]
      }
    }
  5. At this point, you are ready to use the DSL internally from your Groovy code:
    def engine = new LogReportDslEngine()
    
    engine.process {
    
      format '^execution of (\w+) took (\d+)ms$'
    
      column 1, 'methodName'
      column 2, 'duration'
    
      source('PerformanceData2012') {
        localFile 'log1.log'
        localFile 'log2.log'
      }
      source('PerformanceData2013') {
        localFile 'log3.log'
        localFile 'log4.log'
      }
    
      report('Duration') {
        avg 'duration'
        sum 'duration'
        groupBy 'methodName'
      }
    
    }
  6. The previous script will produce two report files (one for each data source), named PerformanceData2012_Duration.report and PerformanceData2013_Duration.report. The report will look approximately like the following example:
    Report for getCustomerName
      Total records: 12
      Average of duration is 179.0
      Sum of duration is 2148.0
    Report for getAccountNumber
      Total records: 4
      Average of duration is 64.0
      Sum of duration is 256.0
    

How it works...

The Source and Report classes defined previously are simple structures holding information needed to build the reports; therefore, we will not spend any time on them.

The Configuration class is a bit more involved because it makes use of closure delegates (similar to the DSL for executing commands over SSH recipe).

The Configuration object is also constructed through a closure delegate inside the process method of the LogReportDslEngine class. After the configuration closure is executed, we get back a fully constructed data structure, which we are ready to use for further processing.

The code executed after we have a configuration object does the following:

  • Loops through all data sources, and for each of them
  • Loops through all report definitions, and for each of them
  • Goes through all source files and reads every line
  • For each line, it tries to match it against configured format expression and then fills in internal report data array
  • When the file processing is done, the collected data is printed into a report file

There's more...

Obviously, this DSL implementation is rather primitive and can be extended with many more features such as:

  • DSL validation rules (for example, groupBy columns cannot appear in an aggregated function)
  • More grouping functions (for example, min and max)
  • More data source types (for example, URL, FTP, and JDBC)
  • More column types (for example, date, time, Boolean, and IP address)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.74.160