© Rahul Sharma, Akshay Mathur 2021
R. Sharma, A. MathurTraefik API Gateway for Microserviceshttps://doi.org/10.1007/978-1-4842-6376-1_5

5. Logs, Request Tracing, and Metrics

Rahul Sharma1   and Akshay Mathur2
(1)
Patpargunj, Delhi, India
(2)
Gurgaon, Haryana, India
 

Business operations perform application monitoring. This process aims to discover and fix application outages before they impact regular business operations. Traditionally, teams performed simple checks like process up/down or port open/closed. But these checks were not good enough. Over time, many tools have been built to improve the process of application monitoring. The process involves capturing usage metrics and performing analysis. But relying only on application monitoring is a weak practice. Application monitoring can only provide notifications on ongoing application issues. The next step is to determine the root cause .

The root cause is mostly contextual: a new feature is malfunctioning, or some controls were missed in the specification, or a user is executing a valid request that results in “out of memory,” and so forth. We are unable to reach a conclusion by only looking at notifications. We need more information to determine the root cause. This is known as the context-of-the-failure .

Context is created by first looking at application logs, if available. A stack trace provides a lead into a possible bug, but the bug is caused by a particular edge scenario. These edge scenarios are defined by user data and the application state. User data is determined from request access logs if they have been captured. All of this is easier said than done.

Over the years, the enterprise application landscape has become more and more complex. Previous practices were insufficient in dealing with outages. Google came out with the practice of request tracing. Request tracing captured the flow of user requests across different distributed systems. This complementary process helped project failing scenarios and the systems involved.

In summary, logs, metrics, and traces are complementary practices (see Figure 5-1) for different purposes. None of these practices is individually sufficient during an outage. Thus, the simple practice of application monitoring has moved from the individual application state to a holistic view of the entire ecosystem. This is also known as observability. Observability encompasses gathering, visualization, and analysis of metrics, logs, and traces to gain a holistic understanding of a system’s operation.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig1_HTML.jpg
Figure 5-1

Observability data

Companies like Twitter, Google, Uber, and so forth, which pioneered observability, defined the complete practice built on the following pillars.
  • Application and business metrics

  • Logs

  • Distributed traces

  • Alerts and notifications

  • Visualizations

Note

Observability projects why something is wrong, compared to monitoring, which simply tells when something is wrong.

Traefik, being the API-gateway, is a single point of entry of all externally originated user requests. It must integrate with enterprise existing solutions to capture all request flows and metrics. To capture end-to-end request flows, Traefik needs to generate request spans and send them the tracing backend system. Traefik also needs to generate access logs and request-based metrics to build visibility into distributed systems’ behavior. This chapter discusses these features with a sample HTTP application.

Prerequisites

In this chapter, we use an example HTTP service. We deploy and configure the httpbin service (https://github.com/postmanlabs/httpbin) to serve our purposes. It is an open source application. The service is written in Python. We require a Python runtime to run the application. The deployed service is configured using Traefik.

Note

This is an optional step. It is an example service for validating configuration changes. If you have a running service, you can skip this step.

First, check for the required python, pip, and virtualenv commands.
~/Projects$ python3 --version
Python 3.8.0
~/Projects$ pip3 --version
pip 19.2.3 from /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip (python 3.8)
~/Projects$ virtualenv --version
16.7.8

Make sure that you have the 3.x versions of Python and pip. If a command fails, you need to install the required software. Installation instructions for Python, pip, and virtualenv are beyond the scope of the book.

For the next step, we download a version of the httpbin service from release pages https://github.com/postmanlabs/httpbin/releases (see Figure 5-2). At the time of writing, 0.6.1 is the latest release version.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig2_HTML.jpg
Figure 5-2

httpbin releases

Download the released artifacts and extract them to a directory. The directory contains the source files, application license, build files, and so forth. The aim is to compile the code and get a binary artifact from it.
~/Projects/httpbin-0.6.1$ ls -1
AUTHORS
Dockerfile
LICENSE
MANIFEST.in
Pipfile
Pipfile.lock
Procfile
README.md
app.json
build
dist
httpbin
httpbin.egg-info
setup.cfg
setup.py
test_httpbin.py
tox.ini
The service is built using setuptools . You can deploy and run the service, as explained next.
  1. 1.

    Create a virtual environment and then activate it.

     
~/Projects/httpbin-0.6.1$ virtualenv venv
Using base prefix '/Library/Frameworks/Python.framework/Versions/3.8'
New python executable in /Users/rahulsharma/Projects/httpbin-0.6.1/venv/bin/python3.8
Also creating executable in /Users/rahulsharma/Projects/httpbin-0.6.1/venv/bin/python
Installing setuptools, pip, wheel...
done.
~/Projects/httpbin-0.6.1$ source venv/bin/activate
(venv) ~/Projects/httpbin-0.6.1$
  1. 2.
    Build the service in develop mode.
    (venv) ~/Projects/httpbin-0.6.1$ python setup.py develop
    running develop
    running egg_info
    writing httpbin.egg-info/PKG-INFO
    ####                     ####
    #### removed for brevity ####
    ####                     ####
    /Users/rahulsharma/Projects/httpbin-0.6.1/venv/bin
    Using /Users/rahulsharma/Projects/httpbin-0.6.1/venv/lib/python3.8/site-packages
    Finished processing dependencies for httpbin==0.6.1
    (venv) ~/Projects/httpbin-0.6.1$
     
  2. 3.

    Deploy the application in Gunicorn.

     
(venv) ~/Projects/httpbin-0.6.1$ gunicorn -b 0.0.0.0 httpbin:app
[2020-06-12 14:35:04 +0530] [67528] [INFO] Starting gunicorn 20.0.4
[2020-06-12 14:35:04 +0530] [67528] [INFO] Listening at: http://0.0.0.0:8000 (67528)
[2020-06-12 14:35:04 +0530] [67528] [INFO] Using worker: sync
[2020-06-12 14:35:04 +0530] [67530] [INFO] Booting worker with pid: 67530
The httpbin service is now running on our system. You can access it at http://localhost:8000 (see Figure 5-3). You can also test a few of the available endpoints.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig3_HTML.jpg
Figure 5-3

httpbin service

Traefik Configuration

In the previous section, we added an HTTP service. Let’s now configure Traefik to send user requests to it. We will create the following treafik.yml with an entrypoint for web applications.
entryPoints :
  web :
    address : ":80"
providers :
    directory : /Users/rahulsharma/Projects/traefik-book/ch05/services
    watch : true
    filename : config
    debugLogGeneratedTemplate : true
api :
  insecure : true
  dashboard : true
In the prior configuration, Traefik is listening on port 80. Next, let’s define the routing and service for the deployed httpbin application.
 http :
  routers :
    guest-router :
      entryPoints :
      - web
      rule : Host(`localhost`)
      service : httpbin-service
  services :
    httpbin-service :
      loadBalancer :
        servers :
        - url  : http://192.168.1.4:8000/
This configuration sends requests to httpbin running on the 192.168.1.4 instance. This configuration needs to be copied to the services folder as config.yml. After this, you can look up http://localhost. The browser should load the application. The deployed configuration can be seen on the Traefik dashboard (see Figure 5-4).
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig4_HTML.jpg
Figure 5-4

Dashboard for httpbin entrypoint

Traefik Logs

Traefik reports information about encountered issues. By default, Traefik reports these to standard output. These reported issues are corresponding to events in the Traefik application. The information is reported at different severity levels. You can configure Traefik logs by adding log configuration . The configuration can set up logging to a particular file. It can also specify the minimal severity level of messages.
entryPoints :
  web :
    address : ":80"
providers :
  # removed for Brevity
log:
  level: INFO
  filePath: traefik.json.log
  format: json
This code does the following.
  • Directs logs to tarefik.json.log file in the current working directory

  • Changes the default log level to INFO, which writes messages for fatal, error, warn, and information levels

  • Logs messages in JSON format

By default, Traefik writes all messages in common log format. Alternatively, you can change it to JSON format, as shown. Traefik can report log messages at the debug, info, warn, error, and fatal levels. Configuring a lower level enables reporting for all severity levels above the configured level.

The defined code is part of static configuration used to start Traefik. Traefik does not autoload these changes. Restart the server after making the changes. You can tail the log file as shown next.
ch05 $ tail -f traefik.json.log
{"level":"info","msg":"Traefik version 2.2.0 built on 2020-03-25T17:17:27Z","time":"2020-06-13T20:27:08+05:30"}
{"level":"info","msg":" Stats collection is disabled. Help us improve Traefik by turning this feature on :) More details on: https://docs.traefik.io/contributing/data-collection/ ","time":"2020-06-13T20:27:08+05:30"}
{"level":"error","msg":"unsupported access log format: "foobar", defaulting to common format instead.","time":"2020-06-13T20:27:08+05:30"}
{"level":"error","msg":"Failed to create new HTTP code ranges: strconv.Atoi: parsing "foobar": invalid syntax","time":"2020-06-13T20:27:08+05:30"}
{"level":"info","msg":"Starting provider aggregator.ProviderAggregator {}","time":"2020-06-13T20:27:08+05:30"}
{"level":"info","msg":"Starting provider *file.Provider {"directory":"/Users/rahulsharma/Projects/traefik-book/ch05/code","watch":true,"filename":"config","debugLogGeneratedTemplate":true}","time":"2020-06-13T20:27:08+05:30"}
{"level":"info","msg":"Starting provider *traefik.Provider {}","time":"2020-06-13T20:27:08+05:30"}

Access Logs

Traefik can report information about client requests. The information is written to access log after the request is processed. But the access log is not created by default. The access log configuration sets up logging to a particular file. But default access log is written in common log format. It can be configured to report in JSON format.
# Removed for Brevity
log:
  level: INFO
  filePath: traefik.json.log
  format: json
accessLog:
  filePath: access.json.log
  format: json
This code does the following.
  • Directs access logs to access.json.log file in the current working directory

  • Logs messages in JSON format

After adding the preceding configuration, restart the Traefik server. The following access logs are generated when we access http://localhost/.
logs $ tail -f access.json.log
{"ClientAddr":"[::1]:63226","ClientHost":"::1","ClientPort":"63226","ClientUsername":"-","DownstreamContentSize":12026,"DownstreamStatus":200,"Duration":28245000,"OriginContentSize":12026,"OriginDuration":28187000,"OriginStatus":200,"Overhead":58000,"RequestAddr":"localhost","RequestContentSize":0,"RequestCount":1,"RequestHost":"localhost","RequestMethod":"GET","RequestPath":"/","RequestPort":"-","RequestProtocol":"HTTP/1.1","RequestScheme":"http","RetryAttempts":0,"RouterName":"httpbin-router@file","ServiceAddr":"192.168.1.4:8000","ServiceName":"httpbin-service@file","ServiceURL":{"Scheme":"http","Opaque":"","User":null,"Host":"192.168.1.4:8000","Path":"/","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""}
#### TRUNCATED }
The access logs contain diverse information. It can be helpful to determine outages and slow responses times by using the following reported attributes.
  • Duration: The total time spent processing a request

  • OriginDuration: The time spent between establishing a connection and receiving the last byte of the response body from the upstream server

  • Overhead: The time difference between the response received from the upstream server and the response sent back to the client

  • OriginStatus: The response code sent by the upstream server

   "Duration":28245000,
   "OriginContentSize":12026,
   "OriginDuration":28187000,
   "OriginStatus":200,
   "Overhead":58000,

Since the access log is written after request processing, it adds overhead. But logging overheads can be optimized by configuring the buffer for the log messages. The buffer enables asynchronous write, instead of post-request write, of the log messages. The buffer specifies the number of log lines Traefik keeps in memory before writing them to the selected output. To enable the buffer, configure the buffersize attribute.

Note

The access log is a global configuration for only HTTP services. This is not an entrypoint or route-specific configuration. Once enabled, Traefik generates logs for all entrypoints/user requests.

Log Filters

Traefik access logs describe every request handled by the server. The information is detailed. The access log can grow very quickly if the server is handling many user requests. The large volume of information soon become unmanaged. Alternatively, you can log selective requests based on preconfigured criteria. This makes sure we are only looking at the relevant user requests. It excludes trivial log entries from the access log. The selective logging is enabled by using the filters attribute. The filter attribute provides the following three options.
  • statusCodes: Logs only the specified list of response codes.

  • retryAttempts: Logs when there are retry attempts

  • minDuration: Logs when the request takes more than the specified time

# Removed for Brevity
accessLog:
  filePath: logs/access.json.log
  format: json
  bufferingSize: 50
  filters:
    statusCodes:
      - 200
      - 300-302
    retryAttempts: true
    minDuration: 5s
This code writes to access log when any of the following conditions is true .
  • The response code is 200/300/301/302

  • The request is retried using circuit breaks

  • The request takes more than 5 seconds

Accessing http://localhost/ should generate a log message as the status code is 200. Now access http://localhost/status/418. There should not be any log statement.
logs $ tail -f access.json.log
{"ClientAddr":"[::1]:64020","ClientHost":"::1","ClientPort":"64020","ClientUsername":"-","DownstreamContentSize":12026,"DownstreamStatus":200,"Duration":27516000,"OriginContentSize":12026,"OriginDuration":27467000,"OriginStatus":200,"Overhead":49000,"RequestAddr":"localhost","RequestContentSize":0,"RequestCount":1,"RequestHost":"localhost","RequestMethod":"GET","RequestPath":"/","RequestPort":"-","RequestProtocol":"HTTP/1.1","RequestScheme":"http","RetryAttempts":0,"RouterName":"httpbin-router@file","ServiceAddr":"192.168.1.4:8000","ServiceName":"httpbin-service@file"...... TRUNCATED }

Log Fields

Previously, we discussed how you can log on response criteria. But Traefik can also be configured to report selective information in the log statements. You may be required to hide user identities, remove sensitive information, or optimize the log. Traefik log information consists of the following two types.
  • Request headers: The headers passed by the user in the request

  • Fields: Additional information added by Traefik

Both information types have attributes that can be controlled by the following options.
  • keep reports as-is information in a log.

  • drop removes the information from a log.

  • redact replaces and masks information in a log.

accessLog:
  filePath: logs/access.json.log
  format: json
  bufferingSize: 50
  fields:
    defaultMode: keep
    names:
      ClientUsername: drop
    headers:
      defaultMode: keep
      names:
          User-Agent: redact
          Authorization: drop
          Content-Type: keep
In this code, we configured the following .
  • The keep value for defaultmode enables the reporting of fields and headers.

  • The keep value for defaultmode enables reporting headers.

  • The drop value for ClientUsername removes it from a log.

  • The drop value for Content-Type and Authorization removes these headers from a log.

  • The redact value for User-Agent reports the value as redacted.

After adding the preceding configuration, restart the Traefik server. The following access logs are generated when you access http://localhost/.
logs $ tail -f access.json.log
{"ClientAddr":"[::1]:49537","ClientHost":"::1","ClientPort":"49537",
 <!-- REMOVED for Brevity -->
,"origin_X-Processed-Time":"0","request_Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","request_Accept-Encoding":"gzip, deflate, br","request_Accept-Language":"en-US,en;q=0.9","request_Cache-Control":"max-age=0","request_Connection":"keep-alive","request_Sec-Fetch-Dest":"document","request_Sec-Fetch-Mode":"navigate","request_Sec-Fetch-Site":"none","request_Sec-Fetch-User":"?1","request_Upgrade-Insecure-Requests":"1","request_User-Agent":"REDACTED","request_X-Forwarded-Host":"localhost","request_X-Forwarded-Port":"80","request_X-Forwarded-Proto":"http","request_X-Forwarded-Server":"XE-GGN-IT-02498.local","request_X-Real-Ip":"::1","time":"2020-06-14T16:35:18+05:30"}
Note

Traefik reports about 25 additional fields. The list of fields is available in Traefik documentation.

Log Rotation

Production deployed applications prefer the policy of log rotation. This helps in optimal disk usage as historical logs are purged. But Traefik logs are not rotated by default. Thus we need to use system programs to perform log management. Logs management involves archiving and purging activities. Depending on the operating system, there are various programs to do this. On FreeBSD systems, you can use newsyslog , while on Linux, you can use logrotate. All of them rely on sending USR1 signals to rotate logs. In the following discussion, we work with newsyslog. The outlined steps remain the same for any other program.

The newsyslog utility included in FreeBSD rotates, and archives log files, if necessary. The program needs input in for a configuration file. The file identifies which all log files need to be handled. It provides a diverse set of attributes that can describe the file permissions, copy behavior, archive count, and so forth. The program is configured to run at periodic intervals by using a scheduling program like crontab. Let's create the following configuration in a file named syslog.conf.
/Users/rahulsharma/Projects/traefik-book/ch05/logs/access.json.log    rahulsharma:staff    640  5    500    *     Z
In this configuration , we configured log rotation for acces.json.log.
  • Set the file owner and group to rahulsharma:staff. This applies to the zipped file and the new log file.

  • Set the file permission to 640.

  • There are only five rotations for the file.

  • The rotation happens when the size grows above 500,000.

  • The Z flag configures zipped files.

You can run newsyslog with the described configuration using the following command.
code $ sudo newsyslog -vf  syslog.conf
/Users/rahulsharma/Projects/traefik-book/ch05/logs/access.json.log <5Z>: size (Kb): 532 [500] --> trimming log....
Signal all daemon process(es)...
Notified daemon pid 91 = /var/run/syslog.pid
Pause 10 seconds to allow daemon(s) to close log file(s)
Compress all rotated log file(s)...
Note

The preceding process does not apply to Windows because there is no log rotate program due to a lack of USR signals.

Blacklisting

Traefik provides support for backlisting by using middleware. We discussed middleware in Chapter 2. They are configured as part of routers. Middleware is executed after the rule matching but before forwarding the request to the service. Traefik supports IP backlisting by configuring ipWhiteList middleware . It can be configured by using the following options.
  • sourceRange : Describes the set of allowed IPs in CIDR format

  • ipstrategy : Describes how to identify client IP from the X-forward-for header

http :
  routers :
    httpbin-router :
      entryPoints :
      - web
      rule : HostRegexp(`{name:.*}`)
      middlewares :
      - allowed-sources
      service : httpbin-service
  middlewares:
    allowed-sources:
      ipWhiteList:
        sourceRange:
          - "127.0.0.1/32"
  services :
  # Removed for Brevity
In the preceding code, we did the following.
  • We modified to router rule to allow all hostnames using a regular expression. This is done using the HostRegexp function instead of the Host operator.

  • We added the Middlewares section with the name of the configured ipWhiteList middleware.

  • We configured the Middlewares section with the configuration for ipWhiteList.

  • We added the list of allowed IPs using the sourceRange option.

Now let’s run the configuration. Access the http://localhost/ page to access the httpbin service.
$ curl -v http://localhost/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Date: Sat, 20 Jun 2020 17:41:11 GMT
< Content-Length: 9
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
Forbidden* Closing connection 0
We get back a forbidden response. This is so because our localhost domain is resolved to the IP6 loopback address (::1). The loopback address is not in the whitelist. Alternatively, you can access using the IP4 loopback address (127.0.0.1). This should load the page as expected. The forbidden access is reported in access logs. Make sure that you remove status code–based log filters from the static configuration.
{"ClientAddr":"[::1]:64616","ClientHost":"::1","ClientPort":"64616","ClientUsername":"-","DownstreamContentSize":9,"DownstreamStatus":403,"Duration":128000,"OriginContentSize":9,"OriginDuration":79000,"OriginStatus":403,"Overhead":49000,"RequestAddr":"localhost","RequestContentSize":0,"RequestCount":63,"RequestHost":"localhost","RequestMethod":"GET","RequestPath":"/","RequestPort":"-","RequestProtocol":"HTTP/1.1","RequestScheme":"http","RetryAttempts":0,"RouterName":"httpbin-router@file","StartLocal":"2020-06-20T23:11:01.21434+05:30","StartUTC":"2020-06-20T17:41:01.21434Z","entryPointName":"web","level":"info","msg":"","time":"2020-06-20T23:11:01+05:30"}

Request Tracing

You learned that observability is a diverse practice. Request tracing or distributed tracing is an important pillar to profile application behaviors. It is commonly applied to distributed systems to project how the request processing has happened across different systems. It can point out applications that have caused performance issues or have failed request processing.

In a nutshell, distributed tracing maps the flow of a request as it is processed through a system. The processing flow is created on a building block known as a request span. A request span represents time spent in processing by a service. All services which process the request generate their individual spans. These spans are then combined into a single distributed trace for the entire request.

As an API gateway, Traefik receives incoming requests for different applications. It is the single point of entry for all external requests. Traefik must support the generation of request spans. The generated request spans are propagated as request headers to the application. In turn, the application must propagate these headers further. Traefik generates the following B3 trace headers.
  • x-b3-traceid

  • x-b3-spanid

  • x-b3-parentspanid

  • x-b3-sampled

These spans are sent to a tracing backend service. The service is responsible for storing and processing this information. Traefik supports several OpenTracing backends like Zipkin, Datadog, and Jagger. In this section, we work with Zipkin. Similar configurations are required for other backends.

Install Zipkin

Zipkin is an open source trace collection engine built in Java. It not only supports trace collection, but it also provides a dashboard to visualize traces. There are other features that allow you to analyze request flows. Since Zipkin is open sourced, it provides access to code that can be compiled for a target platform. Alternatively, we directly run a released binary. Zipkin's latest release can be downloaded using the following command.
code $curl -sSL https://zipkin.io/quickstart.sh | bash -s
Thank you for trying Zipkin!
This installer is provided as a quick-start helper, so you can try Zipkin out
without a lengthy installation process.
Fetching version number of latest io.zipkin:zipkin-server release...
Latest release of io.zipkin:zipkin-server seems to be 2.21.4
Downloading io.zipkin:zipkin-server:2.21.4:exec to zipkin.jar...
Once zipkin.jar is downloaded, run it using the following command.
code $ java -jar zipkin.jar
2020-06-20 21:57:31.012  INFO 47685 --- [           main] z.s.ZipkinServer                         : Starting ZipkinServer on XE-GGN-IT-02498.local with PID 47685 (/Users/rahulsharma/Projects/trafik/code/zipkin.jar started by rahulsharma in /Users/rahulsharma/Projects/trafik/code)
2020-06-20 21:57:31.016  INFO 47685 --- [           main] z.s.ZipkinServer                         : The following profiles are active: shared
2020-06-20 21:57:32.040  INFO 47685 --- [           main] c.l.a.c.u.SystemInfo                     : hostname: xe-ggn-it-02498.local (from 'hostname' command)
2020-06-20 21:57:32.537  INFO 47685 --- [oss-http-*:9411] c.l.a.s.Server                           : Serving HTTP at /0:0:0:0:0:0:0:0:9411 - http://127.0.0.1:9411/
2020-06-20 21:57:32.538  INFO 47685 --- [           main] c.l.a.s.ArmeriaAutoConfiguration         : Armeria server started at ports: {/0:0:0:0:0:0:0:0:9411=ServerPort(/0:0:0:0:0:0:0:0:9411, [http])}
The server is up and running on 9411 port. You can access its dashboard at http://localhost:9411/.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig5_HTML.jpg
Figure 5-5

Zipkin dashboard

Integrate Zipkin

Traefik integration with Zipkin is simple. We only need to provide the Zipkin API location. The parameters are part of Traefik’s static configuration. Traefik also provides the following attributes to customize the tracing behavior.
  • sameSpan: Configures one span for RPC invocations

  • id128Bit: Generates 128-bit trace IDs

  • samplerate: Percentage of requests traced

# Removed for Brevity
tracing:
  zipkin:
    httpEndpoint: http://localhost:9411/api/v2/spans
    id128Bit : true
    sameSpan: true
In this configuration, we provided the location for the Zipkin API. We also configured 128-bit traces with the same span for RPC client and server. Now restart the server.
ch05 $ traefik  --configfile traefik.yml
INFO[0000] Configuration loaded from file: /Users/rahulsharma/Projects/traefik-book/ch05/traefik.yml
You can validate the configuration in the Traefik dashboard (see Figure 5-6). It should report which tracing backend is configured in the application.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig6_HTML.jpg
Figure 5-6

Tracing dashboard status

Note

Tracing is enabled at a global level. Once enabled, it generates traces for all requests, including the dashboard API.

Now, let’s make a couple of requests. The httpbin application (see Figure 5-7) provides several request types. Try loading the IP, status code, and redirect requests. Traefik generates the request traces and sends it to the deployed Zipkin.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig7_HTML.jpg
Figure 5-7

The httpbin application

You can tail the access logs. Traefik log all passed request headers, including the generated B3 headers.
  {
   # Removed for Brevity
   "request_User-Agent":"REDACTED",
   "request_X-B3-Parentspanid":"12f1ca6cf7671169",
   "request_X-B3-Sampled":"1",
   "request_X-B3-Spanid":"1704e2a62f95fa8b",
   "request_X-B3-Traceid":"12f1ca6cf7671169",
  }
Traefik integration consists of the following steps.
  • Generate TraceId and Span for a request based on the configured sampling rate

  • Forward the trace headers to the service application

  • Update the spans based on the response code

  • Send the generated trace spans to the tracing backend

Now, you can load the Zipkin dashboard. The dashboard provides a UI to visualize request traces. You can search for requests during the last 15 mins. The resulting page should look as per the following. The tracer/Zipkin dashboard(see Figure 5-8) marks all traces with 2XX or 3XX return in blue. But a return code of 4XX /5XX is shown in red.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig8_HTML.jpg
Figure 5-8

Request traces

Traefik Metrics

Traefik can generate application-level metrics. These metrics must be captured in a backed service for monitoring and alert notifications. Traefik supports the most widely used metrics solutions like StatsD, Datadog, Prometheus, and so forth. In the current section, we work with Prometheus as a metrics-backed store. Prometheus is an open-source solution built using Golang. Prometheus can scrape metrics from endpoints exposed in Traefik. It also provides a dashboard to visualize metrics. Prometheus details are beyond the scope of the book.

Let’s first enable Traefik metrics by adding a relevant configuration. The configuration needs to be added to the static-configuration file. Traefik provides the following options.
  • buckets: Defines buckets for response latencies

  • addEntryPointLabels: Adds entrypoint names to request metrics

  • addServiceLabels: Adds service names to request metrics

  • entryPoint: Names entrypoint configured to publish metrics

  • manualrouting: Enables a custom router for the prometheus@internal service

entryPoints :
  web :
    address : ":80"
# Removed for Brevity
metrics:
  prometheus:
     addEntryPointsLabels: true
     addServicesLabels : true
This configuration enables metrics on the default endpoint. The metrics are generated at http://locahost:8080/metrics. Restart the server and verify the configuration on the Traefik dashboard.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig9_HTML.jpg
Figure 5-9

Enable metrics

Configure Prometheus

Now we need to capture the generated metrics in Prometheus. Let’s start by downloading the latest version using the Release page (https://prometheus.io/download/). You can unzip the release. But before starting the Prometheus server, we need to configure the endpoint, which needs to be scrapped. This can be done by updating the bundled prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
### REMOVED for BREVITY
    static_configs:
    - targets: ['localhost:8080']
In this configuration, the Traefik endpoint (localhost:8080) to the list of targets. Prometheus looks up the metrics using http:// localhot:8080/metrics. Now start Prometheus using the following command.
prometheus-2.19.1.darwin-amd64 $ ./prometheus
level=info ts=2020-06-21T06:14:37.958Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-06-21T06:14:37.959Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.19.1, branch=HEAD, revision=eba3fdcbf0d378b66600281903e3aab515732b39)"
level=info ts=2020-06-21T06:14:37.959Z caller=main.go:338 build_context="(go=go1.14.4, user=root@62700b3d0ef9, date=20200618-16:45:01)"
level=info ts=2020-06-21T06:14:37.959Z caller=main.go:339 host_details=(darwin)
level=info ts=2020-06-21T06:14:37.959Z caller=main.go:340 fd_limits="(soft=2560, hard=unlimited)"
level=info ts=2020-06-21T06:14:37.959Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-06-21T06:14:37.960Z caller=main.go:678 msg="Starting TSDB ..."
level=info ts=2020-06-21T06:14:37.960Z caller=web.go:524 component=web msg="Start listening for connections" address=0.0.0.0:9090
We can Load Prometheus dashboard using http://locahost:9090/. The metric dropdown have different options with traefik_ prefix. We load the traefik_entrypoint_requests_total metric. It described the total number of requests handler by Traefik. Additionally, you can also send several requests to Traefik using the following bash script.
$ for ((i=1;i<=10000;i++)); do   curl -v --header "Connection: keep-alive" "localhost"; done
This script sends about 10,000 requests to Traefik server. Lastly, you can check the Prometheus dashboard (see Figure 5-10), which captures the growth in traffic.
../images/497627_1_En_5_Chapter/497627_1_En_5_Fig10_HTML.jpg
Figure 5-10

Request traffic metric

Summary

This chapter discussed observability. We talked about its three pillars of error logs, request traces, and application metrics. First, we configured error logs. These logs capture information about any errors occurring in Traefik. As a next step, we configured access logs. The access logs capture incoming requests handled by Traefik. As the incoming requests increase, the access logs can bloat quickly.

We discussed ways to manage it by using filters, rotation, and header masking. We also configured IPwhitelist middleware and captured the forbidden logs generated by it. After this, we enabled request tracing using Zipkin. Traefik generates B3 headers for tracing. These headers can be seen in access logs.

You looked at the process flow and generated traces in the Zipkin dashboard. Finally, we enabled Traefik metrics and captured them in Prometheus. Traefik supports many backend stores for Tracing and metrics. Zipkin and Prometheus have been taken as an example to demonstrate its integration. These tools are helpful in distributed architectures like microservices.

In the next chapter, you work with Traefik support for microservices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.93.12