0%

Book Description

Build Prometheus ecosystems with metric-centric visualization, alerting, and querying

Key Features

  • Integrate Prometheus with Alertmanager and Grafana for building a complete monitoring system
  • Explore PromQL, Prometheus' functional query language, with easy-to-follow examples
  • Learn how to deploy Prometheus components using Kubernetes and traditional instances

Book Description

Prometheus is an open source monitoring system. It provides a modern time series database, a robust query language, several metric visualization possibilities, and a reliable alerting solution for traditional and cloud-native infrastructure.

This book covers the fundamental concepts of monitoring and explores Prometheus architecture, its data model, and how metric aggregation works. Multiple test environments are included to help explore different configuration scenarios, such as the use of various exporters and integrations. You'll delve into PromQL, supported by several examples, and then apply that knowledge to alerting and recording rules, as well as how to test them. After that, alert routing with Alertmanager and creating visualizations with Grafana is thoroughly covered. In addition, this book covers several service discovery mechanisms and even provides an example of how to create your own. Finally, you'll learn about Prometheus federation, cross-sharding aggregation, and also long-term storage with the help of Thanos.

By the end of this book, you'll be able to implement and scale Prometheus as a full monitoring system on-premises, in cloud environments, in standalone instances, or using container orchestration with Kubernetes.

What you will learn

  • Grasp monitoring fundamentals and implement them using Prometheus
  • Discover how to extract metrics from common infrastructure services
  • Find out how to take full advantage of PromQL
  • Design a highly available, resilient, and scalable Prometheus stack
  • Explore the power of Kubernetes Prometheus Operator
  • Understand concepts such as federation and cross-shard aggregation
  • Unlock seamless global views and long-term retention in cloud-native apps with Thanos

Who this book is for

If you're a software developer, cloud administrator, site reliability engineer, DevOps enthusiast or system admin looking to set up a fail-safe monitoring and alerting system for sustaining infrastructure security and performance, this book is for you. Basic networking and infrastructure monitoring knowledge will help you understand the concepts covered in this book.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Infrastructure Monitoring with Prometheus
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Introduction to the book and the technology
    2. Who this book is for
    3. What this book covers
    4. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    5. Get in touch
      1. Reviews
  6. Section 1: Introduction
  7. Monitoring Fundamentals
    1. Definition of monitoring
      1. The value of monitoring
      2. Organizational contexts 
      3. Monitoring components
    2. Whitebox versus blackbox monitoring
    3. Understanding metrics collection
      1. An overview of the two collection approaches
      2. Push versus pull
      3. What to measure
        1. Google's four golden signals
        2. Brendan Gregg's USE method
        3. Tom Wilkie's RED method
    4. Summary
    5. Questions
    6. Further reading
  8. An Overview of the Prometheus Ecosystem
    1. Metrics collection with Prometheus
      1. High-level overview of the Prometheus architecture
    2. Exposing internal state with exporters
      1. Exporter fundamentals
    3. Alert routing and management with Alertmanager
      1. Alerting routes
    4. Visualizing your data
    5. Summary
    6. Questions
    7. Further reading
  9. Setting Up a Test Environment
    1. Code organization
    2. Machine requirements
      1. Hardware requirements
      2. Recommended software
        1. VirtualBox
        2. Vagrant
        3. Minikube
        4. kubectl
    3. Spinning up a new environment
      1. Automated deployment walkthrough
        1. Prometheus
        2. Grafana
        3. Alertmanager
        4. Cleanup
      2. Advanced deployment walkthrough
        1. Prometheus
        2. Grafana
        3. Alertmanager
        4. Node Exporter
        5. Validating your test environment
    4. Summary
    5. Questions
    6. Further reading
  10. Section 2: Getting Started with Prometheus
  11. Prometheus Metrics Fundamentals
    1. Understanding the Prometheus data model
      1. Time series data
      2. Time series databases
      3. Prometheus local storage
        1. Data flow
          1. Memory
          2. Write ahead log
          3. Disk
        2. Layout
      4. Prometheus data model
        1. Notation
        2. Metric names
        3. Metric labels
        4. Samples
        5. Cardinality
    2. A tour of the four core metric types
      1. Counter
      2. Gauge
      3. Histogram
      4. Summaries
    3. Longitudinal and cross-sectional aggregations
      1. Cross-sectional aggregation
      2. Longitudinal aggregation
    4. Summary
    5. Questions
    6. Further reading
  12. Running a Prometheus Server
    1. Deep dive into the Prometheus configuration
      1. Prometheus startup configuration
        1. The config section
        2. The storage section
        3. The web section
        4. The query section
      2. Prometheus configuration file walkthrough
        1. Global configuration
        2. Scrape configuration
    2. Managing Prometheus in a standalone server
      1. Server deploy
      2. Configuration inspection
      3. Cleanup
    3. Managing Prometheus in Kubernetes
      1. Static configuration
        1. Kubernetes environment
        2. Prometheus server deployment
        3. Adding targets to Prometheus
      2. Dynamic configuration – the Prometheus Operator
        1. Kubernetes environment
        2. Prometheus Operator deployment
        3. Prometheus server deployment
        4. Adding targets to Prometheus
    4. Summary
    5. Questions
    6. Further reading
  13. Exporters and Integrations
    1. Test environments for this chapter
      1. Static infrastructure test environment
      2. Kubernetes test environment
    2. Operating system exporter
      1. The Node Exporter
        1. Configuration
        2. Deployment
    3. Container exporter
      1. cAdvisor
        1. Configuration
        2. Deployment
      2. kube-state-metrics
        1. Configuration
        2. Deployment
    4. From logs to metrics
      1. mtail
        1. Configuration
        2. Deployment
      2. Grok exporter
        1. Configuration
        2. Deployment
    5. Blackbox monitoring
      1. Blackbox exporter
        1. Configuration
        2. Deployment
    6. Pushing metrics
      1. Pushgateway
        1. Configuration
        2. Deployment
    7. More exporters
      1. JMX exporter
      2. HAProxy exporter
    8. Summary
    9. Questions
    10. Further reading
  14. Prometheus Query Language - PromQL
    1. The test environment for this chapter
    2. Getting to know the basics of PromQL
      1. Selectors
        1. Label matchers
        2. Instant vectors
        3. Range vectors
        4. The offset modifier
        5. Subqueries
      2. Operators
        1. Binary operators
          1. Arithmetic
          2. Comparison
        2. Vector matching
          1. One-to-one
          2. Many-to-one and one-to-many
          3. Logical operators
        3. Aggregation operators
        4. Binary operator precedence
      3. Functions
        1. absent()
        2. label_join() and label_replace()
        3. predict_linear()
        4. rate() and irate()
        5. histogram_quantile()
        6. sort() and sort_desc()
        7. vector()
        8. Aggregation operations over time
        9. Time functions
        10. Info and enum
    3. Common patterns and pitfalls
      1. Patterns
        1. Service-level indicators
        2. Percentiles
        3. The health of scrape jobs
      2. Pitfalls
        1. Choosing the right functions for the data type
        2. Sum-of-rates versus rate-of-sums
        3. Having enough data to work with
        4. Unexpected results when using increase
        5. Not using enough matchers to select a time series
        6. Losing statistical significance
        7. Knowing what to expect when constructing complex queries
        8. The query of death
    4. Moving on to more complex queries
      1. In which node is Node Exporter running?
        1. Scenario rationale
        2. PromQL approach
      2. Comparing CPU usage across different versions
        1. Scenario rationale
        2. PromQL approach
    5. Summary
    6. Questions
    7. Further reading
  15. Troubleshooting and Validation
    1. The test environment for this chapter
      1. Deployment
      2. Cleanup
    2. Exploring promtool
      1. Checks
        1. check config
        2. check rules
        3. check metrics
      2. Queries
        1. query instant
        2. query range
        3. query series
        4. query labels
      3. Debug
        1. debug pprof
        2. debug metrics
        3. debug all
      4. Tests
    3. Logs and endpoint validation
      1. Endpoints
      2. Logs
    4. Analyzing the time series database
      1. Using the tsdb tool
    5. Summary
    6. Questions
    7. Further reading
  16. Section 3: Dashboards and Alerts
  17. Defining Alerting and Recording Rules
    1. Creating the test environment
      1. Deployment
      2. Cleanup
    2. Understanding how rule evaluation works
      1. Using recording rules
      2. Naming convention for recording rules
    3. Setting up alerting in Prometheus
      1. What is an alerting rule?
      2. Configuring alerting rules
        1. Prometheus server configuration file
        2. Rule file configuration
      3. Labels and annotations
      4. Delays on alerting
    4. Testing your rules
      1. Recording rules tests
      2. Alerting rules tests
    5. Summary
    6. Questions
    7. Further reading
  18. Discovering and Creating Grafana Dashboards
    1. Test environment for this chapter
      1. Deployment
      2. Cleanup
    2. How to use Grafana with Prometheus
      1. Login screen
      2. Data source
      3. Explore
      4. Dashboards
      5. Grafana running on Kubernetes
    3. Building your own dashboards
      1. Dashboard fundamentals
        1. Panels
        2. Variables
        3. Time picker
      2. Creating a basic dashboard
      3. Exporting dashboards
    4. Discovering ready-made dashboards
      1. Grafana dashboards gallery
      2. Publishing your dashboards
    5. Default Prometheus visualizations
      1. Out-of-the-box console templates
      2. Console template basics
    6. Summary
    7. Questions
    8. Further reading
  19. Understanding and Extending Alertmanager
    1. Setting up the test environment
      1. Deployment
      2. Cleanup
    2. Alertmanager fundamentals
      1. The notification pipeline
        1. Dispatching alert groups to the notification pipeline
        2. Inhibition
        3. Silencing
        4. Routing
      2. Alertmanager clustering
    3. Alertmanager configuration
      1. Prometheus configuration
      2. Configuration file overview
        1. global
        2. route
        3. inhibit_rules
        4. receiver
        5. templates
      3. The amtool command-line tool
        1. alert
        2. silence
        3. check-config
        4. config
      4. Kubernetes Prometheus Operator and Alertmanager
    4. Common Alertmanager notification integrations
      1. Email
      2. Chat
      3. Pager
      4. Webhook
      5. null
    5. Customizing your alert notifications
      1. Default message format
      2. Creating a new template
    6. Who watches the Watchmen?
      1. Meta-monitoring and cross-monitoring
      2. Dead man's switch alerts
    7. Summary
    8. Questions
    9. Further reading
  20. Section 4: Scalability, Resilience, and Maintainability
  21. Choosing the Right Service Discovery
    1. Test environment for this chapter
      1. Deployment
      2. Cleanup
    2. Running through the service discovery options
      1. Cloud providers
      2. Container orchestrators
      3. Service discovery systems
      4. DNS-based service discovery
      5. File-based service discovery
    3. Using a built-in service discovery
      1. Using Consul service discovery
      2. Using Kubernetes service discovery
    4. Building a custom service discovery
      1. Custom service discovery fundamentals
      2. Recommended approach
        1. The service discovery adapter
        2. Custom service discovery example
      3. Using the custom service discovery
    5. Summary
    6. Questions
    7. Further reading
  22. Scaling and Federating Prometheus
    1. Test environment for this chapter
      1. Deployment
      2. Cleanup
    2. Scaling with the help of sharding
      1. Logical grouping of jobs
      2. The single job scale problem
      3. What to consider when sharding
      4. Alternatives to sharding
    3. Having a global view using federation
      1. Federation configuration
      2. Federation patterns
        1. Hierarchical
        2. Cross-service
    4. Using Thanos to mitigate Prometheus shortcomings at scale
      1. Thanos' global view components
        1. Sidecar
        2. Query
    5. Summary
    6. Questions
    7. Further reading
  23. Integrating Long-Term Storage with Prometheus
    1. Test environment for this chapter
      1. Deployment
      2. Cleanup
    2. Remote write and remote read
      1. Remote write
      2. Remote read
    3. Options for metrics storage
      1. Local storage
      2. Remote storage integrations
    4. Thanos remote storage and ecosystem
      1. Thanos ecosystem
      2. Thanos components
        1. Test environment specifics
        2. Thanos query
        3. Thanos sidecar
        4. Thanos store gateway
        5. Thanos compact
        6. Thanos bucket
        7. Thanos receive
        8. Thanos rule
    5. Summary
    6. Questions
    7. Further reading
  24. Assessments
    1. Chapter 1, Monitoring Fundamentals
    2. Chapter 2, An Overview of the Prometheus Ecosystem
    3. Chapter 3, Setting Up a Test Environment
    4. Chapter 4, Prometheus Metrics Fundamentals
    5. Chapter 5, Running a Prometheus Server
    6. Chapter 6, Exporters and Integrations
    7. Chapter 7, Prometheus Query Language - PromQL
    8. Chapter 8, Troubleshooting and Validation
    9. Chapter 9, Defining Alerting and Recording Rules
    10. Chapter 10, Discovering and Creating Grafana Dashboards
    11. Chapter 11, Understanding and Extending Alertmanager
    12. Chapter 12, Choosing the Right Service Discovery
    13. Chapter 13, Scaling and Federating Prometheus
    14. Chapter 14, Integrating Long-Term Storage with Prometheus
  25. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
3.239.206.191