0%

Software Telemetry shows you how to efficiently collect, store, and analyze system and application log data so you can monitor and improve your systems. Manage the pillars of observability—logs, metrics, and traces—in an end-to-end telemetry system that integrates with your existing infrastructure. You’ll discover how software telemetry benefits both small startups and legacy enterprises. And at a time when data audits are increasingly common, you’ll appreciate the thorough coverage of legal compliance processes, so there’s no reason to panic when a discovery request arrives.

Table of Contents

  1. inside front cover
  2. Software Telemetry
  3. Copyright
  4. dedication
  5. brief contents
  6. contents
  7. front matter
    1. preface
    2. acknowledgments
    3. about this book
    4. Who should read this book
    5. How this book is organized: A road map
    6. About the code
    7. liveBook discussion forum
    8. Other online resources
    9. about the author
    10. about the cover illustration
  8. 1 Introduction
    1. 1.1 Defining the styles of telemetry
    2. 1.1.1 Defining centralized logging
    3. 1.1.2 Defining metrics
    4. 1.1.3 Defining distributed tracing
    5. 1.1.4 Defining SIEM
    6. 1.2 How telemetry is consumed by different teams
    7. 1.2.1 Telemetry use by Operations, DevOps, and SRE teams
    8. 1.2.2 Telemetry use by Security and Compliance teams
    9. 1.2.3 Telemetry use by Software Engineering and SRE teams
    10. 1.2.4 Telemetry use by Customer Support teams
    11. 1.2.5 Telemetry use by business intelligence
    12. 1.3 Challenges facing telemetry systems
    13. 1.3.1 Chronic underinvestment harms decision-making
    14. 1.3.2 Diverse needs resist standardization
    15. 1.3.3 Information spills and cleaning them up to avoid legal problems
    16. 1.3.4 Court orders break your assumptions
    17. 1.4 What you will learn
    18. Summary
  9. Part 1. Telemetry system architecture
  10. 2 The Emitting stage: Creating and submitting telemetry
    1. 2.1 Emitting from production code
    2. 2.1.1 Emitting telemetry into a log file
    3. 2.1.2 Emitting telemetry into the system log
    4. 2.1.3 Emitting telemetry into standard output
    5. 2.1.4 Formatting telemetry for emissions
    6. 2.2 Emitting from hardware
    7. 2.2.1 Explaining SNMP
    8. 2.2.2 Ingesting telemetry from a Cisco ASA firewall
    9. 2.3 Emitting from as-a-Service systems
    10. 2.3.1 Emitting events from SaaS systems
    11. 2.3.2 Emitting events from IaaS systems
    12. Summary
  11. 3 The Shipping stage: Moving and storing telemetry
    1. 3.1 Emitter/shipper functions, telemetry from production code
    2. 3.1.1 Shipping directly into storage
    3. 3.1.2 Shipping through queues and streams
    4. 3.1.3 Shipping to SaaS systems
    5. 3.2 Shipping between SaaS systems
    6. 3.3 Tipping points in Shipping-stage architecture
    7. Summary
  12. 4 The Shipping stage: Unifying diverse telemetry formats
    1. 4.1 Shipping locally-emitted telemetry
    2. 4.1.1 Shipping telemetry from a log file
    3. 4.1.2 Shipping telemetry from the system logger
    4. 4.1.3 Shipping telemetry from standard output
    5. 4.2 Unifying diverse emitting formats
    6. 4.2.1 Encoding telemetry into strings
    7. 4.2.2 Picking a shipping format
    8. 4.2.3 Converting Syslog to JSON or other object-encoding formats
    9. 4.2.4 Designing with cardinality in mind
    10. Summary
  13. 5 The Presentation stage: Displaying telemetry
    1. 5.1 Displaying telemetry in metrics systems
    2. 5.1.1 Making pretty pictures with telemetry
    3. 5.1.2 Feeding the graphs with aggregation functions
    4. 5.1.3 Using aggregations with pdf_pages
    5. 5.2 Displaying telemetry in centralized logging systems
    6. 5.2.1 Selecting needed features in a display system for centralized logging
    7. 5.2.2 Demonstrating centralized logging display
    8. 5.3 Displaying telemetry in security systems
    9. 5.4 Displaying telemetry distributed tracing systems
    10. 5.5 Displaying telemetry in large organizations
    11. Summary
  14. 6 Marking up and enriching telemetry
    1. 6.1 Markup in the Emitting stage
    2. 6.2 Markup and enrichment in the Shipping stage
    3. 6.2.1 Applying context-related telemetry in the Shipping stage
    4. 6.2.2 Extracting and enriching telemetry in-flight
    5. 6.2.3 Converting field types during the Shipping stage
    6. 6.3 Enrichment in the Presentation stage
    7. 6.4 How telemetry style affects markup and enrichment
    8. 6.4.1 Markup and enrichment with centralized logging
    9. 6.4.2 Markup and enrichment with SIEM systems
    10. 6.4.3 Markup and enrichment with metrics
    11. 6.4.4 Markup and enrichment with distributed tracing systems
    12. Summary
  15. 7 Handling multitenancy
    1. 7.1 How multitenant architectures come about
    2. 7.1.1 Evolving multitenancy in an early-stage startup
    3. 7.1.2 Evolving multitenancy in a culture of free sharing
    4. 7.1.3 Evolving multitenancy in a culture of strong separation
    5. 7.2 Designing multitenant telemetry systems
    6. 7.2.1 Multitenancy in the Shipping stage
    7. 7.2.2 Multitenancy in the Presentation stage
    8. Summary
  16. Part 2. Use cases revisited: Applying architecture concepts
  17. 8 Growing cloud-based startup
    1. 8.1 Telemetry at the small-company stage
    2. 8.1.1 Describing the small company’s telemetry system
    3. 8.1.2 Analyzing the small company’s telemetry system
    4. 8.2 Telemetry at the medium-size company stage
    5. 8.2.1 Describing the medium-size company’s telemetry system
    6. 8.2.2 Analyzing the medium-size company’s telemetry system
    7. 8.3 Telemetry at the large-company stage
    8. 8.3.1 Describing the large company’s telemetry system
    9. 8.3.2 Analyzing the large company’s telemetry system
    10. 8.4 Telemetry at the enterprise stage
    11. 8.5 Looking back at all this growth
    12. Summary
  18. 9 Nonsoftware business
    1. 9.1 Telemetry use in small organizations
    2. 9.2 Telemetry use in medium-size organizations
    3. 9.3 Telemetry use in large organizations
    4. 9.4 Telemetry use in enterprise organizations
    5. Summary
  19. 10 Long-established business IT
    1. 10.1 Telemetry use in medium-size organizations
    2. 10.1.1 Telemetry use in office IT
    3. 10.1.2 Telemetry use in production systems
    4. 10.2 Telemetry use in large organizations
    5. 10.3 Telemetry use in global organizations
    6. 10.3.1 Telemetry use in the Booking and Passenger Manifest department
    7. 10.3.2 Telemetry use in the Loyalty Programs department
    8. Summary
  20. Part 3. Techniques for handling telemetry
  21. 11 Optimizing for regular expressions at scale
    1. 11.1 Anchoring expressions for speed
    2. 11.2 Building expressions to fail fast
    3. 11.3 Digging into the Cisco ASA firewall telemetry
    4. 11.4 Refining emissions to speed regular-expression performance
    5. 11.5 Additional regular-expression resources
    6. Summary
  22. 12 Standardized logging and event formats
    1. 12.1 Implementing structured logging in your code
    2. 12.2 Implementing standards in your code
    3. 12.3 Implementing standards in the Shipping stage
    4. Summary
  23. 13 Using more nonfile emitting techniques
    1. 13.1 Designing for socket- and datagram-based emitters
    2. 13.2 Emitting and shipping for container- and serverless-based code
    3. 13.2.1 Emitting and shipping from containerd-based code
    4. 13.2.2 Emitting and shipping from serverless-based code
    5. 13.3 Encrypting UDP-based telemetry
    6. Summary
  24. 14 Managing cardinality in telemetry
    1. 14.1 Identifying cardinality problems
    2. 14.1.1 Cardinality in time-series databases
    3. 14.1.2 Cardinality in logging databases
    4. 14.2 Lowering the cost of cardinality
    5. 14.2.1 Use logging standards to contain cardinality
    6. 14.2.2 Using storage-side methods to tame cardinality
    7. 14.2.3 Make cardinality someone else’s problem
    8. Summary
  25. 15 Ensuring telemetry integrity
    1. 15.1 Getting telemetry out of reach of an attacker
    2. 15.1.1 Move telemetry too fast to catch
    3. 15.1.2 Use ACLs to enforce write-only telemetry
    4. 15.1.3 Durable telemetry when using SaaS providers
    5. 15.2 Making telemetry harder to mess with
    6. 15.2.1 Using access control requirements to defend against attacks
    7. 15.2.2 Ensuring configuration integrity in your telemetry systems
    8. 15.2.3 Making changes obvious
    9. Summary
  26. 16 Redacting and reprocessing telemetry
    1. 16.1 Identifying toxic data and where it comes from
    2. 16.2 Redacting toxic information spills
    3. 16.3 Reprocessing telemetry to support upgrades
    4. 16.4 Isolating toxic data to reduce cleanup costs
    5. Summary
  27. 17 Building policies for telemetry retention and aggregation
    1. 17.1 Creating a retention policy
    2. 17.1.1 Building a policy for centralized logging
    3. 17.1.2 Building a policy for metrics
    4. 17.1.3 Building a policy for distributed tracing
    5. 17.1.4 Building a policy for SIEM systems
    6. 17.2 Creating an aggregation policy
    7. 17.3 Using sampling to reduce costs and increase retention
    8. Summary
  28. 18 Surviving legal processes
    1. 18.1 Defining the eDiscovery process
    2. 18.2 Dealing with records-retention requests
    3. 18.2.1 Examining an ELK-based centralized logging system
    4. 18.2.2 Examining a Sumo Logic-based centralized logging system
    5. 18.3 Dealing with document-production requests
    6. 18.3.1 Telemetry in the collection phase
    7. 18.3.2 Telemetry in the review phase
    8. 18.3.3 Telemetry in the production phase
    9. 18.4 Working with lawyers
    10. Summary
  29. Appendix A. Telemetry storage systems
    1. A.1 Analyzing Elasticsearch
    2. A.1.1 What Elasticsearch is good at
    3. A.1.2 What is challenging for Elasticsearch
    4. A.2 Analyzing Apache Cassandra
    5. A.2.1 What Cassandra is good at
    6. A.2.2 What is challenging for Cassandra
    7. A.3 Analyzing Grafana Labs’ Loki
    8. A.3.1 What Loki is good at
    9. A.3.2 What is challenging for Loki
    10. A.4 Analyzing MongoDB
    11. A.4.1 What MongoDB is good at
    12. A.4.2 What is challenging for MongoDB
    13. A.5 Analyzing Prometheus
    14. A.5.1 What Prometheus is good at
    15. A.5.2 What is challenging for Prometheus
    16. A.6 Analyzing InfluxDB
    17. A.6.1 What InfluxDB is good at
    18. A.6.2 What is challenging for InfluxDB
    19. A.7 Analyzing Jaeger
    20. A.7.1 What Jaeger is good at
    21. A.7.2 What is challenging for Jaeger
  30. Appendix B. Recommendation checklist reference
    1. B.1 Telemetry standards, structure, and setting policies
    2. Section 4.2.2: Setting standardized telemetry formats
    3. Section 4.2.4: Designing telemetry formats with cardinality in mind
    4. Section 6.4.1: When and where to mark up or enrich telemetry in centralized logging systems
    5. Section 6.4.3: When and where to mark up or enrich telemetry in metrics systems
    6. Section 7.2.1: How parasitic is that parasitic load?
    7. Chapter 11: Making regular expressions fast
    8. Section 11.4: The project phases for optimizing your logging statements for regular expressions
    9. Chapter 12: The benefits of using a structured logger
    10. Section 13.1: In-memory networking and how it eases telemetry
    11. Section 14.2.1: Enforcing logging standards through development process
    12. Section 17.1.3: Recommendations on setting a tracing retention policy
    13. Section 17.1.4: Recommendations on setting a SIEM retention policy
    14. Section 17.3: Considerations when picking a sampling rate
    15. B.2 Presentation-stage recommendations
    16. Section 5.1.1: The features of a good metrics system
    17. Section 5.1.1: Considerations for building dashboards
    18. Section 5.2.1: The features of a good centralized logging system
    19. Section 5.3: Extending centralized logging to SIEM work
    20. Section 7.2.2: Adding multitenancy
    21. B.3 Cardinality management
    22. Section 4.2.4: Designing telemetry formats with cardinality in mind
    23. Section 14.1: The symptoms of high cardinality
    24. Section 14.2.1: Healthy low-cardinality context-related telemetry
    25. Section 14.2.2: How sharding affects cardinality management
    26. Section 14.2.3: When to make cardinality someone else’s problem
    27. B.4 Telemetry safety and effects
    28. Chapter 15: The two principles of secure telemetry
    29. Section 15.1.1: Moving telemetry too fast to catch
    30. Section 15.2.1: The three Linux Mandatory Access Control systems
    31. Section 15.2.1: Places to use ACLs in a telemetry pipeline
    32. Section 15.2.3: How encryption and digital signatures support telemetry
    33. Section 15.2.3: How encryption and digital signatures make telemetry more fragile
    34. Section 16.1: The three types of toxic data
    35. Section 16.1: The penalties for mishandling toxic data
    36. Section 16.3: What drives periodic reprocessing
    37. Section 16.4: Why isolating telemetry helps you
    38. Section 16.4: Tips to avoid false-positive toxic-data detections
    39. B.5 Legal topics
    40. Section 18.2: Questions to ask when assessing a telemetry system to handle legal hold orders
    41. Section 18.4: How to work with lawyers
  31. Appendix C. Exercise answers
  32. index
  33. inside back cover
52.90.235.91