Chapter 10. A Standard Approach to Troubleshooting

In this chapter, the goal is to sum up some of the approaches you have seen in this book and combine them into one overall approach for typical everyday scenarios. Customers may come up with any or all of the following statements:

  • "My database is slow"
  • "CPU usage and I/O are going through the roof"
  • "It has not always been so slow"
  • "Sometimes, replication fails"
  • "I'm adding new hardware"

The goal here is to provide you with a systematic approach to the problem and point to solutions. Note that the example outlined in this chapter is a typical case faced by clients around the globe.

Getting an overview of the problem

Before getting started, the most important thing is to get an overview of the problem. Assuming that there is no corruption around, the first thing to do is to take a detailed look at the content of pg_stat_activity. As already outlined in this book, pg_stat_activity contains a list of all open database connections. There are a couple of things to look out for here:

  • Is the number of open database connections reasonable?
  • Are there many idling transactions? Is there a certain query showing up again and again?
  • Are there queries that are obviously running too long?

Many middleware components open an insane number of database connections. Remember that each useless connection takes away a bit of memory, which could have been used for something more productive, such as caching. However, the number of database connections alone does not usually impose a dramatic overhead. The number of active snapshots is way more dangerous because active snapshots directly translate into a decrease in speed. An active snapshot causes some internal overhead. The benchmark at http://www.cybertec.at/max_connections-performance-impacts/ will prove this point.

Once the number of connections has been checked, it makes sense to see how many of those are idling. Open transactions that have been idling for quite some time can be a danger. Why? The VACUUM command can only clean up dead rows if there is no transaction around anymore that is capable of seeing the data. Old, idle transactions can delay the cleanup of rows, and bad performance is likely to be the consequence. Even if the VACUUM command is executed on a regular basis, things are likely to stay slow because VACUUM is simply not able to do its job. Therefore, it can make sense to keep an eye on unreasonably long transactions.

But there is more; if the same query shows up again and again in pg_stat_activity, it is already worth taking a look at this query and ensuring that it performs nicely. The same applies to queries that are already showing high execution times and show up in the listing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.254.90