Monitoring Read vs. Write Activity in Postgres

If you spend anytime either giving or receiving database tuning advice, perhaps the most common phrase you will hear is “it depends”. Of course that is made in reference to a whole lot of things, but one of the things it depends on is read vs write load. While that’s easy to point out, it’s not always that obvious as to how best to measure read/write load for a system.

The current methods are mostly tied around two things; one I find pretty good, the other I’ve always felt was lacking. The first method is to monitor read/write activity at the disk level; this one I like. Trending total read vs. writes, whether by iops or bytes, can give you a good idea of how heavy your app leans at the physical layer (here we are trying read and write bps across several sets of spindles);

However, that’s only half the picture; the other thing I’ve always wanted a good picture of is read vs. writes at the database level. In Postgres, the closest thing we’ve had for that is looking at the various statistics tables, and measuring tuple activity. To get our read/write ratio, we can start by looking at pg_stat_tables and grabbing the number of tuples inserted, updated, or deleted. A typical graph for that might look like this:

That’s all well and good, but comparing it to reads becomes problematic. You have two statistics to look at; seq tuple reads and index tuple reads. If you graph that you might get something like this (green line is commits, brown area is xmax velocity, and red line is rollbacks)

The problem here is that the way these numbers are measured is not a direct correlation to the insert/update/delete numbers from the above. While it is handy for doing index vs seq read comparisons, if you want read vs. write activity, you need something different.

Enter txid_snapshot_xmax. One of several utility functions available to look at transaction information, this function, when passed the results of txid_current_snapshot() can help us track the current xmax of the system. And why does that matter you ask? Well, whenever a write transaction occurs, the xmax is advanced, so if we track xmax over time, this gives us an idea of the write velocity of a given system.

Now, I fully expect someone to explain in the comments where this breaks down; I’ll be the first to admit it’s not a perfect system, but it does let me produce some pretty graphs. Taking our xmax velocity, and graphing that along side commit and rollback activity, we can see something like this:

That database has a generally high read/write ratio. On another system, with a closer read/write mix, we get this:

So, this turns out to be a pretty good way to get a feeling for read vs. writes within the system. There can be some unexpected twists though; here’s one last graph that shows a system that apparently does more writes than total transactions! Bonus points if you can explain why that happens :-)