Controlled failover for PITR Warm Standby

One thing I haven’t really seen discussed is how best to handle controlled failover scenarios when using PITR warm standby. Typically when failover is discussed, it is done with the typical notion that your primary database has actually failed, and you now need to bring up the secondary database and let it “take over”. Traditionally this is controlled using some kind of trigger file; pg_standby waits for the creation of the file, I’ve also seen scripts that look for removal of a file; but you could have any number of mechanisms for when you “flip the switch” from warm standby machine to active database. Now, when your primary has gone down, the choice is simple… make sure you don’t have any unprocessed xlog files, and the flip the switch. But what if your primary hasn’t gone down? There are two scenarios where this seems likely (though I would guess there may be more). The first would be in testing your failover mechanism, where you would like to bring up the slave to verify things are working OK on the test machine (We like to do this using zone promotion via zfs, with our db clone script, but it’s not always an option). The second would be where you have a problem on the primary that spells pending doom, but not immediate doom (failed cpu, failed disk, need for security patch, whatever). Yes, you need to being up the secondary, but you have some amount of time to prepare for a smooth transition. In both of these cases, your primary goal is not simply to get the slave up and running, but also to minimize any potential data loss that could occur from the asynchronous process. If you have full control, the best thing to do is to shut off all of the applications accessing your database, so that you can stop any further transactions from taking place. Once you have done that, I typically run a checkpoint to make sure everything has hit disk, and then follow that with a call to pg_switch_xlog(), so that I can verify which is the last xlog I should expect to see transfered.

pagila=# checkpoint;
CHECKPOINT
Time: 106.134 ms
pagila=# select pg_switch_xlog();
 pg_switch_xlog 
----------------
 118D/7B8206A8
(1 row)

Time: 25.399 ms

Assuming no other transactions are going on, that gives you a clear deliniation point. If you want to take it a step further, you could shut down the old database at this point, just to make sure no further transactions occur. (It’s worth noting that when you shut down, you’ll get one more xlog file in $PGDATA/pg_xlog). Once you’ve determined the last xlog you need for recovery, you need to make sure it gets shipped over. This step will be different depending on how you’re shipping xlog files between servers. If you run a script of some type that sends files periodically, you’d ideally run it manually and watch the xlog file (or files) get pushed over. Once the file is moved, you should see it processed (I usually keep a terminal window open tailing the postgres logfile for this), and once that happens, you’re ready for failover. That’s pretty much all there is to it. Of course if you can’t stop your applications, or shut down your database, YMMV, but in the end this should still help you reduce the possibility of missing transactions when you failover a system. Sorry, you’ll still have to go through the pain of setting up the slave again, but that’s good practice too eh?