Re: Avoiding adjacent checkpoint records - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Avoiding adjacent checkpoint records |
Date | |
Msg-id | CA+TgmoZNqSbuJwYB8ZGtSf0qQFcDeXU+LKvLqxLczcM-OnZoFQ@mail.gmail.com Whole thread Raw |
In response to | Re: Avoiding adjacent checkpoint records (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Avoiding adjacent checkpoint records
|
List | pgsql-hackers |
On Thu, Jun 7, 2012 at 9:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > The only risk of data loss is in the case where someone deletes their > pg_xlog and who didn't take a backup in all that time, which is hardly > recommended behaviour. We're at exactly the same risk of data loss if > someone deletes their pg_clog. Too frequent checkpoints actually makes > the data loss risk from deleted pg_clog greater, so the balance of > data loss risk doesn't seem to have altered. This doesn't match my experience. pg_xlog is often located on a separate disk, which significantly increases the chances of something bad happening to it, either through user error or because, uh, disks sometimes fail. Now, granted, you can also lose your data directory (including pg_clog) this way, but just because we lose data in that situation doesn't mean we should be happy about also losing data when pg_xlog goes does the toilet, especially when we can easily prevent it by going back to the behavior we've had in every previous release. Now, I have had customers lose pg_clog data, and it does suck, but it's usually a safe bet that most of the missing transactions committed, so you can pad out the missing files with 0x55, and probably get your data back. On the other hand, it's impossible to guess what any missing pg_xlog data might have been. Perhaps if the data pages are on disk and only CLOG didn't get written you could somehow figure out which bits you need to flip in CLOG to get your data back, but that's pretty heavy brain surgery, and if autovacuum or even just a HOT prune runs before you realize that you need to do it then you're toast. OTOH, if the database has checkpointed, pg_resetxlog is remarkably successful in letting you pick up the pieces and go on with your life. All that having been said, it wouldn't be a stupid idea to have a little more redundancy in our CLOG mechanism than we do right now. Hint bits help, as does the predictability of the data, but it's still an awfully scary to have that much critical data packed into that small a space. I'd love to see us checksum those pages, or store the data in some redundant location that makes it unlikely we'll lose both copies, or ship a utility that will scan all your heap pages and try to find hint bits that reveal which transactions committed and which ones aborted, or all of the above. But until then, I'd like to make sure that we at least have the data on the disk instead of sitting dirty in memory forever. As a general thought about disaster recovery, my experience is that if you can tell a customer to run a command (like pg_resetxlog), or - not quite as good - if you can tell them to run some script that you email them (like my pad-out-the-CLOG-with-0x55 script), then they're willing to do that, and it usually works, and they're as happy as they're going to be. But if you tell them that they have to send you all their data files or let you log into the machine and poke around for $X/hour * many hours, then they typically don't want to do that. Sometimes it's legally or procedurally impossible for them; even if not, it's cheaper to find some other way to cope with the situation, so they do, but now - the way they view it - the database lost their data. Even if the problem was entirely self-inflicted, like an intentional deletion of pg_xlog, and even if they therefore understand that it was entirely their own stupid fault that the data got eaten, it's a bad experience. For that reason, I think we should be looking for opportunities to increase the recoverability of the database in every area. I'm sure that everyone on this list who works with customers on a regular basis has had customers who lost pg_xlog, who lost pg_clog (or portions theref), who dropped their main table, who lost the backing files for pg_class and/or pg_attribute, whose database ended up in lost+found, who had a break in WAL, who had individual blocks corrupted or unreadable within some important table, who were missing TOAST chunks, who took a pg_basebackup and failed to create recovery.conf, who had a corrupted index on a critical system table, who had inconsistent system catalog contents. Some of these problems are caused by bad hardware or bugs, but the most common cause is user error. Regardless of the cause, the user wants to get as much of their data back as possible as quickly and as easily and as reliably as possible. To the extent that we can transform a situations that would have required consulting hours into situations from which a semi-automated recovery is possible, or situations that would have required many consulting hours into ones that require only a few, that's a huge win. Of course, we shouldn't place that goal above all else; and of course, this is only one small piece of that. But it is a piece, and it has a tangible benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: