Re: Unlogged tables cleanup - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Unlogged tables cleanup |
Date | |
Msg-id | CA+TgmoZ4TWaPCKhF-szV-nPxDXL40zCwm9pNFJZURvRgm2oJzQ@mail.gmail.com Whole thread Raw |
In response to | Re: Unlogged tables cleanup (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Unlogged tables cleanup
|
List | pgsql-hackers |
On Wed, Nov 16, 2016 at 11:55 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Nov 16, 2016 at 7:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Nov 16, 2016 at 3:54 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> Indeed I missed this comment block. Please let me suggest the following instead: >>> /* >>> * Set up an init fork for an unlogged table so that it can be correctly >>> - * reinitialized on restart. Since we're going to do an immediate sync, we >>> - * only need to xlog this if archiving or streaming is enabled. And the >>> - * immediate sync is required, because otherwise there's no guarantee that >>> - * this will hit the disk before the next checkpoint moves the redo pointer. >>> + * reinitialized on restart. An immediate sync is required even if the >>> + * page has been logged, because the write did not go through >>> + * shared_buffers and therefore a concurrent checkpoint may have moved >>> + * the redo pointer past our xlog record. >>> */ >> >> Hmm. Well, that deletes the comment that's no longer true, but it >> doesn't replace it with any explanation of why we also need to WAL-log >> it unconditionally, and I think that explanation is not entirely >> trivial? > > OK, the original code does not give any special reason either > regarding why doing so is safe for archiving or streaming :) Yeah, but surely it's obvious that if you don't WAL log it, it's not going to work for archiving or streaming. It's a lot less obvious why you have to WAL log it when you're not doing either of those things if you've already written it and fsync'd it locally. How is it going to disappear if it's been fsync'd, one wonders? > More seriously, if there could be more details regarding that, I would > think that we could say something like "logging the init fork is > mandatory in any case to ensure its on-disk presence when at recovery > replay, even on non-default tablespaces whose base location are > deleted and re-created from scratch if the WAL record in charge of > creating this tablespace gets replayed". The problem shows up because > of tablespaces being deleted at replay at the end... So perhaps this > makes sense. What do you think? Yes, that's about what I think we need to explain. Actually, I'm wondering if we ought to just switch this over to using the delayChkpt mechanism instead of going through this complicated song-and-dance. Incurring an immediate sync to avoid having to WAL-log this is a bit tenuous, but having to WAL-log it AND do the immediate sync just seems silly, and I'm actually a bit worried that even with your fix there might still be a latent bug somewhere. We couldn't switch mechanisms cleanly in the 9.2 branch (cf. f21bb9cfb5646e1793dcc9c0ea697bab99afa523) so perhaps we should back-patch it in the form you have it in already, but it's probably worth switching over at least in master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: