Thread: Shared pg_xlog directory/partition and warm standby server
Hello, Is there anything that may prevent two PostgreSQL servers to share the same pg_xlog directory; while one is using read-only and the other one is using the same partition for read and write? The problem is: If we share the same pg_xlog between production server and warm standby server; can you see any possibility of data/xlog corruption? Of course, warm standby server will mount that partition as read-only. I thought a bit on this; could not find any possibilities. Can you think of one? Regards, -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Devrim GUNDUZ wrote: > Hello, > > Is there anything that may prevent two PostgreSQL servers to share the > same pg_xlog directory; while one is using read-only and the other one > is using the same partition for read and write? The problem is: If we > share the same pg_xlog between production server and warm standby > server; can you see any possibility of data/xlog corruption? Of course, > warm standby server will mount that partition as read-only. What happens in the standby server falls so far behind the master that the xlogs it wants to read are already being overwritten? AFAIK the files in pg_xlog form a circular buffer, and are reused after a while... greetings, Florian Pflug
On Mon, 2006-11-27 at 14:17 +0100, Florian G. Pflug wrote: > Devrim GUNDUZ wrote: > > Is there anything that may prevent two PostgreSQL servers to share the > > same pg_xlog directory; while one is using read-only and the other one > > is using the same partition for read and write? The problem is: If we > > share the same pg_xlog between production server and warm standby > > server; can you see any possibility of data/xlog corruption? Of course, > > warm standby server will mount that partition as read-only. > > What happens in the standby server falls so far behind the master that > the xlogs it wants to read are already being overwritten? > > AFAIK the files in pg_xlog form a circular buffer, and are reused after > a while... If the archive_command doesn't actually do anything, just leaves them there, the files will automatically get moved to .done state and will then get removed within 2 checkpoints. So it will work as long as your standby keeps up with the primary. If it falls behind, you'll lose the file and you'll be out of luck (no file, start from base backup again). A large checkpoint_segments would help, but no way to avoid that situation. The archiver assumes that you want to archive things oldest first, so if the archive_command fails it will retry on that file repeatedly. Put it another way the archiving is synchronous: when an archive is requested we wait for the answer before attempting the next. I suppose we might want to have multiple archivals occurring simultaneously by overlapping their start and stop times. That might be useful for situations where we have a bank of slow response tape drives/autoloaders? You'd need to have a second archive command to poll for completion. Currently archive_status has 2 states: .ready and .done. We could have 3 states: .ready, .inprogress and .done. The first archive_command_start, if successful would move the state from .ready to .inprogress, while the second archive_command_confirm would move the state from .inprogress to .done. (Better names please...) With an asynchronous API, it would then be possible to fire off requests to archive lots of files, then return later to confirm their completion. Or in Devrim's case do nothing apart from wait for them to be applied by the Standby server. Anybody else see the need for this? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Mon, Nov 27, 2006 at 04:35:30PM +0000, Simon Riggs wrote: > On Mon, 2006-11-27 at 14:17 +0100, Florian G. Pflug wrote: > > Devrim GUNDUZ wrote: > > > Is there anything that may prevent two PostgreSQL servers to share the > > > same pg_xlog directory; while one is using read-only and the other one > > > is using the same partition for read and write? The problem is: If we > > > share the same pg_xlog between production server and warm standby > > > server; can you see any possibility of data/xlog corruption? Of course, > > > warm standby server will mount that partition as read-only. <snip> > I suppose we might want to have multiple archivals occurring > simultaneously by overlapping their start and stop times. That might be > useful for situations where we have a bank of slow response tape > drives/autoloaders? > > You'd need to have a second archive command to poll for completion. > Currently archive_status has 2 states: .ready and .done. We could have 3 > states: .ready, .inprogress and .done. The first archive_command_start, > if successful would move the state from .ready to .inprogress, while the > second archive_command_confirm would move the state from .inprogress > to .done. (Better names please...) > > With an asynchronous API, it would then be possible to fire off requests > to archive lots of files, then return later to confirm their completion. > Or in Devrim's case do nothing apart from wait for them to be applied by > the Standby server. > > Anybody else see the need for this? There might be a desire for async archiving in some circumstances, but I don't really see what Devrim's after that couldn't just be done with our current PITR. The only difference I can think of is not having to copy logfiles around, but presumably that could be addressed by using hardlinks instead of actually copying (at least on unix...) Maybe Devrim has something else in mind? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Florian G. Pflug" <fgp@phlo.org> writes: > Devrim GUNDUZ wrote: >> Is there anything that may prevent two PostgreSQL servers to share the >> same pg_xlog directory; while one is using read-only and the other one >> is using the same partition for read and write? > What happens in the standby server falls so far behind the master that > the xlogs it wants to read are already being overwritten? Worse than that: what happens when the standby comes alive, and needs to start writing pg_xlog entries? Sounds like a disaster in the making to me. regards, tom lane
Hi, On Mon, 2006-11-27 at 12:14 -0600, Jim C. Nasby wrote: > The only difference I can think of is not having to copy logfiles > around, but presumably that could be addressed by using hardlinks > instead of actually copying (at least on unix...) Maybe Devrim > has something else in mind? What I was thinking is to find a way to reduce network traffic in high-volume environments. If the archive_timeout is set to a really low value, such as 1 or 2 seconds, it may result in a high traffic. I thought that if both servers are in the same network, or better, directly connected to each other, they could share the same partition so that no network activity occurs. Anyway, I haven't tried this feature yet on my test server, etc. I am just trying to understand what's going on and what can be done with this feature. Regards, -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
> I suppose we might want to have multiple archivals occurring > simultaneously by overlapping their start and stop times. > That might be useful for situations where we have a bank of slow response tape > drives/autoloaders? I have never seen such a setup, where it would have helped to archive DB logs in parallel. The 16 Mb are not enough to get tapes going. So in setups where you have lots of WAL, I would increase XLOG_SEG_SIZE. In my experience it is less a db performance issue, than an administrative and storage system overhead issue (to start a backup session every few seconds or even subsecond). e.g. Backup systems like TSM perform better when you don't have so many tiny files, all saved separately. > Anybody else see the need for this? No :-) Andreas
Devrim GUNDUZ wrote: > Hi, > > On Mon, 2006-11-27 at 12:14 -0600, Jim C. Nasby wrote: >> The only difference I can think of is not having to copy logfiles >> around, but presumably that could be addressed by using hardlinks >> instead of actually copying (at least on unix...) Maybe Devrim >> has something else in mind? > > What I was thinking is to find a way to reduce network traffic in > high-volume environments. If the archive_timeout is set to a really low > value, such as 1 or 2 seconds, it may result in a high traffic. Using hardlinks sounds like a viable alternative - but since AFAIK postgres reuses old wal segments instead of deleting and recreating them, I guess hardlinks wouldn't work.... > I thought that if both servers are in the same network, or better, > directly connected to each other, they could share the same partition so > that no network activity occurs. But if they're connected over a fast network anyway, then copying wals even every few seconds should be no problem, no? greetings, Florian Pflug