Thread: Shared pg_xlog directory/partition and warm standby server

Shared pg_xlog directory/partition and warm standby server

From

Devrim GUNDUZ

Date:

27 November 2006, 08:41:16

Hello,

Is there anything that may prevent two PostgreSQL servers to share the
same pg_xlog directory; while one is using read-only and the other one
is using the same partition for read and write? The problem is: If we
share the same pg_xlog between production server and warm standby
server; can you see any possibility of data/xlog corruption? Of course,
warm standby server will mount that partition as read-only.

I thought a bit on this; could not find any possibilities. Can you think
of one?

Regards,
--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: Shared pg_xlog directory/partition and warm standby

From

"Florian G. Pflug"

Date:

27 November 2006, 09:14:00

Devrim GUNDUZ wrote:
> Hello,
> 
> Is there anything that may prevent two PostgreSQL servers to share the
> same pg_xlog directory; while one is using read-only and the other one
> is using the same partition for read and write? The problem is: If we
> share the same pg_xlog between production server and warm standby
> server; can you see any possibility of data/xlog corruption? Of course,
> warm standby server will mount that partition as read-only.

What happens in the standby server falls so far behind the master that
the xlogs it wants to read are already being overwritten?

AFAIK the files in pg_xlog form a circular buffer, and are reused after 
a while...

greetings, Florian Pflug

Re: Shared pg_xlog directory/partition and warm standby

From

"Simon Riggs"

Date:

27 November 2006, 12:35:47

On Mon, 2006-11-27 at 14:17 +0100, Florian G. Pflug wrote:
> Devrim GUNDUZ wrote:
> > Is there anything that may prevent two PostgreSQL servers to share the
> > same pg_xlog directory; while one is using read-only and the other one
> > is using the same partition for read and write? The problem is: If we
> > share the same pg_xlog between production server and warm standby
> > server; can you see any possibility of data/xlog corruption? Of course,
> > warm standby server will mount that partition as read-only.
> 
> What happens in the standby server falls so far behind the master that
> the xlogs it wants to read are already being overwritten?
> 
> AFAIK the files in pg_xlog form a circular buffer, and are reused after 
> a while...

If the archive_command doesn't actually do anything, just leaves them
there, the files will automatically get moved to .done state and will
then get removed within 2 checkpoints. So it will work as long as your
standby keeps up with the primary. If it falls behind, you'll lose the
file and you'll be out of luck (no file, start from base backup again).
A large checkpoint_segments would help, but no way to avoid that
situation.

The archiver assumes that you want to archive things oldest first, so if
the archive_command fails it will retry on that file repeatedly. Put it
another way the archiving is synchronous: when an archive is requested
we wait for the answer before attempting the next. 

I suppose we might want to have multiple archivals occurring
simultaneously by overlapping their start and stop times. That might be
useful for situations where we have a bank of slow response tape
drives/autoloaders?

You'd need to have a second archive command to poll for completion.
Currently archive_status has 2 states: .ready and .done. We could have 3
states: .ready, .inprogress and .done. The first archive_command_start,
if successful would move the state from .ready to .inprogress, while the
second archive_command_confirm would move the state from .inprogress
to .done. (Better names please...)

With an asynchronous API, it would then be possible to fire off requests
to archive lots of files, then return later to confirm their completion.
Or in Devrim's case do nothing apart from wait for them to be applied by
the Standby server.

Anybody else see the need for this?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: Shared pg_xlog directory/partition and warm standby

From

"Jim C. Nasby"

Date:

27 November 2006, 14:15:03

On Mon, Nov 27, 2006 at 04:35:30PM +0000, Simon Riggs wrote:
> On Mon, 2006-11-27 at 14:17 +0100, Florian G. Pflug wrote:
> > Devrim GUNDUZ wrote:
> > > Is there anything that may prevent two PostgreSQL servers to share the
> > > same pg_xlog directory; while one is using read-only and the other one
> > > is using the same partition for read and write? The problem is: If we
> > > share the same pg_xlog between production server and warm standby
> > > server; can you see any possibility of data/xlog corruption? Of course,
> > > warm standby server will mount that partition as read-only.
<snip> 
> I suppose we might want to have multiple archivals occurring
> simultaneously by overlapping their start and stop times. That might be
> useful for situations where we have a bank of slow response tape
> drives/autoloaders?
> 
> You'd need to have a second archive command to poll for completion.
> Currently archive_status has 2 states: .ready and .done. We could have 3
> states: .ready, .inprogress and .done. The first archive_command_start,
> if successful would move the state from .ready to .inprogress, while the
> second archive_command_confirm would move the state from .inprogress
> to .done. (Better names please...)
> 
> With an asynchronous API, it would then be possible to fire off requests
> to archive lots of files, then return later to confirm their completion.
> Or in Devrim's case do nothing apart from wait for them to be applied by
> the Standby server.
> 
> Anybody else see the need for this?

There might be a desire for async archiving in some circumstances, but I
don't really see what Devrim's after that couldn't just be done with our
current PITR. The only difference I can think of is not having to copy
logfiles around, but presumably that could be addressed by using
hardlinks instead of actually copying (at least on unix...) Maybe Devrim
has something else in mind?
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Shared pg_xlog directory/partition and warm standby

From

Tom Lane

Date:

27 November 2006, 14:54:37

"Florian G. Pflug" <fgp@phlo.org> writes:
> Devrim GUNDUZ wrote:
>> Is there anything that may prevent two PostgreSQL servers to share the
>> same pg_xlog directory; while one is using read-only and the other one
>> is using the same partition for read and write?

> What happens in the standby server falls so far behind the master that
> the xlogs it wants to read are already being overwritten?

Worse than that: what happens when the standby comes alive, and needs to
start writing pg_xlog entries?

Sounds like a disaster in the making to me.
        regards, tom lane

Re: Shared pg_xlog directory/partition and warm standby

From

Devrim GUNDUZ

Date:

27 November 2006, 18:32:44

Hi,

On Mon, 2006-11-27 at 12:14 -0600, Jim C. Nasby wrote:
> The only difference I can think of is not having to copy logfiles
> around, but presumably that could be addressed by using hardlinks
> instead of actually copying (at least on unix...) Maybe Devrim
> has something else in mind?

What I was thinking is to find a way to reduce network traffic in
high-volume environments. If the archive_timeout is set to a really low
value, such as 1 or 2 seconds, it may result in a high traffic.

I thought that if both servers are in the same network, or better,
directly connected to each other, they could share the same partition so
that no network activity occurs.

Anyway, I haven't tried this feature yet on my test server, etc. I am
just trying to understand what's going on and what can be done with this
feature.

Regards,
--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: Shared pg_xlog directory/partition and warm standby

From

"Zeugswetter Andreas ADI SD"

Date:

28 November 2006, 09:09:40

> I suppose we might want to have multiple archivals occurring
> simultaneously by overlapping their start and stop times.
> That might be useful for situations where we have a bank of slow
response tape
> drives/autoloaders?

I have never seen such a setup, where it would have helped to archive
DB logs in parallel. The 16 Mb are not enough to get tapes going.
So in setups where you have lots of WAL, I would increase
XLOG_SEG_SIZE. In my experience it is less a db performance issue, than
an administrative and storage system overhead issue (to start a backup
session every few seconds or even subsecond).
e.g. Backup systems like TSM perform better when you don't have so many
tiny files,
all saved separately.

> Anybody else see the need for this?

No :-)

Andreas

Re: Shared pg_xlog directory/partition and warm standby

From

"Florian G. Pflug"

Date:

28 November 2006, 11:13:11

Devrim GUNDUZ wrote:
> Hi,
> 
> On Mon, 2006-11-27 at 12:14 -0600, Jim C. Nasby wrote:
>> The only difference I can think of is not having to copy logfiles
>> around, but presumably that could be addressed by using hardlinks
>> instead of actually copying (at least on unix...) Maybe Devrim
>> has something else in mind? 
> 
> What I was thinking is to find a way to reduce network traffic in
> high-volume environments. If the archive_timeout is set to a really low
> value, such as 1 or 2 seconds, it may result in a high traffic.

Using hardlinks sounds like a viable alternative - but since AFAIK
postgres reuses old wal segments instead of deleting and recreating
them, I guess hardlinks wouldn't work....

> I thought that if both servers are in the same network, or better,
> directly connected to each other, they could share the same partition so
> that no network activity occurs.
But if they're connected over a fast network anyway, then copying wals
even every few seconds should be no problem, no?

greetings, Florian Pflug