Re: pgsql: Add contrib/pg_walinspect. - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: pgsql: Add contrib/pg_walinspect. |
Date | |
Msg-id | CA+hUKG+VXatd-_ykFdnQoGrBb9NuNz8-YMGiWMMtNEPp9rU3Vw@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Add contrib/pg_walinspect. (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: pgsql: Add contrib/pg_walinspect.
|
List | pgsql-hackers |
On Wed, Apr 27, 2022 at 10:22 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Here's v2 patch (up on Thomas's v1 at [1]) using private_data to set > the end of the WAL flag. Please have a look at it. I don't have a strong view on whether it's better to use a NULL error for this private communication between pg_walinspect.c and the read_page callback it installs, or install a custom private_data to signal this condition, or to give up on all this stuff completely and just wait (see below for thoughts). I'd feel better about both no-wait options if the read_page callback in question were actually in the contrib module, and not in the core code. On the other hand, I'm not too hung up about it because I'm really hoping to see the get-rid-of-all-these-callbacks-and-make-client-code-do-the-waiting scheme proposed by Horiguchi-san and Heikki developed further, to rip much of this stuff out in a future release. If you go with the private_data approach, a couple of small comments: if (record == NULL) { + ReadLocalXLogPageNoWaitPrivate *private_data; + + /* return NULL, if end of WAL is reached */ + private_data = (ReadLocalXLogPageNoWaitPrivate *) + xlogreader->private_data; + + if (private_data->end_of_wal) + return NULL; + if (errormsg) ereport(ERROR, (errcode_for_file_access(), errmsg("could not read WAL at %X/%X: %s", LSN_FORMAT_ARGS(first_record), errormsg))); - else - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not read WAL at %X/%X", - LSN_FORMAT_ARGS(first_record)))); } I think you should leave that second ereport() call in, in this variant of the patch, no? I don't know if anything else raises errors with no message, but if we're still going to treat them as silent end-of-data conditions then you might as well go with the v1 patch. Another option might be to abandon this whole no-wait concept and revert 2258e76f's changes to xlogutils.c. pg_walinspect already does preliminary checks that LSNs are in range, so you can't enter a value that will wait indefinitely, and it's interruptible (it's a 1ms sleep/check loop, not my favourite programming pattern but that's pre-existing code). If you're unlucky enough to hit the case where the LSN is judged to be valid but is in the middle of a record that hasn't been totally flushed yet, it'll just be a bit slower to return as we wait for the inevitable later flush(es) to happen. The rest of your record will *surely* be flushed pretty soon (or the flushing backend panics the whole system and time ends). I don't imagine this is performance critical work, so maybe that'd be acceptable?
pgsql-hackers by date: