Home > mailing lists

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From	Kyotaro Horiguchi
Subject	Re: [Patch] Optimize dropping of relation buffers using dlist
Date	October 22, 2020 05:16:37
Msg-id	20201022.141637.2217958886309431797.horikyota.ntt@gmail.com Whole thread Raw
In response to	Re: [Patch] Optimize dropping of relation buffers using dlist (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: [Patch] Optimize dropping of relation buffers using dlist
List	pgsql-hackers

Tree view

At Thu, 22 Oct 2020 16:35:27 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in 
> On Thu, Oct 22, 2020 at 3:07 PM k.jamison@fujitsu.com
> <k.jamison@fujitsu.com> wrote:
> +    /*
> +     * Get the total number of to-be-invalidated blocks of a relation as well
> +     * as the total blocks for a given fork.  The cached value returned by
> +     * smgrnblocks could be smaller than the actual number of existing buffers
> +     * of the file.  This is caused by buggy Linux kernels that might not have
> +     * accounted for the recent write.  Give up the optimization if the block
> +     * count of any fork cannot be trusted.
> +     */
> +    for (i = 0; i < nforks; i++)
> +    {
> +        /* Get the number of blocks for a relation's fork */
> +        nForkBlocks[i] = smgrnblocks(smgr_reln, forkNum[i], &accurate);
> +
> +        if (!accurate)
> +            break;
> 
> Hmmm.  The Linux comment led me to commit ffae5cc and a 2006 thread[1]
> showing a buggy sequence of system calls.  AFAICS it was not even an
> SMP/race problem of the type you might half expect, it was a single
> process not seeing its own write.  I didn't find details on the
> version, filesystem etc.

Anyway that comment is irrelevant to the added code. The point here is
that the returned value may not be reliable, due to not only the
kernel bugs, but the files is extended/truncated by other
procesess. But I suppose that we may have synchronized file-size cache
in the future?

> Searching for our message "This has been seen to occur with buggy
> kernels; consider updating your system" turns up recent-ish results
> too.  The reports I read involved GlusterFS, which I don't personally
> know anything about, but it claims full POSIX compliance, and POSIX is
> strict about that sort of thing, so I'd guess that is/was a fairly
> serious bug or misconfiguration.  Surely there must be other symptoms
> for PostgreSQL on such systems too, like sequential scans that don't
> see recently added pages.
> 
> But... does the proposed caching behaviour and "accurate" flag really
> help with any of that?  Cached values come from lseek() anyway.  If we

That "accurate" (good name wanted) flag suggest that it is guaranteed
that we don't have a buffer for blocks after that block number.

> just trusted unmodified smgrnblocks(), someone running on such a
> forgetful file system might eventually see nasty errors because we
> left buffers in the buffer pool that prevent a checkpoint from
> completing (and panic?), but they might also see other really strange
> errors, and that applies with or without that "accurate" flag, no?
> 
> [1] https://www.postgresql.org/message-id/flat/26202.1159032931%40sss.pgh.pa.us

smgrtruncate and msgrextend modifies that cache from their parameter,
not from lseek().  At the very first the value in the cache comes from
lseek() but if nothing other than postgres have changed the file size,
I believe we can rely on the cache even with such a buggy kernels even
if still exists.

If there's no longer such a buggy kernel, we can rely on lseek() only
when InRecovery. If we had synchronized file size cache we could rely
on the cache even while !InRecovery.  (I'm not sure about how vacuum
affects, though.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: Dilip Kumar
Date: 22 October 2020, 05:11:38
Subject: Re: [HACKERS] Custom compression methods

From: Kyotaro Horiguchi
Date: 22 October 2020, 05:26:33
Subject: Re: Mop-up around psql's \connect behavior

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

Previous

Next