Home > mailing lists

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: BitmapHeapScan streaming read user and prelim refactoring
Date	March 13, 2024 22:38:38
Msg-id	CA+hUKG+a1NSHa-=7znx1EhmGXo+BFJH3mk3xJJLY3SPgJ0L2Bw@mail.gmail.com Whole thread Raw
In response to	Re: BitmapHeapScan streaming read user and prelim refactoring (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: BitmapHeapScan streaming read user and prelim refactoring
List	pgsql-hackers

Tree view

On Sun, Mar 3, 2024 at 11:41 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 3/2/24 23:28, Melanie Plageman wrote:
> > On Sat, Mar 2, 2024 at 10:05 AM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >> With the current "master" code, eic=1 means we'll issue a prefetch for B
> >> and then read+process A. And then issue prefetch for C and read+process
> >> B, and so on. It's always one page ahead.
> >
> > Yes, that is what I mean for eic = 1

I spent quite a few days thinking about the meaning of eic=0 and eic=1
for streaming_read.c v7[1], to make it agree with the above and with
master.  Here's why I was confused:

Both eic=0 and eic=1 are expected to generate at most 1 physical I/O
at a time, or I/O queue depth 1 if you want to put it that way.  But
this isn't just about concurrency of I/O, it's also about computation.
Duh.

eic=0 means that the I/O is not concurrent with executor computation.
So, to annotate an excerpt from [1]'s random.txt, we have:

effective_io_concurrency = 0, range size = 1
unpatched                              patched
==============================================================================
pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
             *** executor now has page at 0x58000 to work on ***
pread(43,...,8192,0xb0000) = 8192      pread(82,...,8192,0xb0000) = 8192
             *** executor now has page at 0xb0000 to work on ***

eic=1 means that a single I/O is started and then control is returned
to the executor code to do useful work concurrently with the
background read that we assume is happening:

effective_io_concurrency = 1, range size = 1
unpatched                              patched
==============================================================================
pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
posix_fadvise(43,0xb0000,0x2000,...)   posix_fadvise(82,0xb0000,0x2000,...)
             *** executor now has page at 0x58000 to work on ***
pread(43,...,8192,0xb0000) = 8192      pread(82,...,8192,0xb0000) = 8192
posix_fadvise(43,0x108000,0x2000,...)  posix_fadvise(82,0x108000,0x2000,...)
             *** executor now has page at 0xb0000 to work on ***
pread(43,...,8192,0x108000) = 8192     pread(82,...,8192,0x108000) = 8192
posix_fadvise(43,0x160000,0x2000,...)  posix_fadvise(82,0x160000,0x2000,...)

In other words, 'concurrency' doesn't mean 'number of I/Os running
concurrently with each other', it means 'number of I/Os running
concurrently with computation', and when you put it that way, 0 and 1
are different.

Note that the first read is a bit special: by the time the consumer is
ready to pull a buffer out of the stream when we don't have a buffer
ready yet, it is too late to issue useful advice, so we don't bother.
FWIW I think even in the AIO future we would have a synchronous read
in that specific place, at least when using io_method=worker, because
it would be stupid to ask another process to read a block for us that
we want right now and then wait for it wake us up when it's done.

Note that even when we aren't issuing any advice because eic=0 or
because we detected sequential access and we believe the kernel can do
a better job than us, we still 'look ahead' (= call the callback to
see which block numbers are coming down the pipe), but only as far as
we need to coalesce neighbouring blocks.  (I deliberately avoid using
the word "prefetch" except in very general discussions because it
means different things to different layers of the code, hence talk of
"look ahead" and "advice".)  That's how we get this change:

effective_io_concurrency = 0, range size = 4
unpatched                              patched
==============================================================================
pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
pread(43,...,8192,0x5a000) = 8192      preadv(82,...,2,0x5a000) = 16384
pread(43,...,8192,0x5c000) = 8192      pread(82,...,8192,0x5e000) = 8192
pread(43,...,8192,0x5e000) = 8192      preadv(82,...,4,0xb0000) = 32768
pread(43,...,8192,0xb0000) = 8192      preadv(82,...,4,0x108000) = 32768
pread(43,...,8192,0xb2000) = 8192      preadv(82,...,4,0x160000) = 32768

And then once we introduce eic > 0 to the picture with neighbouring
blocks that can be coalesced, "patched" starts to diverge even more
from "unpatched" because it tracks the number of wide I/Os in
progress, not the number of single blocks.

[1] https://www.postgresql.org/message-id/CA+hUKGLJi+c5jB3j6UvkgMYHky-qu+LPCsiNahUGSa5Z4DvyVA@mail.gmail.com

pgsql-hackers by date:

From: Corey Huinker
Date: 13 March 2024, 22:33:14
Subject: Re: Statistics Import and Export

From: David Rowley
Date: 13 March 2024, 23:00:24
Subject: Re: pg16: XX000: could not find pathkey item to sort

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

Previous

Next