Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date | |
Msg-id | CA+hUKG+a1NSHa-=7znx1EhmGXo+BFJH3mk3xJJLY3SPgJ0L2Bw@mail.gmail.com Whole thread Raw |
In response to | Re: BitmapHeapScan streaming read user and prelim refactoring (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: BitmapHeapScan streaming read user and prelim refactoring
|
List | pgsql-hackers |
On Sun, Mar 3, 2024 at 11:41 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 3/2/24 23:28, Melanie Plageman wrote: > > On Sat, Mar 2, 2024 at 10:05 AM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> With the current "master" code, eic=1 means we'll issue a prefetch for B > >> and then read+process A. And then issue prefetch for C and read+process > >> B, and so on. It's always one page ahead. > > > > Yes, that is what I mean for eic = 1 I spent quite a few days thinking about the meaning of eic=0 and eic=1 for streaming_read.c v7[1], to make it agree with the above and with master. Here's why I was confused: Both eic=0 and eic=1 are expected to generate at most 1 physical I/O at a time, or I/O queue depth 1 if you want to put it that way. But this isn't just about concurrency of I/O, it's also about computation. Duh. eic=0 means that the I/O is not concurrent with executor computation. So, to annotate an excerpt from [1]'s random.txt, we have: effective_io_concurrency = 0, range size = 1 unpatched patched ============================================================================== pread(43,...,8192,0x58000) = 8192 pread(82,...,8192,0x58000) = 8192 *** executor now has page at 0x58000 to work on *** pread(43,...,8192,0xb0000) = 8192 pread(82,...,8192,0xb0000) = 8192 *** executor now has page at 0xb0000 to work on *** eic=1 means that a single I/O is started and then control is returned to the executor code to do useful work concurrently with the background read that we assume is happening: effective_io_concurrency = 1, range size = 1 unpatched patched ============================================================================== pread(43,...,8192,0x58000) = 8192 pread(82,...,8192,0x58000) = 8192 posix_fadvise(43,0xb0000,0x2000,...) posix_fadvise(82,0xb0000,0x2000,...) *** executor now has page at 0x58000 to work on *** pread(43,...,8192,0xb0000) = 8192 pread(82,...,8192,0xb0000) = 8192 posix_fadvise(43,0x108000,0x2000,...) posix_fadvise(82,0x108000,0x2000,...) *** executor now has page at 0xb0000 to work on *** pread(43,...,8192,0x108000) = 8192 pread(82,...,8192,0x108000) = 8192 posix_fadvise(43,0x160000,0x2000,...) posix_fadvise(82,0x160000,0x2000,...) In other words, 'concurrency' doesn't mean 'number of I/Os running concurrently with each other', it means 'number of I/Os running concurrently with computation', and when you put it that way, 0 and 1 are different. Note that the first read is a bit special: by the time the consumer is ready to pull a buffer out of the stream when we don't have a buffer ready yet, it is too late to issue useful advice, so we don't bother. FWIW I think even in the AIO future we would have a synchronous read in that specific place, at least when using io_method=worker, because it would be stupid to ask another process to read a block for us that we want right now and then wait for it wake us up when it's done. Note that even when we aren't issuing any advice because eic=0 or because we detected sequential access and we believe the kernel can do a better job than us, we still 'look ahead' (= call the callback to see which block numbers are coming down the pipe), but only as far as we need to coalesce neighbouring blocks. (I deliberately avoid using the word "prefetch" except in very general discussions because it means different things to different layers of the code, hence talk of "look ahead" and "advice".) That's how we get this change: effective_io_concurrency = 0, range size = 4 unpatched patched ============================================================================== pread(43,...,8192,0x58000) = 8192 pread(82,...,8192,0x58000) = 8192 pread(43,...,8192,0x5a000) = 8192 preadv(82,...,2,0x5a000) = 16384 pread(43,...,8192,0x5c000) = 8192 pread(82,...,8192,0x5e000) = 8192 pread(43,...,8192,0x5e000) = 8192 preadv(82,...,4,0xb0000) = 32768 pread(43,...,8192,0xb0000) = 8192 preadv(82,...,4,0x108000) = 32768 pread(43,...,8192,0xb2000) = 8192 preadv(82,...,4,0x160000) = 32768 And then once we introduce eic > 0 to the picture with neighbouring blocks that can be coalesced, "patched" starts to diverge even more from "unpatched" because it tracks the number of wide I/Os in progress, not the number of single blocks. [1] https://www.postgresql.org/message-id/CA+hUKGLJi+c5jB3j6UvkgMYHky-qu+LPCsiNahUGSa5Z4DvyVA@mail.gmail.com
pgsql-hackers by date: