Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) |
Date | |
Msg-id | CAEepm=3hkZt5UuqMvYEZJ5rNDTJy0wKVq=JnQ57OiKZQKeM8Og@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
|
List | pgsql-hackers |
On Sat, May 6, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, May 4, 2017 at 10:20 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> Now I'm not going to pretend that this patch is ready for the >> prime-time. I've not yet worked out how to properly report sync-scan >> locations without risking reporting later pages after reporting the >> end of the scan. What I have at the moment could cause a report to be >> missed if SYNC_SCAN_REPORT_INTERVAL is not divisible by the batch >> size. I'm also not sure how batching like this affect read-aheads, but >> at least the numbers above speak for something. Although none of the >> pages in this case came from disk. > > This kind of approach has also been advocated within EnterpriseDB, and > I immediately thought of the read-ahead problem. I think we need more > research into how Parallel Seq Scan interacts with OS readahead > behavior on various operating systems. It seem possible that Parallel > Seq Scan frustrates operating system read-ahead even without this > change on at least some systems (because maybe they can only detect > ascending block number requests within a single process) and even more > possible that you run into problems with the block number requests are > no longer precisely in order (which, at present, they should be, or > very close). If it turns out to be a problem, either currently or > with this patch, we might need to add explicit prefetching logic to > Parallel Seq Scan. I don't know much about this stuff, but I was curious to go looking at source code. I hope someone will correct me if I'm wrong but here's what I could glean: In Linux, each process that opens a file gets its own 'file' object[1][5]. Each of those has it's own 'file_ra_state' object[2][3], used by ondemand_readahead[4] for sequential read detection. So I speculate that page-at-a-time parallel seq scan must look like random access to Linux. In FreeBSD the situation looks similar. Each process that opens a file gets a 'file' object[8] which has members 'f_seqcount' and 'f_nextoff'[6]. These are used by the 'sequential_heuristics' function[7] which affects the ioflag which UFS/FFS uses to control read ahead (see ffs_read). So I speculate that page-at-a-time parallel seq scan must look like random access to FreeBSD too. In both cases I suspect that if you'd inherited (or sent the file descriptor to the other process via obscure tricks), it would actually work because they'd have the same 'file' entry, but that's clearly not workable for md.c. Experimentation required... [1] https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/include/linux/fs.h#L837 [2] https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/include/linux/fs.h#L858 [3] https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/include/linux/fs.h#L817 [4] https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/mm/readahead.c#L376 [5] http://www.makelinux.net/ldd3/chp-3-sect-3 "There can be numerous file structures representing multiple open descriptors on a single file, but they all point to a single inode structure." [6] https://github.com/freebsd/freebsd/blob/7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da/sys/sys/file.h#L180 [7] https://github.com/freebsd/freebsd/blob/7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da/sys/kern/vfs_vnops.c#L477 [8] Page 319 of 'Design and Implementation of the FreeBSD Operating System' 2nd Edition -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: