Re: Cache relation sizes? - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Cache relation sizes? |
Date | |
Msg-id | CAEepm=3f9Ho1jKohAUF=ueDqN5LUfdLv5k8FK9DNYaCP=si1Cg@mail.gmail.com Whole thread Raw |
In response to | RE: Cache relation sizes? ("Jamison, Kirk" <k.jamison@jp.fujitsu.com>) |
Responses |
RE: Cache relation sizes?
|
List | pgsql-hackers |
On Thu, Dec 27, 2018 at 8:00 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote: > I also find this proposed feature to be beneficial for performance, especially when we want to extend or truncate largetables. > As mentioned by David, currently there is a query latency spike when we make generic plan for partitioned table with manypartitions. > I tried to apply Thomas' patch for that use case. Aside from measuring the planning and execution time, > I also monitored the lseek calls using simple strace, with and without the patch. Thanks for looking into this and testing! > Setup 8192 table partitions. > (1) set plan_cache_mode = 'force_generic_plan'; > Planning Time: 1678.680 ms > Planning Time: 1596.566 ms > (2) plan_cache_mode = 'auto’ > Planning Time: 768.669 ms > Planning Time: 181.690 ms > (3) set plan_cache_mode = 'force_generic_plan'; > Planning Time: 14.294 ms > Planning Time: 13.976 ms > If I did the test correctly, I am not sure though as to why the patch did not affect the generic planning performance oftable with many partitions. > However, the number of lseek calls was greatly reduced with Thomas’ patch. > I also did not get considerable speed up in terms of latency average using pgbench –S (read-only, unprepared). > I am assuming this might be applicable to other use cases as well. > (I just tested the patch, but haven’t dug up the patch details yet). The result for (2) is nice. Even though you had to use 8192 partitions to see it. > Would you like to submit this to the commitfest to get more reviews for possible idea/patch improvement? For now I think this still in the experiment/hack phase and I have a ton of other stuff percolating in this commitfest already (and a week of family holiday in the middle of January). But if you have ideas about the validity of the assumptions, the reason it breaks initdb, or any other aspect of this approach (or alternatives), please don't let me stop you, and of course please feel free to submit this, an improved version or an alternative proposal yourself! Unfortunately I wouldn't have time to nurture it this time around, beyond some drive-by comments. Assorted armchair speculation: I wonder how much this is affected by the OS and KPTI, virtualisation technology, PCID support, etc. Back in the good old days, Linux's lseek(SEEK_END) stopped acquiring the inode mutex when reading the size, at least in the generic implementation used by most filesystems (I wonder if our workloads were indirectly responsible for that optimisation?) so maybe it became about as fast as a syscall could possibly be, but now the baseline for how fast syscalls can be has moved and it also depends on your hardware, and it also has external costs that depend on what memory you touch in between syscalls. Also, other operating systems might still acquire a per-underlying-file/vnode/whatever lock (<checks source code>... yes) and the contention for that might depend on what else is happening, so that a single standalone test wouldn't capture that but a super busy DB with a rapidly expanding and contracting table that many other sessions are trying to observe with lseek(SEEK_END) could slow down more. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: