Thread: reduce size of fmgr_builtins array
Hi all, Currently, we include the function name string in each FmgrBuiltin struct, whose size is 24 bytes on 64 bit platforms. As far as I can tell, the name is usually unused, so the attached (WIP, untested) patch stores it separately, reducing this struct to 16 bytes. We can go one step further and allocate the names as a single character string, reducing the binary size. It doesn't help much to store offsets, since there are ~40k characters, requiring 32-bit offsets. If we instead compute the offset on the fly from stored name lengths, we can use 8-bit values, saving 19kB of space in the binary over using string pointers. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I wrote: > Currently, we include the function name string in each FmgrBuiltin > struct, whose size is 24 bytes on 64 bit platforms. As far as I can > tell, the name is usually unused, so the attached (WIP, untested) > patch stores it separately, reducing this struct to 16 bytes. > > We can go one step further and allocate the names as a single > character string, reducing the binary size. It doesn't help much to > store offsets, since there are ~40k characters, requiring 32-bit > offsets. If we instead compute the offset on the fly from stored name > lengths, we can use 8-bit values, saving 19kB of space in the binary > over using string pointers. I tested with the attached C function to make sure fmgr_internal_function() still returned the correct answer. I assume this is not a performance critical function, but I still wanted to see if there was a visible performance regression. I get this when calling fmgr_internal_function() 100k times: master: 833ms patch: 886ms The point of the patch is to increase the likelihood of fmgr_isbuiltin() finding the fmgr_builtins[] element in L1 cache. It seems harder to put a number on that for a realistic workload, but reducing the array size by 1/3 couldn't hurt. I'll go ahead and add this to the commitfest. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 02/01/2020 01:15, John Naylor wrote: > I wrote: > >> Currently, we include the function name string in each FmgrBuiltin >> struct, whose size is 24 bytes on 64 bit platforms. As far as I can >> tell, the name is usually unused, so the attached (WIP, untested) >> patch stores it separately, reducing this struct to 16 bytes. >> >> We can go one step further and allocate the names as a single >> character string, reducing the binary size. It doesn't help much to >> store offsets, since there are ~40k characters, requiring 32-bit >> offsets. If we instead compute the offset on the fly from stored name >> lengths, we can use 8-bit values, saving 19kB of space in the binary >> over using string pointers. > > I tested with the attached C function to make sure > fmgr_internal_function() still returned the correct answer. I assume > this is not a performance critical function, but I still wanted to see > if there was a visible performance regression. I get this when calling > fmgr_internal_function() 100k times: > > master: 833ms > patch: 886ms Hmm. I was actually expecting this to slightly speed up fmgr_internal_function(), now that all the names fit in a smaller amount of cache. I guess there are more branches or a data dependency or something now. I'm not too worried about that though. If it mattered, we should switch to binary searching the array. > The point of the patch is to increase the likelihood of > fmgr_isbuiltin() finding the fmgr_builtins[] element in L1 cache. It > seems harder to put a number on that for a realistic workload, but > reducing the array size by 1/3 couldn't hurt. Yeah. Nevertheless, it would be nice to be able to demonstrate the benefit in some test, at least. It feels hard to justify committing a performance patch if we can't show the benefit. Otherwise, we should just try to keep it as simple as possible, to optimize for readability. A similar approach was actually discussed a couple of years back: https://www.postgresql.org/message-id/bd13812c-c4ae-3788-5b28-5633beed2929%40iki.fi. The conclusion then was that it's not worth the trouble or the code complication. So I think this patch is Rejected, unless you can come up with a test case that concretely shows the benefit. - Heikki
On Tue, Jan 7, 2020 at 9:08 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Yeah. Nevertheless, it would be nice to be able to demonstrate the > benefit in some test, at least. It feels hard to justify committing a > performance patch if we can't show the benefit. Otherwise, we should > just try to keep it as simple as possible, to optimize for readability. > > A similar approach was actually discussed a couple of years back: > https://www.postgresql.org/message-id/bd13812c-c4ae-3788-5b28-5633beed2929%40iki.fi. > The conclusion then was that it's not worth the trouble or the code > complication. So I think this patch is Rejected, unless you can come up > with a test case that concretely shows the benefit. Thanks for reviewing! As expected, a microbenchmark didn't show a difference. I could try profiling in some workload, but I don't think the benefit would be worth the effort involved. I've marked it rejected. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services