Thread: Re: Some ExecSeqScan optimizations
Hi, On 2025-01-22 10:07:51 +0900, Amit Langote wrote: > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Here's v5 with a few commit message tweaks. > > > > Barring objections, I would like to push this early next week. > > Pushed yesterday. Thank you all. While looking at a profile I recently noticed that ExecSeqScanWithQual() had a runtime branch to test whether qual is NULL, which seemed a bit silly. I think we should use pg_assume(), which I just added to avoid a compiler warning, to improve the code generation here. The performance gain unsurprisingly isn't significant (but seems repeatably measureable), but it does cut out a fair bit of unnecessary code. andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o text data bss dec hex filename 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o 3834 0 0 3834 efa executor_nodeSeqscan.c.o A 13% reduction in actual code size isn't bad for such a small change, imo. I have a separate question as well - do we need to call ResetExprContext() if we neither qual, projection nor epq? I see a small gain by avoiding that. Greetings, Andres Freund
Attachment
Hi Andres, On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > On 2025-01-22 10:07:51 +0900, Amit Langote wrote: > > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > Here's v5 with a few commit message tweaks. > > > > > > Barring objections, I would like to push this early next week. > > > > Pushed yesterday. Thank you all. > > While looking at a profile I recently noticed that ExecSeqScanWithQual() had a > runtime branch to test whether qual is NULL, which seemed a bit silly. I think > we should use pg_assume(), which I just added to avoid a compiler warning, to > improve the code generation here. +1. I think this might be what David was getting at in his first message in this thread. > The performance gain unsurprisingly isn't significant (but seems repeatably > measureable), but it does cut out a fair bit of unnecessary code. > > andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o > text data bss dec hex filename > 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o > 3834 0 0 3834 efa executor_nodeSeqscan.c.o > > A 13% reduction in actual code size isn't bad for such a small change, imo. Yeah, that seems worthwhile. I had been a bit concerned about code size growth from having four variant functions with at least some duplication, so this is a nice offset. Thanks for the patch. + /* + * Use pg_assume() for != NULL tests to make the compiler realize no + * runtime check for the field is needed in ExecScanExtended(). + */ I propose changing "to make the compiler realize no runtime check" to "so the compiler can optimize away the runtime check", assuming that is what it means. Also, I assume you intentionally avoided repeating the comment in all the variant functions. > I have a separate question as well - do we need to call ResetExprContext() if > we neither qual, projection nor epq? I see a small gain by avoiding that. You're referring to this block, I assume: /* * If we have neither a qual to check nor a projection to do, just skip * all the overhead and return the raw scan tuple. */ if (!qual && !projInfo) { ResetExprContext(econtext); return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); } Yeah, I think it's fine to remove ResetExprContext() here. When I looked at it before, I left it in because I was unsure whether accessMtd() might leak memory into the per-tuple context. But on second thought, that seems unlikely? Would you like me to do it or do you have a patch in your tree already? -- Thanks, Amit Langote
Hi, On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > > On 2025-01-22 10:07:51 +0900, Amit Langote wrote: > > > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Here's v5 with a few commit message tweaks. > > > > > > > > Barring objections, I would like to push this early next week. > > > > > > Pushed yesterday. Thank you all. > > > > While looking at a profile I recently noticed that ExecSeqScanWithQual() had a > > runtime branch to test whether qual is NULL, which seemed a bit silly. I think > > we should use pg_assume(), which I just added to avoid a compiler warning, to > > improve the code generation here. > > +1. I think this might be what David was getting at in his first > message in this thread. Indeed. > > The performance gain unsurprisingly isn't significant (but seems repeatably > > measureable), but it does cut out a fair bit of unnecessary code. > > > > andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o > > text data bss dec hex filename > > 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o > > 3834 0 0 3834 efa executor_nodeSeqscan.c.o > > > > A 13% reduction in actual code size isn't bad for such a small change, imo. > > Yeah, that seems worthwhile. I had been a bit concerned about code > size growth from having four variant functions with at least some > duplication, so this is a nice offset. I'm rather surprised by just how much the size reduces... I built nodeSeqscan.c with -ffunction-sections and looked at the size with size --format=sysv: Before: .text.SeqRecheck 6 0 .rodata.str1.8 135 0 .text.unlikely.SeqNext 53 0 .text.SeqNext 178 0 .text.ExecSeqScanEPQ 20 0 .text.ExecSeqScanWithProject 289 0 .text.unlikely.ExecSeqScanWithQual 53 0 .text.ExecSeqScanWithQual 441 0 .text.unlikely.ExecSeqScanWithQualProject 53 0 .text.ExecSeqScanWithQualProject 811 0 .text.unlikely.ExecSeqScan 53 0 .text.ExecSeqScan 245 0 .text.ExecInitSeqScan 287 0 .text.ExecEndSeqScan 33 0 .text.ExecReScanSeqScan 63 0 .text.ExecSeqScanEstimate 88 0 .text.ExecSeqScanInitializeDSM 114 0 .text.ExecSeqScanReInitializeDSM 34 0 .text.ExecSeqScanInitializeWorker 64 0 After: .text.SeqRecheck 6 0 .rodata.str1.8 135 0 .text.unlikely.SeqNext 53 0 .text.SeqNext 178 0 .text.ExecSeqScanEPQ 20 0 .text.ExecSeqScanWithProject 209 0 .text.unlikely.ExecSeqScanWithQual 53 0 .text.ExecSeqScanWithQual 373 0 .text.unlikely.ExecSeqScanWithQualProject 53 0 .text.ExecSeqScanWithQualProject 474 0 .text.unlikely.ExecSeqScan 53 0 .text.ExecSeqScan 245 0 .text.ExecInitSeqScan 287 0 .text.ExecEndSeqScan 33 0 .text.ExecReScanSeqScan 63 0 .text.ExecSeqScanEstimate 88 0 .text.ExecSeqScanInitializeDSM 114 0 .text.ExecSeqScanReInitializeDSM 34 0 .text.ExecSeqScanInitializeWorker 64 0 I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811 to 474, just due to those null checks being removed... But I'll take it. > Thanks for the patch. > > + /* > + * Use pg_assume() for != NULL tests to make the compiler realize no > + * runtime check for the field is needed in ExecScanExtended(). > + */ > > I propose changing "to make the compiler realize no runtime check" to > "so the compiler can optimize away the runtime check", assuming that > is what it means. It does. I don't really see a meaningful difference between the comments? > Also, I assume you intentionally avoided repeating the comment in all > the variant functions. Yea, it didn't seem helpful to do so. > > I have a separate question as well - do we need to call ResetExprContext() if > > we neither qual, projection nor epq? I see a small gain by avoiding that. > > You're referring to this block, I assume: > > /* > * If we have neither a qual to check nor a projection to do, just skip > * all the overhead and return the raw scan tuple. > */ > if (!qual && !projInfo) > { > ResetExprContext(econtext); > return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); > } Yep. > Yeah, I think it's fine to remove ResetExprContext() here. When I > looked at it before, I left it in because I was unsure whether > accessMtd() might leak memory into the per-tuple context. It's a good question. I think I unfortunately found a problematic case, ForeignNext(). I wonder if we instead can MemoryContextReset cheaper, by avoiding a function call for the common case that no reset is needed. Right now we can't just check ->isReset in an inline function, because we also delete children. I wonder if we could define isReset so that creating a child context unsets isReset? Greetings, Andres Freund
On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote: > On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > > > The performance gain unsurprisingly isn't significant (but seems repeatably > > > measureable), but it does cut out a fair bit of unnecessary code. > > > > > > andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o > > > text data bss dec hex filename > > > 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o > > > 3834 0 0 3834 efa executor_nodeSeqscan.c.o > > > > > > A 13% reduction in actual code size isn't bad for such a small change, imo. > > > > Yeah, that seems worthwhile. I had been a bit concerned about code > > size growth from having four variant functions with at least some > > duplication, so this is a nice offset. > > I'm rather surprised by just how much the size reduces... > > I built nodeSeqscan.c with -ffunction-sections and looked at the size with > size --format=sysv: > > Before: > .text.SeqRecheck 6 0 > .rodata.str1.8 135 0 > .text.unlikely.SeqNext 53 0 > .text.SeqNext 178 0 > .text.ExecSeqScanEPQ 20 0 > .text.ExecSeqScanWithProject 289 0 > .text.unlikely.ExecSeqScanWithQual 53 0 > .text.ExecSeqScanWithQual 441 0 > .text.unlikely.ExecSeqScanWithQualProject 53 0 > .text.ExecSeqScanWithQualProject 811 0 > .text.unlikely.ExecSeqScan 53 0 > .text.ExecSeqScan 245 0 > .text.ExecInitSeqScan 287 0 > .text.ExecEndSeqScan 33 0 > .text.ExecReScanSeqScan 63 0 > .text.ExecSeqScanEstimate 88 0 > .text.ExecSeqScanInitializeDSM 114 0 > .text.ExecSeqScanReInitializeDSM 34 0 > .text.ExecSeqScanInitializeWorker 64 0 > > After: > .text.SeqRecheck 6 0 > .rodata.str1.8 135 0 > .text.unlikely.SeqNext 53 0 > .text.SeqNext 178 0 > .text.ExecSeqScanEPQ 20 0 > .text.ExecSeqScanWithProject 209 0 > .text.unlikely.ExecSeqScanWithQual 53 0 > .text.ExecSeqScanWithQual 373 0 > .text.unlikely.ExecSeqScanWithQualProject 53 0 > .text.ExecSeqScanWithQualProject 474 0 > .text.unlikely.ExecSeqScan 53 0 > .text.ExecSeqScan 245 0 > .text.ExecInitSeqScan 287 0 > .text.ExecEndSeqScan 33 0 > .text.ExecReScanSeqScan 63 0 > .text.ExecSeqScanEstimate 88 0 > .text.ExecSeqScanInitializeDSM 114 0 > .text.ExecSeqScanReInitializeDSM 34 0 > .text.ExecSeqScanInitializeWorker 64 0 > > > I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811 > to 474, just due to those null checks being removed... But I'll take it. Wow, indeed. > > Thanks for the patch. > > > > + /* > > + * Use pg_assume() for != NULL tests to make the compiler realize no > > + * runtime check for the field is needed in ExecScanExtended(). > > + */ > > > > I propose changing "to make the compiler realize no runtime check" to > > "so the compiler can optimize away the runtime check", assuming that > > is what it means. > > It does. I don't really see a meaningful difference between the comments? Maybe not. I just had to pause for a moment to be sure that was what it actually meant when I first read it. I'm fine leaving it as is if you prefer. > > > I have a separate question as well - do we need to call ResetExprContext() if > > > we neither qual, projection nor epq? I see a small gain by avoiding that. > > > > You're referring to this block, I assume: > > > > /* > > * If we have neither a qual to check nor a projection to do, just skip > > * all the overhead and return the raw scan tuple. > > */ > > if (!qual && !projInfo) > > { > > ResetExprContext(econtext); > > return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); > > } > > Yep. > > > > Yeah, I think it's fine to remove ResetExprContext() here. When I > > looked at it before, I left it in because I was unsure whether > > accessMtd() might leak memory into the per-tuple context. > > It's a good question. I think I unfortunately found a problematic case, > ForeignNext(). Ah, so we do have a culprit in the tree. > I wonder if we instead can MemoryContextReset cheaper, by avoiding a function > call for the common case that no reset is needed. Right now we can't just > check ->isReset in an inline function, because we also delete children. I > wonder if we could define isReset so that creating a child context unsets > isReset? Were you thinking ResetExprContext() could become something like: #define ResetExprContext(econtext) \ do { \ if (!((econtext)->ecxt_per_tuple_memory)->isReset) \ MemoryContextReset((econtext)->ecxt_per_tuple_memory); \ } while (0) that is, once isReset also accounts for whether any child context exists? -- Thanks, Amit Langote