Thread: computing completion tag is expensive for pgbench -S -M prepared
Hi, While looking at a profile I randomly noticed that we spend a surprising amount of time in snprintf() and its subsidiary functions. That turns out to be if (strcmp(portal->commandTag, "SELECT") == 0) snprintf(completionTag, COMPLETION_TAG_BUFSIZE, "SELECT " UINT64_FORMAT, nprocessed); in PortalRun(). That's actually fairly trivial to optimize - we don't need the full blown snprintf machinery here. A quick benchmark replacing it with: memcpy(completionTag, "SELECT ", sizeof("SELECT ")); pg_lltoa(nprocessed, completionTag + 7); yields nearly a ~2% increase in TPS. Larger than I expected. The code is obviously less pretty, but it's also not actually that bad. Attached is the patch I used for benchmarking. I wonder if I just hit some specific version of glibc that regressed snprintf performance, or whether others can reproduce this. If it actually reproducible, I think we should go for it. But update the rest of the completionTag writes in the same file too. Greetings, Andres Freund
Attachment
On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote: > in PortalRun(). That's actually fairly trivial to optimize - we don't > need the full blown snprintf machinery here. A quick benchmark > replacing it with: > > memcpy(completionTag, "SELECT ", sizeof("SELECT ")); > pg_lltoa(nprocessed, completionTag + 7); I'd also noticed something similar with some recent benchmarks I was doing for INSERTs into partitioned tables. In my case I saw as high as 0.7% of the time spent building the INSERT tag. So I think it's worth fixing this. I think it would be better to invent a function that accepts a CmdType, int64 and Oid that copies the tag into the supplied buffer, then make a more generic change that also replaces the code in ProcessQuery() which builds the tag. I'm sure there must be some way to get the CmdType down to the place you've patched so we can get rid of the if (strcmp(portal->commandTag, "SELECT") == 0) line too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote: >> in PortalRun(). That's actually fairly trivial to optimize - we don't >> need the full blown snprintf machinery here. A quick benchmark >> replacing it with: >> >> memcpy(completionTag, "SELECT ", sizeof("SELECT ")); >> pg_lltoa(nprocessed, completionTag + 7); > > I'd also noticed something similar with some recent benchmarks I was > doing for INSERTs into partitioned tables. In my case I saw as high as > 0.7% of the time spent building the INSERT tag. So I think it's worth > fixing this. > > I think it would be better to invent a function that accepts a > CmdType, int64 and Oid that copies the tag into the supplied buffer, > then make a more generic change that also replaces the code in > ProcessQuery() which builds the tag. I'm sure there must be some way > to get the CmdType down to the place you've patched so we can get rid > of the if (strcmp(portal->commandTag, "SELECT") == 0) line too. Sounds better Do we actually need the completion tag at all? In most cases?? Perhaps we should add a parameter to make it optional and turn it off by default, except for psql. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2018-06-07 12:01 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:
On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:
>> in PortalRun(). That's actually fairly trivial to optimize - we don't
>> need the full blown snprintf machinery here. A quick benchmark
>> replacing it with:
>>
>> memcpy(completionTag, "SELECT ", sizeof("SELECT "));
>> pg_lltoa(nprocessed, completionTag + 7);
>
> I'd also noticed something similar with some recent benchmarks I was
> doing for INSERTs into partitioned tables. In my case I saw as high as
> 0.7% of the time spent building the INSERT tag. So I think it's worth
> fixing this.
>
> I think it would be better to invent a function that accepts a
> CmdType, int64 and Oid that copies the tag into the supplied buffer,
> then make a more generic change that also replaces the code in
> ProcessQuery() which builds the tag. I'm sure there must be some way
> to get the CmdType down to the place you've patched so we can get rid
> of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.
Sounds better
Do we actually need the completion tag at all? In most cases??
affected rows is taken from this value on protocol level
Regards
Pavel
Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Do we actually need the completion tag at all? In most cases?? > > > affected rows is taken from this value on protocol level I didn't mean we should remove the number of rows. Many things rely on that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andres Freund <andres@anarazel.de> writes: > ... That's actually fairly trivial to optimize - we don't > need the full blown snprintf machinery here. A quick benchmark > replacing it with: > memcpy(completionTag, "SELECT ", sizeof("SELECT ")); > pg_lltoa(nprocessed, completionTag + 7); While I don't have any objection to this change if the speedup is reproducible, I do object to spelling the same constant as 'sizeof("SELECT ")' and '7' on adjacent lines ... regards, tom lane
On 2018-06-07 11:40:48 +0100, Simon Riggs wrote: > On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > >> Do we actually need the completion tag at all? In most cases?? > > > > > > affected rows is taken from this value on protocol level > > I didn't mean we should remove the number of rows. Many things rely on that. How do you mean it then? We can't really easily change how we return that value on the protocol level, and the command tag is basically just returned as a string in the protocol. If we were to design the protocol today I'm sure we just would declare the rowcount and affected rowcounts separate fields or something, but ... Greetings, Andres Freund
On 2018-06-07 10:30:14 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > ... That's actually fairly trivial to optimize - we don't > > need the full blown snprintf machinery here. A quick benchmark > > replacing it with: > > > memcpy(completionTag, "SELECT ", sizeof("SELECT ")); > > pg_lltoa(nprocessed, completionTag + 7); > > While I don't have any objection to this change if the speedup is > reproducible, I do object to spelling the same constant as > 'sizeof("SELECT ")' and '7' on adjacent lines ... Hah, yes. Nor would I want to keep the #if 0 around it ;). I mostly wanted to know whether others can reproduce the effect - the actual patch would need to be bigger and affect more places. Greetings, Andres Freund
On 2018-06-07 17:01:47 +1200, David Rowley wrote: > On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote: > > in PortalRun(). That's actually fairly trivial to optimize - we don't > > need the full blown snprintf machinery here. A quick benchmark > > replacing it with: > > > > memcpy(completionTag, "SELECT ", sizeof("SELECT ")); > > pg_lltoa(nprocessed, completionTag + 7); > > I'd also noticed something similar with some recent benchmarks I was > doing for INSERTs into partitioned tables. In my case I saw as high as > 0.7% of the time spent building the INSERT tag. So I think it's worth > fixing this. I'm kinda surprised that I never noticed this until recently. I wonder if there's a glibc or compiler regression causing this. There's some string optimization passes, it could be that it now does less. > I think it would be better to invent a function that accepts a > CmdType, int64 and Oid that copies the tag into the supplied buffer, > then make a more generic change that also replaces the code in > ProcessQuery() which builds the tag. I'm sure there must be some way > to get the CmdType down to the place you've patched so we can get rid > of the if (strcmp(portal->commandTag, "SELECT") == 0) line too. Generally sounds reasonable, I'm not sure it's worth to do the surgery to avoid the strcmp(). That's a larger change that's somewhat independent... Greetings, Andres Freund
On 7 June 2018 at 19:20, Andres Freund <andres@anarazel.de> wrote: > On 2018-06-07 11:40:48 +0100, Simon Riggs wrote: >> On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >> >> Do we actually need the completion tag at all? In most cases?? >> > >> > >> > affected rows is taken from this value on protocol level >> >> I didn't mean we should remove the number of rows. Many things rely on that. > > How do you mean it then? We can't really easily change how we return > that value on the protocol level, and the command tag is basically just > returned as a string in the protocol. If we were to design the protocol > today I'm sure we just would declare the rowcount and affected rowcounts > separate fields or something, but ... I meant remove the pointless noise word at the start of the tag that few clients care about. I was thinking of just returning "SQL" instead, but that wasn't after much thought. But now I think about it more returning any fixed string, "SQL" or "SELECT", in the protocol seems like a waste of bandwidth and a potential route in to decrypt the stream. If we're going to compress the protocol, it seems sensible to remove extraneous information first. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > If we're going to compress the protocol, it seems sensible to remove > extraneous information first. Breaking the wire protocol was nowhere in this thread. regards, tom lane
On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> If we're going to compress the protocol, it seems sensible to remove >> extraneous information first. > > Breaking the wire protocol was nowhere in this thread. No, it wasn't. But there is another thread on the subject of compressing libpq, which is what I was referring to. Andres' patch is clearly very efficient at adding the SELECT tag. I am questioning if we can remove that need altogether. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-06-07 20:34:39 +0100, Simon Riggs wrote: > On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > >> If we're going to compress the protocol, it seems sensible to remove > >> extraneous information first. > > > > Breaking the wire protocol was nowhere in this thread. > > No, it wasn't. But there is another thread on the subject of > compressing libpq, which is what I was referring to. > > Andres' patch is clearly very efficient at adding the SELECT tag. I am > questioning if we can remove that need altogether. That'd be a wire protocol break. We'd have to add compatibilities for both things in the client, wait a couple years, and then change. Or we could make it an optional thing based on a client option passed at connect. Which'd also take a good while. Those seem extremely disproportionate complicated solutions for the problem. Nor can I believe that a "SELECT " in the resultset is a meaningful space issue, making it even worthwhile to break compat in the first place. Greetings, Andres Freund