Thread: computing completion tag is expensive for pgbench -S -M prepared

computing completion tag is expensive for pgbench -S -M prepared

From

Andres Freund

Date:

07 June 2018, 07:13:41

Hi,

While looking at a profile I randomly noticed that we spend a surprising
amount of time in snprintf() and its subsidiary functions. That turns
out to be
                    if (strcmp(portal->commandTag, "SELECT") == 0)
                        snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
                                 "SELECT " UINT64_FORMAT, nprocessed);

in PortalRun().  That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here.  A quick benchmark
replacing it with:

                       memcpy(completionTag, "SELECT ", sizeof("SELECT "));
                       pg_lltoa(nprocessed, completionTag + 7);

yields nearly a ~2% increase in TPS. Larger than I expected.  The code
is obviously less pretty, but it's also not actually that bad.

Attached is the patch I used for benchmarking. I wonder if I just hit
some specific version of glibc that regressed snprintf performance, or
whether others can reproduce this.

If it actually reproducible, I think we should go for it. But update the
rest of the completionTag writes in the same file too.

Greetings,

Andres Freund

Attachment

completion-tag.patch

Re: computing completion tag is expensive for pgbench -S -M prepared

From

David Rowley

Date:

07 June 2018, 08:01:47

On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:
> in PortalRun().  That's actually fairly trivial to optimize - we don't
> need the full blown snprintf machinery here.  A quick benchmark
> replacing it with:
>
>                        memcpy(completionTag, "SELECT ", sizeof("SELECT "));
>                        pg_lltoa(nprocessed, completionTag + 7);

I'd also noticed something similar with some recent benchmarks I was
doing for INSERTs into partitioned tables. In my case I saw as high as
0.7% of the time spent building the INSERT tag. So I think it's worth
fixing this.

I think it would be better to invent a function that accepts a
CmdType, int64 and Oid that copies the tag into the supplied buffer,
then make a more generic change that also replaces the code in
ProcessQuery() which builds the tag. I'm sure there must be some way
to get the CmdType down to the place you've patched so we can get rid
of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Simon Riggs

Date:

07 June 2018, 13:01:26

On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:
>> in PortalRun().  That's actually fairly trivial to optimize - we don't
>> need the full blown snprintf machinery here.  A quick benchmark
>> replacing it with:
>>
>>                        memcpy(completionTag, "SELECT ", sizeof("SELECT "));
>>                        pg_lltoa(nprocessed, completionTag + 7);
>
> I'd also noticed something similar with some recent benchmarks I was
> doing for INSERTs into partitioned tables. In my case I saw as high as
> 0.7% of the time spent building the INSERT tag. So I think it's worth
> fixing this.
>
> I think it would be better to invent a function that accepts a
> CmdType, int64 and Oid that copies the tag into the supplied buffer,
> then make a more generic change that also replaces the code in
> ProcessQuery() which builds the tag. I'm sure there must be some way
> to get the CmdType down to the place you've patched so we can get rid
> of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Sounds better

Do we actually need the completion tag at all? In most cases??

Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Pavel Stehule

Date:

07 June 2018, 13:29:09

2018-06-07 12:01 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

On 7 June 2018 at 06:01, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:
>> in PortalRun(). That's actually fairly trivial to optimize - we don't
>> need the full blown snprintf machinery here. A quick benchmark
>> replacing it with:
>>
>> memcpy(completionTag, "SELECT ", sizeof("SELECT "));
>> pg_lltoa(nprocessed, completionTag + 7);
>
> I'd also noticed something similar with some recent benchmarks I was
> doing for INSERTs into partitioned tables. In my case I saw as high as
> 0.7% of the time spent building the INSERT tag. So I think it's worth
> fixing this.
>
> I think it would be better to invent a function that accepts a
> CmdType, int64 and Oid that copies the tag into the supplied buffer,
> then make a more generic change that also replaces the code in
> ProcessQuery() which builds the tag. I'm sure there must be some way
> to get the CmdType down to the place you've patched so we can get rid
> of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Sounds better

Do we actually need the completion tag at all? In most cases??

affected rows is taken from this value on protocol level

Regards

Pavel

Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Simon Riggs

Date:

07 June 2018, 13:40:48

On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Do we actually need the completion tag at all? In most cases??
>
>
> affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Tom Lane

Date:

07 June 2018, 17:30:14

Andres Freund <andres@anarazel.de> writes:
> ...  That's actually fairly trivial to optimize - we don't
> need the full blown snprintf machinery here.  A quick benchmark
> replacing it with:

>                        memcpy(completionTag, "SELECT ", sizeof("SELECT "));
>                        pg_lltoa(nprocessed, completionTag + 7);

While I don't have any objection to this change if the speedup is
reproducible, I do object to spelling the same constant as
'sizeof("SELECT ")' and '7' on adjacent lines ...

            regards, tom lane

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Andres Freund

Date:

07 June 2018, 21:20:10

On 2018-06-07 11:40:48 +0100, Simon Riggs wrote:
> On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 
> >> Do we actually need the completion tag at all? In most cases??
> >
> >
> > affected rows is taken from this value on protocol level
> 
> I didn't mean we should remove the number of rows. Many things rely on that.

How do you mean it then? We can't really easily change how we return
that value on the protocol level, and the command tag is basically just
returned as a string in the protocol.  If we were to design the protocol
today I'm sure we just would declare the rowcount and affected rowcounts
separate fields or something, but ...

Greetings,

Andres Freund

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Andres Freund

Date:

07 June 2018, 21:21:04

On 2018-06-07 10:30:14 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > ...  That's actually fairly trivial to optimize - we don't
> > need the full blown snprintf machinery here.  A quick benchmark
> > replacing it with:
> 
> >                        memcpy(completionTag, "SELECT ", sizeof("SELECT "));
> >                        pg_lltoa(nprocessed, completionTag + 7);
> 
> While I don't have any objection to this change if the speedup is
> reproducible, I do object to spelling the same constant as
> 'sizeof("SELECT ")' and '7' on adjacent lines ...

Hah, yes. Nor would I want to keep the #if 0 around it ;).  I mostly
wanted to know whether others can reproduce the effect - the actual
patch would need to be bigger and affect more places.

Greetings,

Andres Freund

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Andres Freund

Date:

07 June 2018, 21:31:35

On 2018-06-07 17:01:47 +1200, David Rowley wrote:
> On 7 June 2018 at 16:13, Andres Freund <andres@anarazel.de> wrote:
> > in PortalRun().  That's actually fairly trivial to optimize - we don't
> > need the full blown snprintf machinery here.  A quick benchmark
> > replacing it with:
> >
> >                        memcpy(completionTag, "SELECT ", sizeof("SELECT "));
> >                        pg_lltoa(nprocessed, completionTag + 7);
> 
> I'd also noticed something similar with some recent benchmarks I was
> doing for INSERTs into partitioned tables. In my case I saw as high as
> 0.7% of the time spent building the INSERT tag. So I think it's worth
> fixing this.

I'm kinda surprised that I never noticed this until recently. I wonder
if there's a glibc or compiler regression causing this. There's some
string optimization passes, it could be that it now does less.


> I think it would be better to invent a function that accepts a
> CmdType, int64 and Oid that copies the tag into the supplied buffer,
> then make a more generic change that also replaces the code in
> ProcessQuery() which builds the tag. I'm sure there must be some way
> to get the CmdType down to the place you've patched so we can get rid
> of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Generally sounds reasonable, I'm not sure it's worth to do the surgery
to avoid the strcmp(). That's a larger change that's somewhat
independent...


Greetings,

Andres Freund

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Simon Riggs

Date:

07 June 2018, 22:24:41

On 7 June 2018 at 19:20, Andres Freund <andres@anarazel.de> wrote:
> On 2018-06-07 11:40:48 +0100, Simon Riggs wrote:
>> On 7 June 2018 at 11:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> >> Do we actually need the completion tag at all? In most cases??
>> >
>> >
>> > affected rows is taken from this value on protocol level
>>
>> I didn't mean we should remove the number of rows. Many things rely on that.
>
> How do you mean it then? We can't really easily change how we return
> that value on the protocol level, and the command tag is basically just
> returned as a string in the protocol.  If we were to design the protocol
> today I'm sure we just would declare the rowcount and affected rowcounts
> separate fields or something, but ...

I meant remove the pointless noise word at the start of the tag that
few clients care about.

I was thinking of just returning "SQL" instead, but that wasn't after
much thought.

But now I think about it more returning any fixed string, "SQL" or
"SELECT", in the protocol seems like a waste of bandwidth and a
potential route in to decrypt the stream.

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Tom Lane

Date:

07 June 2018, 22:27:08

Simon Riggs <simon@2ndquadrant.com> writes:
> If we're going to compress the protocol, it seems sensible to remove
> extraneous information first.

Breaking the wire protocol was nowhere in this thread.

            regards, tom lane

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Simon Riggs

Date:

07 June 2018, 22:34:39

On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> If we're going to compress the protocol, it seems sensible to remove
>> extraneous information first.
>
> Breaking the wire protocol was nowhere in this thread.

No, it wasn't. But there is another thread on the subject of
compressing libpq, which is what I was referring to.

Andres' patch is clearly very efficient at adding the SELECT tag. I am
questioning if we can remove that need altogether.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

From

Andres Freund

Date:

07 June 2018, 22:48:36

On 2018-06-07 20:34:39 +0100, Simon Riggs wrote:
> On 7 June 2018 at 20:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> >> If we're going to compress the protocol, it seems sensible to remove
> >> extraneous information first.
> >
> > Breaking the wire protocol was nowhere in this thread.
> 
> No, it wasn't. But there is another thread on the subject of
> compressing libpq, which is what I was referring to.
> 
> Andres' patch is clearly very efficient at adding the SELECT tag. I am
> questioning if we can remove that need altogether.

That'd be a wire protocol break. We'd have to add compatibilities for
both things in the client, wait a couple years, and then change. Or we
could make it an optional thing based on a client option passed at
connect. Which'd also take a good while.  Those seem extremely
disproportionate complicated solutions for the problem.  Nor can I
believe that a "SELECT " in the resultset is a meaningful space issue,
making it even worthwhile to break compat in the first place.

Greetings,

Andres Freund