Thread: Profiling the backend (gprof output) [current devel]

Profiling the backend (gprof output) [current devel]

From

Mattias Kregert

Date:

22 January 1998, 11:24:45

Here is the top part of my gprof output from a simple session, creating
two tables, inserting some rows, creating an index and doing a couple
of simple selects (one minute of typing):
----------
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 39.74     12.39    12.39                             mcount (profiler overhead)
  7.86     14.84     2.45   964885     0.00     0.00  fastgetattr
  2.79     15.71     0.87   906153     0.00     0.00  fastgetiattr
  2.44     16.47     0.76                             _psort_cmp
  2.08     17.12     0.65   400783     0.00     0.00  _bt_compare
  1.60     17.62     0.50   125987     0.00     0.01  hash_search
  1.48     18.08     0.46   128756     0.00     0.01  SearchSysCache
  1.28     18.48     0.40   120307     0.00     0.00  SpinAcquire
  1.25     18.87     0.39  1846682     0.00     0.00  fmgr_faddr
  1.06     19.20     0.33   253022     0.00     0.00  StrategyTermEvaluate
  1.03     19.52     0.32    31578     0.01     0.04  heapgettup
  0.99     19.83     0.31   128842     0.00     0.00  CatalogCacheComputeHashIndex
----------
Fastgetattr() doesn't seem to be so fast, after all... or perhaps it would be
best to try and reduce the number of calls to it? One million calls to read
attributes out of tuples seems to me as extreme when we are talking about less
than one hundred rows.

Perhaps it would be better to add a new function 'fastgetattrlist' to retrieve
multiple attributes at once, instead of calling a macro wrapped around another
bunch of macros, calling 'fastgetattr' for each attribute to retrieve?

Or perhaps the tuples could be fitted with a "lookup table" when being stored
in the backend cache? It could take .000005 second or so to build the table and
attach it to the tuple, but it would definitively speed up retrieval of attributes
from that tuple. If the same tuple is searched for its atributtes lots of times (as
seem to be the case) then this would be faster in the end.

Can we afford not to optimize this? I just hate those MySql people showing their
performance figures. PostgreSQL should be the best...


How about this (seemingly) unnecessarily complex part of
access/common/heaptuple.c [fastgetattr] ...
----------
switch (att[i]->attlen)
{
    case sizeof(char):
        off++;        <-- why not 'sizeof(char)'?
        break;
    case sizeof(int16):
        off += sizeof(int16);
        break;
    case sizeof(int32):
        off += sizeof(int32);
        break;
    case -1:
        usecache = false;
        off += VARSIZE(tp + off);
        break;
    default:
        off += att[i]->attlen;
        break;
}
----------

Would it not be faster *and* easier to read if written as:
----------
off += (att[i]->attlen == -1 ? (usecache=false,VARSIZE(tp+off)) : att[i]->attlen);
----------

...or is this some kind of magic which I should not worry about? There are almost
no comments in this code, and most of the stuff is totally incomprehensible to me.

Would it be a good idea to try and optimize things like this, or will these
functions be replace sometime anyway?

/* m */

Re: [HACKERS] Profiling the backend (gprof output) [current devel]

From

Bruce Momjian

Date:

22 January 1998, 13:30:54

>
> Here is the top part of my gprof output from a simple session, creating
> two tables, inserting some rows, creating an index and doing a couple
> of simple selects (one minute of typing):
> ----------
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  39.74     12.39    12.39                             mcount (profiler overhead)
>   7.86     14.84     2.45   964885     0.00     0.00  fastgetattr
>   2.79     15.71     0.87   906153     0.00     0.00  fastgetiattr
>   2.44     16.47     0.76                             _psort_cmp
>   2.08     17.12     0.65   400783     0.00     0.00  _bt_compare
>   1.60     17.62     0.50   125987     0.00     0.01  hash_search
>   1.48     18.08     0.46   128756     0.00     0.01  SearchSysCache
>   1.28     18.48     0.40   120307     0.00     0.00  SpinAcquire
>   1.25     18.87     0.39  1846682     0.00     0.00  fmgr_faddr
>   1.06     19.20     0.33   253022     0.00     0.00  StrategyTermEvaluate
>   1.03     19.52     0.32    31578     0.01     0.04  heapgettup
>   0.99     19.83     0.31   128842     0.00     0.00  CatalogCacheComputeHashIndex
> ----------
> Fastgetattr() doesn't seem to be so fast, after all... or perhaps it would be
> best to try and reduce the number of calls to it? One million calls to read
> attributes out of tuples seems to me as extreme when we are talking about less
> than one hundred rows.
>
> Perhaps it would be better to add a new function 'fastgetattrlist' to retrieve
> multiple attributes at once, instead of calling a macro wrapped around another
> bunch of macros, calling 'fastgetattr' for each attribute to retrieve?
>
> Or perhaps the tuples could be fitted with a "lookup table" when being stored
> in the backend cache? It could take .000005 second or so to build the table and
> attach it to the tuple, but it would definitively speed up retrieval of attributes
> from that tuple. If the same tuple is searched for its atributtes lots of times (as
> seem to be the case) then this would be faster in the end.
>
> Can we afford not to optimize this? I just hate those MySql people showing their
> performance figures. PostgreSQL should be the best...
>
>
> How about this (seemingly) unnecessarily complex part of
> access/common/heaptuple.c [fastgetattr] ...
> ----------
> switch (att[i]->attlen)
> {
>     case sizeof(char):
>         off++;        <-- why not 'sizeof(char)'?
>         break;
>     case sizeof(int16):
>         off += sizeof(int16);
>         break;
>     case sizeof(int32):
>         off += sizeof(int32);
>         break;
>     case -1:
>         usecache = false;
>         off += VARSIZE(tp + off);
>         break;
>     default:
>         off += att[i]->attlen;
>         break;
> }
> ----------
>
> Would it not be faster *and* easier to read if written as:
> ----------
> off += (att[i]->attlen == -1 ? (usecache=false,VARSIZE(tp+off)) : att[i]->attlen);
> ----------
>
> ...or is this some kind of magic which I should not worry about? There are almost
> no comments in this code, and most of the stuff is totally incomprehensible to me.
>
> Would it be a good idea to try and optimize things like this, or will these
> functions be replace sometime anyway?

OK, here is my statement on this.  GO FOR IT.  YOU HAVE THE GREEN LIGHT.
RUN WITH THE BALL.

Yes, PostgreSQL is very modularized, but this modularization causes
severe function all overhead, as you have seen.  I did some tuning in
6.2 that improved some things, but much more needs to be done.

Anything that can be done without making the code harder to understand
or maintain should be done.

Your change to the above switch statement is a good example of a good
cleanup.  If things like this can be improved or cached or made into
macros, let's do it.

The fastgetattr function does quite a bit in terms of reading the tuple,
so you may need to re-code part of it to optimize it.  There is a
attcacheoff value, but that is only removing part of the overhead.

There are three things that make us slower that other databases:
transactions, user-defined type system, and a good optimizer, which can
slow small queries, but makes large queries much faster.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Profiling the backend (gprof output) [current devel]

From

Bruce Momjian

Date:

22 January 1998, 13:47:35

> Anything that can be done without making the code harder to understand
> or maintain should be done.

One thing.  If you find a better/faster/clearer way to code something,
and it appears in several areas/files of the code, it should probably be
done in all those parts, not just the part that gets used a lot.


--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Profiling the backend (gprof output) [current devel]

From

Bruce Momjian

Date:

29 January 1998, 10:02:10

I think I am on to something with fastgetattr.  I will send a patch in a
few hours.

>
> Here is the top part of my gprof output from a simple session, creating
> two tables, inserting some rows, creating an index and doing a couple
> of simple selects (one minute of typing):
> ----------
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  39.74     12.39    12.39                             mcount (profiler overhead)
>   7.86     14.84     2.45   964885     0.00     0.00  fastgetattr
>   2.79     15.71     0.87   906153     0.00     0.00  fastgetiattr
>   2.44     16.47     0.76                             _psort_cmp
>   2.08     17.12     0.65   400783     0.00     0.00  _bt_compare
>   1.60     17.62     0.50   125987     0.00     0.01  hash_search
>   1.48     18.08     0.46   128756     0.00     0.01  SearchSysCache
>   1.28     18.48     0.40   120307     0.00     0.00  SpinAcquire
>   1.25     18.87     0.39  1846682     0.00     0.00  fmgr_faddr
>   1.06     19.20     0.33   253022     0.00     0.00  StrategyTermEvaluate
>   1.03     19.52     0.32    31578     0.01     0.04  heapgettup
>   0.99     19.83     0.31   128842     0.00     0.00  CatalogCacheComputeHashIndex
> ----------
> Fastgetattr() doesn't seem to be so fast, after all... or perhaps it would be
> best to try and reduce the number of calls to it? One million calls to read
> attributes out of tuples seems to me as extreme when we are talking about less
> than one hundred rows.
>
> Perhaps it would be better to add a new function 'fastgetattrlist' to retrieve
> multiple attributes at once, instead of calling a macro wrapped around another
> bunch of macros, calling 'fastgetattr' for each attribute to retrieve?
>
> Or perhaps the tuples could be fitted with a "lookup table" when being stored
> in the backend cache? It could take .000005 second or so to build the table and
> attach it to the tuple, but it would definitively speed up retrieval of attributes
> from that tuple. If the same tuple is searched for its atributtes lots of times (as
> seem to be the case) then this would be faster in the end.
>
> Can we afford not to optimize this? I just hate those MySql people showing their
> performance figures. PostgreSQL should be the best...
>
>
> How about this (seemingly) unnecessarily complex part of
> access/common/heaptuple.c [fastgetattr] ...
> ----------
> switch (att[i]->attlen)
> {
>     case sizeof(char):
>         off++;        <-- why not 'sizeof(char)'?
>         break;
>     case sizeof(int16):
>         off += sizeof(int16);
>         break;
>     case sizeof(int32):
>         off += sizeof(int32);
>         break;
>     case -1:
>         usecache = false;
>         off += VARSIZE(tp + off);
>         break;
>     default:
>         off += att[i]->attlen;
>         break;
> }
> ----------
>
> Would it not be faster *and* easier to read if written as:
> ----------
> off += (att[i]->attlen == -1 ? (usecache=false,VARSIZE(tp+off)) : att[i]->attlen);
> ----------
>
> ...or is this some kind of magic which I should not worry about? There are almost
> no comments in this code, and most of the stuff is totally incomprehensible to me.
>
> Would it be a good idea to try and optimize things like this, or will these
> functions be replace sometime anyway?
>
> /* m */
>
>


--
Bruce Momjian
maillist@candle.pha.pa.us