Home > mailing lists

Re: Is it really such a good thing for newNode() to be a macro? - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Is it really such a good thing for newNode() to be a macro?
Date	August 29, 2008 16:17:12
Msg-id	26133.1220037409@sss.pgh.pa.us Whole thread Raw
In response to	Re: Is it really such a good thing for newNode() to be a macro? (Peter Eisentraut <peter_e@gmx.net>)
Responses	Re: Is it really such a good thing for newNode() to be a macro? Re: Is it really such a good thing for newNode() to be a macro?
List	pgsql-hackers

Tree view

Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> I considered that one, but since part of my argument is that inlining
>> this is a waste of code space, it seems like a better inlining
>> technology isn't really the answer.

> The compiler presumably has the intelligence and the command-line options to 
> control how much inlining one wants to do.  But without any size vs. 
> performance measurements it is an idle discussion.  Getting rid of a global 
> variable and macro ugliness is a worthwhile goal of its own.

I got around to doing some experiments.  The method suggested by Heikki
(out-of-line subroutine for everything except the MemSetTest) reduces
the size of the backend executable by about 0.5% (about 20K) in CVS HEAD
on Fedora 9 x86_64, in a non-assert-enabled build.  However it also
makes it measurably slower.  I couldn't detect any difference in a
regular pgbench run, so instead I timed iterations of this:

explain select * from  tenk1 a join tenk1 b using(unique1) join tenk1 c on a.unique1 = c.unique2 join tenk1 d on
a.unique1= d.thousand join tenk1 e on a.unique1 = e.ten join tenk1 f on a.unique1 = f.tenthous join tenk1 g on
a.unique1= g.unique2

where exists(select 1 from int4_tbl where f1 = b.unique2);

in the regression database.  Put the above (as a single line!) into
"explainjoin.sql" and try

pgbench -c 1 -t 100 -n -f explainjoin.sql regression

This is mostly stressing the planner, which is pretty newNode-heavy.
I get consistently about 14.1 tps on straight CVS HEAD and about 13.8
with the partially out-of-line implementation.

I also tried the "static inline" implementation, but that doesn't work
at all: gcc refuses to inline it, which makes the palloc0fast call a
dead loss.

So indeed what we need here is a better inlining technology.  I looked
into using gcc's "a compound statement enclosed in parentheses is an
expression" extension, thus:

#define newNode(size, tag) \
({  Node *newNodeMacroHolder; \   AssertMacro((size) >= sizeof(Node));        /* need the tag, at least */ \
newNodeMacroHolder= (Node *) palloc0fast(size); \   newNodeMacroHolder->type = (tag); \   newNodeMacroHolder; \

})

This gets rid of the global, but incredibly, it's even slower: 13.5 tps
on the explain test.  I do not understand that result.  I looked at the
generated machine code to verify that it was what I expected, and indeed
it's about the same as CVS HEAD except that there's no store-and-fetch
into a global.

Getting rid of the global variable accesses reduces the size of the
backend by about 12K on this architecture, and the only theory I can
think of is that that moves things around enough to make the instruction
cache less efficient on some code path that this test happens to
exercise heavily.

In theory the above implementation of newNode should be a clear win,
so I'm thinking this result must be an artifact of some kind.  I'm
going to go try it on PPC and HPPA machines next; does anyone want to
try it on something else?
        regards, tom lane

pgsql-hackers by date:

From: "D'Arcy J.M. Cain"
Date: 29 August 2008, 15:32:09
Subject: Re: Proposal: new border setting in psql

From: Robert Treat
Date: 29 August 2008, 18:01:25
Subject: Re: [patch] GUC source file and line number

Re: Is it really such a good thing for newNode() to be a macro? - Mailing list pgsql-hackers

Previous

Next