Thread: Massive memory use for star query

Massive memory use for star query

From

Mark Kirkwood

Date:

15 April 2011, 01:36:02

I've recently seen examples of star-like queries using vast amounts of
memory in one of our production systems. Here's a simplified example
using synthetic data (see attached to generate if desired):

SET geqo_threshold = 14;
SET from_collapse_limit = 14;
SET join_collapse_limit = 14;

EXPLAIN
SELECT
     1
FROM node n
JOIN nodekeyword kwn0 ON (n.nodeid = kwn0.nodeid)
JOIN keyword kw0 ON (kwn0.keywordid = kw0.keywordid)
JOIN nodekeyword kwn1 ON (n.nodeid = kwn1.nodeid)
JOIN keyword kw1 ON (kwn1.keywordid = kw1.keywordid)
JOIN nodekeyword kwn2 ON (n.nodeid = kwn2.nodeid)
JOIN keyword kw2 ON (kwn2.keywordid = kw2.keywordid)
JOIN nodekeyword kwn3 ON (n.nodeid = kwn3.nodeid)
JOIN keyword kw3 ON (kwn3.keywordid = kw3.keywordid)
JOIN nodekeyword kwn4 ON (n.nodeid = kwn4.nodeid)
JOIN keyword kw4 ON (kwn4.keywordid = kw4.keywordid)
JOIN nodekeyword kwn5 ON (n.nodeid = kwn5.nodeid)
JOIN keyword kw5 ON (kwn5.keywordid = kw5.keywordid)
WHERE   kw0.keyword = 'sscghryv'
AND   kw1.keyword = 'sscghryv'
AND   kw2.keyword = 'sscghryv'
AND   kw3.keyword = 'sscghryv'
AND   kw4.keyword = 'sscghryv'
AND   kw5.keyword = 'sscghryv'
;

Here's what a ps listing looks like:

VSZ         RSS         SZ        CMD
1849524 1793680 1791144 postgres: postgres test [local] EXPLAIN

So we are using 1.7G doing an *EXPLAIN* - so presumably this is gonna be
the join search planning getting expensive for 13 tables. Is it expected
that this much memory could/would be used? Could this be evidence of a
leak?

Note this is a default 9.1 (2011-04-07) build w/o asserts, with a
default postgresql.conf.

Clearly this particular query is a bit dumb, making the keyword
predicates have different values results in much better behaved planning
memory usage... and also allowing geqo to do the join search for us
prevents the high memory use (however geqo has its own problems.... in
the production variant of this query *one* of the plans it would pick
liked to use >100G of temp space to execute...and there are only 100G
available...sigh). However for these semi ad-hoc systems it is hard to
prevent dumb queries altogether!

regards

Mark

Attachment

starjoin.tar.gz

Re: Massive memory use for star query

From

"Kevin Grittner"

Date:

15 April 2011, 10:59:40

Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:

> Here's a simplified example using synthetic data (see attached to
> generate if desired):

Doesn't work for me:

kgrittn@kgrittn-desktop:~/work/starjoin$ ./gendata.pl
generate cat
cannot open cat.dat: No such file or directory at ./gendata.pl line
17.

> Here's what a ps listing looks like:
>
> VSZ         RSS         SZ        CMD
> 1849524 1793680 1791144 postgres: postgres test [local] EXPLAIN

If you run pmap -d on the pid, what does the last line look like?

-Kevin

Re: Massive memory use for star query

From

"Kevin Grittner"

Date:

15 April 2011, 12:14:54

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
>
>> Here's a simplified example using synthetic data (see attached to
>> generate if desired):
>
> Doesn't work for me:

Edited scripts to change hard-coded directory.  Issue confirmed.

> If you run pmap -d on the pid, what does the last line look like?

$ pmap -d 5869|grep '^mapped' ; ps aux|grep '^kgrittn   5869 '
mapped: 1132044K    writeable/private: 1084480K    shared: 38436K
kgrittn   5869  101 35.4 1139664 1094784 ?     Rs   10:01   0:20
postgres: kgrittn test [local] EXPLAIN

This wasn't necessarily at the peak.  Attached is `vmstat 1` output
during the EXPLAIN.  About 1.1 GB private writeable memory consumed
on my HEAD build on kubuntu 32 bit.

-Kevin

Attachment

vmstat-during-explain.txt

Re: Massive memory use for star query

From

Tom Lane

Date:

15 April 2011, 13:43:17

Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> I've recently seen examples of star-like queries using vast amounts of
> memory in one of our production systems. Here's a simplified example
> using synthetic data (see attached to generate if desired):

> SET geqo_threshold = 14;
> SET from_collapse_limit = 14;
> SET join_collapse_limit = 14;

Well, if you're going to do the above, you should be expecting the
planner to eat a lot of memory.  There is a reason why the default
values of those parameters are significantly lower than that ...

            regards, tom lane

Re: Massive memory use for star query

From

Mark Kirkwood

Date:

16 April 2011, 04:17:20

On 16/04/11 01:59, Kevin Grittner wrote:
> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  wrote:
>
>> Here's a simplified example using synthetic data (see attached to
>> generate if desired):
>
> Doesn't work for me:
>
> kgrittn@kgrittn-desktop:~/work/starjoin$ ./gendata.pl
> generate cat
> cannot open cat.dat: No such file or directory at ./gendata.pl line
> 17.

Apologies Kevin, I stuffed up the edit to supposedly make it easier for
you all to choose your own place to write the files (left one hard coded
for the table 'cat').

Cheers

Mark

Re: Massive memory use for star query

From

Mark Kirkwood

Date:

16 April 2011, 04:21:40

On 16/04/11 04:43, Tom Lane wrote:
> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
>> I've recently seen examples of star-like queries using vast amounts of
>> memory in one of our production systems. Here's a simplified example
>> using synthetic data (see attached to generate if desired):
>> SET geqo_threshold = 14;
>> SET from_collapse_limit = 14;
>> SET join_collapse_limit = 14;
> Well, if you're going to do the above, you should be expecting the
> planner to eat a lot of memory.  There is a reason why the default
> values of those parameters are significantly lower than that ...
>

Ok - so with the settings at their defaults geqo chooses a semi-random
plan, and at least one of those (for the production variant of this
query anyway) eat massive (>100G) amounts of temp space - not really a
suitable outcome either.

I guess you have answered my first question - i.e yes this should eat
massive amount of ram as written - however are you sure there is no
memory leaking going on here?

regards

Mark

Re: Massive memory use for star query

From

Greg Stark

Date:

16 April 2011, 08:09:18

On Sat, Apr 16, 2011 at 8:21 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
>
> I guess you have answered my first question - i.e yes this should eat
> massive amount of ram as written - however are you sure there is no memory
> leaking going on here?

The planner doesn't try to free up memory while it's working, it
generally assumes that producing a plan is a short process and when
it's done it'll free the whole context and that's enough.

The basic problem is that the number of possible plans blows up
combinatorically. That is with 14 tables there are 14! possible join
orderings and more something like 3^(14!) possible join strategies --
actually more if you include things like whether to materialize and
which keys to use and so on.

The planner uses various heuristics to avoid combinatoric growth
wherever it can but there's no way to completely avoid it.
--
greg

Re: Massive memory use for star query

From

Tom Lane

Date:

16 April 2011, 11:58:54

Greg Stark <gsstark@mit.edu> writes:
> On Sat, Apr 16, 2011 at 8:21 AM, Mark Kirkwood
> <mark.kirkwood@catalyst.net.nz> wrote:
>> I guess you have answered my first question - i.e yes this should eat
>> massive amount of ram as written - however are you sure there is no memory
>> leaking going on here?

> The planner uses various heuristics to avoid combinatoric growth
> wherever it can but there's no way to completely avoid it.

Yeah.  The collapse_limit variables can be seen as another heuristic to
deal with this type of problem: they artificially limit the number of
combinations considered by forcing the join search to be broken down
into subproblems.  The trouble of course is that this breakdown is
pretty stupid and can easily prevent the best join order from ever being
considered.

If you've got a small number of such query types that you can afford to
spend some manual effort on, here's what I'd do:

1. With those three planner variables cranked up to more than the number
of relations in the query (if possible), run an EXPLAIN, or better
EXPLAIN ANALYZE so you can confirm you get a good plan.

2. Observe the join order selected in the good plan.

3. Rearrange your query so that the tables are explicitly JOINed in that
order.  Don't use the FROM-comma-list style.

4. Now, in your production app, *reduce* join_collapse_limit to a small
value, maybe even 1, to force the syntactic JOIN order to be followed.
(Obviously, don't keep it there when running queries you haven't
hand-optimized this way.)

This will force the planner to consider only small subproblems, which
will make it both much faster and much less memory-hungry than when it's
trying to solve a large join problem from scratch.

            regards, tom lane

Re: Massive memory use for star query

From

Mark Kirkwood

Date:

16 April 2011, 20:19:11

On 15/04/11 16:35, Mark Kirkwood wrote:
>  Here's a simplified example using synthetic data (see attached to
> generate if desired):
>

For anyone else who might be want to play with this:

Patch with correction to make the directory reassignment work correctly,
plus an additional comment in the README mentioning the need to set this
in the generator and loading scripts.

Thanks

Mark

Attachment

starjoin-dirs.patch

Re: Massive memory use for star query

From

Mark Kirkwood

Date:

16 April 2011, 20:54:42

On 17/04/11 02:58, Tom Lane wrote:
> Greg Stark<gsstark@mit.edu>  writes:
>> The planner uses various heuristics to avoid combinatoric growth
>> wherever it can but there's no way to completely avoid it.
> Yeah.  The collapse_limit variables can be seen as another heuristic to
> deal with this type of problem: they artificially limit the number of
> combinations considered by forcing the join search to be broken down
> into subproblems.  The trouble of course is that this breakdown is
> pretty stupid and can easily prevent the best join order from ever being
> considered.
>
> If you've got a small number of such query types that you can afford to
> spend some manual effort on, here's what I'd do:
>
> 1. With those three planner variables cranked up to more than the number
> of relations in the query (if possible), run an EXPLAIN, or better
> EXPLAIN ANALYZE so you can confirm you get a good plan.
>
> 2. Observe the join order selected in the good plan.
>
> 3. Rearrange your query so that the tables are explicitly JOINed in that
> order.  Don't use the FROM-comma-list style.
>
> 4. Now, in your production app, *reduce* join_collapse_limit to a small
> value, maybe even 1, to force the syntactic JOIN order to be followed.
> (Obviously, don't keep it there when running queries you haven't
> hand-optimized this way.)
>
> This will force the planner to consider only small subproblems, which
> will make it both much faster and much less memory-hungry than when it's
> trying to solve a large join problem from scratch.
>

We've sort of done an equivalent thing as a temporary fix - restricted
the page generating these queries to one or two keywords to tame the
number of tables joined in.

We are only seeing this type of query being generated in a very specific
part of the application (keyword search), and I've been encouraging a
redesign in that area anyway as I don't believe it is necessary to
require so many joins to achieve what they wish to do - so this is
really the clincher for a redesign.

I will get 'em to reduce the *collapse limits too.

Thanks to all of you for your help, regards

Mark