Home > mailing lists

Need help - optimizer trouble - Mailing list pgsql-novice

From	Helge Bahmann
Subject	Need help - optimizer trouble
Date	April 4, 2001 18:03:05
Msg-id	Pine.LNX.4.21.0104042351270.18606-100000@lothlorien.stunet2.tu-freiberg.de Whole thread Raw
Responses	Re: Need help - optimizer trouble Re: Need help - optimizer trouble
List	pgsql-novice

Tree view

Hi!

I am desperate with 7.1beta4; I have a serious performance problem that
does not manifest in 7.0.2. This is currently just a feasability test, so
I am willing to try every hack.

The problematic part of my schema looks as follows:

document       written_by       author           author_keyword
-------- 1   n ----------       ----------       --------------
*doc_id  <---- *doc_id    n   1 *name      1   n *keyword
doc            *author_id ----> *author_id <---- *author_id

columns marked with an asterik have an index

document contains ~100 000 tuples; each is "written_by" 1-3 out of
~10 000 authors, and every author is associated with ~4 keywords

I am trying to retrieve documents written by authors associated with
a given keyword. I can get the doc_ids using:

SELECT doc_id FROM written_by AS wb
    JOIN author_keyword AS kw ON kw.author_id=wb.author_id
    WHERE kw.keyword='foo'

7.1beta4 and 7.0.2 both use the following query plan:

Nested Loop  (cost=0.00..18.37 rows=8 width=20)
  ->  Index Scan using author_keyword_pkey on author_keyword kw  (cost=0.00..2.02 rows=1 width=4)
  ->  Index Scan using written_by_idx on written_by  (cost=0.00..16.29 rows=4 width=16)

I can then take the values returned and retrieve the documents in a
separate query; the two queries take at most a second. However if
I add a join to get the documents directly:

SELECT doc FROM document
    JOIN written_by AS wb ON document.doc_id=wb.doc_id
    JOIN author_keyword AS kw ON kw.author_id=wb.author_id
    WHERE kw.keyword='foo'

7.0.2 will use the following query plan, query takes ~1 second:

Nested Loop  (cost=0.00..27.53 rows=2 width=32)
  ->  Nested Loop  (cost=0.00..10.00 rows=4 width=20)
        ->  Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4)
        ->  Index Scan using written_by_idx on ist_autor_von (cost=0.00..7.95 rows=2 width=16)
  ->  Index Scan using document_pkey on document  (cost=0.00..4.16 rows=1 width=12)

7.1beta4 uses the following query plan, query takes ~150 seconds:

Nested Loop  (cost=8562.09..39846.62 rows=4 width=32)
  ->  Index Scan using author_keyword_pkey on author_keyword kw (cost=0.00..2.02 rows=1 width=4)
  ->  Materialize  (cost=37969.52..37969.52 rows=100000 width=28)
        ->  Hash Join  (cost=8562.09..37969.52 rows=100000 width=28)
              ->  Seq Scan on document  (cost=0.00..11932.00 rows=100000 width=12)
              ->  Hash  (cost=5129.55..5129.55 rows=203555 width=16)
                    ->  Seq Scan on written_by  (cost=0.00..5129.55 rows=303555 width=16)

The two databases are not identical, but they were created using the
same script which fills the database with random data of the above
structure. I recreated the databases several times, so it is completely
repeatable.

I tried playing with the optimizer parameters, but it only got worse
because I do not really understand what they are all doing.

I fell uncomfortable with the idea of having to split up my query into
two separate ones. Can anyone give me any hints how to influence the 7.1
optimizer to behave like 7.0.2 in this case? Is there something wrong with
my query? Is there an alternative form which the optimizer can handle
better?

Thanks for your patience

Helge

pgsql-novice by date:

From: "ADBAAMD"
Date: 04 April 2001, 14:27:55
Subject: Re: Postgresql.7.0.3

From: Neil Conway
Date: 04 April 2001, 18:29:31
Subject: Re: Need help - optimizer trouble

Need help - optimizer trouble - Mailing list pgsql-novice

Previous

Next