Home > mailing lists

Re: order by and index path - Mailing list pgsql-hackers

From	jwieck@debis.com (Jan Wieck)
Subject	Re: order by and index path
Date	October 15, 1998 07:25:10
Msg-id	m0zTlgT-000EBRC@orion.SAPserv.Hamburg.dsh.de Whole thread Raw
Responses	Re: [HACKERS] Re: order by and index path
List	pgsql-hackers

Tree view

Andread wrote:
> > (who really selects nearly all rows from a 5M row
> >    table?).
>
> Data Warehouse apps
>
> >    This  will hurt if someone really selects most of the rows and the index
> >    scan jumps over the disc.
>
> I think this is a non issue, since if a qual is not restrictive enough,
> the optimizer should choose a seq scan anyway. Doesn' t it do this already ?
> [...]

    Right and wrong.

    Right - it is the optimizers job to decide if an index should
    be used or not. And the decision has to be made based on  the
    cost.

    Wrong  -  PostgreSQL's query optimizer doesn't do it already.
    It assumes that a qualification is always restrictive  enough
    and  chooses  an index scan any time if the qualification can
    be thrown into the indexqual.

    In  the  following  I  only  discuss  the   situation   where
    qualifications can be used in the indexqual.

    Calculating  the  cost of a query is easy. Have N tuples in P
    data-pages  where  the  given  qualification  will  match  M.
    Assuming  that  the  tuples are not in ascending order in the
    data pages, the cost fetching one tuple  by  its  TID  raises
    with  P  (more  seeking necessary). Now you can calculate the
    cost of an index scan by C=M/N*P*F where F is  some  constant
    factor  to  make C comparable to a current seqscan cost value
    (I know, it must be  smarter,  but  for  this  description  a
    simple calculation is enough).

    The  only  problem  is  that  the optimizer has absolutely no
    chance to estimate M (the mystic value as Bruce  called  it).
    In a given qualification

        WHERE key > 0 AND key <= 100

    it cannot know if this would result in 0 or 100% of the rows.
    To estimate that, it needs statistical information about  the
    key  ranges  that  are present. Assume there would be 11 keys
    remembered by the last vacuum run, that break  up  the  whole
    present  key  range  of  10000 tuples into 10 chunks and they
    read

        5 40 70 90 500 600 1000 1100 1400 1500 2000

    where 5 is the lowest key at all, 40 is the key of tuple 1000
    (in  key  order),  70 is the key of tuple 2000 and so on. Now
    looking at the qualification and this key  range  information
    would  tell,  that  the absolute limit of rows returned by an
    index scan would be 3999 (which still could have a key  value
    of 100). But the qualification

        WHERE key >= 50

    could return at max 8999 tuples and

        WHERE key > 50 AND key < 70

    has  a  maximum  of  998  result  tuples.  This  would be the
    information required to make the right decision for the  case
    where all rows selected are wanted.

    We  do  not  have  this statistical information. So the whole
    thing is at this time academic.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

From: "Marc G. Fournier"
Date: 15 October 1998, 05:59:36
Subject: Re: [HACKERS] postmaster locking issues.

From: darcy@druid.net (D'Arcy J.M. Cain)
Date: 15 October 1998, 07:38:09
Subject: Re: [HACKERS] PostgreSQL v6.4 BETA2...

Re: order by and index path - Mailing list pgsql-hackers

Previous

Next