Home > mailing lists

Re: possible patch to increase number of hash overflow pages? - Mailing list pgsql-patches

From	Stephen Ramsey
Subject	Re: possible patch to increase number of hash overflow pages?
Date	June 19, 2001 12:47:34
Msg-id	Pine.LNX.4.21.0106190932280.15997-100000@sramsey.ocp.internap.com Whole thread Raw
In response to	Re: possible patch to increase number of hash overflow pages? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: possible patch to increase number of hash overflow pages?
List	pgsql-patches

Tree view

Hello Tom,

Thanks for your reply.

> Just out of curiosity, what's the reason for using a hash index at all?
> The btree index type is much better supported and will do everything
> that a hash index could do (and more).

Good point.  I'm using a btree index currently, and it is working great,
even for tables with 10^8 rows.  I pursued the overflow pages with the
hash access method only because I wanted to compare it with btree to see
which performed better.

> > Looking into the source code a bit, it looked (to my untrained eye) as if
> > it might be possible to increase the number of overflow pages, with a
> > patch to src/include/access/hash.h to use a 32-bit "overflow page address"
> > data type rather than a 16-bit "overflow page address" data type.
>
> I haven't looked much at hash either, but am I right to guess that
> overflow pages are used when an individual hash bucket fills up?
> If so, overrunning a 16-bit field would suggest that you've got more
> than 64K index pages in a single hash bucket ... which does not bode
> well at all for performance.  Seems like the answer is to get the thing
> to use more hash buckets, not to make it possible to support linear
> searches over chains exceeding 64K pages...

I believe that the algorithm only uses 5 of the 16 bits in the
OverflowPageAddress type to identify the "split number" part of the
overflow page address.  These five bits in turn dictate the size (2^5 =
32) of the hashm_spares[] and hashm_mapp[] arrays in the HashMetaPageData
structure.  From the comments in hash.h:

 *
 * The reason that the size is restricted to NCACHED (32) is because
 * the bitmaps are 16 bits: upper 5 represent the splitpoint, lower 11
 * indicate the page number within the splitpoint. Since there are
 * only 5 bits to store the splitpoint, there can only be 32 splitpoints.
 * Both spares[] and bitmaps[] use splitpoints as there indices, so there
 * can only be 32 of them.
 */

It looks (from the hash algorithm code) as if the system is possibly
needing more splitpoints than can be accomodated by the HashMetaPageData
structure, rather than running out of overflow pages, because the error
message that I'm getting is when the "splitnum" variable is greater than
NCACHED, the latter being the array bound for the hashm_spares[] element
of the HashMetaPageData structure.  From src/backend/access/hashovfl.c:

#define OVMSG   "HASH: Out of overflow pages.  Out of luck.\n"

        if (offset > SPLITMASK)
        {
                if (++splitnum >= NCACHED)
                        elog(ERROR, OVMSG);
                metap->OVFL_POINT = splitnum;
                metap->SPARES[splitnum] = metap->SPARES[splitnum - 1];
                metap->SPARES[splitnum - 1]--;
                offset = 0;
        }

So that's why I bumped the number of bits (in the OverflowPageAddress
type) assigned to keep track of splitpoints to 8 bits.

Cheers,
Steve Ramsey

---------------------------------------------
Stephen Ramsey

Software Engineer
Core Software Development
Internap Network Services

e-mail           sramsey@internap.com
telephone     206.504.5361
facsimile     206.447.1870
---------------------------------------------

pgsql-patches by date:

From: Jason Tishler
Date: 19 June 2001, 12:38:04
Subject: Re: Cygwin contrib patch

From: Tom Lane
Date: 19 June 2001, 14:45:06
Subject: Re: Cygwin contrib patch

Re: possible patch to increase number of hash overflow pages? - Mailing list pgsql-patches

Previous

Next