Home > mailing lists

Re: Minmax indexes - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Minmax indexes
Date	September 27, 2013 06:19:13
Msg-id	CAA4eK1KgCzOdcoH1ZoiQzS6rqzGp5Gbe9BDMSqXnvoxfDMe-Lg@mail.gmail.com Whole thread Raw
In response to	Re: Minmax indexes (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses	Re: Minmax indexes
List	pgsql-hackers

Tree view

On Thu, Sep 26, 2013 at 1:46 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Amit Kapila escribió:
>> On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>

>
>>   On Windows, patch gives below compilation errors:
>>   src\backend\access\minmax\mmtuple.c(96): error C2057: expected
>> constant expression
>
> I have fixed all these compile errors (fix attached).  Thanks for
> reporting them.  I'll post a new version shortly.
  Thanks for fixing it. In last few days I had spent some time
reading about minmax or equivalent indexes in other databases (Netezza
and Oracle) and going through some parts of your proposal. Its a bit
bigger patch and needs much more time, but I would like to share my
findings/thoughts I had developed till now.

Firstly about interface and use case, as far as I could understand
other databases provide this index automatically rather than having a
separate Create Index command which may be because such an index can
be mainly useful when the data is ordered or if it's distributed in
such a way that it's quite useful for repeatedly executing queries.
You have proposed it as a command which means user needs to take care
of it which I find is okay for first version, later may be we can also
have some optimisations so that it can get created automatically.
For the page range, If I read correctly, currently you have used hash
define, do you want to expose it to user in some way like GUC or
maintain it internally and assign the right value based on performance
of different queries?

Operations on this index seems to be very fast, like Oracle has this
as an in-memory structure and I read in Netezza that write operations
doesn't carry any significant overhead for zone maps as compare to
other indexes, so shouldn't we consider it to be without WAL logged?
OTOH I think because these structures get automatically created in
those databases, so it might be okay but if we provide it as a
command, then user might be bothered if he didn't find it
automatically on server restart.

Few Questions and observations:
1.
+ When a new heap tuple is inserted in a summarized page range, it is
possible to
+ compare the existing index tuple with the new heap tuple.  If the
heap tuple is
+ outside the minimum/maximum boundaries given by the index tuple for
any indexed
+ column (or if the new heap tuple contains null values but the index tuple
+ indicate there are no nulls), it is necessary to create a new index tuple with
+ the new values.  To do this, a new index tuple is inserted, and the
reverse range
+ map is updated to point to it.  The old index tuple is left in
place, for later
+ garbage collection.

Is there a reason why we can't directly update the value rather then
new insert in index, as I understand for other indexes like btree
we do this because we might need to rollback, but here even if after
updating the min or max value, rollback happens, it will not cause
any harm (tuple loss).

2.
+ If the reverse range map points to an invalid TID, the corresponding
page range
+ is not summarized.

3.
It might be better if you can mention when range map will point to an
invalid TID, it's not explained in your proposal, but you have used it
in you proposal to explain some other things.

4.
Range reverse map is a good terminology, but isn't Range translation
map better. I don't mind either way, it's just a thought came to my
mind while understanding concept of Range Reverse map.

5.
/** As above, except that instead of scanning the complete heap, only the given* range is scanned.  Scan to end-of-rel
canbe signalled by passing* InvalidBlockNumber as end block number.*/ 
double
IndexBuildHeapRangeScan(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool allow_sync,
BlockNumber start_blockno,
BlockNumber numblocks,
IndexBuildCallback callback,
void *callback_state)

In comments you have used end block number, which parameter does it
refer to? I could see only start_blockno and numb locks?

6.
currently you are passing 0 as start block and InvalidBlockNumber as
number of blocks, what's the logic for it?
return IndexBuildHeapRangeScan(heapRelation, indexRelation, indexInfo, allow_sync, 0, InvalidBlockNumber, callback,
callback_state);

7.
In mmbuildCallback, it only add's tuple to minmax index, if it
satisfies page range, else this can lead to waste of big scan incase
page range is large (1280 pages as you mentiones in one of your
mails). Why can't we include it end of scan?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Peter Geoghegan
Date: 27 September 2013, 04:17:19
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Amit Kapila
Date: 27 September 2013, 06:40:49
Subject: Re: Minmax indexes

Re: Minmax indexes - Mailing list pgsql-hackers

Previous

Next