Re: Minmax indexes - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Minmax indexes |
Date | |
Msg-id | CAA4eK1KgCzOdcoH1ZoiQzS6rqzGp5Gbe9BDMSqXnvoxfDMe-Lg@mail.gmail.com Whole thread Raw |
In response to | Re: Minmax indexes (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Minmax indexes
|
List | pgsql-hackers |
On Thu, Sep 26, 2013 at 1:46 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Amit Kapila escribió: >> On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: > > >> On Windows, patch gives below compilation errors: >> src\backend\access\minmax\mmtuple.c(96): error C2057: expected >> constant expression > > I have fixed all these compile errors (fix attached). Thanks for > reporting them. I'll post a new version shortly. Thanks for fixing it. In last few days I had spent some time reading about minmax or equivalent indexes in other databases (Netezza and Oracle) and going through some parts of your proposal. Its a bit bigger patch and needs much more time, but I would like to share my findings/thoughts I had developed till now. Firstly about interface and use case, as far as I could understand other databases provide this index automatically rather than having a separate Create Index command which may be because such an index can be mainly useful when the data is ordered or if it's distributed in such a way that it's quite useful for repeatedly executing queries. You have proposed it as a command which means user needs to take care of it which I find is okay for first version, later may be we can also have some optimisations so that it can get created automatically. For the page range, If I read correctly, currently you have used hash define, do you want to expose it to user in some way like GUC or maintain it internally and assign the right value based on performance of different queries? Operations on this index seems to be very fast, like Oracle has this as an in-memory structure and I read in Netezza that write operations doesn't carry any significant overhead for zone maps as compare to other indexes, so shouldn't we consider it to be without WAL logged? OTOH I think because these structures get automatically created in those databases, so it might be okay but if we provide it as a command, then user might be bothered if he didn't find it automatically on server restart. Few Questions and observations: 1. + When a new heap tuple is inserted in a summarized page range, it is possible to + compare the existing index tuple with the new heap tuple. If the heap tuple is + outside the minimum/maximum boundaries given by the index tuple for any indexed + column (or if the new heap tuple contains null values but the index tuple + indicate there are no nulls), it is necessary to create a new index tuple with + the new values. To do this, a new index tuple is inserted, and the reverse range + map is updated to point to it. The old index tuple is left in place, for later + garbage collection. Is there a reason why we can't directly update the value rather then new insert in index, as I understand for other indexes like btree we do this because we might need to rollback, but here even if after updating the min or max value, rollback happens, it will not cause any harm (tuple loss). 2. + If the reverse range map points to an invalid TID, the corresponding page range + is not summarized. 3. It might be better if you can mention when range map will point to an invalid TID, it's not explained in your proposal, but you have used it in you proposal to explain some other things. 4. Range reverse map is a good terminology, but isn't Range translation map better. I don't mind either way, it's just a thought came to my mind while understanding concept of Range Reverse map. 5. /** As above, except that instead of scanning the complete heap, only the given* range is scanned. Scan to end-of-rel canbe signalled by passing* InvalidBlockNumber as end block number.*/ double IndexBuildHeapRangeScan(Relation heapRelation, Relation indexRelation, IndexInfo *indexInfo, bool allow_sync, BlockNumber start_blockno, BlockNumber numblocks, IndexBuildCallback callback, void *callback_state) In comments you have used end block number, which parameter does it refer to? I could see only start_blockno and numb locks? 6. currently you are passing 0 as start block and InvalidBlockNumber as number of blocks, what's the logic for it? return IndexBuildHeapRangeScan(heapRelation, indexRelation, indexInfo, allow_sync, 0, InvalidBlockNumber, callback, callback_state); 7. In mmbuildCallback, it only add's tuple to minmax index, if it satisfies page range, else this can lead to waste of big scan incase page range is large (1280 pages as you mentiones in one of your mails). Why can't we include it end of scan? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: