Re: [PATCH] Add sortsupport for range types and btree_gist - Mailing list pgsql-hackers
From | Bernd Helmle |
---|---|
Subject | Re: [PATCH] Add sortsupport for range types and btree_gist |
Date | |
Msg-id | 2f2e88bf1d44d06c1baf27bbf3db880f42a5cb87.camel@oopsware.de Whole thread Raw |
In response to | Re: [PATCH] Add sortsupport for range types and btree_gist (jian he <jian.universality@gmail.com>) |
Responses |
Re: [PATCH] Add sortsupport for range types and btree_gist
|
List | pgsql-hackers |
Am Mittwoch, dem 10.01.2024 um 22:18 +0800 schrieb jian he: > > I split the original author's patch into 2. > 1. Add GiST sortsupport function for all the btree-gist module data > types except anyrange data type (which actually does not in this > module) > 2. Add GiST sortsupport function for anyrange data type. > Please find attached a new version of this patch set with the following changes/adjustments: - Rebased to current master - Heavily reworked *_cmp() functions to properly decode GPT_VARKEY and GBT_KEY input. For some datatypes the btree comparison functions were reused and the input arguments not properly handled. This patch adds dedicated btree_gist sortsupport comparison methods for all datatypes. There was another patch from Andrey Borodin (thanks again for the hint) and a deeper review done by Heikki in [1]. I've incorporated Heikkis findings in this patch, too. [...] I've also updated the btree_gist documentation to reflect the default sorted built strategy this patch introduces now. Additionally i did some benchmarks again on this new version on the patch. Still, index build speed improvement is quite impressive on the dataset originally provided by Christoph Heiss (since its not available anymore i've uploaded it here [2] again): HEAD (Index was built with default buffering setting) --------------------- REINDEX (s) 4809 CREATE INDEX (s) 4920 btree_gist sortsupport ---------------------- REINDEX (s) 573 CREATE INDEX (s) 578 I created another pgbench based custom script to measure the single core speed of the lookup query of the bench-gist.py script. This looks like this: init.sql -------- BEGIN; DROP TABLE IF EXISTS test_dataset; CREATE TABLE test_dataset(keyid integer not null, id text not null, block_range int4range); CREATE TEMP SEQUENCE testset_seq; INSERT INTO test_dataset SELECT nextval('testset_seq'), id, block_range FROM test ORDER BY random() LIMIT 10000; CREATE UNIQUE INDEX ON test_dataset(keyid); COMMIT; bench.pgbench ------------- \set keyid random(1, 10000) SELECT id, block_range FROM test_dataset WHERE keyid = :keyid \gset SELECT id, block_range FROM test WHERE id = ':id' AND block_range && ':block_range'; Run by for in in `seq 1 3`; do psql -qXf init.pgbench && pgbench -n -r -c 1 -T 60 -f bench.pgbench; done With this i get the following (on prewarmed index and table): HEAD ------------------------------------- pgbench single core tps=248,67 btree_gist sortsupport ---------------------------- pgbench single core tps=1830,33 This is an average over 3 runs each (complete results attached). So this looks really impressive and i hope i didn't do something entirely wrong (still learning about this GiST stuff). > what I am confused: > In fmgr.h > > /* > * Support for cleaning up detoasted copies of inputs. This must > only > * be used for pass-by-ref datatypes, and normally would only be used > * for toastable types. If the given pointer is different from the > * original argument, assume it's a palloc'd detoasted copy, and > pfree it. > * NOTE: most functions on toastable types do not have to worry about > this, > * but we currently require that support functions for indexes not > leak > * memory. > */ > #define PG_FREE_IF_COPY(ptr,n) \ > do { \ > if ((Pointer) (ptr) != PG_GETARG_POINTER(n)) \ > pfree(ptr); \ > } while (0) > > but the doc > (https://www.postgresql.org/docs/current/gist-extensibility.html) > says: > All the GiST support methods are normally called in short-lived > memory > contexts; that is, CurrentMemoryContext will get reset after each > tuple is processed. It is therefore not very important to worry about > pfree'ing everything you palloc. > ENDOF_QUOTE > > so I am not sure in range_gist_cmp, we need the following: > ` > if ((Datum) range_a != a) > pfree(range_a); > if ((Datum) range_b != b) > pfree(range_b); > ` Turns out this is not true for sortsupport: the comparison function is called for each tuple during sorting, which will leak the detoasted (and probably copied datum) in the sort memory context. See the same for e.g. numeric and text, which i needed to change to pass the key values correctly to the text_cmp() or numeric_cmp() function in these cases. I've adapted the PG_FREE_IF_COPY() macro for these functions and introduced GBT_FREE_IF_COPY() in btree_utils_var.h, since the former relies on fcinfo. I'll add the patch again to the upcoming CF for another review round. [1] https://www.postgresql.org/message-id/c0846e34-8b3a-e1bf-c88e-021eb241a481%40iki.fi [2] https://drive.google.com/file/d/1CPNFGR53-FUto1zjXPMM2Yrn0GaGfGFz/view?usp=drive_link
Attachment
pgsql-hackers by date: