Re: Hash Indexes - Mailing list pgsql-hackers
From | Jesper Pedersen |
---|---|
Subject | Re: Hash Indexes |
Date | |
Msg-id | a49b17a9-bf5c-822b-4a89-be7d9fdca35c@redhat.com Whole thread Raw |
In response to | Re: Hash Indexes (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Hash Indexes
|
List | pgsql-hackers |
Hi, On 09/14/2016 07:24 AM, Amit Kapila wrote: > On Wed, Sep 14, 2016 at 12:29 AM, Jesper Pedersen > <jesper.pedersen@redhat.com> wrote: >> On 09/13/2016 07:26 AM, Amit Kapila wrote: >>> >>> Attached, new version of patch which contains the fix for problem >>> reported on write-ahead-log of hash index thread [1]. >>> >> >> I have been testing patch in various scenarios, and it has a positive >> performance impact in some cases. >> >> This is especially seen in cases where the values of the indexed column are >> unique - SELECTs can see a 40-60% benefit over a similar query using b-tree. >> > > Here, I think it is better if we have the data comparing the situation > of hash index with respect to HEAD as well. What I mean to say is > that you are claiming that after the hash index improvements SELECT > workload is 40-60% better, but where do we stand as of HEAD? > The tests I have done are with a copy of a production database using the same queries sent with a b-tree index for the primary key, and the same with a hash index. Those are seeing a speed-up of the mentioned 40-60% in execution time - some involve JOINs. Largest of those tables is 390Mb with a CHAR() based primary key. >> UPDATE also sees an improvement. >> > > Can you explain this more? Is it more compare to HEAD or more as > compare to Btree? Isn't this contradictory to what the test in below > mail shows? > Same thing here - where the fields involving the hash index aren't updated. >> In cases where the indexed column value isn't unique, it takes a long time >> to build the index due to the overflow page creation. >> >> Also in cases where the index column is updated with a high number of >> clients, ala >> >> -- ddl.sql -- >> CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val; >> CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id); >> CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val); >> ANALYZE; >> >> -- test.sql -- >> \set id random(1,10) >> \set val random(0,10) >> BEGIN; >> UPDATE test SET val = :val WHERE id = :id; >> COMMIT; >> >> w/ 100 clients - it takes longer than the b-tree counterpart (2921 tps for >> hash, and 10062 tps for b-tree). >> > > Thanks for doing the tests. Have you applied both concurrent index > and cache the meta page patch for these tests? So from above tests, > we can say that after these set of patches read-only workloads will be > significantly improved even better than btree in quite-a-few useful > cases. Agreed. > However when the indexed column is updated, there is still a > large gap as compare to btree (what about the case when the indexed > column is not updated in read-write transaction as in our pgbench > read-write transactions, by any chance did you ran any such test?). I have done a run to look at the concurrency / TPS aspect of the implementation - to try something different than Mark's work on testing the pgbench setup. With definitions as above, with SELECT as -- select.sql -- \set id random(1,10) BEGIN; SELECT * FROM test WHERE id = :id; COMMIT; and UPDATE/Indexed with an index on 'val', and finally UPDATE/Nonindexed w/o one. [1] [2] [3] is new_hash - old_hash is the existing hash implementation on master. btree is master too. Machine is a 28C/56T with 256Gb RAM with 2 x RAID10 SSD for data + wal. Clients ran with -M prepared. [1] https://www.postgresql.org/message-id/CAA4eK1+ERbP+7mdKkAhJZWQ_dTdkocbpt7LSWFwCQvUHBXzkmA@mail.gmail.com [2] https://www.postgresql.org/message-id/CAD__OujvYghFX_XVkgRcJH4VcEbfJNSxySd9x=1Wp5VyLvkf8Q@mail.gmail.com [3] https://www.postgresql.org/message-id/CAA4eK1JUYr_aB7BxFnSg5+JQhiwgkLKgAcFK9bfD4MLfFK6Oqw@mail.gmail.com Don't know if you find this useful due to the small number of rows, but let me know if there are other tests I can run, f.ex. bump the number of rows. > I > think we need to focus on improving cases where index columns are > updated, but it is better to do that work as a separate patch. > Ok. Best regards, Jesper
Attachment
pgsql-hackers by date: