Re: Write Ahead Logging for Hash Indexes - Mailing list pgsql-hackers
From | Mark Kirkwood |
---|---|
Subject | Re: Write Ahead Logging for Hash Indexes |
Date | |
Msg-id | 47cbd716-1a43-9f55-28ba-26e6eeabd563@catalyst.net.nz Whole thread Raw |
In response to | Re: Write Ahead Logging for Hash Indexes (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Write Ahead Logging for Hash Indexes
|
List | pgsql-hackers |
On 09/09/16 07:09, Jeff Janes wrote: > On Wed, Sep 7, 2016 at 3:29 AM, Ashutosh Sharma <ashu.coek88@gmail.com > <mailto:ashu.coek88@gmail.com>> wrote: > > > Thanks to Ashutosh Sharma for doing the testing of the patch and > > helping me in analyzing some of the above issues. > > Hi All, > > I would like to summarize the test-cases that i have executed for > validating WAL logging in hash index feature. > > 1) I have mainly ran the pgbench test with read-write workload at the > scale factor of 1000 and various client counts like 16, 64 and 128 for > time duration of 30 mins, 1 hr and 24 hrs. I have executed this test > on highly configured power2 machine with 128 cores and 512GB of RAM. I > ran the test-case both with and without the replication setup. > > Please note that i have changed the schema of pgbench tables created > during initialisation phase. > > The new schema of pgbench tables looks as shown below on both master > and standby: > > postgres=# \d pgbench_accounts > Table "public.pgbench_accounts" > Column | Type | Modifiers > ----------+---------------+----------- > aid | integer | not null > bid | integer | > abalance | integer | > filler | character(84) | > Indexes: > "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) > "pgbench_accounts_bid" hash (bid) > > postgres=# \d pgbench_history > Table "public.pgbench_history" > Column | Type | Modifiers > --------+-----------------------------+----------- > tid | integer | > bid | integer | > aid | integer | > delta | integer | > mtime | timestamp without time zone | > filler | character(22) | > Indexes: > "pgbench_history_bid" hash (bid) > > > Hi Ashutosh, > > This schema will test the maintenance of hash indexes, but it will > never use hash indexes for searching, so it limits the amount of test > coverage you will get. While searching shouldn't generate novel types > of WAL records (that I know of), it will generate locking and timing > issues that might uncover bugs (if there are any left to uncover, of > course). > > I would drop the primary key on pgbench_accounts and replace it with a > hash index and test it that way (except I don't have a 128 core > machine at my disposal, so really I am suggesting that you do this...) > > The lack of primary key and the non-uniqueness of the hash index > should not be an operational problem, because the built in pgbench > runs never attempt to violate the constraints anyway. > > In fact, I'd replace all of the indexes on the rest of the pgbench > tables with hash indexes, too, just for additional testing. > > I plan to do testing using my own testing harness after changing it to > insert a lot of dummy tuples (ones with negative values in the > pseudo-pk column, which are never queried by the core part of the > harness) and deleting them at random intervals. I think that none of > pgbench's built in tests are likely to give the bucket splitting and > squeezing code very much exercise. > > Is there a way to gather statistics on how many of each type of WAL > record are actually getting sent over the replication link? The only > way I can think of is to turn on wal archving as well as replication, > then using pg_xlogdump to gather the stats. > > I've run my original test for a while now and have not seen any > problems. But I realized I forgot to compile with enable-casserts, to > I will have to redo it to make sure the assertion failures have been > fixed. In my original testing I did very rarely get a deadlock (or > some kind of hang), and I haven't seen that again so far. It was > probably the same source as the one Mark observed, and so the same fix. > > Cheers, > > Jeff Yeah, good suggestion about replacing (essentially) all the indexes with hash ones and testing. I did some short runs with this type of schema yesterday (actually to get a feel for if hash performance vs btree was compareable - does seem tp be) - but probably longer ones with higher concurrency (as high as I can manage on a single socket i7 anyway) is a good plan. If Ashutosh has access to seriously large numbers of cores then that is even better :-) Cheers Mark
pgsql-hackers by date: