Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Use simplehash.h instead of dynahash in SMgr |
Date | |
Msg-id | CAApHDvqK3XF2fowu22UYOyuyiJFrEpRtwTaZBSy33j_vygqaew@mail.gmail.com Whole thread Raw |
In response to | Re: Use simplehash.h instead of dynahash in SMgr (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: Use simplehash.h instead of dynahash in SMgr
|
List | pgsql-hackers |
On Mon, 26 Apr 2021 at 05:03, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > If your test so sensitive to hash function speed, then I'd suggest > to try something even simpler: > > static inline uint32 > relfilenodebackend_hash(RelFileNodeBackend *rnode) > { > uint32 h = 0; > #define step(x) h ^= (uint32)(x) * 0x85ebca6b; h = pg_rotate_right(h, > 11); h *= 9; > step(rnode->node.relNode); > step(rnode->node.spcNode); // spcNode could be different for same > relNode only > // during table movement. Does it pay > to hash it? > step(rnode->node.dbNode); > step(rnode->backend); // does it matter to hash backend? > // It equals to InvalidBackendId for > non-temporary relations > // and temporary relations in same > database never have same > // relNode (have they?). > return murmurhash32(hashkey); > } I tried that and it got a median result of 113.795 seconds over 14 runs with this recovery benchmark test. LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 1014, max chain: 6, avg chain: 0.499016, total_collisions: 428, max_collisions: 3, avg_collisions: 0.210630 I also tried the following hash function just to see how much performance might be left from speeding it up: static inline uint32 relfilenodebackend_hash(RelFileNodeBackend *rnode) { uint32 h; h = pg_rotate_right32((uint32) rnode->node.relNode, 16) ^ ((uint32) rnode->node.dbNode); return murmurhash32(h); } I got a median of 112.685 seconds over 14 runs with: LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 1044, max chain: 7, avg chain: 0.513780, total_collisions: 438, max_collisions: 3, avg_collisions: 0.215551 So it looks like there might not be too much left given that v2 was 113.375 seconds (median over 10 runs) > I'd like to see benchmark code. It quite interesting this place became > measurable at all. Sure. $ cat recoverybench_insert_hash.sh #!/bin/bash pg_ctl stop -D pgdata -m smart pg_ctl start -D pgdata -l pg.log -w psql -f setup1.sql postgres > /dev/null psql -c "create table log_wal (lsn pg_lsn not null);" postgres > /dev/null psql -c "insert into log_wal values(pg_current_wal_lsn());" postgres > /dev/null psql -c "insert into hp select x,0 from generate_series(1,100000000) x;" postgres > /dev/null psql -c "insert into log_wal values(pg_current_wal_lsn());" postgres > /dev/null psql -c "select 'Used ' || pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), lsn)) || ' of WAL' from log_wal limit 1;" postgres pg_ctl stop -D pgdata -m immediate -w echo Starting Postgres... pg_ctl start -D pgdata -l pg.log $ cat setup1.sql drop table if exists hp; create table hp (a int primary key, b int not null) partition by hash(a); select 'create table hp'||x|| ' partition of hp for values with (modulus 1000, remainder '||x||');' from generate_Series(0,999) x; \gexec config: shared_buffers = 10GB checkpoint_timeout = 60min max_wal_size = 20GB min_wal_size = 20GB For subsequent runs, if you apply the patch that does the PANIC at the end of recovery, you'll just need to start the database up again to perform recovery again. You can then just tail -f on your postgres logs to watch for the "redo done" message which will show you the time spent doing recovery. David.
pgsql-hackers by date: