Re: [PERFORM] DELETE vs TRUNCATE explanation - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: [PERFORM] DELETE vs TRUNCATE explanation |
Date | |
Msg-id | CAMkU=1w7vLA63hf-+Uc_j61CvOdKixsOHph9d0H-aLNBYgfX=w@mail.gmail.com Whole thread Raw |
Responses |
Re: [PERFORM] DELETE vs TRUNCATE explanation
|
List | pgsql-hackers |
I've moved this thread from performance to hackers. The topic was poor performance when truncating lots of small tables repeatedly on test environments with fsync=off. On Thu, Jul 12, 2012 at 6:00 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > I think the problem is in the Fsync Absorption queue. Every truncate > adds a FORGET_RELATION_FSYNC to the queue, and processing each one of > those leads to sequential scanning the checkpointer's pending ops hash > table, which is quite large. It is almost entirely full of other > requests which have already been canceled, but it still has to dig > through them all. So this is essentially an N^2 operation. My attached Proof of Concept patch reduces the run time of the benchmark at the end of this message from 650sec to 84sec, demonstrating that this is in fact the problem. Which doesn't mean that my patch is the right answer to it, of course. (The delete option is still faster than truncate, coming in at around 55sec) > I'm not sure why we don't just delete the entry instead of marking it > as cancelled. It looks like the only problem is that you can't delete > an entry other than the one just returned by hash_seq_search. Which > would be fine, as that is the entry that we would want to delete; > except that mdsync might have a different hash_seq_search open, and so > it wouldn't be safe to delete. > > If the segno was taken out of the hash key and handled some other way, > then the forgetting could be done with a simple hash look up rather > than a full scan. The above two ideas might be the better solution, as they would work even when fsync=on. Since BBU are becoming so popular I think the fsync queue could be a problem even with fsync on if the fsync is fast enough. But I don't immediately know how to implement them. > Maybe we could just turn off the pending ops table altogether when > fsync=off, but since fsync is PGC_SIGHUP it is not clear how you could > safely turn it back on. Now that I think about it, I don't see how turning fsync from off to on can ever be known to be safe, until a system wide sync has intervened. After all a segment that was dirtied and added to the pending ops table while fsync=off might also be removed from the pending ops table the microsecond before fsync is turned on, so how is that different from never adding it in the first place? The attached Proof Of Concept patch implements this in two ways, one of which is commented out. The commented out way omits the overhead of sending the request to the checkpointer in the first place, but breaks modularity a bit. The benchmark used on 9.3devel head is: fsync=off, all other defaults. ## one time initialization perl -le 'print "create schema foo$_; create table foo$_.foo$_ (k integer, v integer);" $ARGV[0]..$ARGV[0]+$ARGV[1]-1' 0 10 |psql ## actual benchmark. perl -le 'print "set client_min_messages=warning;"; foreach (1..10000) { print "BEGIN;\n"; print "insert into foo$_.foo$_ select * from generate_series(1,10); " foreach $ARGV[0]..$ARGV[0]+$ARGV[1]-1; print "COMMIT;\nBEGIN;\n"; print "truncate table foo$_.foo$_; " foreach $ARGV[0]..$ARGV[0]+$ARGV[1]-1; #print "delete from foo$_.foo$_; " foreach $ARGV[0]..$ARGV[0]+$ARGV[1]-1; print "COMMIT;\n" } ' 0 10 | time psql > /dev/null Cheers, Jeff
Attachment
pgsql-hackers by date: