GIN fast insert - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | GIN fast insert |
Date | |
Msg-id | 603c8f070902101859j91fb78eg7e0228afe8f2fd36@mail.gmail.com Whole thread Raw |
Responses |
Re: GIN fast insert
|
List | pgsql-hackers |
Jeff Davis asked me if I'd be willing to do a review of the GIN fast insert patch about two weeks ago, but I haven't actually had a chance to read through it in detail until tonight. I can't say I really know anything about GIN (though I did take this opportunity to RTM), so apologies in advance if my comments here are totally off base. My basic impression of this code is that it's trying unnaturally hard to avoid creating a lossy TIDBitmap, which seems pretty odd, considering that the whole point of TIDBitmap is, AFAICS, to degrade to lossy mode when the alternative is eating too much memory. I'm particularly dismayed by this hunk: + if ( ntids == NULL && tbm && tbm_has_lossy(tbm) ) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("not enough memory to store result of pending list or VACUUM table" ), + errhint("Increase the \"work_mem\" parameter."))); The definition of work_mem is the amount of memory that a hash table can use before spilling to disk, NOT the amount of memory a hash table can consume before arbitrarily failing. It's intended to be a soft limit which people can tune to get the best performance out of their system, not a hard limit that interrupts work. Using the limit this way will encourage people to set work_mem too high so that things don't crash, but that in turn will potentially lead to other bad behavior (like swapping). I see that there's already one other place in the GIN code that uses work_mem this way, but I don't think that's a good reason to add another one - I don't see any other place in the system that behaves this way, outside of GIN. I think this is related to the problems with gincostestimate() that Tom Lane was complaining about here: http://archives.postgresql.org/message-id/20441.1234209245@sss.pgh.pa.us I am not 100% sure I'm understanding this correctly, but I think the reason why gincostestimate() is so desperate to avoid index scans when the pending list is long is because it knows that scanFastInsert() will blow up if an index scan is actually attempted because of the aforementioned TIDBitmap problem. This seems unacceptably fragile. Faster insert performance is nice, but not if it means that my indices suddenly start switching themselves off (which is bad) or in the presence of cached plans, crashing (which is worse). I think this code needs to be somehow rewritten to make things degrade gracefully when the pending list is long - I'm not sure what the best way to do that is. Inventing a new data structure to store TIDs that is never lossy seems like it might work, but you'd have to think about what to do if it got too big. I think it's probably best to just go ahead and let it get arbitrarily long (since you have no other option besides crashing) and leave work_mem out of it. Instead, put a reloption it that controls the maximum length of the pending list. This is easy to tune: if you make it bigger, insert performance improves, but queries eat more memory. If you make it smaller, insert performance gets worse, but you bound the memory that queries use more tightly. The other problem with using work_mem is that it can vary between sessions, so one session happily stuffs a lot of data into the pending list and then another session can't scan the index because it has a lower work_mem setting. Using a reloption avoids that problem by making the whole thing symmetric, plus it gives you fine-grained control over the behavior on an index-by-index basis. ...Robert
pgsql-hackers by date: