Fix up for BTP_CHAIN problems - Mailing list pgsql-hackers
From | Wayne Piekarski |
---|---|
Subject | Fix up for BTP_CHAIN problems |
Date | |
Msg-id | 199907170340.NAA10098@helpdesk.senet.com.au Whole thread Raw |
Responses |
Re: [HACKERS] Fix up for BTP_CHAIN problems
|
List | pgsql-hackers |
Hi, A few weeks ago I sent an email out about getting BTP_CHAIN faults when trying to perform operations with tables. My colleague Matt Altus was trawling the mailing lists looking for information about this, and he found some articles previously discussing problems with Btree indices and how they sometimes can have problems handling tables with massive duplicate entries in them, as the tree becomes unbalanced, and mentioned other things like leaf nodes and so on. The postings talked about how fixing up the problem was tricky and was still there, and Oracle solved it by including the tid in with the index to make it more unique. Well, we thought about this, and had a look at every table and index we'd ever had BTP_CHAIN problems with, and all had massive duplication of values in the particular columns. Ie, one table has 1.5 million rows, and one of the columns with an index on it (snum) has only 20000 unique values - this particular table was very troublesome, whereas others weren't so bad because they were a lot smaller. Each table we looked at were all the same problem, and we thought wow, this is really neat because all our problem tables were explained by these postings. None of our other indexes caused problems, because they were more unique. Each one of our tables has a column called id which is very similar to an oid except we generate it ourselves, and so we put in a reference to the id column after all the other columns in our indexes. ie, create index sessions_snum_index on sessions using btree (snum); became: create index sessions_snum_index on sessions using btree (snum, id); The indexes grew a little bit, but now we have not had *ANY* BTP_CHAIN faults at all, and to test it we really thrashed the machine to see if we could cause it to die. It worked perfectly and we were all really happy because BTP_CHAIN was very annoying to fix up. It was occuring a lot when the machine was under high load. So I can definitely recommend this to anyone who has problems like this, or tables with lots of rows but not many unique values. The problem does not occur under simple circumstances, only under cases where many backends are all running and the system is under a high load. Would a solution to the problem be to automatically include the row OID when creating an index? This would fix the problem for everyone automatically without having to do the hack manually. Is it ok to include the OID in an index? I wasn't sure about this which is why I included my own ID value instead so someone might want to comment on this. Just thought I'd share this with everyone so we can all benefit from it. This is a problem which really caused us to doubt the ability of Postgres to be used in a high load environment and so I think it should be mentioned somewhere. Maybe in the docs? BTW, since getting around BTP_CHAIN our only remaining problem is the backends waiting thing, and we are upgrading to 6.5 tomorrow which we hope will fix this up forever. We did some testing of 6.5 and it runs a *lot* faster, is more reliable, and the load of the machine is very much lower than it normally is with 6.4.2 with our thrash testing program. I assume that 6.4 style code will work unchanged in 6.5? Ie, we've used a lot of LOCK TABLE xxx; code everywhere, which we hope will work untouched in 6.5. We'll report back after our upgrade once we know that everything works really well. Regards, Wayne ------------------------------------------------------------------------------ Wayne Piekarski Tel: (08) 8221 5221 Research & Development Manager Fax: (08) 8221 5220 SE Network Access Pty Ltd Mob: 0407 395 889 222 Grote Street Email: wayne@senet.com.au Adelaide SA 5000 WWW: http://www.senet.com.au
pgsql-hackers by date: