Re: performance of insert/delete/update - Mailing list pgsql-performance
From | Rich Scott |
---|---|
Subject | Re: performance of insert/delete/update |
Date | |
Msg-id | 20021126004339.46BB12FD1F@server3.fastmail.fm Whole thread Raw |
In response to | performance of insert/delete/update (Wei Weng <wweng@kencast.com>) |
Responses |
Re: performance of insert/delete/update
|
List | pgsql-performance |
I have seen a number of real-world situations where bundling inserts into transactions made a considerable difference - sometimes as much as a 100x speed-up, and not just in Postgresql databases, but also commercial systems (my experience is in Oracle & Sybase). I've often used an idiom of building up rows until I hit some high-water mark, and then insert those rows in one fell swoop - it's almost always measurably faster than one-at-a-time. Sidebar: a number of years ago, while contracting at a regional telephone company, I became peripherally enmired in a gigantic billing-system-makeover fiasco. Upon initial deployment, the system was so slow at processing that it was taking about 30 hours for each day of billing data. After a week or so, when it became apparent that fundamental Cash Flow was threatened, there were multi-hour conference calls, in which various VPs called for massive h/w upgrades and/or lawsuits against Oracle. An astute cohort of mine asked to see some of the code, and found out that the original developers (at the telco) had created a bloated and slow control system in C++, using semaphores or somesuch, to *serialize* inserts/updates/deletes, and so they had gigantic home-built queues of insert jobs. Not only were they not bundling updates in transactions, they were only ever doing one transaction at a time. (Apparently, they never learned RDBMS fundamentals.) He told them to rip out all that code, and let Oracle (like any other decent RDBMS) handle the update ordering. The resultant speed-up factor was several hundred times. -R On Mon, 25 Nov 2002 15:59:16 -0700 (MST), "scott.marlowe" <scott.marlowe@ihs.com> said: > On Mon, 25 Nov 2002, Josh Berkus wrote: > > > Scott, > > > > > It's quite easy to test if you have a database with a large table to play > > > with, use pg_dump to dump a table with the -d switch (makes the dump use > > > insert statements.) Then, make two versions of the dump, one which has a > > > begin;end; pair around all the inserts and one that doesn't, then use psql > > > -e to restore both dumps. The difference is HUGE. Around 10 to 20 times > > > faster with the begin end pairs. > > > > > > I'd think that anyone who's used postgresql for more than a few months > > > could corroborate my experience. > > > > Ouch! > > > > No need to get testy about it. > > > > Your test works as you said; the way I tried testing it before was different. > > Good to know. However, this approach is only useful if you are doing > > rapidfire updates or inserts coming off a single connection. But then it is > > *very* useful. > > I didn't mean that in a testy way, it's just that after you've sat > through > a fifteen minute wait while a 1000 records are inserted, you pretty > quickly switch to the method of inserting them all in one big > transaction. That's all. > > Note that the opposite is what really gets people in trouble. I've seen > folks inserting rather large amounts of data, say into ten or 15 tables, > and their web servers were crawling under parallel load. Then, they put > them into a single transaction and they just flew. > > The funny thing it, they've often avoided transactions because they > figured they'd be slower than just inserting the rows, and you kinda have > to make them sit down first before you show them the performance increase > from putting all those inserts into a single transaction. > > No offense meant, really. It's just that you seemed to really doubt that > putting things into one transaction helped, and putting things into one > big transaction if like the very first postgresql lesson a lot of > newcomers learn. :-) > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
pgsql-performance by date: