Re: performance of insert/delete/update - Mailing list pgsql-performance
From | scott.marlowe |
---|---|
Subject | Re: performance of insert/delete/update |
Date | |
Msg-id | Pine.LNX.4.33.0211251652200.8723-100000@css120.ihs.com Whole thread Raw |
In response to | Re: performance of insert/delete/update (Tim Gardner <tgardner@codeHorse.com>) |
Responses |
Re: performance of insert/delete/update
Re: performance of insert/delete/update Re: performance of insert/delete/update |
List | pgsql-performance |
On Mon, 25 Nov 2002, Tim Gardner wrote: > >The funny thing it, they've often avoided transactions because they > >figured they'd be slower than just inserting the rows, and you kinda have > >to make them sit down first before you show them the performance increase > >from putting all those inserts into a single transaction. > > > >No offense meant, really. It's just that you seemed to really doubt that > >putting things into one transaction helped, and putting things into one > >big transaction if like the very first postgresql lesson a lot of > >newcomers learn. :-) > > Scott, > > I'm new to postgresql, and as you suggested, this is > counter-intuitive to me. I would have thought that having to store > all the inserts to be able to roll them back would take longer. Is > my thinking wrong or not relevant? Why is this not the case? Your thinking on this is wrong, and it is counter-intuitive to think that a transaction would speed things up. Postgresql is very different from other databases. Postgresql was designed from day one as a transactional database. Which is why it was so bothersome that an Oracle marketroid recently was telling the .org folks why they shouldn't use Postgresql because it didn't have transactions. Postgresql may have a few warts here and there, but not supporting transactions has NEVER been a problem for it. There are two factors that make Postgresql so weird in regards to transactions. One it that everything happens in a transaction (we won't mention truncate for a while, it's the only exception I know of.) The next factor that makes for fast inserts of large amounts of data in a transaction is MVCC. With Oracle and many other databases, transactions are written into a seperate log file, and when you commit, they are inserted into the database as one big group. This means you write your data twice, once into the transaction log, and once into the database. With Postgresql's implementation of MVCC, all your data are inserted in real time, with a transaction date that makes the other clients ignore them (mostly, other read committed transactions may or may not see them.) If there are indexes to update, they are updated in the same "invisible until committed" way. All this means that your inserts don't block anyone else's reads as well. This means that when you commit, all postgresql does is make them visible. In the event you roll back a transaction, the tuples are all just marked as dead and they get ignored. It's interesting when you work with folks who came from other databases. My coworker, who's been using Postgresql for about 2 years now, had an interesting experience when he first started here. He was inserting something like 10,000 rows. He comes over and tells me there must be something wrong with the database, as his inserts have been running for 10 minutes, and he's not even halfway through. So I had him stop the inserts, clean out the rows (it was a new table for a new project) and wrap all 10,000 inserts into a transaction. What had been running for 10 minutes now ran in about 30 seconds. He was floored. Well, good luck on using postgresql, and definitely keep in touch with the performance and general mailing lists. They're a wealth of useful info.
pgsql-performance by date: