Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought - Mailing list pgsql-general
From | Andy Colson |
---|---|
Subject | Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought |
Date | |
Msg-id | 4F4A43C2.9050206@squeakycode.net Whole thread Raw |
In response to | Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought (Stefan Keller <sfkeller@gmail.com>) |
Responses |
Re: Four issues why "old elephants" lack performance:
Explanation sought Four issues why "old elephants" lack performance:
Explanation sought
|
List | pgsql-general |
On 02/25/2012 06:54 PM, Stefan Keller wrote: > Hi, > > Recently Mike Stonebraker identified four areas where "old elephants" > lack performance [1]: > > 1. Buffering/paging > 2. Locking/Multithreading > 3. WAL logging > 4. Latches (aka memory locks for concurrent access of btree structures > in buffer pool?). > > He claims having solved these issues while retaining SQL and ACID. > But the only issue I understood is #1 by loading all tuples in-memory. > => Are there any ideas on how to tell Postgres to aggressively load > all data into memory (issue #1)? > All remaining issues make me wonder. > I actually doubt that there are alternatives even theoretically. > => Can anyone help explaining me issues 2,3 and 4, their solutions, > and why Postgres would be unable to resolve them? > > 1. Buffering/paging PG, and your operating system, already do this for reads. It also keeps things that are hit harder and lets things go thatare not hit as much. On the writing side, you can configure it PG from "write it! write it NOW!", to "running with scissors"depending on how safe you want to feel. > 2. Locking/Multithreading PG does have some internal structures that it needs to lock (and anytime you read lock, think single user access, or oneat a time, or slow). Any time you hear about lock contention, it's multiple processes waiting in a line for a lock. If you only had one client, then you really would not need locks. There is where multithreading comes from, but in PG weuse multi-process instead of multi-thread, but its the same thing. Two (or more) people are needing to lock somethingso they can really screw with it. PG does not need as many locks as other db's however. It uses an MVCC architectureso under normal operations (insert, update, select, delete) people dont block eacth other. (ie readers dont blockwriters and visa versa). I don't see locking going away, but there are not many loads that are lock bound. Most database loads are IO bound, andthen you'd probably be CPU bound before you are lock bound. (although it can be hard to tell if its a spin lock that'smaking you cpu bound). I'm sure there are loads that hit lock contention, but there are probably ways to mitigateit. Say you have a process that alters the table and adds a new column every two seconds, thing updates a singlerow to add data to the new column just added. I can see that being lock bound. And a really stupid implementation. > 3. WAL logging PG writes a transaction twice. Once to WAL and once to the DB. WAL is a simple and quick write, and is only ever used ifyour computer crashes and PG has to re-play transactions to get the db into a good/known state. Its a safety measure thatdoesn't really take much time, and I don't think I've heard of anyone being WAL bound. Although it does increase IOops, it's not the biggest usage of IO. This one falls under "lets be safe" which is something NoSQL did away with. Itsnot something I want to give up, personally. I like using a net. > 4. Latches I can only guess at this one. Its similar to locks I think. Data structures come in different types. In the old days weonly had single user access to data structures, then when we wanted two users to access it we just locked it to serializeaccess (one at a time mode), but that does not scale well at all, so we invented two new types: lock free and waitfree. An index is stored as a btree. To insert a new record into the index you have to reorganize it (rotate it, sort it, add/deletenodes, etc), and while one client is doing that it can make it hard for another to try and search it. Lock free(and wait free) let multiple people work on a btree at the same time with much less contention. Wikipedia does a betterjob of explaining them than I could: http://en.wikipedia.org/wiki/Non-blocking_algorithm I have no idea if PG uses single user locks or some kind of lock free structure for its internals. I can see different partsof the internals needing different levels. Maybe I'm old fashioned, but I don't see how you'll get rid of these. You have to insert a record. You have to have 12people hitting the db at the same time. You have to organize that read/write access somehow so they dont blow each otherup. -Andy
pgsql-general by date: