Re: Extent Locks - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Extent Locks |
Date | |
Msg-id | CA+TgmobDf=ijhoAZhrztPg7CNU=-emzx6K-iUv8hMSEtxFDy1A@mail.gmail.com Whole thread Raw |
In response to | Re: Extent Locks (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: Extent Locks
|
List | pgsql-hackers |
On Thu, May 16, 2013 at 11:55 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> I think it's pretty unrealistic to suppose that this can be made to >> work. The most obvious problem is that a sequential scan is coded to >> assume that every block between 0 and the last block in the relation >> is worth reading, > > You don't change that. However, when a seq scan asks the storage layer > for blocks that it knows don't actually exist, it can simply skip over > them or return "empty" records or something equivilant... Yes, that's > hand-wavy, but I also think it's doable. And slow. And it will involve locking and shared memory data structures of its own, to keep track of which blocks actually exist at the storage layer. I suspect the results would be more kinds of locks than we have at present, not less. >> Also, I think that's really a red herring anyway. Relation extension >> per se is not slow - we can grow a file by adding zero bytes at a >> pretty good clip, and don't really gain anything at the database level >> by spreading the growth across multiple files. > > That's true when the file is on a single filesystem and a single set of > drives. Make them be split across multiple filesystems/volumes where > you get more drives involved... I'd be interested to hear how fast dd if=/dev/zero of=somefile is on your machine compared to a single-threaded COPY into a relation. Dividing those two numbers gives us the level of concurrency at which the speed at which we can extend the relation becomes the bottleneck. On the system I tested, I think it was in the multiple tens until the kernel cache filled up ... and then it dropped way off. But I don't have access to a high-end storage system. >> If I took 30 seconds to pre-extend the relation before writing any >> data into it, then writing the data went pretty much exactly 10 times >> faster with 10 writers than with 1. > > That's rather fantastic.. One sadly relevant detail is that the relation was unlogged. Even so, yes, it's fantastic. >> But small on-the-fly >> pre-extensions during the write didn't work as well. I don't remember >> exactly what formulas I tried, but I do remember that the few I tried >> were not really any better than "always pre-extend by 1 extra block"; >> and that alone eliminated about half the contention, but then I >> couldn't do better. > > That seems quite odd to me- I would have thought extending by more than > 2 blocks would have helped with the contention. Still, it sounds like > extending requires a fair bit of writing, and that sucks in its own > right because we're just going to rewrite that- is that correct? If so, > I like proposal even more... > >> I wonder if I need to use LWLockAcquireOrWait(). > > I'm not seeing how/why that might help? Thinking about it more, my guess is that backend A grabs the relation extension lock. Before it actually extends the relation, backends B, C, D, and E all notice that no free pages are available and queue for the lock. Backend A pre-extends the relation by some number of pages page and then extends it by a second page for its own use. It then releases the relation extension lock. At this point, however, backends B, C, D, and E are already committed to extending the relation, even though some or all of them could now satisfy their need for free pages from the fsm. If they used LWLockAcquireOrWait(), then they'd all wake up when A released the lock. One of them would have the lock, and the rest could go retry the fsm and requeue on the lock if that failed. But as it is, what I bet is happening is that they each take the lock in turn and each extend the relation in turn. Then on the next block they write they all find free pages in the fsm, because they all pre-extended the relation, but when those free pages are used up, they all queue up on the lock again, practically at the same instant, because the fsm becomes empty at the same time for all of them. I should play around with this a bit more... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: