generalized conveyor belt storage - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | generalized conveyor belt storage |
Date | |
Msg-id | CA+Tgmoa_VNzG4ZouZyQQ9h=oRiy=ZQV5+xHQXxMWmep4Ygg8Dg@mail.gmail.com Whole thread Raw |
Responses |
Re: generalized conveyor belt storage
Re: generalized conveyor belt storage |
List | pgsql-hackers |
Hi! Back when we were working on zheap, we realized that we needed some way of storing undo records that would permit us to discard old undo efficiently when it was no longer needed. Thomas Munro dubbed this "conveyor belt storage," the idea being that items are added at one end and removed from the other. In the zheap patches, Thomas took an approach similar to what we've done elsewhere for CLOG and WAL: keep creating new files, put a relatively small amount of data in each one, and remove the old files in their entirety when you can prove that they are no longer needed. While that did and does seem reasonable, I came to dislike it, because it meant we needed a separate smgr for undo as compared with everything else, which was kind of complicated. Also, that approach was tightly integrated with and thus only useful for zheap, and as Thomas observed at the time, the problem seems to be fairly general. I got interested in this problem again because of the idea discussed in https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com of having a "dead TID" relation fork in which to accumulate TIDs that have been marked as dead in the table but not yet removed from the indexes, so as to permit a looser coupling between table vacuum and index vacuum. That's yet another case where you accumulate new data and then at a certain point the oldest data can be thrown away because its intended purpose has been served. So here's a patch. Basically, it lets you initialize a relation fork as a "conveyor belt," and then you can add pages of basically arbitrary data to the conveyor belt and then throw away old ones and, modulo bugs, it will take care of recycling space for you. There's a fairly detailed README in the patch if you want a more detailed description of how the whole thing works. It's missing some features that I want it to have: for example, I'd like to have on-line compaction, where whatever logical page numbers of data currently exist can be relocated to lower physical page numbers thus allowing you to return space to the operating system, hopefully without requiring a strong heavyweight lock. But that's not implemented yet, and it's also missing a few other things, like test cases, performance results, more thorough debugging, better write-ahead logging integration, and some code to use it to do something useful. But there's enough here, I think, for you to form an opinion about whether you think this is a reasonable direction, and give any design-level feedback that you'd like to give. My colleagues Dilip Kumar and Mark Dilger have contributed to this effort with some testing help, but all the code in this patch is mine. When I was chatting with Andres about this, he jumped to the question of whether this could be used to replace SLRUs. To be honest, it's not really designed for applications that are quite that intense. I think we would get too much contention on the metapage, which you have to lock and often modify for just about every conveyor belt operation. Perhaps that problem can be dodged somehow, and it might even be a good idea, because (1) then we'd have that data in shared_buffers instead of a separate tiny buffer space and (2) the SLRU code is pretty crappy. But I'm more interested in using this for new things than I am in replacing existing core technology where any new bugs will break everything for everyone. Still, I'm happy to hear ideas around this kind of thing, or to hear the results of any experimentation you may want to do. Let me know what you think. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: