Re: Undo logs - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Undo logs |
Date | |
Msg-id | CAEepm=3O3FZOmKsXLthy5G0FnVoO_oQ=Ek1-8Sp63U6wAwx5oA@mail.gmail.com Whole thread Raw |
In response to | Re: Undo logs (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Undo logs
|
List | pgsql-hackers |
Hi Simon, On Mon, May 28, 2018 at 11:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 24 May 2018 at 23:22, Thomas Munro <thomas.munro@enterprisedb.com> wrote: >> The lowest level piece of this work is a physical undo log manager, > >> 1. Efficient appending of new undo data from many concurrent >> backends. Like logs. >> 2. Efficient discarding of old undo data that isn't needed anymore. >> Like queues. >> 3. Efficient buffered random reading of undo data. Like relations. > > Like an SLRU? Yes, but with some difference: 1. There is a variable number of undo logs. Each one corresponds to a range of the 64 bit address space, and has its own head and tail pointers, so that concurrent writers don't contend for buffers when appending data. (Unlike SLRUs which are statically defined, one for clog.c, one for commit_ts.c, ...). 2. Undo logs use regular buffers instead of having their own mini buffer pool, ad hoc search and reclamation algorithm etc. 3. Undo logs support temporary, unlogged and permanent storage (= local buffers and reset-on-crash-restart, for undo data relating to relations of those persistence levels). 4. Undo logs storage files are preallocated (rather than being extended block by block), and the oldest file is renamed to become the newest file in common cases, like WAL. >> [4] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/access/undo >> [5] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/storage/smgr > > I think there are quite a few design decisions there that need to be > discussed, so lets crack on and discuss them please. What do you think about using the main buffer pool? Best case: pgbench type workload, discard pointer following closely behind insert pointer, we never write anything out to disk (except for checkpoints when we write a few pages), never advance the buffer pool clock hand, and we use and constantly recycle 1-2 pages per connection via the free list (as can be seen by monitoring insert - discard in the pg_stat_undo_logs view). Worst case: someone opens a snapshot and goes out to lunch so we can't discard old undo data, and then we start to compete with other stuff for buffers, and we hope the buffer reclamation algorithm is good at its job (or can be improved). I just talked about this proposal at a pgcon unconference session. Here's some of the feedback I got: 1. Jeff Davis pointed out that I'm probably wrong about not needing FPI, and there must at least be checksum problems with torn pages. He also gave me an idea on how to fix that very cheaply, and I'm still processing that feedback. 2. Andres Freund thought it seemed OK if we have smgr.c routing to md.c for relations and undofile.c for undo, but if we're going to generalise this technique to put other things into shared buffers eventually too (like the SLRUs, as proposed by Shawn Debnath in another unconf session) then it might be worth investigating how to get md.c to handle all of their needs. They'd all just use fd.c files, after all, so it'd be weird if we had to maintain several different similar things. 3. Andres also suggested that high frequency free page list access might be quite contended in the "best case" described above. I'll look into that. 4. Someone said that segment sizes probably shouldn't be hard coded (cf WAL experience). I also learned in other sessions that there are other access managers in development that need undo logs. I'm hoping to find out more about that. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: