Re: WIP: Access method extendability - Mailing list pgsql-hackers
From | Petr Jelinek |
---|---|
Subject | Re: WIP: Access method extendability |
Date | |
Msg-id | 56FAB125.4020401@2ndquadrant.com Whole thread Raw |
In response to | Re: WIP: Access method extendability (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: WIP: Access method extendability
|
List | pgsql-hackers |
On 29/03/16 18:25, Alvaro Herrera wrote: >> + /*------------------------------------------------------------------------- >> >+ * API for construction of generic xlog records >> >+ * >> >+ * This API allows user to construct generic xlog records which describe >> >+ * difference between pages in a generic way. This is useful for >> >+ * extensions which provide custom access methods because they can't >> >+ * register their own WAL redo routines. >> >+ * >> >+ * Each record must be constructed by following these steps: >> >+ * 1) GenericXLogStart(relation) - start construction of a generic xlog >> >+ * record for the given relation. >> >+ * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers >> >+ * for the record. This function returns a copy of the page >> >+ * image where modifications can be performed. The second argument >> >+ * indicates if the block is new (i.e. a full page image should be taken). >> >+ * 3) Apply modification of page images obtained in the previous step. >> >+ * 4) GenericXLogFinish() - finish construction of generic xlog record. >> >+ * >> >+ * The xlog record construction can be canceled at any step by calling >> >+ * GenericXLogAbort(). All changes made to page images copies will be >> >+ * discarded. >> >+ * >> >+ * Please, note the following points when constructing generic xlog records. >> >+ * - No direct modifications of page images are allowed! All modifications >> >+ * must be done in the copies returned by GenericXLogRegister(). In other >> >+ * words the code which makes generic xlog records must never call >> >+ * BufferGetPage(). >> >+ * - Registrations of buffers (step 2) and modifications of page images >> >+ * (step 3) can be mixed in any sequence. The only restriction is that >> >+ * you can only modify page image after registration of corresponding >> >+ * buffer. >> >+ * - After registration, the buffer also can be unregistered by calling >> >+ * GenericXLogUnregister(buffer). In this case the changes made in >> >+ * that particular page image copy will be discarded. >> >+ * - Generic xlog assumes that pages are using standard layout, i.e., all >> >+ * data between pd_lower and pd_upper will be discarded. >> >+ * - Maximum number of buffers simultaneously registered for a generic xlog >> >+ * record is MAX_GENERIC_XLOG_PAGES. An error will be thrown if this limit >> >+ * is exceeded. >> >+ * - Since you modify copies of page images, GenericXLogStart() doesn't >> >+ * start a critical section. Thus, you can do memory allocation, error >> >+ * throwing etc between GenericXLogStart() and GenericXLogFinish(). >> >+ * The actual critical section is present inside GenericXLogFinish(). >> >+ * - GenericXLogFinish() takes care of marking buffers dirty and setting their >> >+ * LSNs. You don't need to do this explicitly. >> >+ * - For unlogged relations, everything works the same except there is no >> >+ * WAL record produced. Thus, you typically don't need to do any explicit >> >+ * checks for unlogged relations. >> >+ * - If registered buffer isn't new, generic xlog record contains delta >> >+ * between old and new page images. This delta is produced by per byte >> >+ * comparison. This current delta mechanism is not effective for data shifts >> >+ * inside the page and may be improved in the future. >> >+ * - Generic xlog redo function will acquire exclusive locks on buffers >> >+ * in the same order they were registered. After redo of all changes, >> >+ * the locks will be released in the same order. >> >+ * >> >+ * >> >+ * Internally, delta between pages consists of set of fragments. Each >> >+ * fragment represents changes made in given region of page. A fragment is >> >+ * described as follows: >> >+ * >> >+ * - offset of page region (OffsetNumber) >> >+ * - length of page region (OffsetNumber) >> >+ * - data - the data to place into described region ('length' number of bytes) >> >+ * >> >+ * Unchanged regions of page are not represented in the delta. As a result, >> >+ * the delta can be more compact than full page image. But if the unchanged region >> >+ * of the page is less than fragment header (offset and length) the delta >> >+ * would be bigger than the full page image. For this reason we break into fragments >> >+ * only if the unchanged region is bigger than MATCH_THRESHOLD. >> >+ * >> >+ * The worst case for delta size is when we didn't find any unchanged region >> >+ * in the page. Then size of delta would be size of page plus size of fragment >> >+ * header. >> >+ */ >> >+ #define FRAGMENT_HEADER_SIZE (2 * sizeof(OffsetNumber)) >> >+ #define MATCH_THRESHOLD FRAGMENT_HEADER_SIZE >> >+ #define MAX_DELTA_SIZE BLCKSZ + FRAGMENT_HEADER_SIZE > I incorporated your changes and did some additional refinements on top of them still. Attached is delta against v12, that should cause less issues when merging for Teodor. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: