Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) - Mailing list pgsql-hackers
From | KaiGai Kohei |
---|---|
Subject | Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) |
Date | |
Msg-id | 52DDFD9B.8050400@ak.jp.nec.com Whole thread Raw |
In response to | contrib/cache_scan (Re: What's needed for cache-only table scan?) (Kohei KaiGai <kaigai@kaigai.gr.jp>) |
Responses |
Re: contrib/cache_scan (Re: What's needed for cache-only
table scan?)
|
List | pgsql-hackers |
Hello, I revisited the patch for contrib/cache_scan extension. The previous one had a problem when T-tree node shall be rebalanced then crashed on merging the node. Even though contrib/cache_scan portion has more than 2KL code, things I'd like to have a discussion first is a portion of the core enhancements to run MVCCsnapshot on the cached tuple, and to get callback on vacuumed pages for cache synchronization. Any comments please. Thanks, (2014/01/15 0:06), Kohei KaiGai wrote: > Hello, > > The attached patch is what we discussed just before the commit-fest:Nov. > > It implements an alternative way to scan a particular table using on-memory > cache instead of the usual heap access method. Unlike buffer cache, this > mechanism caches a limited number of columns on the memory, so memory > consumption per tuple is much smaller than the regular heap access method, > thus it allows much larger number of tuples on the memory. > > I'd like to extend this idea to implement a feature to cache data according to > column-oriented data structure to utilize parallel calculation processors like > CPU's SIMD operations or simple GPU cores. (Probably, it makes sense to > evaluate multiple records with a single vector instruction if contents of > a particular column is put as a large array.) > However, this patch still keeps all the tuples in row-oriented data format, > because row <=> column translation makes this patch bigger than the > current form (about 2KL), and GPU integration needs to link proprietary > library (cuda or opencl) thus I thought it is not preferable for the upstream > code. > > Also note that this patch needs part-1 ~ part-3 patches of CustomScan > APIs as prerequisites because it is implemented on top of the APIs. > > One thing I have to apologize is, lack of documentation and source code > comments around the contrib/ code. Please give me a couple of days to > clean-up the code. > Aside from the extension code, I put two enhancement on the core code > as follows. I'd like to have a discussion about adequacy of these enhancement. > > The first enhancement is a hook on heap_page_prune() to synchronize > internal state of extension with changes of heap image on the disk. > It is not avoidable to hold garbage, increasing time by time, on the cache, > thus needs to clean up as vacuum process doing. The best timing to do > is when dead tuples are reclaimed because it is certain nobody will > reference the tuples any more. > > diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c > index f626755..023f78e 100644 > --- a/src/backend/utils/time/tqual.c > bool marked[MaxHeapTuplesPerPage + 1]; > } PruneState; > > +/* Callback for each page pruning */ > +heap_page_prune_hook_type heap_page_prune_hook = NULL; > + > /* Local functions */ > static int heap_prune_chain(Relation relation, Buffer buffer, > OffsetNumber rootoffnum, > @@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, Transacti > onId OldestXmin, > * and update FSM with the remaining space. > */ > > + /* > + * This callback allows extensions to synchronize their own status with > + * heap image on the disk, when this buffer page is vacuumed. > + */ > + if (heap_page_prune_hook) > + (*heap_page_prune_hook)(relation, > + buffer, > + ndeleted, > + OldestXmin, > + prstate.latestRemovedXid); > return ndeleted; > } > > > The second enhancement makes SetHintBits() accepts InvalidBuffer to > ignore all the jobs. We need to check visibility of cached tuples when > custom-scan node scans cached table instead of the heap. > Even though we can use MVCC snapshot to check tuple's visibility, > it may internally set hint bit of tuples thus we always needs to give > a valid buffer pointer to HeapTupleSatisfiesVisibility(). Unfortunately, > it kills all the benefit of table cache if it takes to load the heap buffer > being associated with the cached tuple. > So, I'd like to have a special case handling on the SetHintBits() for > dry-run when InvalidBuffer is given. > > diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c > index f626755..023f78e 100644 > --- a/src/backend/utils/time/tqual.c > +++ b/src/backend/utils/time/tqual.c > @@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid, > Snapshot snapshot); > * > * The caller should pass xid as the XID of the transaction to check, or > * InvalidTransactionId if no check is needed. > + * > + * In case when the supplied HeapTuple is not associated with a particular > + * buffer, it just returns without any jobs. It may happen when an extension > + * caches tuple with their own way. > */ > static inline void > SetHintBits(HeapTupleHeader tuple, Buffer buffer, > uint16 infomask, TransactionId xid) > { > + if (BufferIsInvalid(buffer)) > + return; > + > if (TransactionIdIsValid(xid)) > { > /* NB: xid must be known committed here! */ > > Thanks, > > 2013/11/13 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> 2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>: >>> Kohei KaiGai <kaigai@kaigai.gr.jp> writes: >>>> So, are you thinking it is a feasible approach to focus on custom-scan >>>> APIs during the upcoming CF3, then table-caching feature as use-case >>>> of this APIs on CF4? >>> >>> Sure. If you work on this extension after CF3, and it reveals that the >>> custom scan stuff needs some adjustments, there would be time to do that >>> in CF4. The policy about what can be submitted in CF4 is that we don't >>> want new major features that no one has seen before, not that you can't >>> make fixes to previously submitted stuff. Something like a new hook >>> in vacuum wouldn't be a "major feature", anyway. >>> >> Thanks for this clarification. >> 3 days are too short to write a patch, however, 2 month may be sufficient >> to develop a feature on top of the scheme being discussed in the previous >> comitfest. >> >> Best regards, >> -- >> KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > > -- OSS Promotion Center / The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
pgsql-hackers by date: