Home > mailing lists

Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) - Mailing list pgsql-hackers

From	KaiGai Kohei
Subject	Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)
Date	January 21, 2014 04:55:33
Msg-id	52DDFD9B.8050400@ak.jp.nec.com Whole thread Raw
In response to	contrib/cache_scan (Re: What's needed for cache-only table scan?) (Kohei KaiGai <kaigai@kaigai.gr.jp>)
Responses	Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)
List	pgsql-hackers

Tree view

Hello,

I revisited the patch for contrib/cache_scan extension.
The previous one had a problem when T-tree node shall be rebalanced
then crashed on merging the node.

Even though contrib/cache_scan portion has more than 2KL code,
things I'd like to have a discussion first is a portion of the
core enhancements to run MVCCsnapshot on the cached tuple, and
to get callback on vacuumed pages for cache synchronization.

Any comments please.

Thanks,

(2014/01/15 0:06), Kohei KaiGai wrote:
> Hello,
>
> The attached patch is what we discussed just before the commit-fest:Nov.
>
> It implements an alternative way to scan a particular table using on-memory
> cache instead of the usual heap access method. Unlike buffer cache, this
> mechanism caches a limited number of columns on the memory, so memory
> consumption per tuple is much smaller than the regular heap access method,
> thus it allows much larger number of tuples on the memory.
>
> I'd like to extend this idea to implement a feature to cache data according to
> column-oriented data structure to utilize parallel calculation processors like
> CPU's SIMD operations or simple GPU cores. (Probably, it makes sense to
> evaluate multiple records with a single vector instruction if contents of
> a particular column is put as a large array.)
> However, this patch still keeps all the tuples in row-oriented data format,
> because row <=> column translation makes this patch bigger than the
> current form (about 2KL), and GPU integration needs to link proprietary
> library (cuda or opencl) thus I thought it is not preferable for the upstream
> code.
>
> Also note that this patch needs part-1 ~ part-3 patches of CustomScan
> APIs as prerequisites because it is implemented on top of the APIs.
>
> One thing I have to apologize is, lack of documentation and source code
> comments around the contrib/ code. Please give me a couple of days to
> clean-up the code.
> Aside from the extension code, I put two enhancement on the core code
> as follows. I'd like to have a discussion about adequacy of these enhancement.
>
> The first enhancement is a hook on heap_page_prune() to synchronize
> internal state of extension with changes of heap image on the disk.
> It is not avoidable to hold garbage, increasing time by time, on the cache,
> thus needs to clean up as vacuum process doing. The best timing to do
> is when dead tuples are reclaimed because it is certain nobody will
> reference the tuples any more.
>
> diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
> index f626755..023f78e 100644
> --- a/src/backend/utils/time/tqual.c
>      bool        marked[MaxHeapTuplesPerPage + 1];
>   } PruneState;
>
> +/* Callback for each page pruning */
> +heap_page_prune_hook_type heap_page_prune_hook = NULL;
> +
>   /* Local functions */
>   static int heap_prune_chain(Relation relation, Buffer buffer,
>                   OffsetNumber rootoffnum,
> @@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, Transacti
> onId OldestXmin,
>       * and update FSM with the remaining space.
>       */
>
> +   /*
> +    * This callback allows extensions to synchronize their own status with
> +    * heap image on the disk, when this buffer page is vacuumed.
> +    */
> +   if (heap_page_prune_hook)
> +       (*heap_page_prune_hook)(relation,
> +                               buffer,
> +                               ndeleted,
> +                               OldestXmin,
> +                               prstate.latestRemovedXid);
>      return ndeleted;
>   }
>
>
> The second enhancement makes SetHintBits() accepts InvalidBuffer to
> ignore all the jobs. We need to check visibility of cached tuples when
> custom-scan node scans cached table instead of the heap.
> Even though we can use MVCC snapshot to check tuple's visibility,
> it may internally set hint bit of tuples thus we always needs to give
> a valid buffer pointer to HeapTupleSatisfiesVisibility(). Unfortunately,
> it kills all the benefit of table cache if it takes to load the heap buffer
> being associated with the cached tuple.
> So, I'd like to have a special case handling on the SetHintBits() for
> dry-run when InvalidBuffer is given.
>
> diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
> index f626755..023f78e 100644
> --- a/src/backend/utils/time/tqual.c
> +++ b/src/backend/utils/time/tqual.c
> @@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid,
> Snapshot snapshot);
>    *
>    * The caller should pass xid as the XID of the transaction to check, or
>    * InvalidTransactionId if no check is needed.
> + *
> + * In case when the supplied HeapTuple is not associated with a particular
> + * buffer, it just returns without any jobs. It may happen when an extension
> + * caches tuple with their own way.
>    */
>   static inline void
>   SetHintBits(HeapTupleHeader tuple, Buffer buffer,
>              uint16 infomask, TransactionId xid)
>   {
> +   if (BufferIsInvalid(buffer))
> +       return;
> +
>      if (TransactionIdIsValid(xid))
>      {
>          /* NB: xid must be known committed here! */
>
> Thanks,
>
> 2013/11/13 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> 2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:
>>> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>>>> So, are you thinking it is a feasible approach to focus on custom-scan
>>>> APIs during the upcoming CF3, then table-caching feature as use-case
>>>> of this APIs on CF4?
>>>
>>> Sure.  If you work on this extension after CF3, and it reveals that the
>>> custom scan stuff needs some adjustments, there would be time to do that
>>> in CF4.  The policy about what can be submitted in CF4 is that we don't
>>> want new major features that no one has seen before, not that you can't
>>> make fixes to previously submitted stuff.  Something like a new hook
>>> in vacuum wouldn't be a "major feature", anyway.
>>>
>> Thanks for this clarification.
>> 3 days are too short to write a patch, however, 2 month may be sufficient
>> to develop a feature on top of the scheme being discussed in the previous
>> comitfest.
>>
>> Best regards,
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
>
>
>

--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

pgsql-v9.4-custom-scan.part-4.v5.patch

pgsql-hackers by date:

From: KaiGai Kohei
Date: 21 January 2014, 04:33:07
Subject: Re: dynamic shared memory and locks

From: Laurence Rowe
Date: 21 January 2014, 04:56:00
Subject: Re: [PATCH] Implement json_array_elements_text

Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) - Mailing list pgsql-hackers

Attachment

Previous

Next