Home > mailing lists

Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Zedstore - compressed in-core columnar storage
Date	August 29, 2019 12:09:33
Msg-id	ed9dfcfb-871f-f6e6-6463-4ab47b4cb273@iki.fi Whole thread Raw
In response to	Re: Zedstore - compressed in-core columnar storage (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses	Re: Zedstore - compressed in-core columnar storage
List	pgsql-hackers

Tree view

On 29/08/2019 14:30, Ashutosh Sharma wrote:
> 
> On Wed, Aug 28, 2019 at 5:30 AM Alexandra Wang <lewang@pivotal.io 
> <mailto:lewang@pivotal.io>> wrote:
> 
>     You are correct that we currently go through each item in the leaf
>     page that
>     contains the given tid, specifically, the logic to retrieve all the
>     attribute
>     items inside a ZSAttStream is now moved to decode_attstream() in the
>     latest
>     code, and then in zsbt_attr_fetch() we again loop through each item we
>     previously retrieved from decode_attstream() and look for the given
>     tid. 
> 
> 
> Okay. Any idea why this new way of storing attribute data as streams 
> (lowerstream and upperstream) has been chosen just for the attributes 
> but not for tids. Are only attribute blocks compressed but not the tids 
> blocks?

Right, only attribute blocks are currently compressed. Tid blocks need 
to be modified when there are UPDATEs or DELETE, so I think having to 
decompress and recompress them would be more costly. Also, there is no 
user data on the TID tree, and the Simple-8b encoded codewords used to 
represent the TIDs are already pretty compact. I'm not sure how much 
gain you would get from passing it through a general purpose compressor.

I could be wrong though. We could certainly try it out, and see how it 
performs.

>     One
>     optimization we can to is to tell decode_attstream() to stop
>     decoding at the
>     tid we are interested in. We can also apply other tricks to speed up the
>     lookups in the page, for fixed length attribute, it is easy to do
>     binary search
>     instead of linear search, and for variable length attribute, we can
>     probably
>     try something that we didn't think of yet. 
> 
> 
> I think we can probably ask decode_attstream() to stop once it has found 
> the tid that we are searching for but then we only need to do that for 
> Index Scans.

I've been thinking that we should add a few "bookmarks" on long streams, 
so that you could skip e.g. to the midpoint in a stream. It's a tradeoff 
though; when you add more information for random access, it makes the 
representation less compact.

>     Zedstore currently implement update as delete+insert, hence the old
>     tid is not
>     reused. We don't store the tuple in our UNDO log, and we only store the
>     transaction information in the UNDO log. Reusing the tid of the old
>     tuple means
>     putting the old tuple in the UNDO log, which we have not implemented
>     yet.
> 
> OKay, so that means performing update on a non-key attribute would also 
> require changes in the index table. In short, HOT update is currently 
> not possible with zedstore table. Am I right?

That's right. There's a lot of potential gain for doing HOT updates. For 
example, if you UPDATE one column on every row on a table, ideally you 
would only modify the attribute tree containing that column. But that 
hasn't been implemented.

- Heikki

pgsql-hackers by date:

From: Ahsan Hadi
Date: 29 August 2019, 11:47:31
Subject: Re: Email to hackers for test coverage

From: Anastasia Lubennikova
Date: 29 August 2019, 12:13:39
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

Previous

Next