Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Zedstore - compressed in-core columnar storage |
Date | |
Msg-id | ed9dfcfb-871f-f6e6-6463-4ab47b4cb273@iki.fi Whole thread Raw |
In response to | Re: Zedstore - compressed in-core columnar storage (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: Zedstore - compressed in-core columnar storage
|
List | pgsql-hackers |
On 29/08/2019 14:30, Ashutosh Sharma wrote: > > On Wed, Aug 28, 2019 at 5:30 AM Alexandra Wang <lewang@pivotal.io > <mailto:lewang@pivotal.io>> wrote: > > You are correct that we currently go through each item in the leaf > page that > contains the given tid, specifically, the logic to retrieve all the > attribute > items inside a ZSAttStream is now moved to decode_attstream() in the > latest > code, and then in zsbt_attr_fetch() we again loop through each item we > previously retrieved from decode_attstream() and look for the given > tid. > > > Okay. Any idea why this new way of storing attribute data as streams > (lowerstream and upperstream) has been chosen just for the attributes > but not for tids. Are only attribute blocks compressed but not the tids > blocks? Right, only attribute blocks are currently compressed. Tid blocks need to be modified when there are UPDATEs or DELETE, so I think having to decompress and recompress them would be more costly. Also, there is no user data on the TID tree, and the Simple-8b encoded codewords used to represent the TIDs are already pretty compact. I'm not sure how much gain you would get from passing it through a general purpose compressor. I could be wrong though. We could certainly try it out, and see how it performs. > One > optimization we can to is to tell decode_attstream() to stop > decoding at the > tid we are interested in. We can also apply other tricks to speed up the > lookups in the page, for fixed length attribute, it is easy to do > binary search > instead of linear search, and for variable length attribute, we can > probably > try something that we didn't think of yet. > > > I think we can probably ask decode_attstream() to stop once it has found > the tid that we are searching for but then we only need to do that for > Index Scans. I've been thinking that we should add a few "bookmarks" on long streams, so that you could skip e.g. to the midpoint in a stream. It's a tradeoff though; when you add more information for random access, it makes the representation less compact. > Zedstore currently implement update as delete+insert, hence the old > tid is not > reused. We don't store the tuple in our UNDO log, and we only store the > transaction information in the UNDO log. Reusing the tid of the old > tuple means > putting the old tuple in the UNDO log, which we have not implemented > yet. > > OKay, so that means performing update on a non-key attribute would also > require changes in the index table. In short, HOT update is currently > not possible with zedstore table. Am I right? That's right. There's a lot of potential gain for doing HOT updates. For example, if you UPDATE one column on every row on a table, ideally you would only modify the attribute tree containing that column. But that hasn't been implemented. - Heikki
pgsql-hackers by date: