Home > mailing lists

Re: [HACKERS] Pluggable storage - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: [HACKERS] Pluggable storage
Date	August 15, 2017 09:53:48
Msg-id	20170815065348.afkj45dgmg22ecfh@alap3.anarazel.de Whole thread Raw
In response to	Re: [HACKERS] Pluggable storage (Haribabu Kommi <kommi.haribabu@gmail.com>)
Responses	Re: [HACKERS] Pluggable storage Re: [HACKERS] Pluggable storage
List	pgsql-hackers

Tree view

Hi,

On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote:
> Here I attached WIP patches to support pluggable storage. The patch series
> are may not work individually. Still so many things are under development.
> These patches are just to share the approach of the current development.

Making a pass through the patchset to get a feel where this at, and
where this is headed. I previously skimmed the thread to get a rough
sense on what's discused, but not in a very detailed manner.

General:

- I think one important discussion we need to have is what kind of performance impact we're going to accept introducing
this.It seems very likely that this'll cause some slowdown. We can kind of alleviate that by doing some optimizations
atthe same time, but nevertheless, this abstraction is going to cost.

- I don't think we should introduce this without a user besides heapam. The likelihood that API will be usable by
anythingelse without a testcase seems fairly remote. I think some src/test/modules type implementation of a
per-session,in-memory storage - relatively easy to implement - or such is necessary.

- I think, and detailed some of that, we're going to need some cleanups that go in before this, to decrease the size /
increasethe quality of the new APIs. It's going to get more painful to change APIs subsequently.

- We'll have to document clearly that these APIs are going to change for a while, even after the release introducing
them.

StorageAm - Scan stuff:

- I think API isn't quite right. There's a lot of granular callback functionality like scan_begin_function /
scan_begin_catalog/ scan_begin_bm - these largely are convenience wrappers around the same function, and I don't think
thatwould, or rather should, change in any abstracted storage layer. So I think there needs to be some unification
first(pretty close w/ beginscan_internal already, but perhaps we should get rid of a few of these wrappers).

- Some of the exposed functionality, e.g. scan_getpage, scan_update_snapshot, scan_rescan_set_params looks like it
shouldjust be excised, i.e. there's no reason for it to exist.

- Minor: don't think the _function suffix for Storis necessary, just makes things long, and every member has it.
Besidesthat, it's also easy to misunderstand - for a second I understood scan_getnext_function to be about getting the
nextfunction...

- Scans are still represented as HeapScanDesc - I don't think that's going to fly. Either this needs to be an opaque
type(i.e. a struct that's not defined, just forward declared), or it needs to be a base struct that individual AMs
embedin their own structs. Individual AMs definitely are going to need different pieces of data.

Storage AM - tuple stuff:

- tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are each individual functions, that seems pretty
painfulto maintain, and v/ likely to just grow and grow. Not sure what the solution is, but this seems like a hard
sell.

- The three *speculative* functions don't quite seem right to me, nor do I understand:
+ *
+ * Setting a tuple's speculative token is a slot-only operation, so no need
+ * for a storage AM method, but after inserting a tuple containing a
+ * speculative token, the insertion must be completed by these routines:
+ */ I don't see anything related to slots, here?

Storage AM - slot stuff:

- I think there's a number of wrapper functions (slot_getattr, slot_getallattrs, getsomeattrs, attisnull) around the
samefunctionality - that bloats the API and causes slowdowns. Instead we need something like
slot_virtualize_tuple(int16upto), and the rest should just be wrappers.

- I think it's wrong to have the slot functionality defined on the StorageAm level. That'll cause even more indirect
functioncalls (=> slowness), and besides that the TupleTableSlot struct will need members for each and every Am.

I think the solution is to instead have an Am level "create slot" function, and the returned slot is allocated by the
Am,with a base member of TupleTableSlot with basically just tts_nvalid, tts_values, tts_isnull as members. Those are
theonly members that can be accessed without functions.

Access to the individual functions (say store_tuple) would then be direct members of the TupleTableSlot interface.
Whilethat costs a bit of memory, it removes one indirection from an already performance critical path.

- MinimalTuples should be one type of slot for the above, except it's not created by an StorageAm but by a function
returninga TupleTableSlot.

This should remove the need for the slot_copy_min_tuple, slot_is_physical_tuple functions.

- Right now TupleTableSlots are an executor datastructure, but these patches (rightly!) make it much more widely used.
SoI think it needs to be moved outside of executor/, and probably renamed to something like TupleHolder or something.

- The oid integration seems wrong - without an accessor oids won't be queryable with this unless you break through the
API. But from a higher level view I do wonder if it's not time to remove "special" oid columns and replace them with a
normalcolumn. We should be hesitant enshrining crusty old concepts in new APIs.

Executor integration:

- I'm quite fearful that this'll cause slowdowns in a few tight paths. The most likely cases here seem to be a) bitmap
indexscansb) indexscans c) highly selective sequential scans. I do wonder if that can be partially addressed by
switchingout the individual executor routines in the relevant scan nodes by something using or similar to the
infrastructurein cc9f08b6b8

Regards,

Andres

pgsql-hackers by date:

From: Chris Travers
Date: 15 August 2017, 09:24:37
Subject: Re: [HACKERS] Orphaned files in base/[oid]

From: Michael Paquier
Date: 15 August 2017, 09:58:57
Subject: [HACKERS] Generate wait event list and docs from text file

Re: [HACKERS] Pluggable storage - Mailing list pgsql-hackers

Previous

Next