Re: [HACKERS] Pluggable storage - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Pluggable storage |
Date | |
Msg-id | 20170815065348.afkj45dgmg22ecfh@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Pluggable storage (Haribabu Kommi <kommi.haribabu@gmail.com>) |
Responses |
Re: [HACKERS] Pluggable storage
Re: [HACKERS] Pluggable storage |
List | pgsql-hackers |
Hi, On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote: > Here I attached WIP patches to support pluggable storage. The patch series > are may not work individually. Still so many things are under development. > These patches are just to share the approach of the current development. Making a pass through the patchset to get a feel where this at, and where this is headed. I previously skimmed the thread to get a rough sense on what's discused, but not in a very detailed manner. General: - I think one important discussion we need to have is what kind of performance impact we're going to accept introducing this.It seems very likely that this'll cause some slowdown. We can kind of alleviate that by doing some optimizations atthe same time, but nevertheless, this abstraction is going to cost. - I don't think we should introduce this without a user besides heapam. The likelihood that API will be usable by anythingelse without a testcase seems fairly remote. I think some src/test/modules type implementation of a per-session,in-memory storage - relatively easy to implement - or such is necessary. - I think, and detailed some of that, we're going to need some cleanups that go in before this, to decrease the size / increasethe quality of the new APIs. It's going to get more painful to change APIs subsequently. - We'll have to document clearly that these APIs are going to change for a while, even after the release introducing them. StorageAm - Scan stuff: - I think API isn't quite right. There's a lot of granular callback functionality like scan_begin_function / scan_begin_catalog/ scan_begin_bm - these largely are convenience wrappers around the same function, and I don't think thatwould, or rather should, change in any abstracted storage layer. So I think there needs to be some unification first(pretty close w/ beginscan_internal already, but perhaps we should get rid of a few of these wrappers). - Some of the exposed functionality, e.g. scan_getpage, scan_update_snapshot, scan_rescan_set_params looks like it shouldjust be excised, i.e. there's no reason for it to exist. - Minor: don't think the _function suffix for Storis necessary, just makes things long, and every member has it. Besidesthat, it's also easy to misunderstand - for a second I understood scan_getnext_function to be about getting the nextfunction... - Scans are still represented as HeapScanDesc - I don't think that's going to fly. Either this needs to be an opaque type(i.e. a struct that's not defined, just forward declared), or it needs to be a base struct that individual AMs embedin their own structs. Individual AMs definitely are going to need different pieces of data. Storage AM - tuple stuff: - tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are each individual functions, that seems pretty painfulto maintain, and v/ likely to just grow and grow. Not sure what the solution is, but this seems like a hard sell. - The three *speculative* functions don't quite seem right to me, nor do I understand: + * + * Setting a tuple's speculative token is a slot-only operation, so no need + * for a storage AM method, but after inserting a tuple containing a + * speculative token, the insertion must be completed by these routines: + */ I don't see anything related to slots, here? Storage AM - slot stuff: - I think there's a number of wrapper functions (slot_getattr, slot_getallattrs, getsomeattrs, attisnull) around the samefunctionality - that bloats the API and causes slowdowns. Instead we need something like slot_virtualize_tuple(int16upto), and the rest should just be wrappers. - I think it's wrong to have the slot functionality defined on the StorageAm level. That'll cause even more indirect functioncalls (=> slowness), and besides that the TupleTableSlot struct will need members for each and every Am. I think the solution is to instead have an Am level "create slot" function, and the returned slot is allocated by the Am,with a base member of TupleTableSlot with basically just tts_nvalid, tts_values, tts_isnull as members. Those are theonly members that can be accessed without functions. Access to the individual functions (say store_tuple) would then be direct members of the TupleTableSlot interface. Whilethat costs a bit of memory, it removes one indirection from an already performance critical path. - MinimalTuples should be one type of slot for the above, except it's not created by an StorageAm but by a function returninga TupleTableSlot. This should remove the need for the slot_copy_min_tuple, slot_is_physical_tuple functions. - Right now TupleTableSlots are an executor datastructure, but these patches (rightly!) make it much more widely used. SoI think it needs to be moved outside of executor/, and probably renamed to something like TupleHolder or something. - The oid integration seems wrong - without an accessor oids won't be queryable with this unless you break through the API. But from a higher level view I do wonder if it's not time to remove "special" oid columns and replace them with a normalcolumn. We should be hesitant enshrining crusty old concepts in new APIs. Executor integration: - I'm quite fearful that this'll cause slowdowns in a few tight paths. The most likely cases here seem to be a) bitmap indexscansb) indexscans c) highly selective sequential scans. I do wonder if that can be partially addressed by switchingout the individual executor routines in the relevant scan nodes by something using or similar to the infrastructurein cc9f08b6b8 Regards, Andres
pgsql-hackers by date: