Re: Custom Scans and private data - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Custom Scans and private data |
Date | |
Msg-id | CA+TgmoYhMb-DFwu-26p81hkRhrg-dsB0kP1z29+-2D4SoC_hFw@mail.gmail.com Whole thread Raw |
In response to | Re: Custom Scans and private data (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Custom Scans and private data
Re: Custom Scans and private data |
List | pgsql-hackers |
On Wed, Aug 26, 2015 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote: >> > Looking at >> > postgres_fdw and the pg-strom example linked upthread imo pretty clearly >> > shows how ugly this is. There's also the rather noticeable difference >> > that we already have callbacks in the node for custom scans (used by >> > outfuncs), making this rather trivial to add. >> >> I will manfully refrain from taking that bait. > > While that comment made me laugh, I'm not really sure why my examples > are bait. One is the probably most used fdw, the other the only user of > the custom scan interface I know of. I suspect what Tom is getting at is that he thinks the custom scan interface is a giant piece of dung which I ought not to have committed unless somebody was holding a gun to my head at time time, and that putting function pointers in the structure at all was a particularly poor design decision on par with the Roman decision to make their water pipes out of a metal which is toxic if ingested. He and I are going to have to agree to disagree on that. I think it's great that you are experimenting with the feature, and I think we ought to try to make it better, even if that requires incompatible changes to the API. When we originally discussed this topic at a previous PGCon developer meeting, it was suggested that we might want to have a facility for custom nodes, and I at least had the impression that this might be separate from particular facilities we might have for custom paths or custom plans or custom plan-states. For example, suppose you can do this: void InitCustomNodeAndres(CustomNodeTemplate *tmpl) { tmpl->outfunc = _outAndres; tmpl->copyfunc = _copyAndres; tmpl->equalfunc = _equalAndres; tmpl->readfunc = _readAndres; } void _PG_init() { RegisterCustomNodeType("Andres", "andres", "InitCustomNodeAndres"); } ... Andres *andres = makeCustomNode(Andres); If we had something like that, then you could address your use case by making the structures you want to pass around be nodes rather than having to invent a way to smash whatever you have into a binary blob. That would be pretty cool. Maybe, with a bit of work, we could even find a way to get rid of the idea of custom-path, custom-plan, and custom scan-state as node types in their own right and instead have some trick where they can just be general custom nodes that have one or two extra methods defined for the stuff that is specific to the planner and executor. Not sure exactly how to set that up off-hand, but maybe there's a good way. In general, I think we've made extensibility very hard in areas like this. There are a whole bunch of thorny problems here that are interconnected. For example, suppose you want to add a new kind of SQL-visible object to the system, something EnterpriseDB has needed to do repeatedly over the years. You have to modify like 40 core source files. You need new parser rules: but the parser is not extensible. You need new keywords: but the keyword table is not extensible. To represent the results of parsing, you need new parse nodes: but the node system is not extensible. You need to classify your statement as DML for logging purposes, you need to hook it into the event trigger system, you need to hook it into objectaddress.c, you need a new command tag, you need to add your new command to the appropriate place in ProcessUtility. You can sort of get control in ProcessUtility from a hook, but it's awkward, and the others can't be handled at all. So you just have to hack core. It would be nice to fully solve this problem so that some of the EDB stuff could exist as an extension rather than a modification of the core code, but given the magnitude of the problem I don't expect to get there any time soon. Still, I think making core facilities more accessible to extensions has a ton of merit. FDWs, background workers, dynamic shared memory, logical decoding plugins, DDL deparsing, and, yeah, even custom scans are all relatively recent examples of new facilities we've been careful to make usable outside core, and there is, I think, a good deal of evidence that every one of those facilities has gotten use by developers other than the ones who installed the core support, which I personally find really satisfying. I predict that if we take steps to make the node system more extensible, we will find that that capability gets plenty of use, including some uses we can't now predict. And even more broadly: I think the time has come to get really skeptical about every place in our code where there's a big giant switch. All of those represent places where non-core code need not apply. And all of them represent places where, instead of each module registering itself with all the correct subsystems, each subsystem knows about all of the things that plug into it. Whoever designed the shared-memory initialization mechanism was smart: it knows very little itself, and relies on its clients to tell it what they need. There are some things I don't like about that mechanism, but at least we got that part right, and so extensions can use it. A lot of other places, unfortunately, took the simpler approach of just hard-coding everything. There are some places where performance or other concerns justify that approach, but there are a lot of places where I think another approach would be better. EDB is far from the only company that would benefit from making more of these subsystems extensible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: