Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API) |
Date | |
Msg-id | 15333.1436911440@sss.pgh.pa.us Whole thread Raw |
In response to | Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: ctidscan as an example of custom-scan (Re: [v9.5]
Custom Plan API)
Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API) |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > Both you and Andres have articulated the concern that CustomScan isn't > actually useful, but I still don't really understand why not. > I'm curious, for example, whether CustomScan would have been > sufficient to build TABLESAMPLE, and if not, why not. Obviously the > syntax has to be in core, ... so you just made the point ... > but why couldn't the syntax just call an > extension-provided callback that returns a custom scan, instead of > having a node just for TABLESAMPLE? Because that only works for small values of "work". As far as TABLESAMPLE goes, I intend to hold Simon's feet to the fire until there's a less cheesy answer to the problem of scan reproducibility. Assuming we're going to allow sample methods that can't meet the reproducibility requirement, we need something to prevent them from producing visibly broken query results. Ideally, the planner would avoid putting such a scan on the inside of a nestloop. A CustomScan-based implementation could not possibly arrange such a thing; we'd have to teach the core planner about the concern. Or, taking the example of a GpuScan node, it's essentially impossible to persuade the planner to delegate any expensive function calculations, aggregates, etc to such a node; much less teach it that that way is cheaper than doing such things the usual way. So yeah, KaiGai-san may have a module that does a few things with a GPU, but it's far from doing all or even very much of what one would want. Now, as part of the upper-planner-rewrite business that I keep hoping to get to when I'm not riding herd on bad patches, it's likely that we might have enough new infrastructure soon that that particular problem could be solved. But there would just be another problem after that; a likely example is not having adequate statistics, or sufficiently fine-grained function cost estimates, to be able to make valid choices once there's more than one way to do such calculations. (I'm not really impressed by "the GPU is *always* faster" approaches.) Significant improvements of that sort are going to take core-code changes. Even worse, if there do get to be any popular custom-scan extensions, we'll break them anytime we make any nontrivial planner changes, because there is no arms-length API there. A trivial example is that even adding or changing any fields in struct Path will necessarily break custom scan providers, because they build Paths for themselves with no interposed API. In large part this is the same as my core concern about the TABLESAMPLE patch: exposing dubiously-designed APIs is soon going to force us to make choices between breaking those APIs or not being able to make changes we need to make. In the case of custom scans, I will not be particularly sad when (not if) we break custom scan providers; but in other cases such tradeoffs are going to be harder to make. regards, tom lane
pgsql-hackers by date: