Re: Choosing parallel_degree - Mailing list pgsql-hackers
From | Paul Ramsey |
---|---|
Subject | Re: Choosing parallel_degree |
Date | |
Msg-id | CACowWR2Uz8xia83-TBK0Z0Mkkz02c_WfCWftWXyq3DCuO_4Q-w@mail.gmail.com Whole thread Raw |
In response to | Re: Choosing parallel_degree (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Choosing parallel_degree
Re: Choosing parallel_degree |
List | pgsql-hackers |
On Fri, Apr 8, 2016 at 9:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 8 April 2016 at 17:00, Paul Ramsey <pramsey@cleverelephant.ca> wrote: >> >> On Fri, Apr 8, 2016 at 8:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> > On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> >> > wrote: >> >> Other than that, patch looks good and I have marked it as Ready For >> >> Committer. Hope, we get this for 9.6. >> > >> > Committed. I think this is likely to make parallel query >> > significantly more usable in 9.6. >> >> I'm kind of worried that it will make it yet less usable for PostGIS, >> since approaches that ignore costs in favour of relpages will >> dramatically under-resource our queries. I can spin a query for >> multiple seconds on a table with less than 100K records, not even >> trying very hard. > > Doesn't sound good. I admit, it's not a "usual" database thing, but it's right in the meaty middle of use cases that parallelism can crushingly awesomely defeat. It's also probably not too unusual for extension use cases, where complex data are held in user defined types, whether they be image fragments, music samples, genetic data, raster data or LIDAR point clouds. PostGIS is just one voice of many in the Symphony of Crazy Shit in the Database. >> Functions have very unequal CPU costs, and we're talking here about >> using CPUs more effectively, why are costs being given the see-no-evil >> treatment? This is as true in core as it is in PostGIS, even if our >> case is a couple orders of magnitude more extreme: a filter based on a >> complex combination of regex queries will use an order of magnitude >> more CPU than one that does a little math, why plan and execute them >> like they are the same? > > Functions have user assignable costs. We have done a relatively bad job of globally costing our functions thus far, because it mostly didn't make any difference. In my testing [1], I found that costing could push better plans for parallel sequence scans and parallel aggregates, though at very extreme cost values (1000 for sequence scans and 10000 for aggregates) Obviously, if costs can make a difference for 9.6 and parallelism we'll rigorously ensure we have good, useful costs. I've already costed many functions in my parallel postgis test branch [2]. Perhaps the avoidance of cost so far is based on the relatively nebulous definition it has: about the only thing in the docs is "If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary." So what about C functions then? Should a string comparison be 5 and a multiplication 1? An image histogram 1000? >> As it stands now, it seems like out of the box PostGIS users will >> actually not see much benefit from parallelism unless they manhandle >> their configuration settings to force it. > > Does this concern apply to this patch, or to the general situation for 9.6. Insofar as the patch is throttling how many parallel workers you get based solely on your relsize, it does concern this patch, but it's a general issue in both the extreme and not obviously related costings needed to trip parallel sequence and parallel aggregate plans. The parallel join seems to not take function/operator costs into account at all [3], at least I couldn't plump up a high enough cost to trip it without also adjusting the global parallel tuple cost configuration. I've seen a number of asides to the effect that "yes, costs are important, but we probably can't do anything about that for 9.6" in parallel patch threads, including this one, so I'm getting concerned that the core improvement we've been hoping for for years won't actually address our use cases when it is first released. That may just be the way it is, c'est la vie, but it would be unfortunate. P [1] http://blog.cleverelephant.ca/2016/03/parallel-postgis.html [2] https://github.com/pramsey/postgis/tree/parallel [3] http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html
pgsql-hackers by date: