Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: [HACKERS] Parallel Append implementation |
Date | |
Msg-id | CAJ3gD9e-_qFOBkoQCR7OV6PjJ2uMjftEPSCHMBNPT7ZohSzCow@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel Append implementation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Parallel Append implementation
|
List | pgsql-hackers |
Attached is the updated patch that handles the changes for all the comments except the cost changes part. Details about the specific changes are after the cost-related points discussed below. >> I wanted to take into account persubpath parallel_workers for total >> cost of Append. Suppose the partial subpaths have per worker total >> costs (3, 3, 3) and their parallel_workers are (2, 8, 4), with 2 >> Append workers available. So according to what you say, the total cost >> is 9. With persubplan parallel_workers taken into account, total cost >> = (3*2 + 3*8 * 3*4)/2 = 21. > But that case never happens, because the parallel workers for the > append is always at least as large as the number of workers for any > single child. Yeah, that's right. I will use this approach for partial paths. For non-partial paths, I was checking following 3 options : Option 1. Just take the sum of total non-partial child costs and divide it by number of workers. It seems to be getting close to the actual cost. Option 2. Calculate exact cost by an algorithm which I mentioned before, which is pasted below for reference : Persubpath cost : 20 16 10 8 3 1, with 3 workers. After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the times remaining are : 10 6 0 8 3 1 After 6 units (minimum of 10, 06, 08), the times remaining are : 4 0 0 2 3 1 After 2 units (minimum of 4, 2, 3), the times remaining are : 2 0 0 0 1 1 After 1 units (minimum of 2, 1, 1), the times remaining are : 1 0 0 0 0 0 After 1 units (minimum of 1, 0 , 0), the times remaining are : 0 0 0 0 0 0 Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20 Option 3. Get some approximation formula like you suggested. I am also looking for such formula, just that some things are not clear to me. The discussion of the same is below ... >>> 2. Next, estimate the cost of the nonpartial paths. To do this, make >>> an array of Cost of that length and initialize all the elements to >>> zero, then add the total cost of each nonpartial plan in turn to the >>> element of the array with the smallest cost, and then take the maximum >>> of the array elements as the total cost of the nonpartial plans. Add >>> this to the result from step 1 to get the total cost. >> >> So with costs (8, 5, 2), add 8 and 5 to 2 so that it becomes (8, 5, >> 15) , and so the max is 15 ? I surely am misinterpreting this. > No. If you have costs 8, 5, and 2 and only one process, cost is 15. > If you have two processes then for costing purposes you assume worker > 1 will execute the first path (cost 8) and worker 2 will execute the > other two (cost 5 + 2 = 7), so the total cost is 8. If you have three > workers, the cost will still be 8, because there's no way to finish > the cost8 path in less than 8 units of work. So the part that you suggested about adding up total cost in turn to the smallest cost; this suggestion applies to only 1 worker right ? For more than worker, are you suggesting to use some algorithm similar to the one I suggested in option 2 above ? If yes, it would be great if you again describe how that works for multiple workers. Or is it that you were suggesting some simple approximate arithmetic that applies to multiple workers ? Like I mentioned, I will be happy to get such simple approximation arithmetic that can be applied for multiple worker case. The one logic I suggested in option 2 is something we can keep as the last option. And option 1 is also an approximation but we would like to have a better approximation. So wanted to clear my queries regarding option 3. ---------- Details about all the remaining changes in updated patch are below ... On 20 March 2017 at 17:29, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Mar 17, 2017 at 1:12 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >>> - The substantive changes in add_paths_to_append_rel don't look right >>> either. It's not clear why accumulate_partialappend_subpath is >>> getting called even in the non-enable_parallelappend case. I don't >>> think the logic for the case where we're not generating a parallel >>> append path needs to change at all. >> >> When accumulate_partialappend_subpath() is called for a childrel with >> a partial path, it works just like accumulate_append_subpath() when >> enable_parallelappend is false. That's why, for partial child path, >> the same function is called irrespective of parallel-append or >> non-parallel-append case. May be mentioning this in comments should >> suffice here ? > > I don't get it. If you can get the same effect by changing something > or not changing it, presumably it'd be better to not change it. We > try not to change things just because we can; the change should be an > improvement in some way. > >>> - When parallel append is enabled, I think add_paths_to_append_rel >>> should still consider all the same paths that it does today, plus one >>> extra. The new path is a parallel append path where each subpath is >>> the cheapest subpath for that childrel, whether partial or >>> non-partial. If !enable_parallelappend, or if all of the cheapest >>> subpaths are partial, then skip this. (If all the cheapest subpaths >>> are non-partial, it's still potentially useful.) >> >> In case of all-partial childrels, the paths are *exactly* same as >> those that would have been created for enable_parallelappend=off. The >> extra path is there for enable_parallelappend=on only when one or more >> of the child rels do not have partial paths. Does this make sense ? > > No, I don't think so. Imagine that we have three children, A, B, and > C. The cheapest partial paths have costs of 10,000 each. A, however, > has a non-partial path with a cost of 1,000. Even though A has a > partial path, we still want to consider a parallel append using the > non-partial path because it figures to be hugely faster. Right. Now that we want to consider both cheapest partial and cheapest non-partial path, I now get what you were saying about having an extra path for parallel_append. I have done all of the above changes. Now we have an extra path for enable_parallelappend=true, besides the non-parallel partial append path. > - You've added a GUC (which is good) but not documented it (which is > bad) or added it to postgresql.conf.sample (also bad). Done. > > - You've used a loop inside a spinlock-protected critical section, > which is against project policy. Use an LWLock; define and document a > new builtin tranche ID. Done. Used LWlock for the parallel append synchronization. But I am not sure what does "document the new builtin trancheID" mean. Didn't find a readme which documents tranche ids. For setting pa_finished=true when a partial plan finished, earlier it was using Spinlock. Now it does not use any synchronization. It was actually earlier using it because there was another field num_workers, but it is not needed since there is no num_workers. I was considering whether to use atomic read and write API in atomics.c for pa_finished. But from what I understand, just a plain read/write is already atomic. We require them only if there are some compound operations like increment, exchange, etc. > > - The comment for pa_finished claims that it is the number of workers > executing the subplan, but it's a bool, not a count; I think this > comment is just out of date. Done. > > - paths_insert_sorted_by_cost() is a hand-coded insertion sort. Can't > we find a way to use qsort() for this instead of hand-coding a slower > algorithm? I think we could just create an array of the right length, > stick each path into it from add_paths_to_append_rel, and then qsort() > the array based on <is-partial, total-cost>. Then the result can be > turned into a list. Now added a new function list.c list_qsort() so that it can be used in the future. > > - Maybe the new helper functions in nodeAppend.c could get names > starting with exec_append_, to match the style of > exec_append_initialize_next(). Done. > > - There's a superfluous whitespace change in add_paths_to_append_rel. Didn't find exactly which, but I guess the attached latest patch does not have it. >>> - In get_append_num_workers, instead of the complicated formula with >>> log() and 0.693, just add the list lengths and call fls() on the >>> result. Integer arithmetic FTW! >> >> Yeah fls() could be used. BTW I just found that costsize.c already has >> this defined in the same way I did: >> #define LOG2(x) (log(x) / 0.693147180559945) >> May be we need to shift this to some common header file. > > LOG2() would make sense if you're working with a value represented as > a double, but if you have an integer input, I think fls() is better. Used fls() now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: