Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CA+TgmoazFeUhBNYYUE0K4rhtqQAQT4=RBVkKuvod+dOxHTgvDA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On Fri, Oct 2, 2015 at 11:44 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Oct 1, 2015 at 7:41 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Does heap_parallelscan_nextpage really need a pscan_finished output >> parameter, or can it just return InvalidBlockNumber to indicate end of >> scan? I think the latter can be done and would be cleaner. > > I think we can do that way as well, however we have to keep the check > of page == InvalidBlockNumber at all the callers to indicate finish > of scan which made me write the function as it exists in patch. However, > I don't mind changing it the way you have suggested if you feel that would > be cleaner. I think it would. I mean, you just end up testing the other thing instead. >> I think that heap_parallelscan_initialize() should taken an >> additional Boolean argument, allow_sync. It should decide whether to >> actually perform a syncscan using the logic from initscan(), and then >> it should store a phs_syncscan flag into the ParallelHeapScanDesc. >> heap_beginscan_internal should set rs_syncscan based on phs_syncscan, >> regardless of anything else. > > I think this will ensure that any future change in this area won't break the > guarantee for sync scans for parallel workers, is that the reason you > prefer to implement this function in the way suggested by you? Basically. It seems pretty fragile the way you have it now. >> I think heap_parallel_rescan() is an unworkable API. When rescanning >> a synchronized scan, the existing logic keeps the original start-block >> so that the scan's results are reproducible, > > It seems from the code comments in initscan, the reason for keeping > previous startblock is to allow rewinding the cursor which doesn't hold for > parallel scan. We might or might not want to support such cases with > parallel query, but even if we want to there is a way we can do with > current logic (as mentioned in one of my replies below). You don't need to rewind a cursor; you just need to restart the scan. So for example if we were on the inner side of a NestLoop, this would be a real issue. >> but no longer reports the >> scan position since we're presumably out of step with other backends. > > Is it true for all form of rescans or are you talking about some > case (like SampleScan) in particular? As per my reading of code > (and verified by debugging that code path), it doesn't seem to be true > for rescan in case of seqscan. I think it is: if (keep_startblock) { /* * When rescanning, we want to keep the previous startblock setting, *so that rewinding a cursor doesn't generate surprising results. * Reset the active syncscan setting, though. */ scan->rs_syncscan = (allow_sync && synchronize_seqscans); } >> This isn't going to work at all with this API. I don't think you can >> swap out the ParallelHeapScanDesc for another one once you've >> installed it; >> > > Sure, but if we really need some such parameters like startblock position, > then we can preserve those in ScanDesc. Sure, we could transfer the information out of the ParallelHeapScanDesc and then transfer it back into the new one, but I have a hard time thinking that's a good design. > PARAM_EXEC parameters will be used to support initPlan in parallel query (as > it is done in the initial patch), so it seems to me this is the main > downside of > this approach. I think rather than trying to come up with various possible > solutions for this problem, lets prohibit sync scans with parallel query if > you > see some problem with the suggestions made by me above. Do you see any > main use case getting hit due to non support of sync scans with > parallel seq. scan? Yes. Synchronized scans are particularly important with large tables, and those are the kind you're likely to want to use a parallel sequential scan on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: