Re: Parallel copy - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CAA4eK1JO7A5wywsp8v-F=7zhcnOAYUgQAjBHvt_ZM4recjz_Vw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel copy
|
List | pgsql-hackers |
On Thu, Apr 9, 2020 at 1:00 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <ants@cybertec.at> wrote: > > > > With option 1 it's not possible to read input data into shared memory > > and there needs to be an extra memcpy in the time critical sequential > > flow of the leader. With option 2 data could be read directly into the > > shared memory buffer. With future async io support, reading and > > looking for tuple boundaries could be performed concurrently. > > But option 2 still seems significantly worse than your proposal above, right? > > I really think we don't want a single worker in charge of finding > tuple boundaries for everybody. That adds a lot of unnecessary > inter-process communication and synchronization. Each process should > just get the next tuple starting after where the last one ended, and > then advance the end pointer so that the next process can do the same > thing. Vignesh's proposal involves having a leader process that has to > switch roles - he picks an arbitrary 25% threshold - and if it doesn't > switch roles at the right time, performance will be impacted. If the > leader doesn't get scheduled in time to refill the queue before it > runs completely empty, workers will have to wait. Ants's scheme avoids > that risk: whoever needs the next tuple reads the next line. There's > no need to ever wait for the leader because there is no leader. > Hmm, I think in his scheme also there is a single reader process. See the email above [1] where he described how it should work. I think the difference is in the division of work. AFAIU, in Ants scheme, the worker needs to pick the work from tuple_offset queue whereas in Vignesh's scheme it will be based on the size (each worker will get probably 64KB of work). I think in his scheme the main thing to find out is how many tuple offsets to be assigned to each worker in one-go so that we don't unnecessarily add contention for finding the work unit. I think we need to find the right balance between size and number of tuples. I am trying to consider size here because larger sized tuples will probably require more time as we need to allocate more space for them and also probably requires more processing time. One way to achieve that could be each worker will try to claim 500 tuples (or some other threshold number) but if their size is greater than 64K (or some other threshold size) then the worker will try with lesser number of tuples (such that the size of the chunk of tuples is less than a threshold size.). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: