Thread: [pgAgent][Patch] Fixing connection pool leak
Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
Rob
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
Rob

Phone: 0800 021 0888 Email: contactus@ codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Attachment
Hi Rob,
How about this?
On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,
Following on from https://www.postgresql.org/message-id/CA% 2BOCxoz4tONxSpd1rdU- 9SPKRzucz8Bar2CXkEDnCwV6H77ZyA %40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Attachment
Hi Ashesh,
I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.
if (job != NULL)
{
delete job;
job = NULL;
}
I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.
@Robert - Can you also test at your environment and keep us updated for any leak ?
Thanks,
Neel Patel
On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Attachment
Hiya,
That seems absolutely fine. I've tested the patch I had over the weekend, the problem is pretty easy to reproduce with a job that runs every minute and 4 instances of pgagent.
I'll give this version a test as well, but I'm confident it should be fine.
Thanks,That seems absolutely fine. I've tested the patch I had over the weekend, the problem is pretty easy to reproduce with a job that runs every minute and 4 instances of pgagent.
I'll give this version a test as well, but I'm confident it should be fine.
On 23 October 2017 at 06:00, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Phone: 0800 021 0888 Email: contactus@ codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Hiya,
Actually, on reflection (I've not tested yet) but I don't think that'll actually work. Looking at pgAgent.cpp:119, Create and Run are only executed if it's Runnable. I've never used WXWidgets before so I'm not sure but I would guess that Create and Run execute the Entry() method on the new thread that's build. For that delete to be executed we'd need to move this outside of the if?On 23 October 2017 at 06:00, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Phone: 0800 021 0888 Email: contactus@ codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Hiya,
I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes. To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"
This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:
Hi Ashesh,I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.if (job != NULL){delete job;job = NULL;}I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.@Robert - Can you also test at your environment and keep us updated for any leak ?Thanks,Neel PatelOn Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Phone: 0800 021 0888 Email: contactus@ codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Hello again,
I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pg agent/blob/master/pgAgent.cpp# L139 or another method that only gives up the connection which is called in the same place as the delete jt.
Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.
Many Thanks,
Rob
I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pg
Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.
Many Thanks,
Rob
On 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:
RobThanks,I hope that makes senseHiya,I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:Hi Ashesh,I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.if (job != NULL){delete job;job = NULL;}I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.@Robert - Can you also test at your environment and keep us updated for any leak ?Thanks,Neel PatelOn Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--
Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--
Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Phone: 0800 021 0888 Email: contactus@ codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:
Hello again,
I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent. cpp#L139 or another method that only gives up the connection which is called in the same place as the delete jt.
Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.
Nothing wrong in it.
My intention was to release the object asap.
As per your findings, it won't go in that execution flow in all cases. I will commit your patch.
-- Thanks, Ashesh
Many Thanks,
RobOn 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:RobThanks,I hope that makes senseHiya,I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:Hi Ashesh,I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.if (job != NULL){delete job;job = NULL;}I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.@Robert - Can you also test at your environment and keep us updated for any leak ?Thanks,Neel PatelOn Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.netPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:Hello again,
I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp# L139 or another method that only gives up the connection which is called in the same place as the delete jt.
Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.Nothing wrong in it.My intention was to release the object asap.As per your findings, it won't go in that execution flow in all cases. I will commit your patch.
Committed your original patch.
-- Thanks, Ashesh
Many Thanks,
RobOn 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:RobThanks,I hope that makes senseHiya,I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:Hi Ashesh,I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.if (job != NULL){delete job;job = NULL;}I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.@Robert - Can you also test at your environment and keep us updated for any leak ?Thanks,Neel PatelOn Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.netPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Awesome thanks.
What's the release schedule like for pgagent?On 24 October 2017 at 16:05, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:Hello again,
I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp# L139 or another method that only gives up the connection which is called in the same place as the delete jt.
Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.Nothing wrong in it.My intention was to release the object asap.As per your findings, it won't go in that execution flow in all cases. I will commit your patch.Committed your original patch.-- Thanks, Ashesh
Many Thanks,
RobOn 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:RobThanks,I hope that makes senseHiya,I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:Hi Ashesh,I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.if (job != NULL){delete job;job = NULL;}I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.@Robert - Can you also test at your environment and keep us updated for any leak ?Thanks,Neel PatelOn Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.netPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Phone: 0800 021 0888 Email: contactus@ codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
On Tue, Oct 24, 2017 at 8:59 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
I have no idea.Awesome thanks.What's the release schedule like for pgagent?
That can be answered by Dave only.
Dave?
RobThanks,On 24 October 2017 at 16:05, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:Hello again,
I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp# L139 or another method that only gives up the connection which is called in the same place as the delete jt.
Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.Nothing wrong in it.My intention was to release the object asap.As per your findings, it won't go in that execution flow in all cases. I will commit your patch.Committed your original patch.-- Thanks, Ashesh
Many Thanks,
RobOn 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:RobThanks,I hope that makes senseHiya,I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:Hi Ashesh,I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.if (job != NULL){delete job;job = NULL;}I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.@Robert - Can you also test at your environment and keep us updated for any leak ?Thanks,Neel PatelOn Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.netPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.netPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
On Wed, Oct 25, 2017 at 5:44 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:59 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:I have no idea.Awesome thanks.What's the release schedule like for pgagent?That can be answered by Dave only.Dave?
Basically as soon as Neel's work is complete and we can update the packaging etc. and get it tested by Fahar.
RobThanks,On 24 October 2017 at 16:05, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:Hello again,
I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp# L139 or another method that only gives up the connection which is called in the same place as the delete jt.
Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.Nothing wrong in it.My intention was to release the object asap.As per your findings, it won't go in that execution flow in all cases. I will commit your patch.Committed your original patch.-- Thanks, Ashesh
Many Thanks,
RobOn 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:RobThanks,I hope that makes senseHiya,I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:Hi Ashesh,I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.if (job != NULL){delete job;job = NULL;}I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.@Robert - Can you also test at your environment and keep us updated for any leak ?Thanks,Neel PatelOn Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote: Hi Rob,How about this?On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:Hi,
Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU -9SPKRzucz8Bar2CXkEDnCwV6H77Zy A%40mail.gmail.com
I think I've identified and fixed the issue, please see the patch attached.
As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.
The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.
Thanks,
RobPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.netPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
--Robert Emery
Infrastructure Director
E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.netPhone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company