Thread: [pgAgent][Patch] Fixing connection pool leak

[pgAgent][Patch] Fixing connection pool leak

From

Rob Emery

Date:

21 October 2017, 18:06:24

Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers

October

Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Attachment

connection-pool-patch1.diff

Re: [pgAgent][Patch] Fixing connection pool leak

From

Ashesh Vashi

Date:

23 October 2017, 08:00:51

Hi Rob,

How about this?

Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:

Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

Attachment

connection_pool_v2.patch

Re: [pgAgent][Patch] Fixing connection pool leak

From

Neel Patel

Date:

23 October 2017, 10:18:01

Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)

{

delete job;

job = NULL;

}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,

Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:

Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

Attachment

connection_pool_v3.patch

Re: [pgAgent][Patch] Fixing connection pool leak

From

Rob Emery

Date:

23 October 2017, 10:29:08

Hiya,

That seems absolutely fine. I've tested the patch I had over the weekend, the problem is pretty easy to reproduce with a job that runs every minute and 4 instances of pgagent.

I'll give this version a test as well, but I'm confident it should be fine.

Thanks,

Rob

On 23 October 2017 at 06:00, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:

Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers

October

Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Re: [pgAgent][Patch] Fixing connection pool leak

From

Rob Emery

Date:

23 October 2017, 15:10:54

Hiya,

Actually, on reflection (I've not tested yet) but I don't think that'll actually work. Looking at pgAgent.cpp:119, Create and Run are only executed if it's Runnable. I've never used WXWidgets before so I'm not sure but I would guess that Create and Run execute the Entry() method on the new thread that's build. For that delete to be executed we'd need to move this outside of the if?

I'll test and confirm this evening

Thanks,

Rob

On 23 October 2017 at 06:00, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:

Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers

October

Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Re: [pgAgent][Patch] Fixing connection pool leak

From

Rob Emery

Date:

23 October 2017, 22:09:04

Hiya,

I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.

To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"

This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.

I hope that makes sense

Thanks,

Rob

On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:

Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)
{
delete job;
job = NULL;
}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,
Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers

October

Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Re: [pgAgent][Patch] Fixing connection pool leak

From

Rob Emery

Date:

24 October 2017, 17:52:53

Hello again,

I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp#L139 or another method that only gives up the connection which is called in the same place as the delete jt.

Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.

Many Thanks,
Rob

On 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:

Hiya,

I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.

To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"

This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.

I hope that makes sense

Thanks,
Rob

On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:
Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)
{
delete job;
job = NULL;
}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,
Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

--

Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers

October

Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Re: [pgAgent][Patch] Fixing connection pool leak

From

Ashesh Vashi

Date:

24 October 2017, 18:00:37

On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:

Hello again,

I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp#L139 or another method that only gives up the connection which is called in the same place as the delete jt.

Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.

Nothing wrong in it.

My intention was to release the object asap.

As per your findings, it won't go in that execution flow in all cases. I will commit your patch.

-- Thanks, Ashesh

Many Thanks,
Rob

On 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hiya,

I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.

To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"

This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.

I hope that makes sense

Thanks,
Rob

On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:
Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)
{
delete job;
job = NULL;
}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,
Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

Re: [pgAgent][Patch] Fixing connection pool leak

From

Ashesh Vashi

Date:

24 October 2017, 18:05:16

On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:

On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:
Hello again,

I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp#L139 or another method that only gives up the connection which is called in the same place as the delete jt.

Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.
Nothing wrong in it.
My intention was to release the object asap.

As per your findings, it won't go in that execution flow in all cases. I will commit your patch.

Committed your original patch.

Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

-- Thanks, Ashesh

Many Thanks,
Rob

On 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hiya,

I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.

To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"

This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.

I hope that makes sense

Thanks,
Rob

On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:
Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)
{
delete job;
job = NULL;
}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,
Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

Re: [pgAgent][Patch] Fixing connection pool leak

From

Rob Emery

Date:

24 October 2017, 18:29:45

Awesome thanks.

What's the release schedule like for pgagent?

Thanks,

Rob

On 24 October 2017 at 16:05, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:

On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:
Hello again,

I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp#L139 or another method that only gives up the connection which is called in the same place as the delete jt.

Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.
Nothing wrong in it.
My intention was to release the object asap.

As per your findings, it won't go in that execution flow in all cases. I will commit your patch.
Committed your original patch.

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

-- Thanks, Ashesh

Many Thanks,
Rob

On 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hiya,

I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.

To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"

This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.

I hope that makes sense

Thanks,
Rob

On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:
Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)
{
delete job;
job = NULL;
}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,
Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888 Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers

October

Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Re: [pgAgent][Patch] Fixing connection pool leak

From

Ashesh Vashi

Date:

25 October 2017, 06:44:34

On Tue, Oct 24, 2017 at 8:59 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:

Awesome thanks.

What's the release schedule like for pgagent?

I have no idea.

That can be answered by Dave only.

Dave?

Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

Thanks,
Rob

On 24 October 2017 at 16:05, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:
Hello again,

I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp#L139 or another method that only gives up the connection which is called in the same place as the delete jt.

Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.
Nothing wrong in it.
My intention was to release the object asap.

As per your findings, it won't go in that execution flow in all cases. I will commit your patch.
Committed your original patch.

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

-- Thanks, Ashesh

Many Thanks,
Rob

On 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hiya,

I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.

To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"

This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.

I hope that makes sense

Thanks,
Rob

On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:
Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)
{
delete job;
job = NULL;
}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,
Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

Re: [pgAgent][Patch] Fixing connection pool leak

From

Dave Page

Date:

25 October 2017, 14:52:29

On Wed, Oct 25, 2017 at 5:44 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:

On Tue, Oct 24, 2017 at 8:59 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Awesome thanks.

What's the release schedule like for pgagent?
I have no idea.
That can be answered by Dave only.

Dave?

Basically as soon as Neel's work is complete and we can update the packaging etc. and get it tested by Fahar.

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

Thanks,
Rob

On 24 October 2017 at 16:05, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:30 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 8:21 PM, Robert Emery <robertemery@codeweavers.net> wrote:
Hello again,

I can't see any other way of solving this problem without either exposing a method on JobThread to delete the Job owned by the thread which is called by https://github.com/postgres/pgagent/blob/master/pgAgent.cpp#L139 or another method that only gives up the connection which is called in the same place as the delete jt.

Is there a problem with chaining the ~JobThread() and ~Job() together as I suggested previously? This does solve the leak.
Nothing wrong in it.
My intention was to release the object asap.

As per your findings, it won't go in that execution flow in all cases. I will commit your patch.
Committed your original patch.

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

-- Thanks, Ashesh

Many Thanks,
Rob

On 23 October 2017 at 20:09, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hiya,

I've just confirmed neither of these patches resolve the issue, it appears in the error case I'm experiencing the JobThread::Entry() never executes.

To clarify I'm talking about the path entering the else in pgAgent.cpp:125 which will save into pga_joblog with jlgstatus = "i"

This path explicitly deletes jt however deleting jt doesn't clean up the connection, hence why I cascaded the delete in the original patch.

I hope that makes sense

Thanks,
Rob

On 23 October 2017 at 08:18, Neel Patel <neel.patel@enterprisedb.com> wrote:
Hi Ashesh,

I just added condition before delete the job otherwise it looks good. Correct me if I am wrong.

if (job != NULL)
{
delete job;
job = NULL;
}

I have created two instances of pgagent on database cluster. As of now not able to see any leak and keep you updated if found anything.

@Robert - Can you also test at your environment and keep us updated for any leak ?

Thanks,
Neel Patel

On Mon, Oct 23, 2017 at 10:30 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
Hi Rob,

How about this?

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company

http://www.linkedin.com/in/asheshvashi

On Sat, Oct 21, 2017 at 8:36 PM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi,

Following on from https://www.postgresql.org/message-id/CA%2BOCxoz4tONxSpd1rdU-9SPKRzucz8Bar2CXkEDnCwV6H77ZyA%40mail.gmail.com

I think I've identified and fixed the issue, please see the patch attached.

As I understand it when there are multiple pgagent instances and they clash executing a job (i.e rc != 1 on job.cpp:38), the loser of the conflict's thread will never be executed (i.e. job.cpp:418 JobThread::Entry), which is responsible for deleting the job owned by the thread, meaning that the connection is never returned to the pool. By moving the delete of the job into the destructor, we can assure that the connection is tidied up in both cases as the thread is deleted in the error case explicitly in pgAgent.cpp:185.

The only possibly unintended difference that I can see with doing this is that the log "Completed job: %s" is now output when before it wasn't, however I think this new behaviour is actually correct as the job object is completed at that time.

Thanks,
Rob

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

Codeweavers
October
Newsletter l Auto Trader extends partnership with Codeweavers

What are Codeweavers doing to gear up for GDPR?

Phone: 0800 021 0888   Email: contactus@codeweavers.net
Codeweavers Ltd | Barn 4 | Dunston Business Village | Dunston | ST18 9AB
Registered in England and Wales No. 04092394 | VAT registration no. 974 9705 63

Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company