Thread: pgAgent STDERR and Time Zone Questions
Howdy, I have a few questions about pgAgent, which I’ve just set up and am using at work. • I’m using -s to send errors to a log file. However, it only seems to log STDOUT, not STDERR. Is that right? Do I need tomodify my batch scripts to 2>&1 to get STDERR output? • When scheduling execution times in pgAdmin, what time zone does it use? And what time zone does pgagent use to determinewhen it's time to run a job? And on a side note: What are the chance the pgAgent daemon could be modified so that it does not use the wx library. Seemskind of silly to include a GUI library in a daemon, though I don’t doubt that it makes windows compatibility easier. Thanks! David
On Dec 16, 2011, at 9:47 AM, David E. Wheeler wrote:

I have a few questions about pgAgent, which I’ve just set up and am using at work.
Oh, one other question: If the exit code of an executed batch script is not 0, why is its status "Successful" in pgAdmin? Screen shot:

Thanks,
David
Attachment
On Dec 16, 2011, at 10:22 AM, David E. Wheeler wrote: > > Oh, one other question: If the exit code of an executed batch script is not 0, why is its status "Successful" in pgAdmin?Screen shot: FYI, I have "On error" set to "Fail". David
On Fri, Dec 16, 2011 at 5:47 PM, David E. Wheeler <david@justatheory.com> wrote: > Howdy, > > I have a few questions about pgAgent, which I’ve just set up and am using at work. > > • I’m using -s to send errors to a log file. However, it only seems to log STDOUT, not STDERR. Is that right? Do I needto modify my batch scripts to 2>&1 to get STDERR output? All of the internal logging goes to the file if specified, or stdout, with the exception of a couple of serious errors, like "failed to open the logfile". wxWidgets might be writing some of it's own logging output to stderr though I guess. > • When scheduling execution times in pgAdmin, what time zone does it use? It doesn't, as you don't typically specify absolute dates in schedule (except sometimes for exceptions and start/end dates - then it'll default to whatever the connection will use) > And what time zone does pgagent use to determine when it's time to run a job? Whatever the default is for the client that causes the execution time to be recalculated. In truth though, when this was originally written I don't think anyone really considered different timezones. Patches are welcome if you want to improve. > And on a side note: What are the chance the pgAgent daemon could be modified so that it does not use the wx library. Seemskind of silly to include a GUI library in a daemon, though I don’t doubt that it makes windows compatibility easier. wxWidgets isn't just a GUI library - it consists of various individual libraries offering different features, some of which are GUI related, and some are not. pgAgent doesn't use any of the GUI ones, so there's no requirement to install any GUI components on your server (unless your chosen packaging for wxWidgets doesn't separate the GUI and non-GUI parts of course). -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 16, 2011 at 6:27 PM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 16, 2011, at 10:22 AM, David E. Wheeler wrote: > >> >> Oh, one other question: If the exit code of an executed batch script is not 0, why is its status "Successful" in pgAdmin?Screen shot: > > FYI, I have "On error" set to "Fail". What's actually in pgagent.pga_jobsteplog? -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Dec 19, 2011, at 1:51 AM, Dave Page wrote: > All of the internal logging goes to the file if specified, or stdout, > with the exception of a couple of serious errors, like "failed to open > the logfile". wxWidgets might be writing some of it's own logging > output to stderr though I guess. No, I mean that STDERR output from my batch scripts does not appear to be logged anywhere. Is there any way for pgadmin tocapture output from batch scripts and log it somewhere, or do they need to do their own logging? Would be handy for such output to appear somewhere in the job history… >> • When scheduling execution times in pgAdmin, what time zone does it use? > > It doesn't, as you don't typically specify absolute dates in schedule > (except sometimes for exceptions and > start/end dates - then it'll default to whatever the connection will use) > >> And what time zone does pgagent use to determine when it's time to run a job? > > Whatever the default is for the client that causes the execution time > to be recalculated. > > In truth though, when this was originally written I don't think anyone > really considered different timezones. Patches are welcome if you want > to improve. Yes, because consider this: if I schedule a job for 1:06 am pgAdmin on my laptop, which is running under America/Los_Angeles,PostgreSQL is running with time zone to UTC, and the system where pgadmin is running is on America/New_York,at which of these times will it execute? a. 1:06 UTC b. 1:06 America/New_York c. 1:06 America/Los_Angeles From what you’re telling me, it sounds like the answer is b. If so, I think it might be sufficient for the UI to say somethinglike “Jobs will be run at the specified time(s) in the time zone under which the client (pgagent) runs. >> And on a side note: What are the chance the pgAgent daemon could be modified so that it does not use the wx library. Seemskind of silly to include a GUI library in a daemon, though I don’t doubt that it makes windows compatibility easier. > > wxWidgets isn't just a GUI library - it consists of various individual > libraries offering different features, some of which are GUI related, > and some are not. pgAgent doesn't use any of the GUI ones, so there's > no requirement to install any GUI components on your server (unless > your chosen packaging for wxWidgets doesn't separate the GUI and > non-GUI parts of course). Okay. Pity it’s not 64-bit yet, though. Best, David
On Dec 19, 2011, at 1:52 AM, Dave Page wrote:
FYI, I have "On error" set to "Fail".
What's actually in pgagent.pga_jobsteplog?
postgres=# select * from pgagent.pga_jobsteplog where jslresult <> 0;
jslid | jsljlgid | jsljstid | jslstatus | jslresult | jslstart | jslduration | jsloutput
-------+----------+----------+-----------+-----------+-------------------------------+-----------------+-----------
1 | 1 | 1 | s | 127 | 2011-12-15 16:49:25.038705-08 | 00:00:00.024641 |
2 | 2 | 1 | s | 15 | 2011-12-15 17:23:03.637553-08 | 00:00:02.496322 |
3 | 3 | 1 | s | 15 | 2011-12-15 17:25:58.995506-08 | 00:00:00.281899 |
6 | 6 | 3 | s | 5 | 2011-12-16 10:46:46.047913-08 | 00:00:00.265507 |
10 | 10 | 7 | s | 5 | 2011-12-16 15:31:34.899431-08 | 00:00:00.071114 |
11 | 11 | 7 | s | 5 | 2011-12-16 15:38:30.729017-08 | 00:00:00.071268 |
12 | 12 | 7 | s | 5 | 2011-12-16 15:39:00.797969-08 | 00:00:00.073382 |
(7 rows)
Best,
David
On Mon, Dec 19, 2011 at 5:50 PM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 19, 2011, at 1:51 AM, Dave Page wrote: > >> All of the internal logging goes to the file if specified, or stdout, >> with the exception of a couple of serious errors, like "failed to open >> the logfile". wxWidgets might be writing some of it's own logging >> output to stderr though I guess. > > No, I mean that STDERR output from my batch scripts does not appear to be logged anywhere. Is there any way for pgadminto capture output from batch scripts and log it somewhere, or do they need to do their own logging? > > Would be handy for such output to appear somewhere in the job history… Oh, OK. We use popen() to execute the task on *nix, and that only reads STDOUT, so you should handle redirection in your script. I did spend some time look at this a few years back as I recall, and didn't find a fix I was happy with. >> wxWidgets isn't just a GUI library - it consists of various individual >> libraries offering different features, some of which are GUI related, >> and some are not. pgAgent doesn't use any of the GUI ones, so there's >> no requirement to install any GUI components on your server (unless >> your chosen packaging for wxWidgets doesn't separate the GUI and >> non-GUI parts of course). > > Okay. Pity it’s not 64-bit yet, though. It is on platforms other than Mac (which will add 64bit support in wxWidgets 3.0). -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 19, 2011 at 5:54 PM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 19, 2011, at 1:52 AM, Dave Page wrote: > > FYI, I have "On error" set to "Fail". > > > What's actually in pgagent.pga_jobsteplog? > > > postgres=# select * from pgagent.pga_jobsteplog where jslresult <> 0; Can you show me that without the where clause please? If there's too much data, please filter by date to exclude lines that aren't on your original screenshot. Thanks. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Dec 20, 2011, at 12:12 AM, Dave Page wrote: >> Would be handy for such output to appear somewhere in the job history… > > Oh, OK. We use popen() to execute the task on *nix, and that only > reads STDOUT, so you should handle redirection in your script. I did > spend some time look at this a few years back as I recall, and didn't > find a fix I was happy with. So it will pick up the output to STDOUT? That’s fine, I think I can tweak the scripts to send STDERR to STDOUT. >> Okay. Pity it’s not 64-bit yet, though. > > It is on platforms other than Mac (which will add 64bit support in > wxWidgets 3.0). Oh, good news, other platforms can’t be too far behind, right? Best, David
On Dec 20, 2011, at 12:14 AM, Dave Page wrote:
postgres=# select * from pgagent.pga_jobsteplog where jslresult <> 0;
Can you show me that without the where clause please? If there's too
much data, please filter by date to exclude lines that aren't on your
original screenshot.
postgres=# select * from pgagent.pga_jobsteplog; jslid | jsljlgid | jsljstid | jslstatus | jslresult | jslstart | jslduration | jsloutput -------+----------+----------+-----------+-----------+-------------------------------+-----------------+----------- 1 | 1 | 1 | s | 127 | 2011-12-15 16:49:25.038705-08 | 00:00:00.024641 | 2 | 2 | 1 | s | 15 | 2011-12-15 17:23:03.637553-08 | 00:00:02.496322 | 3 | 3 | 1 | s | 15 | 2011-12-15 17:25:58.995506-08 | 00:00:00.281899 | 4 | 4 | 1 | s | 0 | 2011-12-15 17:35:00.074975-08 | 00:00:48.599992 | 5 | 5 | 1 | s | 0 | 2011-12-16 01:06:00.771713-08 | 00:00:49.709092 | 6 | 6 | 3 | s | 5 | 2011-12-16 10:46:46.047913-08 | 00:00:00.265507 | 7 | 7 | 3 | s | 0 | 2011-12-16 10:50:31.504398-08 | 00:00:00.490332 | 9 | 9 | 7 | s | 0 | 2011-12-16 15:28:24.512434-08 | 00:00:00.839441 | 10 | 10 | 7 | s | 5 | 2011-12-16 15:31:34.899431-08 | 00:00:00.071114 | 11 | 11 | 7 | s | 5 | 2011-12-16 15:38:30.729017-08 | 00:00:00.071268 | 12 | 12 | 7 | s | 5 | 2011-12-16 15:39:00.797969-08 | 00:00:00.073382 | 13 | 13 | 7 | s | 0 | 2011-12-16 15:39:55.914821-08 | 00:00:00.265437 | 14 | 14 | 1 | s | 0 | 2011-12-17 01:06:03.390182-08 | 00:01:00.365723 | 15 | 15 | 1 | s | 0 | 2011-12-18 01:06:04.226371-08 | 00:01:01.670605 | 16 | 16 | 1 | s | 0 | 2011-12-19 01:06:04.86558-08 | 00:01:02.095472 | 17 | 17 | 1 | s | 0 | 2011-12-20 01:06:00.534551-08 | 00:12:44.113144 | (16 rows)
Best,
David
On Wed, Dec 21, 2011 at 2:00 AM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 20, 2011, at 12:12 AM, Dave Page wrote: > >>> Would be handy for such output to appear somewhere in the job history… >> >> Oh, OK. We use popen() to execute the task on *nix, and that only >> reads STDOUT, so you should handle redirection in your script. I did >> spend some time look at this a few years back as I recall, and didn't >> find a fix I was happy with. > > So it will pick up the output to STDOUT? That’s fine, I think I can tweak the scripts to send STDERR to STDOUT. Yup, that should work. >>> Okay. Pity it’s not 64-bit yet, though. >> >> It is on platforms other than Mac (which will add 64bit support in >> wxWidgets 3.0). > > Oh, good news, other platforms can’t be too far behind, right? I think you misread. As far as I know, it's *only* Mac OS X which isn't supported by wxWidgets in 64 bit builds. Windows, Linux, Solaris and HP-UX at least are all supported. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Dec 21, 2011 at 2:02 AM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 20, 2011, at 12:14 AM, Dave Page wrote: > > postgres=# select * from pgagent.pga_jobsteplog where jslresult <> 0; > > > Can you show me that without the where clause please? If there's too > much data, please filter by date to exclude lines that aren't on your > original screenshot. > > > postgres=# select * from pgagent.pga_jobsteplog; jslid | jsljlgid | jsljstid > | jslstatus | jslresult | jslstart | jslduration | jsloutput > -------+----------+----------+-----------+-----------+-------------------------------+-----------------+----------- > 1 | 1 | 1 | s | 127 | 2011-12-15 16:49:25.038705-08 | 00:00:00.024641 | 2 | > 2 | 1 | s | 15 | 2011-12-15 17:23:03.637553-08 | 00:00:02.496322 | 3 | 3 | 1 > | s | 15 | 2011-12-15 17:25:58.995506-08 | 00:00:00.281899 | 4 | 4 | 1 | s | > 0 | 2011-12-15 17:35:00.074975-08 | 00:00:48.599992 | 5 | 5 | 1 | s | 0 | > 2011-12-16 01:06:00.771713-08 | 00:00:49.709092 | 6 | 6 | 3 | s | 5 | > 2011-12-16 10:46:46.047913-08 | 00:00:00.265507 | 7 | 7 | 3 | s | 0 | > 2011-12-16 10:50:31.504398-08 | 00:00:00.490332 | 9 | 9 | 7 | s | 0 | > 2011-12-16 15:28:24.512434-08 | 00:00:00.839441 | 10 | 10 | 7 | s | 5 | > 2011-12-16 15:31:34.899431-08 | 00:00:00.071114 | 11 | 11 | 7 | s | 5 | > 2011-12-16 15:38:30.729017-08 | 00:00:00.071268 | 12 | 12 | 7 | s | 5 | > 2011-12-16 15:39:00.797969-08 | 00:00:00.073382 | 13 | 13 | 7 | s | 0 | > 2011-12-16 15:39:55.914821-08 | 00:00:00.265437 | 14 | 14 | 1 | s | 0 | > 2011-12-17 01:06:03.390182-08 | 00:01:00.365723 | 15 | 15 | 1 | s | 0 | > 2011-12-18 01:06:04.226371-08 | 00:01:01.670605 | 16 | 16 | 1 | s | 0 | > 2011-12-19 01:06:04.86558-08 | 00:01:02.095472 | 17 | 17 | 1 | s | 0 | > 2011-12-20 01:06:00.534551-08 | 00:12:44.113144 | (16 rows) Thanks. The logic used to determine the success or failure of a step is pretty simple: if (rc == 0) stepstatus = wxT("s"); else stepstatus= steps->GetString(wxT("jstonerror")); Which is the reason why you see 0's show up as "success". The non-zero exit codes will show the same status as the pgagent.pga_jobstep.jstonerror column holds for the step in question. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Dec 21, 2011, at 1:50 AM, Dave Page wrote: >> Oh, good news, other platforms can’t be too far behind, right? > > I think you misread. As far as I know, it's *only* Mac OS X which > isn't supported by wxWidgets in 64 bit builds. Windows, Linux, Solaris > and HP-UX at least are all supported. Oh. Devrim told me that Wx was 32-bit only. Devrim, is it possible to update the RPMs to use the 64-bit Wx? Thanks, David
On Dec 21, 2011, at 1:57 AM, Dave Page wrote:

Thanks. The logic used to determine the success or failure of a step
is pretty simple:
if (rc == 0)
stepstatus = wxT("s");
else
stepstatus = steps->GetString(wxT("jstonerror"));
Which is the reason why you see 0's show up as "success". The non-zero
exit codes will show the same status as the
pgagent.pga_jobstep.jstonerror column holds for the step in question.
The problem is not that 0 shows up as "Success", which I expect. The problem is that the non-0 does, too. This is in the Step Statistics view:

Note that Run 1, with an exit code of 127, is listed as "Successful". That does not seem right to me. The record for that row is:
postgres=# select * from pgagent.pga_jobsteplog where jslid = 1;
-[ RECORD 1 ]------------------------------
jslid | 1
jsljlgid | 1
jsljstid | 1
jslstatus | s
jslresult | 127
jslstart | 2011-12-15 16:49:25.038705-08
jslduration | 00:00:00.024641
jsloutput |
-[ RECORD 1 ]------------------------------
jslid | 1
jsljlgid | 1
jsljstid | 1
jslstatus | s
jslresult | 127
jslstart | 2011-12-15 16:49:25.038705-08
jslduration | 00:00:00.024641
jsloutput |
Which makes me think that either `rc`is not what gets put into jsresult, or that `steps->GetString(wxT("jstonerror"))` is returning "s", too. Or that I’m completely misunderstanding something, of course. :-)
Best,
David
Attachment
On Wed, Dec 21, 2011 at 10:08 PM, David E. Wheeler <david@justatheory.com> wrote:
On Dec 21, 2011, at 1:57 AM, Dave Page wrote:Thanks. The logic used to determine the success or failure of a step
is pretty simple:
if (rc == 0)
stepstatus = wxT("s");
else
stepstatus = steps->GetString(wxT("jstonerror"));
Which is the reason why you see 0's show up as "success". The non-zero
exit codes will show the same status as the
pgagent.pga_jobstep.jstonerror column holds for the step in question.The problem is not that 0 shows up as "Success", which I expect. The problem is that the non-0 does, too. This is in the Step Statistics view:Note that Run 1, with an exit code of 127, is listed as "Successful". That does not seem right to me. The record for that row is:postgres=# select * from pgagent.pga_jobsteplog where jslid = 1;
-[ RECORD 1 ]------------------------------
jslid | 1
jsljlgid | 1
jsljstid | 1
jslstatus | s
jslresult | 127
jslstart | 2011-12-15 16:49:25.038705-08
jslduration | 00:00:00.024641
jsloutput |Which makes me think that either `rc`is not what gets put into jsresult, or that `steps->GetString(wxT("jstonerror"))` is returning "s", too. Or that I’m completely misunderstanding something, of course. :-)
What's in pgagent.pga_jobstep where jstid in (1, 3, 7)?
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
On Dec 22, 2011, at 1:44 AM, Dave Page wrote: > Which makes me think that either `rc`is not what gets put into jsresult, or that `steps->GetString(wxT("jstonerror"))`is returning "s", too. Or that I’m completely misunderstanding something, of course.:-) > > What's in pgagent.pga_jobstep where jstid in (1, 3, 7)? postgres=# select * from pgagent.pga_jobstep where jstid IN (1, 3, 7); -[ RECORD 1 ]-------------------------------------------------------------------------- jstid | 1 jstjobid | 1 jstname | Daily Liberation jstdesc | jstenabled | t jstkind | b jstcode | analytics-datamart/bin/daily_liberator jstconnstr | jstdbname | jstonerror | f jscnextrun | [null] -[ RECORD 2 ]-------------------------------------------------------------------------- jstid | 3 jstjobid | 2 jstname | Liberate subscriber_evid_override jstdesc | jstenabled | t jstkind | b jstcode | analytics-datamart/bin/liberate_once proreporting subscriber_evid_override jstconnstr | jstdbname | jstonerror | f jscnextrun | [null] -[ RECORD 3 ]-------------------------------------------------------------------------- jstid | 7 jstjobid | 4 jstname | Copy flat_evid_item jstdesc | jstenabled | t jstkind | b jstcode | analytics-datamart/bin/liberate_once proreporting flat_evid_type jstconnstr | jstdbname | jstonerror | f jscnextrun | [null] Best, David
On Thu, Dec 22, 2011 at 5:44 PM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 22, 2011, at 1:44 AM, Dave Page wrote: > >> Which makes me think that either `rc`is not what gets put into jsresult, or that `steps->GetString(wxT("jstonerror"))`is returning "s", too. Or that I’m completely misunderstanding something, of course.:-) >> >> What's in pgagent.pga_jobstep where jstid in (1, 3, 7)? > > postgres=# select * from pgagent.pga_jobstep where jstid IN (1, 3, 7); > -[ RECORD 1 ]-------------------------------------------------------------------------- > jstid | 1 > jstjobid | 1 > jstname | Daily Liberation > jstdesc | > jstenabled | t > jstkind | b > jstcode | analytics-datamart/bin/daily_liberator > jstconnstr | > jstdbname | > jstonerror | f > jscnextrun | [null] > -[ RECORD 2 ]-------------------------------------------------------------------------- > jstid | 3 > jstjobid | 2 > jstname | Liberate subscriber_evid_override > jstdesc | > jstenabled | t > jstkind | b > jstcode | analytics-datamart/bin/liberate_once proreporting subscriber_evid_override > jstconnstr | > jstdbname | > jstonerror | f > jscnextrun | [null] > -[ RECORD 3 ]-------------------------------------------------------------------------- > jstid | 7 > jstjobid | 4 > jstname | Copy flat_evid_item > jstdesc | > jstenabled | t > jstkind | b > jstcode | analytics-datamart/bin/liberate_once proreporting flat_evid_type > jstconnstr | > jstdbname | > jstonerror | f > jscnextrun | [null] Hmm, that looks like it should work. I've just run some tests here, and can't find anything wrong. Successes are reported as such, and non-zero return values are reported as whatever the "on error" setting says they should be. Subsequent steps in a job are either processed or skipped correctly based on the return value and on error setting of the previous step(s). postgres=# postgres=# select * from pgagent.pga_jobsteplog;jslid | jsljlgid | jsljstid | jslstatus | jslresult | jslstart | jslduration | jsloutput -------+----------+----------+-----------+-----------+-------------------------------+-----------------+------------------------ 1 | 1 | 1 | s | 127 | 2011-12-23 10:14:37.108296+00 | 00:00:00.182375 | 2 | 2 | 1 | s | 127 | 2011-12-23 10:15:02.155214+00 | 00:00:00.052011 | 3 | 3 | 1 | f | 127 | 2011-12-23 10:16:02.334791+00 | 00:00:00.0732 | 4 | 4 | 1 | f | 127 | 2011-12-23 10:17:02.514272+00 | 00:00:00.113905 | 5 | 5 | 1 | f | 127 | 2011-12-23 10:18:02.645482+00 | 00:00:00.087998 | This will be an error. 6 | 6 | 1 | f | 127 | 2011-12-23 10:19:02.780288+00 | 00:00:00.08348 | This will be an error. 7 | 7 | 1 | i | 127 | 2011-12-23 10:20:02.903824+00 | 00:00:00.201093 | This will be an error. 8 | 7 | 3 | s | 0 | 2011-12-23 10:20:03.106797+00 | 00:00:00.004515 | This will be a success 9 | 8 | 1 | i | 127 | 2011-12-23 10:21:03.088051+00 | 00:00:00.006627 | This will be an error. 10 | 8 | 3 | s | 0 | 2011-12-23 10:21:03.096869+00 | 00:00:00.006623 | This will be a success (10 rows) Is it possible you've got the config right now, but the logs you posted were from a different config? -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Dec 23, 2011, at 2:25 AM, Dave Page wrote: > Hmm, that looks like it should work. I've just run some tests here, > and can't find anything wrong. Successes are reported as such, and > non-zero return values are reported as whatever the "on error" setting > says they should be. Subsequent steps in a job are either processed or > skipped correctly based on the return value and on error setting of > the previous step(s). > > postgres=# postgres=# select * from pgagent.pga_jobsteplog; > jslid | jsljlgid | jsljstid | jslstatus | jslresult | > jslstart | jslduration | jsloutput > -------+----------+----------+-----------+-----------+-------------------------------+-----------------+------------------------ > 1 | 1 | 1 | s | 127 | 2011-12-23 > 10:14:37.108296+00 | 00:00:00.182375 | > 2 | 2 | 1 | s | 127 | 2011-12-23 > 10:15:02.155214+00 | 00:00:00.052011 | Should those 127s be "s"? > 3 | 3 | 1 | f | 127 | 2011-12-23 > 10:16:02.334791+00 | 00:00:00.0732 | > 4 | 4 | 1 | f | 127 | 2011-12-23 > 10:17:02.514272+00 | 00:00:00.113905 | > 5 | 5 | 1 | f | 127 | 2011-12-23 > 10:18:02.645482+00 | 00:00:00.087998 | This will be an error. > 6 | 6 | 1 | f | 127 | 2011-12-23 > 10:19:02.780288+00 | 00:00:00.08348 | This will be an error. > 7 | 7 | 1 | i | 127 | 2011-12-23 > 10:20:02.903824+00 | 00:00:00.201093 | This will be an error. > 8 | 7 | 3 | s | 0 | 2011-12-23 > 10:20:03.106797+00 | 00:00:00.004515 | This will be a success > 9 | 8 | 1 | i | 127 | 2011-12-23 > 10:21:03.088051+00 | 00:00:00.006627 | This will be an error. > 10 | 8 | 3 | s | 0 | 2011-12-23 > 10:21:03.096869+00 | 00:00:00.006623 | This will be a success > (10 rows) > > Is it possible you've got the config right now, but the logs you > posted were from a different config? Haven’t changed anything. I believe it’s the default configuration, as installed by Devrim’s new RPMs. Is there somethingelse I should show you? Best, David
<br /><br />On Friday, December 23, 2011, David E. Wheeler <<a href="mailto:david@justatheory.com">david@justatheory.com</a>>wrote:<br />> On Dec 23, 2011, at 2:25 AM, Dave Pagewrote:<br />><br />>> Hmm, that looks like it should work. I've just run some tests here,<br /> >> andcan't find anything wrong. Successes are reported as such, and<br />>> non-zero return values are reported as whateverthe "on error" setting<br />>> says they should be. Subsequent steps in a job are either processed or<br />>> skipped correctly based on the return value and on error setting of<br />>> the previous step(s).<br />>><br/>>> postgres=# postgres=# select * from pgagent.pga_jobsteplog;<br />>> jslid | jsljlgid | jsljstid| jslstatus | jslresult |<br /> >> jslstart | jslduration | jsloutput<br />>> -------+----------+----------+-----------+-----------+-------------------------------+-----------------+------------------------<br />>> 1 | 1 | 1 | s | 127 | 2011-12-23<br /> >> 10:14:37.108296+00 | 00:00:00.182375|<br />>> 2 | 2 | 1 | s | 127 | 2011-12-23<br />>> 10:15:02.155214+00| 00:00:00.052011 |<br />><br />> Should those 127s be "s"?<br /><br />No, I had it set to ignorefailures then.<br /><br />>> 3 | 3 | 1 | f | 127 | 2011-12-23<br />>> 10:16:02.334791+00| 00:00:00.0732 |<br />>> 4 | 4 | 1 | f | 127 | 2011-12-23<br />>> 10:17:02.514272+00 | 00:00:00.113905 |<br />>> 5 | 5 | 1 | f | 127 | 2011-12-23<br/>>> 10:18:02.645482+00 | 00:00:00.087998 | This will be an error.<br />>> 6 | 6 | 1 | f | 127 | 2011-12-23<br /> >> 10:19:02.780288+00 | 00:00:00.08348 | This will be an error.<br/>>> 7 | 7 | 1 | i | 127 | 2011-12-23<br />>> 10:20:02.903824+00 | 00:00:00.201093| This will be an error.<br /> >> 8 | 7 | 3 | s | 0 | 2011-12-23<br/>>> 10:20:03.106797+00 | 00:00:00.004515 | This will be a success<br />>> 9 | 8 | 1 | i | 127 | 2011-12-23<br /> >> 10:21:03.088051+00 | 00:00:00.006627 | This will be an error.<br/>>> 10 | 8 | 3 | s | 0 | 2011-12-23<br />>> 10:21:03.096869+00 | 00:00:00.006623| This will be a success<br /> >> (10 rows)<br />>><br />>> Is it possible you've got theconfig right now, but the logs you<br />>> posted were from a different config?<br />><br />> Haven’t changedanything. I believe it’s the default configuration, as installed by Devrim’s new RPMs. Is there something else I shouldshow you?<br /><br />Everything looks fine. Care to try some debugging?<br /><br />-- <br />Dave Page<br />Blog: <ahref="http://pgsnake.blogspot.com" target="_blank">http://pgsnake.blogspot.com</a><br />Twitter: @pgsnake<br /><br />EnterpriseDBUK: <a href="http://www.enterprisedb.com" target="_blank">http://www.enterprisedb.com</a><br /> The EnterprisePostgreSQL Company<br /><br />
On Dec 23, 2011, at 9:19 AM, Dave Page wrote: > Everything looks fine. Care to try some debugging? Sure. Tell me what to do. David
On Fri, Dec 23, 2011 at 5:24 PM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 23, 2011, at 9:19 AM, Dave Page wrote: > >> Everything looks fine. Care to try some debugging? > > Sure. Tell me what to do. OK, well I won't try to get you debugging wxWidgets code in GDB as that's just painful, so please build from source having applied the attached patch and then test your jobs with the log level set to DEBUG and capture the output so we can compare it with what ends up in the pga_jobsteplog table. Don't throw away the build env when you're done - we may need to do some more debugging later. Note that the patch was written against GIT master. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Dec 23, 2011, at 9:33 AM, Dave Page wrote: > OK, well I won't try to get you debugging wxWidgets code in GDB as > that's just painful, so please build from source having applied the > attached patch and then test your jobs with the log level set to DEBUG > and capture the output so we can compare it with what ends up in the > pga_jobsteplog table. Don't throw away the build env when you're done > - we may need to do some more debugging later. > > Note that the patch was written against GIT master. Thanks. This will probably have to wait until next week, as I’m a bit stretched between family, travel, and $work. Best, David
On Dec 20, 2011, at 6:00 PM, David E. Wheeler wrote: >> Oh, OK. We use popen() to execute the task on *nix, and that only >> reads STDOUT, so you should handle redirection in your script. I did >> spend some time look at this a few years back as I recall, and didn't >> find a fix I was happy with. > > So it will pick up the output to STDOUT? That’s fine, I think I can tweak the scripts to send STDERR to STDOUT. Unfortunately, it does not seem to work. I have pgagent started like so: pgagent -s /var/log/pgagent_91.log hostaddr=127.0.0.1 dbname=postgres user=postgres /var/log/pgagent_91.log does exist, and has nothing in it at all. So I created this shell script: #!/bin/bash echo Hi there perl -e 'die "WTF!"' || exit $? 2>&1 I set up a job to run it. The results were: postgres=# select * from pgagent.pga_jobsteplog where jsljlgid = 46; -[ RECORD 1 ]------------------------------ jslid | 45 jsljlgid | 46 jsljstid | 13 jslstatus | f jslresult | -1 jslstart | 2012-01-02 10:59:32.606959-08 jslduration | 00:00:00.01009 jsloutput | Hi there | WTF! at -e line 1. | Which is great. However, the /var/log/pgagent_91.log file is still empty. Should I take that to mean that job output is notlogged there, but only errors from pgagent itself? Thanks, David
On Mon, Jan 2, 2012 at 7:04 PM, David E. Wheeler <david@justatheory.com> wrote: > On Dec 20, 2011, at 6:00 PM, David E. Wheeler wrote: > >>> Oh, OK. We use popen() to execute the task on *nix, and that only >>> reads STDOUT, so you should handle redirection in your script. I did >>> spend some time look at this a few years back as I recall, and didn't >>> find a fix I was happy with. >> >> So it will pick up the output to STDOUT? That’s fine, I think I can tweak the scripts to send STDERR to STDOUT. > > Unfortunately, it does not seem to work. I have pgagent started like so: > > pgagent -s /var/log/pgagent_91.log hostaddr=127.0.0.1 dbname=postgres user=postgres You need to increase the log level, eg. pgagent -l DEBUG2 -s /var/log/pgagent_91.log hostaddr=127.0.0.1 dbname=postgres user=postgres > /var/log/pgagent_91.log does exist, and has nothing in it at all. > > So I created this shell script: > > #!/bin/bash > > echo Hi there > perl -e 'die "WTF!"' || exit $? 2>&1 > > I set up a job to run it. The results were: > > postgres=# select * from pgagent.pga_jobsteplog where jsljlgid = 46; > -[ RECORD 1 ]------------------------------ > jslid | 45 > jsljlgid | 46 > jsljstid | 13 > jslstatus | f > jslresult | -1 > jslstart | 2012-01-02 10:59:32.606959-08 > jslduration | 00:00:00.01009 > jsloutput | Hi there > | WTF! at -e line 1. > | > > Which is great. However, the /var/log/pgagent_91.log file is still empty. Should I take that to mean that job output isnot logged there, but only errors from pgagent itself? Yes, it's really just a debug log. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan 3, 2012, at 1:07 AM, Dave Page wrote: >> Which is great. However, the /var/log/pgagent_91.log file is still empty. Should I take that to mean that job output isnot logged there, but only errors from pgagent itself? > > Yes, it's really just a debug log. Gotcha, thank you. David
On Dec 28, 2011, at 2:08 PM, David E. Wheeler wrote: >> OK, well I won't try to get you debugging wxWidgets code in GDB as >> that's just painful, so please build from source having applied the >> attached patch and then test your jobs with the log level set to DEBUG >> and capture the output so we can compare it with what ends up in the >> pga_jobsteplog table. Don't throw away the build env when you're done >> - we may need to do some more debugging later. >> >> Note that the patch was written against GIT master. > > Thanks. This will probably have to wait until next week, as I’m a bit stretched between family, travel, and $work. Finally got back to this. Naturally, the car works perfectly when it’s in the shop. I installed the pgrpms version of pgAgent, which Devrim uploaded a few weeks ago. http://yum.pgrpms.org/9.1/redhat/rhel-5-i386/pgagent_91-3.0.1-1.rhel5.i386.rpm With that, I can create a job with a single batch step. All it does is this: perl -e 'die "WTF"' That results in: postgres=# select * from pgagent.pga_jobsteplog where jslid = 8; -[ RECORD 1 ]----------------------------- jslid | 8 jsljlgid | 8 jsljstid | 1 jslstatus | f jslresult | -1 jslstart | 2012-01-10 14:28:26.95867-08 jslduration | 00:00:00.011809 jsloutput | Which seems fine. Then, with the patched version from SVN, I fire it up and run the same job and get: postgres=# select * from pgagent.pga_jobsteplog where jslid = 9; -[ RECORD 1 ]------------------------------ jslid | 9 jsljlgid | 9 jsljstid | 1 jslstatus | f jslresult | 255 jslstart | 2012-01-10 14:34:05.243314-08 jslduration | 00:00:00.008179 jsloutput | Which is different, but at least still a failure. (Nothing was logged; I ran it with `/usr/bin/pgagent -s /home/dwheeler/pgagent.log-l DEBUG hostaddr=127.0.0.1 dbname=postgres user=postgres`.) Going back to the box that originally had this problem, which also uses Devrim’s RPM, I created a new test job with exactlythe same step as above. There I get: postgres=# select * from pgagent.pga_jobsteplog where jsljlgid = 155; -[ RECORD 1 ]------------------------------ jslid | 177 jsljlgid | 155 jsljstid | 21 jslstatus | s jslresult | 5 jslstart | 2012-01-10 22:18:42.995252+00 jslduration | 00:00:00.013335 Which just seems wrong. Both boxes are running CentOS 5.5 (Linux 2.6.18-194.el5). Might there be a difference in the versionof Wx installed or something to account for this? Thanks, David
2012/1/10 David E. Wheeler <david@justatheory.com>: > > Finally got back to this. Naturally, the car works perfectly when it’s in the shop. Of course - that's all part of the game :-) > I installed the pgrpms version of pgAgent, which Devrim uploaded a few weeks ago. > > http://yum.pgrpms.org/9.1/redhat/rhel-5-i386/pgagent_91-3.0.1-1.rhel5.i386.rpm > > With that, I can create a job with a single batch step. All it does is this: > > perl -e 'die "WTF"' > > That results in: > > postgres=# select * from pgagent.pga_jobsteplog where jslid = 8; > -[ RECORD 1 ]----------------------------- > jslid | 8 > jsljlgid | 8 > jsljstid | 1 > jslstatus | f > jslresult | -1 > jslstart | 2012-01-10 14:28:26.95867-08 > jslduration | 00:00:00.011809 > jsloutput | > > Which seems fine. Then, with the patched version from SVN, I fire it up and run the same job and get: Yup. > postgres=# select * from pgagent.pga_jobsteplog where jslid = 9; > -[ RECORD 1 ]------------------------------ > jslid | 9 > jsljlgid | 9 > jsljstid | 1 > jslstatus | f > jslresult | 255 > jslstart | 2012-01-10 14:34:05.243314-08 > jslduration | 00:00:00.008179 > jsloutput | > > Which is different, but at least still a failure. (Nothing was logged; I ran it with `/usr/bin/pgagent -s /home/dwheeler/pgagent.log-l DEBUG hostaddr=127.0.0.1 dbname=postgres user=postgres`.) Hmm, I suspect the difference in return values is as a result of this: raptor:pgagent dpage$ git show 2aec3bc473e583f8fa07e133bc8def60ff1c09fe commit 2aec3bc473e583f8fa07e133bc8def60ff1c09fe Author: Dave Page <dpage@pgadmin.org> Date: Mon Mar 14 08:58:43 2011 +0000 USe exit status macros to get the Unix exit code, and recognise all non-zero return values as failure codes. Per discussion with Korry Douglas and Robert Haas. diff --git a/job.cpp b/job.cpp index 0611c93..4cb3b8c 100644 --- a/job.cpp +++ b/job.cpp @@ -284,8 +284,11 @@ int Job::Execute() } rc=pclose(fp_script); - rc = (unsigned char)(rc >> 8); // The exit code is in the top 8 bits - rc = (signed char)rc; + + if (WIFEXITED(rc)) + rc = WEXITSTATUS(rc); + else + rc = -1;#endif // Delete the file/directory. If we fail, don't overwrite the script output in the log, just throw warnings. @@ -313,7 +316,7 @@ int Job::Execute() } wxString stepstatus; - if (rc >= 0) + if (rc == 0) stepstatus = wxT("s"); else stepstatus = steps->GetString(wxT("jstonerror")); We haven't released a new version since then. I'm somewhat mystified about the lack of log data though - that's just plain odd. I assume it's not dumping it to stdout (which should only happen if a logfile isn't specified - which you seem to have done)? > Going back to the box that originally had this problem, which also uses Devrim’s RPM, I created a new test job with exactlythe same step as above. There I get: > > postgres=# select * from pgagent.pga_jobsteplog where jsljlgid = 155; > -[ RECORD 1 ]------------------------------ > jslid | 177 > jsljlgid | 155 > jsljstid | 21 > jslstatus | s > jslresult | 5 > jslstart | 2012-01-10 22:18:42.995252+00 > jslduration | 00:00:00.013335 > > Which just seems wrong. Both boxes are running CentOS 5.5 (Linux 2.6.18-194.el5). Might there be a difference in the versionof Wx installed or something to account for this? Are both systems 32bit? I can't imagine a wxWidgets version would cause such a difference - the code that handles the return values is mostly pure C++, and the wxWidgets code that is there is so common (things like wxT, wxString) that we'd see weirdness all over the place in a bunch of apps if that was broken. Devrim; any chance you could whip up a test RPM from GIT master for us please? -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan 11, 2012, at 12:17 AM, Dave Page wrote: > We haven't released a new version since then. I'm somewhat mystified > about the lack of log data though - that's just plain odd. I assume > it's not dumping it to stdout (which should only happen if a logfile > isn't specified - which you seem to have done)? Yes, the "WTF!" did get emitted to the terminal where I ran pgagent, presumably STDERR, though. >> Going back to the box that originally had this problem, which also uses Devrim’s RPM, I created a new test job with exactlythe same step as above. There I get: >> >> postgres=# select * from pgagent.pga_jobsteplog where jsljlgid = 155; >> -[ RECORD 1 ]------------------------------ >> jslid | 177 >> jsljlgid | 155 >> jsljstid | 21 >> jslstatus | s >> jslresult | 5 >> jslstart | 2012-01-10 22:18:42.995252+00 >> jslduration | 00:00:00.013335 >> >> Which just seems wrong. Both boxes are running CentOS 5.5 (Linux 2.6.18-194.el5). Might there be a difference in the versionof Wx installed or something to account for this? > > Are both systems 32bit? Both are 64 bit, identical CentOS 5.5 builds: Linux foo.example.com 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Linux bar.example.com 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux One is a VM, the other big iron. Shouldn’t make a difference, I wouldn’t think. > I can't imagine a wxWidgets version would > cause such a difference - the code that handles the return values is > mostly pure C++, and the wxWidgets code that is there is so common > (things like wxT, wxString) that we'd see weirdness all over the place > in a bunch of apps if that was broken. Yeah, it was a wild stab. I sure wish it would have been the same behavior on both boxes. *sigh* Maybe I can install the Git build on the production box that’s having the problems and squeeze in a few tests between hourlyruns of production code…I’ll try to do that today. Best, David
On Jan 11, 2012, at 8:54 AM, David E. Wheeler wrote: > Maybe I can install the Git build on the production box that’s having the problems and squeeze in a few tests between hourlyruns of production code…I’ll try to do that today. Done. Here's what I got with the RPM build running: postgres=# select * from pgagent.pga_jobsteplog where jsljlgid = 180; -[ RECORD 1 ]------------------------------ jslid | 225 jsljlgid | 180 jsljstid | 25 jslstatus | s jslresult | 5 jslstart | 2012-01-11 20:09:23.525634+00 jslduration | 00:00:00.010944 jsloutput | Not right. Then with the patched Git build: postgres=# select * from pgagent.pga_jobsteplog where jsljlgid = 181; -[ RECORD 1 ]------------------------------ jslid | 226 jsljlgid | 181 jsljstid | 25 jslstatus | f jslresult | 255 jslstart | 2012-01-11 20:17:50.985678+00 jslduration | 00:00:00.007648 jsloutput | Which looks just fine. The log file was empty again and the error sent to the terminal (STDERR, I'm sure). So either the changes in 2aec3bc473e583f8fa07e133bc8def60ff1c09fe or some other commit fixed the underlying problem, or somethingis wonky with the RPM. I can’t imagine what, though, because I build from Git exactly the same way as Devrim hasthings specified in the pgagent.spec. http://svn.pgrpms.org/browser/rpm/redhat/9.1/pgagent/EL-5/pgagent.spec I can leave the RPM build in place for the moment, but once I get some monitoring stuff written, I will either install anew RPM (assuming Devrim has created one) or I will build from Git and install it. Thanks, David
2012/1/11 David E. Wheeler <david@justatheory.com>: > > So either the changes in 2aec3bc473e583f8fa07e133bc8def60ff1c09fe or some other commit fixed the underlying problem, orsomething is wonky with the RPM. I can’t imagine what, though, because I build from Git exactly the same way as Devrimhas things specified in the pgagent.spec. > > http://svn.pgrpms.org/browser/rpm/redhat/9.1/pgagent/EL-5/pgagent.spec > > I can leave the RPM build in place for the moment, but once I get some monitoring stuff written, I will either installa new RPM (assuming Devrim has created one) or I will build from Git and install it. > OK, so to make things a little easier, I wrapped up and uploaded a 3.2.0 tarball. The only changes from what you've seen in GIT are copyright notice updates. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan 12, 2012, at 2:59 AM, Dave Page wrote: > OK, so to make things a little easier, I wrapped up and uploaded a > 3.2.0 tarball. The only changes from what you've seen in GIT are > copyright notice updates. Great, thank you. Now just have to set up a cron job to email Devrim once a day to ask for a new RPM. ;-P Best, David