Re: plperlu stored procedure seems to freeze for a minute - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: plperlu stored procedure seems to freeze for a minute |
Date | |
Msg-id | 565DE4D9.20604@aklaver.com Whole thread Raw |
In response to | Re: plperlu stored procedure seems to freeze for a minute ("Peter J. Holzer" <hjp-pgsql@hjp.at>) |
Responses |
Re: plperlu stored procedure seems to freeze for a minute
|
List | pgsql-general |
On 12/01/2015 09:58 AM, Peter J. Holzer wrote: > On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: >> On 12/01/2015 06:51 AM, Peter J. Holzer wrote: >>> A rather weird observation from the log files of our server (9.5 beta1): >>> >>> 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success inconcatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. >>> 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" >>> [lots of other stuff from different connections] >>> 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pmline 36. >>> WDS::Macrobond::Utils::decode_result("") called at line 30 >>> main::__ANON__("gen_wqehur") called at -e line 0 >>> eval {...} called at -e line 0 >>> >>> Two messages from the same line of the same plperlu stored procedure, 68 >>> seconds apart. So what is this line 36? >>> >>> confess "impossible result '$success' (payload=$payload)"; >>> >>> What? The first message clearly comes from interpolating $success >>> (which is undef at that point) into the argument. The second from >>> confess itself. What could cause a plperlu procedure to freeze for 68 >>> seconds between the call to confess and its output? >>> >>> Is it possible that only the writing of the log entry is delayed? >>> >>> Another weird thing: $success is undef because a ØMQ rpc call[1] timed >> >> And the call is? > > The sequence is: > > my $req_sck = zmq_socket($context, ZMQ_REQ); > zmq_connect($req_sck, $url); > my $qry_msg = join(...); > zmq_send($req_sck, $qry_msg); > my $res_msg = zmq_msg_init(); > my $rv = zmq_msg_recv($res_msg, $req_sck); > # check rv here. > my $data = zmq_msg_data($res_msg); # $data is "" here > my $result = WDS::Macrobond::Utils::decode_result($data); # the error messages are from this function > > (Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but > I'm not using it). > > I omitted that because I don't think it's terribly relevant here. > Details of the usage of ØMQ are better discussed on the ØMQ mailing > list. > > But there is something else which may be relevant: ØMQ uses threads > internally, and I don't actually know whether zmq_msg_recv returning Except I see this here: http://api.zeromq.org/4-0:zmq-socket Thread safety ØMQ sockets are not thread safe. Applications MUST NOT use a socket from multiple threads except after migrating a socket from one thread to another with a "full fence" memory barrier. > means that the read(2) call (or whatever) on the socket terminates. > It may actually continue in another thread. But I still don't see how > that could block the main thread (or wake it up again in a place which > has nothing to do with ØMQ (confess is a standard Perl function to print > a stack trace and die)). Or - just thinking aloud here - I fear I'm > abusing you guys as support teddy bears[1] - maybe it's the other way > round: confess dies, so maybe it frees some lock during cleanup which > allows the message which should have been sent by zmq_send to finally go > out on the wire. But that still doesn't explain the 68 seconds spent in > confess ... > > Postgres worker processes are single-threaded, are they? Is there > something else which could interact badly with a moderately complex > multithreaded I/O library used from a stored procedure? > > I suspect such an interaction because I cannot reproduce the problem > outside of a stored procedure. A standalone Perl script doing the same > requests doesn't get a timeout. > > I guess Alvaro is right: I should strace the postgres worker process > while it executes the stored procedure. The problem of course is that > it happens often enough be annoying, but rarely enough that it's not > easily reproducible. From here: http://api.zeromq.org/4-0:zmq-connect It seems something like(I am not a Perl programmer, so approach carefully): my $rc = zmq_connect($req_sck, $url); Then you will have an error code to examine. Have you looked at the Notes at the bottom of this page: http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html > >>> out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem >>> to have a default timeout of 60 seconds, and I don't set one). But at >> >> Network timeout? > > That was my first guess, but I don't see where it would come from. Or > why it only is there if I call the code from a stored procedure, not > from a standalone script. > >>> 09:24:45 (i.e. the time of the error message) the answer for that RPC >>> call arrived. So it kind of looks like confess waited for the message to >>> arrive (which makes no sense at all) or maybe that confess waited for >>> something which also blocked the sending of the request (because >>> according to the server logs, the RPC request only arrived there at >>> 09:24:45 and was answered within 1 second), but that doesn't make any >> >> So if the request timed out how did you get a reply, a second request? > > Nope. I don't really "get" the reply. I just see in the logs of the > other server that it sent a reply at that time. > > The time line is like this > > time postgres process mb_dal process > T zmq_send() > zmq_msg_recv() > > T+60 zmq_msg_recv returns > an empty message; > decode() is called > which notices that the > message is empty and > calls confess(); > > T+128 confess() prints a receives the message sent at T; > stacktrace; sends a reply; > > hp > > [1] For those who don't know the story: Someone at a help desk noticed > that people often started to explain their problem and somewhere in the > middle said "never mind, I think I know what the problem is" and went > away. So he put a big teddy bear on the desk and asked people to explain > their problems to the bear. So a support teddy bear is someone who you > talk to mostly to clear your own mind, not because you expect a > solution. > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: