Home > mailing lists

RE: FW: query pg_stat_ssl hang 100%cpu - Mailing list pgsql-bugs

From	James Pang (chaolpan)
Subject	RE: FW: query pg_stat_ssl hang 100%cpu
Date	September 7, 2023 08:54:09
Msg-id	PH0PR11MB5191FAF4D1FF00F918018F5ED6EEA@PH0PR11MB5191.namprd11.prod.outlook.com Whole thread Raw
In response to	RE: FW: query pg_stat_ssl hang 100%cpu ("James Pang (chaolpan)" <chaolpan@cisco.com>)
Responses	Re: FW: query pg_stat_ssl hang 100%cpu
List	pgsql-bugs

Tree view


   Yes, this backend has been always on same call stack tens of hours  and 100% cpu there.   It's still hang there now,
butI can not reproduce that in other similar environment.  I found this query start a transaction "xact_start" from "
2023-09-0302:36:23" from pg_stat_activity, no idea why.    

-----Original Message-----
From: Michael Paquier <michael@paquier.xyz>
Sent: Thursday, September 7, 2023 12:05 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: FW: query pg_stat_ssl hang 100%cpu

On Thu, Sep 07, 2023 at 01:35:00AM +0000, James Pang (chaolpan) wrote:
>     PGv14.8, OS RHEL8, no SSL enabled in this database, we have a
>     lot of client sessions who check it's ssl state by  query, all
>     other sessions got done very quickly, but only 1 session hang
>     there in 100% cpu tens of hours, even pg_terminate_backend does
>     not make it stopped either.  It looks like abnormal.
>
>    select ssl from pg_stat_ssl where pid=pg_backend_pid();

This is hard to act on without more details or even a reproducible and self-contained test case.  Even a java script
basedon the JDBC driver would be OK for me, for example, if it helps digging into what you are seeing. 

> #0  ensure_record_cache_typmod_slot_exists (typmod=0) at
> typcache.c:1714
> #1  0x000000000091185b in assign_record_type_typmod
> (tupDesc=<optimized out>, tupDesc@entry=0x27bc738) at typcache.c:2001
> #2  0x000000000091df03 in internal_get_result_type (funcid=<optimized out>, call_expr=<optimized out>,
rsinfo=<optimizedout>, 
>     resultTypeId=<optimized out>, resultTupleDesc=0x7ffc9dff8cd0) at
> funcapi.c:393
> #3  0x000000000091e263 in get_expr_result_type (expr=expr@entry=0x2792798,
resultTypeId=resultTypeId@entry=0x7ffc9dff8ccc,
>     resultTupleDesc=resultTupleDesc@entry=0x7ffc9dff8cd0) at
> funcapi.c:230
> #4  0x00000000006a2fa5 in ExecInitFunctionScan
> (node=node@entry=0x273afa8, estate=estate@entry=0x269e948,
> eflags=eflags@entry=16) at nodeFunctionscan.c:370
> #5  0x000000000069084e in ExecInitNode (node=node@entry=0x273afa8,
> estate=estate@entry=0x269e948, eflags=eflags@entry=16) at
> execProcnode.c:255
> #6  0x000000000068a96d in InitPlan (eflags=16, queryDesc=0x273b2d8) at
> execMain.c:936
> #7  standard_ExecutorStart (queryDesc=0x273b2d8, eflags=16) at
> execMain.c:263
> #8  0x00007f67c2821d5d in pgss_ExecutorStart (queryDesc=0x273b2d8,
> eflags=<optimized out>) at pg_stat_statements.c:965
> #9  0x00000000007fc226 in PortalStart (portal=portal@entry=0x26848b8, params=params@entry=0x0, eflags=eflags@entry=0,
snapshot=snapshot@entry=0x0)
>     at pquery.c:514
> #10 0x00000000007fa27f in exec_bind_message
> (input_message=0x7ffc9dff90d0) at postgres.c:1995
> #11 PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffc9dff9370,
> dbname=<optimized out>, username=<optimized out>) at postgres.c:4552
> #12 0x000000000077a4ea in BackendRun (port=<optimized out>,
> port=<optimized out>) at postmaster.c:4537
> #13 BackendStartup (port=<optimized out>) at postmaster.c:4259
> #14 ServerLoop () at postmaster.c:1745
> #15 0x000000000077b363 in PostmasterMain (argc=argc@entry=5,
> argv=argv@entry=0x256abc0) at postmaster.c:1417
> #16 0x00000000004fec63 in main (argc=5, argv=0x256abc0) at main.c:209

This stack is referring to a code path where we are checking that some of the type-related data associated to a record
isaround, but this does not say exactly where the loop happens, so...  Are we looking on a loop in the function
executionitself from which the information of pg_stat_ssl is retrieved (aka pg_stat_get_activity())?  Or is the type
cachesomewhat broken because of the extended query protocol? 
That's not really possible to see any evidence based on the information provided, though it provides a few hits that
canhelp. 
FWIW, I've not heard about an issue like that in the field.

The first thing I would do is update to 14.9, which is the latest version of Postgres available for this major version.
--
Michael

pgsql-bugs by date:

From: "James Pang (chaolpan)"
Date: 07 September 2023, 08:46:29
Subject: RE: FW: query pg_stat_ssl hang 100%cpu

From: Thomas Munro
Date: 07 September 2023, 09:31:00
Subject: Re: FW: query pg_stat_ssl hang 100%cpu

RE: FW: query pg_stat_ssl hang 100%cpu - Mailing list pgsql-bugs

Previous

Next