Thread: BUG #18600: Getting wait_type_event as IPC:BTreePage for count queries
BUG #18600: Getting wait_type_event as IPC:BTreePage for count queries
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18600 Logged by: Aditya Singh Email address: aditya.singh@lji.io PostgreSQL version: 13.11 Operating system: macOS Description: Table1: Structure: CREATE TABLE "femsa"."activities_bit" ( "id" int4 NOT NULL DEFAULT nextval('femsa.activities_bit_copy_id_seq'::regclass), "created_ts" timestamptz, "member_id" varchar(40), "bit_reference" varchar(32), "header" jsonb, "payment_details" jsonb, PRIMARY_KEY("id") } Table 2: Structure: CREATE TABLE "femsa"."activities_bit" ( "id" int8 NOT NULL DEFAULT nextval('femsa.activities_bit_copy_id_seq'::regclass), "created_ts" timestamptz, "member_id" varchar(40), "bit_reference" varchar(32), "header" jsonb, "payment_details" jsonb, PRIMARY_KEY("id") } The only difference between the two is the data type of the id sequence (int, bigint). SELECT COUNT(*) FROM {table_name} WHERE id > 0; When I try to run the above query, I get the result for Table 1, but the query on Table 2 goes into wait_type_event IPC:BtreePage. Why this IPC:BtreePage is coming for Table2 when both tables have similar structure, data and Indexes?
On 9/4/24 11:21, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 18600 > Logged by: Aditya Singh > Email address: aditya.singh@lji.io > PostgreSQL version: 13.11 > Operating system: macOS > Description: > > Table1: > Structure: > CREATE TABLE "femsa"."activities_bit" ( > "id" int4 NOT NULL DEFAULT > nextval('femsa.activities_bit_copy_id_seq'::regclass), > "created_ts" timestamptz, > "member_id" varchar(40), > "bit_reference" varchar(32), > "header" jsonb, > "payment_details" jsonb, > PRIMARY_KEY("id") > } > > Table 2: > Structure: > CREATE TABLE "femsa"."activities_bit" ( > "id" int8 NOT NULL DEFAULT > nextval('femsa.activities_bit_copy_id_seq'::regclass), > "created_ts" timestamptz, > "member_id" varchar(40), > "bit_reference" varchar(32), > "header" jsonb, > "payment_details" jsonb, > PRIMARY_KEY("id") > } > > The only difference between the two is the data type of the id sequence > (int, bigint). > > SELECT COUNT(*) FROM {table_name} WHERE id > 0; > When I try to run the above query, I get the result for Table 1, but the > query on Table 2 goes into wait_type_event IPC:BtreePage. > > Why this IPC:BtreePage is coming for Table2 when both tables have similar > structure, data and Indexes? > Not sure I understood the question correctly, but chances are both queries hit that wait event, but it's intermittent and you only noticed that for one of them. Or are you saying it gets "stuck" in that wait even for the table with int8 column? Another option is that the first query does not use the index at all, in which case it of course won't see btree wait events. We don't even know how large the tables are, not to mention which access methods they use. You'd have to show us explain. I'd bet the first query uses seqscan, while the second one uses index-only scan. regards -- Tomas Vondra
The query on the first table didn't hit the wait event; it hit on the second table. I can show you the same on RDS Database Insights. Both tables have the same Indexes, Data, and Columns. There is only a difference in the ID datatype. Is there any possibility of scheduling a call to troubleshoot this issue since it is happening in the Production environment?
Regards
Aditya Narayan
Regards
Aditya Narayan
On Wed, Sep 4, 2024 at 9:43 PM Tomas Vondra <tomas@vondra.me> wrote:
On 9/4/24 11:21, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 18600
> Logged by: Aditya Singh
> Email address: aditya.singh@lji.io
> PostgreSQL version: 13.11
> Operating system: macOS
> Description:
>
> Table1:
> Structure:
> CREATE TABLE "femsa"."activities_bit" (
> "id" int4 NOT NULL DEFAULT
> nextval('femsa.activities_bit_copy_id_seq'::regclass),
> "created_ts" timestamptz,
> "member_id" varchar(40),
> "bit_reference" varchar(32),
> "header" jsonb,
> "payment_details" jsonb,
> PRIMARY_KEY("id")
> }
>
> Table 2:
> Structure:
> CREATE TABLE "femsa"."activities_bit" (
> "id" int8 NOT NULL DEFAULT
> nextval('femsa.activities_bit_copy_id_seq'::regclass),
> "created_ts" timestamptz,
> "member_id" varchar(40),
> "bit_reference" varchar(32),
> "header" jsonb,
> "payment_details" jsonb,
> PRIMARY_KEY("id")
> }
>
> The only difference between the two is the data type of the id sequence
> (int, bigint).
>
> SELECT COUNT(*) FROM {table_name} WHERE id > 0;
> When I try to run the above query, I get the result for Table 1, but the
> query on Table 2 goes into wait_type_event IPC:BtreePage.
>
> Why this IPC:BtreePage is coming for Table2 when both tables have similar
> structure, data and Indexes?
>
Not sure I understood the question correctly, but chances are both
queries hit that wait event, but it's intermittent and you only noticed
that for one of them. Or are you saying it gets "stuck" in that wait
even for the table with int8 column?
Another option is that the first query does not use the index at all, in
which case it of course won't see btree wait events. We don't even know
how large the tables are, not to mention which access methods they use.
You'd have to show us explain. I'd bet the first query uses seqscan,
while the second one uses index-only scan.
regards
--
Tomas Vondra
This message and any attachments are intended only for the use of the intended recipient(s), are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or other use of this message and any attachments is strictly prohibited. If received in error, please notify the sender immediately and permanently delete it.
On 9/5/24 10:11, Aditya Singh wrote: > The query on the first table didn't hit the wait event; it hit on the > second table. I can show you the same on RDS Database Insights. Both > tables have the same *Indexes*, *Data*, and *Columns*. There is only a > difference in the ID datatype. Is there any possibility of scheduling a > call to troubleshoot this issue since it is happening in the Production > environment? > As this is a RDS database, you should probably talk to AWS support? Other than that, share the explain plans for the two queries (as I asked in my previous response). BTW please don't top post. It's customary to reply inline on this list. regards -- Tomas Vondra
QUERY 1: explain select count(*) from activities_bit_old where id > 0; (id column data-type is int4)
Finalize Aggregate (cost=75709053.22..75709053.23 rows=1 width=8)
-> Gather (cost=75709053.00..75709053.21 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=75708053.00..75708053.01 rows=1 width=8)
-> Parallel Index Only Scan using activities_bit_pkey on activities_bit_old (cost=0.58..73513797.33 rows=877702270 width=0)
Index Cond: (id > 0)
QUERY 2: explain select count(*) from activities_bit where id > 0; (id column data-type is int8)
Finalize Aggregate (cost=117811065.69..117811065.70 rows=1 width=8)
-> Gather (cost=117811065.48..117811065.69 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=117810065.48..117810065.49 rows=1 width=8)
-> Parallel Index Only Scan using activities_bit_copy_pkey on activities_bit (cost=0.58..115695086.65 rows=845991530 width=0)
Index Cond: (id > 0)
Above mentioned is the query plan for Table 1 and Table 2;
Finalize Aggregate (cost=75709053.22..75709053.23 rows=1 width=8)
-> Gather (cost=75709053.00..75709053.21 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=75708053.00..75708053.01 rows=1 width=8)
-> Parallel Index Only Scan using activities_bit_pkey on activities_bit_old (cost=0.58..73513797.33 rows=877702270 width=0)
Index Cond: (id > 0)
QUERY 2: explain select count(*) from activities_bit where id > 0; (id column data-type is int8)
Finalize Aggregate (cost=117811065.69..117811065.70 rows=1 width=8)
-> Gather (cost=117811065.48..117811065.69 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=117810065.48..117810065.49 rows=1 width=8)
-> Parallel Index Only Scan using activities_bit_copy_pkey on activities_bit (cost=0.58..115695086.65 rows=845991530 width=0)
Index Cond: (id > 0)
Above mentioned is the query plan for Table 1 and Table 2;
On Thu, Sep 5, 2024 at 7:11 PM Tomas Vondra <tomas@vondra.me> wrote:
On 9/5/24 10:11, Aditya Singh wrote:
> The query on the first table didn't hit the wait event; it hit on the
> second table. I can show you the same on RDS Database Insights. Both
> tables have the same *Indexes*, *Data*, and *Columns*. There is only a
> difference in the ID datatype. Is there any possibility of scheduling a
> call to troubleshoot this issue since it is happening in the Production
> environment?
>
As this is a RDS database, you should probably talk to AWS support?
Other than that, share the explain plans for the two queries (as I asked
in my previous response).
BTW please don't top post. It's customary to reply inline on this list.
regards
--
Tomas Vondra
This message and any attachments are intended only for the use of the intended recipient(s), are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or other use of this message and any attachments is strictly prohibited. If received in error, please notify the sender immediately and permanently delete it.