RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers
From | wangw.fnst@fujitsu.com |
---|---|
Subject | RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date | |
Msg-id | OS3PR01MB62754E945BD95A9B2A1164789E289@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>) |
Responses |
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
|
List | pgsql-hackers |
On Fri, Sep 23, 2022 at 0:14 AM Önder Kalacı <onderkalaci@gmail.com> wrote: > Hii Wang wei, Thanks for updating the patch and your reply. > > 1. In the function GetCheapestReplicaIdentityFullPath. > > + if (rel->pathlist == NIL) > > + { > > + /* > > + * A sequential scan could have been dominated by by an index > > scan > > + * during make_one_rel(). We should always have a sequential > > scan > > + * before set_cheapest(). > > + */ > > + Path *seqScanPath = create_seqscan_path(root, rel, NULL, > > 0); > > + > > + add_path(rel, seqScanPath); > > + } > > > > This is a question I'm not sure about: > > Do we need this part to add sequential scan? > > > > I think in our case, the sequential scan seems to have been added by the > > function make_one_rel (see function set_plain_rel_pathlist). > > Yes, the sequential scan is added during make_one_rel. > > > If I am missing something, please let me know. BTW, there is a typo in > > above comment: `by by`. > > As the comment mentions, the sequential scan could have been dominated & > removed by index scan, see add_path(): > > *We also remove from the rel's pathlist any old paths that are dominated > * by new_path --- that is, new_path is cheaper, at least as well ordered, > * generates no more rows, requires no outer rels not required by the old > * path, and is no less parallel-safe. > > Still, I agree that the comment could be improved, which I pushed. Oh, sorry I didn't realize this part of the logic. Thanks for sharing this. And I have another confusion about function GetCheapestReplicaIdentityFullPath: If rel->pathlist is NIL, could we return NULL directly from this function, and then set idxoid to InvalidOid in function FindUsableIndexForReplicaIdentityFull in that case? === Here are some comments for test file 032_subscribe_use_index.pl on v18 patch: 1. ``` +# Basic test where the subscriber uses index +# and only updates 1 row for and deletes +# 1 other row ``` There seems to be an extra "for" here. 2. Typos for subscription name in the error messages. tap_sub_rep_full_0 -> tap_sub_rep_full 3. Typo in comments ``` +# use the newly created index (provided that it fullfils the requirements). ``` fullfils -> fulfils 4. Some extra single quotes at the end of the error message ('"). For example: ``` # wait until the index is used on the subscriber $node_subscriber->poll_query_until( 'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';} ) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index'"; ``` 5. The column names in the error message appear to be a typo. ``` +) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on highcardinality column-1"; ... +) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on highcardinality column-3"; ... +) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on highcardinality column-4"; ``` It seems that we need to do the following change: 'column-3' -> 'column-1' and 'column-4' -> 'column-2'. Or we could use the column names directly like this: 'column-1' -> 'column a', 'column_3' -> 'column a' and 'column_4' -> 'column b'. 6. DELETE action is missing from the error message. ``` +# 2 rows from first command, another 2 from the second command +# overall index_on_child_1_a is used 4 times +$node_subscriber->poll_query_until( + 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where indexrelname = 'index_on_child_1_a';} +) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates child_1 table'"; ``` I think we execute both UPDATE and DELETE for child_1 here. Could we add DELETE action to this error message? 7. Table name in the error message. ``` # check if the index is used even when the index has NULL values $node_subscriber->poll_query_until( 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';} ) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates parent table'"; ``` It seems to be "test_replica_id_full" here instead of "parent'". Regards, Wang wei
pgsql-hackers by date: