Re: [HACKERS] Interval for launching the table sync worker - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Interval for launching the table sync worker |
Date | |
Msg-id | CAD21AoCHPfBqKRiFSMmZSXM5qbB5rD_HZMrniL4E19T2r8WpkA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Interval for launching the table sync worker (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] Interval for launching the table sync worker
Re: [HACKERS] Interval for launching the table sync worker |
List | pgsql-hackers |
On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoApBU+nz7FYaWX6gjyB9r6WWrTujbL4rrZiocHLc=pEbA@mail.gmail.com> >> On Fri, Apr 21, 2017 at 11:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI >> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> >> Hello, >> >> >> >> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDrw0OaHE=oVRRhQX248kjJ7W+1ViM3K76aP46HnHJsnQ@mail.gmail.com> >> >>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >> >>> <petr.jelinek@2ndquadrant.com> wrote: >> >>> > On 19/04/17 15:57, Masahiko Sawada wrote: >> >>> >> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek >> >>> >> <petr.jelinek@2ndquadrant.com> wrote: >> >>> >> Yeah, sorry, I meant that we don't want to wait but cannot launch the >> >>> >> tablesync worker in such case. >> >>> >> >> >>> >> But after more thought, the latest patch Peter proposed has the same >> >>> >> problem. Perhaps we need to scan always whole pg_subscription_rel and >> >>> >> remove the entry if the corresponding table get synced. >> >>> >> >> >>> > >> >>> > Yes that's what I mean by "Why can't we just update the hashtable based >> >>> > on the catalog". And if we do that then I don't understand why do we >> >>> > need both hastable and linked list if we need to update both based on >> >>> > catalog reads anyway. >> >>> >> >>> Thanks, I've now understood correctly. Yes, I think you're right. If >> >>> we update the hash table based on the catalog whenever table state is >> >>> invalidated, we don't need to have both hash table and list. >> >> >> >> Ah, ok. The patch from Peter still generating and replacing the >> >> content of the list. The attached patch stores everything into >> >> SubscriptionRelState. Oppositte to my anticiation, the hash can >> >> be efectively kept small and removed. >> >> >> > >> > Thank you for the patch! >> > Actually, I also bumped into the same the situation where we got the >> > following error during hash_seq_search. I guess we cannot commit a >> > transaction during hash_seq_scan but the sequential scan loop in >> > process_syncing_tables_for_apply could attempt to do that. > > Ah. Thanks. I forgot that I saw the same error once. The hash has > nothing to do with any transactions. The scan now runs after > freezing of the hash. > > Petr: >> I think we should document why this is done this way - I mean it's not >> very obvious that it's okay to update just when not found and why and >> even less obvious that it would be in fact wrong to update already >> existing entry. > > It is not prudently designed. I changed there.. seemingly more > reasonable, maybe. > > Petr: >> I also think that in it's current form the >> GetSubscriptionNotReadyRelations should be moved to tablesync.c > > Removed then moved to tablesync.c. > > Petr: >> I also think we can't add the new fields to >> SubscriptionRelState directly as that's used by catalog access >> functions as well. > > Yeah, it was my lazyness. A new struct SubscriptionWorkerState is > added in tablesync.c and the interval stuff runs on it. > > > At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoApBU+nz7FYaWX6gjyB9r6WWrTujbL4rrZiocHLc=pEbA@mail.gmail.com> >> So I guess we should commit the changing status to SUBREL_STATE_READY >> after finished hash_seq_scan. > > We could so either, but I thought that a hash can explicitly > "unfreeze" without a side effect. The attached first one adds the > function in dynahash.c. I don't think that just allowing > unregsiterd scan on unfrozen hash is worse that this. > Thank you for updating the patch. + elog(ERROR, "this hash hashtable \"%s\" is not frozen", + hashp->tabname); Maybe the error message should be "this hashtable \"%s\" is not frozen". + /* + * Freeze hash table. This guarantees that the hash won't be broken for + * the scan below. Unfreezing on error leavs the hash freezed but any + * errors within this loop blows away the worker process involving the + * hash. + */ s/leavs/leaves/ s/freezed/frozen/ + /* This will be true in the next rurn only for live entries */ + wstate->alive = false; s/rurn/run/ + /* + * Remove entries no longer necessary. The flag signals nothing if + * subrel_local_state is not updated above. We can remove entries in + * frozen hash safely. + */ + if (local_state_updated && !wstate->alive) + { + hash_search(subrel_local_state, &wstate->rs.relid, + HASH_REMOVE, NULL); + continue; + } IIUC since the apply worker can change the status from SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop the table sync worker which is changed to SUBREL_STATE_READY by the apply worker before updating the subrel_local_state could be remained in the hash table. I think that we should scan pg_subscription_rel by using only a condition "subid". Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: