Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: logical decoding : exceeded maxAllocatedDescs for .spill files |
Date | |
Msg-id | CAJ3gD9c9s6m1KYfM9w6a52NE6kh=pWyMoEv9Dqnjt+DRTqD24A@mail.gmail.com Whole thread Raw |
In response to | Re: logical decoding : exceeded maxAllocatedDescs for .spill files (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
|
List | pgsql-hackers |
On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > > > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Few comments: > > > ---------------------- > > > > > > 1. > > > + /* Now that the state fields are initialized, it is safe to return it. */ > > > + *iter_state = state; > > > + > > > /* allocate heap */ > > > state->heap = > > > binaryheap_allocate(state->nr_txns, > > > ReorderBufferIterCompare, > > > > > > Is there a reason for not initializing iter_state after > > > binaryheap_allocate? If we do so, then we don't need additional check > > > you have added in ReorderBufferIterTXNFinish. > > > > If iter_state is initialized *after* binaryheap_allocate, then we > > won't be able to close the vfds if binaryheap_allocate() ereports(). > > > > Is it possible to have vfds opened before binaryheap_allocate(), if so how? No it does not look possible for the vfds to be opened before binaryheap_allocate(). But actually, the idea behind placing the iter_state at the place where I put is that, we should return back the iter_state at the *earliest* place in the code where it is safe to return. > > I couldn't reproduce the original problem (on HEAD) reported with the > > test case in the patch. So, I can't verify the fix. I think it is > > because of recent commits cec2edfa7859279f36d2374770ca920c59c73dd8 and > > 9290ad198b15d6b986b855d2a58d087a54777e87. It seems you need to either > > change the value of logical_decoding_work_mem or change the test in > > some way so that the original problem can be reproduced. Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > Yeah, it does seem like the commit cec2edfa78592 must have caused the > test to not reproduce on your env, although the test does fail for me > still. Setting logical_decoding_work_mem to a low value does sound > like a good idea. Will work on it. I checked that setting logical_decoding_work_mem to its min value (64KB) causes early serialization. So I think if you set this, you should be able to reproduce with the spill.sql test that has the new testcase. I will anyway set this value in the test. Also check below ... >> 4. I think we should also check how much time increase will happen for >> test_decoding regression test after the test added by this patch? Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > Yeah, it currently takes noticeably longer compared to the others. > Let's see if setting logical_decoding_work_mem to a min value allows > us to reproduce the test with much lesser number of inserts. The test in the patch takes around 20 seconds, as compared to the max time of 2 seconds any of the other tests take in that test suite. But if we set the max_files_per_process to a very low value (say 26) as against the default 1000, we can reproduce the issue with as low as 20 sub-transactions as against the 600 that I used in spill.sql test. And with this, the test runs in around 4 seconds, so this is good. But the problem is : max_files_per_process needs server restart. So either we have to shift this test to src/test/recovery in one of the logical_decoding test, or retain it in contrib/test_decoding and let it run for 20 seconds. Let me know if you figure out any other approach. > > > > > > > 5. One naive question about the usage of PathNameOpenFile(). When it > > > reaches the max limit, it will automatically close one of the files, > > > but how will that be reflected in the data structure (TXNEntryFile) > > > you are managing. Basically, when PathNameOpenFile closes some file, > > > how will the corresponding vfd in TXNEntryFile be changed. Because if > > > it is not changed, then won't it start pointing to some wrong > > > filehandle? > > > > In PathNameOpenFile(), excess kernel fds could be closed > > (ReleaseLruFiles). But with that, the vfds themselves don't get > > invalidated. Only the underlying kernel fd gets closed, and the > > vfd->fd is marked VFD_CLOSED. The vfd array element remains valid (a > > non-null vfd->fileName means the vfd slot is valid; check > > FileIsValid). So later, when FileRead(vfd1) is called and that vfd1 > > happens to be the one that had got it's kernel fd closed, it gets > > opened again through FileAccess(). > > > > I was under impression that once the fd is closed due to excess kernel > fds that are opened, the slot in VfdCache array could be resued by > someone else, but on closer inspection that is not true. It will be > only available for reuse after we explicitly call FileClose, right? Yes, that's right. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: