The commit message looks good to me, except maybe using a "master" word, which I would suggest to replace with "primary".
And a small nitpick:
+/* + * Initialize a hash table to store WAL file names that must be kept. + */ +void +keepwal_init(void) +{ + /* + * This hash table is empty in the vast majority of cases, so set an + * initial size of 0. + */ + keepwal = keepwal_create(0, NULL); +}
I don't think that the hash table will be empty. It needs to hold all WAL filenames starting from the last checkpoint and up to divergent point.
On loaded clusters it could be hundreds and thousands of files.
After reading the whole thread a couple of times to make sure I understood the problem correctly, I think the approach in the v10 patch is a reasonable one. I agree that it's better for maintainability to keep a separate hash table. I made some cosmetic adjustments -- didn't find any fault in the code. I also verified that the test is hitting the problematic case.
So here's v11, which I'll try to get out tomorrow.
I also assessed back-patching this. There's some conflicts, but nothing serious, back to 14. Unfortunately, 13 lacks requisite infrastructure, so I think it'll have to stay as it is. (12 is dead.)
Can you please verify that my explanation in the commit message is sound?
Thanks
-- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)