Thread: Re: Increase of maintenance_work_mem limit in 64-bit Windows
On Fri, 20 Sept 2024 at 01:55, Пополитов Владлен <v.popolitov@postgrespro.ru> wrote: > Currently PostgreSQL built on 64-bit Windows has 2Gb limit for > GUC variables due to sizeof(long)==4 used by Windows compilers. > Technically 64-bit addressing for maintenance_work_mem is possible, > but code base historically uses variables and constants of type "long", > when process maintenance_work_mem value. I agree. Ideally, we shouldn't use longs for anything ever. We should likely adopt trying to remove the usages of them when possible. I'd like to suggest you go about this patch slightly differently with the end goal of removing the limitation from maintenance_work_mem, work_mem, autovacuum_work_mem and logical_decoding_work_mem. Patch 0001: Add a macro named something like WORK_MEM_KB_TO_BYTES() and adjust all places where we do <work_mem_var> * 1024L to use this new macro. Make the macro do the * 1024L as is done today so that this patch is a simple refactor. Patch 0002: Convert all places that use long and use Size instead. Adjust WORK_MEM_KB_TO_BYTES to use a Size type rather than 1024L. It might be wise to break 0002 down into individual GUCs as the patch might become large. I suspect we might have quite a large number of subtle bugs in our code today due to using longs. 7340d9362 is an example of one that was fixed recently. David
David Rowley писал(а) 2024-09-23 04:28: > On Fri, 20 Sept 2024 at 01:55, Пополитов Владлен > <v.popolitov@postgrespro.ru> wrote: >> Currently PostgreSQL built on 64-bit Windows has 2Gb limit for >> GUC variables due to sizeof(long)==4 used by Windows compilers. >> Technically 64-bit addressing for maintenance_work_mem is possible, >> but code base historically uses variables and constants of type >> "long", >> when process maintenance_work_mem value. > > I agree. Ideally, we shouldn't use longs for anything ever. We should > likely adopt trying to remove the usages of them when possible. > > I'd like to suggest you go about this patch slightly differently with > the end goal of removing the limitation from maintenance_work_mem, > work_mem, autovacuum_work_mem and logical_decoding_work_mem. > > Patch 0001: Add a macro named something like WORK_MEM_KB_TO_BYTES() > and adjust all places where we do <work_mem_var> * 1024L to use this > new macro. Make the macro do the * 1024L as is done today so that this > patch is a simple refactor. > Patch 0002: Convert all places that use long and use Size instead. > Adjust WORK_MEM_KB_TO_BYTES to use a Size type rather than 1024L. > > It might be wise to break 0002 down into individual GUCs as the patch > might become large. > > I suspect we might have quite a large number of subtle bugs in our > code today due to using longs. 7340d9362 is an example of one that was > fixed recently. > > David Hi David, Thank you for proposal, I looked at the patch and source code from this point of view. In this approach we need to change all <work_mem_var>. I counted the appearences of these vars in the code: maintenance_work_mem appears 63 times in 20 files work_mem appears 113 times in 48 files logical_decoding_work_mem appears 10 times in 2 files max_stack_depth appears 11 times in 3 files wal_keep_size_mb appears 5 times in 3 files min_wal_size_mb appears 5 times in 2 files max_wal_size_mb appears 10 times in 2 files wal_skip_threshold appears 5 times in 2 files max_slot_wal_keep_size_mb appears 6 times in 3 files wal_sender_timeout appears 23 times in 3 files autovacuum_work_mem appears 11 times in 4 files gin_pending_list_limit appears 8 times in 5 files pendingListCleanupSize appears 2 times in 2 files GinGetPendingListCleanupSize appears 2 times in 2 files maintenance_work_mem appears 63 times and had only 4 cases, where "long" is used (I fix it in patch). I also found, that this patch also fixed autovacuum_work_mem , that has only 1 case - the same place in code as maintenance_work_mem. Now <work_mem_vars> in the code are processed based on the context: they are assigned to Size, uint64, int64, double, long, int variables (last 2 cases need to fix) or multiplied by (uint64)1024, (Size)1024, 1024L (last case needs to fix). Also signed value is used for max_stack_depth (-1 used as error value). I am not sure, that we can solve all this cases by one macro WORK_MEM_KB_TO_BYTES(). The code needs case by case check. If I check the rest of the variables, the patch does not need MAX_SIZE_T_KILOBYTES constant (I introduced it for variables, that are already checked and fixed), it will contain only fixes in the types of the variables and the constants. It requires a lot of time to check all appearances and neighbour code, but final patch will not be large, I do not expect a lot of "long" in the rest of the code (only 4 case out of 63 needed to fix for maintenance_work_mem). What do you think about this approach? -- Best regards, Vladlen Popolitov.
On Mon, 23 Sept 2024 at 21:01, Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > Thank you for proposal, I looked at the patch and source code from this > point of view. In this approach we need to change all <work_mem_var>. > I counted the appearences of these vars in the code: > maintenance_work_mem appears 63 times in 20 files > work_mem appears 113 times in 48 files > logical_decoding_work_mem appears 10 times in 2 files > max_stack_depth appears 11 times in 3 files > wal_keep_size_mb appears 5 times in 3 files > min_wal_size_mb appears 5 times in 2 files > max_wal_size_mb appears 10 times in 2 files > wal_skip_threshold appears 5 times in 2 files > max_slot_wal_keep_size_mb appears 6 times in 3 files > wal_sender_timeout appears 23 times in 3 files > autovacuum_work_mem appears 11 times in 4 files > gin_pending_list_limit appears 8 times in 5 files > pendingListCleanupSize appears 2 times in 2 files > GinGetPendingListCleanupSize appears 2 times in 2 files Why do you think all of these appearances matter? I imagined all you care about are when the values are multiplied by 1024. > If I check the rest of the variables, the patch does not need > MAX_SIZE_T_KILOBYTES constant (I introduced it for variables, that are > already checked and fixed), it will contain only fixes in the types of > the variables and the constants. > It requires a lot of time to check all appearances and neighbour > code, but final patch will not be large, I do not expect a lot of > "long" in the rest of the code (only 4 case out of 63 needed to fix > for maintenance_work_mem). > What do you think about this approach? I don't think you can do maintenance_work_mem without fixing work_mem too. I don't think the hacks you've put into RI_Initial_Check() to ensure you don't try to set work_mem beyond its allowed range are very good. It effectively means that maintenance_work_mem does not do what it's meant to for the initial validation of referential integrity checks. If you're not planning on fixing work_mem too, would you just propose to leave those hacks in there forever? David
David Rowley писал(а) 2024-09-23 15:35: > On Mon, 23 Sept 2024 at 21:01, Vladlen Popolitov > <v.popolitov@postgrespro.ru> wrote: >> Thank you for proposal, I looked at the patch and source code from >> this >> point of view. In this approach we need to change all <work_mem_var>. >> I counted the appearences of these vars in the code: >> maintenance_work_mem appears 63 times in 20 files >> work_mem appears 113 times in 48 files >> logical_decoding_work_mem appears 10 times in 2 files >> max_stack_depth appears 11 times in 3 files >> wal_keep_size_mb appears 5 times in 3 files >> min_wal_size_mb appears 5 times in 2 files >> max_wal_size_mb appears 10 times in 2 files >> wal_skip_threshold appears 5 times in 2 files >> max_slot_wal_keep_size_mb appears 6 times in 3 files >> wal_sender_timeout appears 23 times in 3 files >> autovacuum_work_mem appears 11 times in 4 files >> gin_pending_list_limit appears 8 times in 5 files >> pendingListCleanupSize appears 2 times in 2 files >> GinGetPendingListCleanupSize appears 2 times in 2 files > > Why do you think all of these appearances matter? I imagined all you > care about are when the values are multiplied by 1024. Common pattern in code - assign <work_mem_var> to local variable and send local variable as parameter to function, then to nested function, and somewhere deep multiply function parameter by 1024. It is why I needed to check all appearances, most of them are correct. >> If I check the rest of the variables, the patch does not need >> MAX_SIZE_T_KILOBYTES constant (I introduced it for variables, that are >> already checked and fixed), it will contain only fixes in the types of >> the variables and the constants. >> It requires a lot of time to check all appearances and neighbour >> code, but final patch will not be large, I do not expect a lot of >> "long" in the rest of the code (only 4 case out of 63 needed to fix >> for maintenance_work_mem). >> What do you think about this approach? > > I don't think you can do maintenance_work_mem without fixing work_mem > too. I don't think the hacks you've put into RI_Initial_Check() to > ensure you don't try to set work_mem beyond its allowed range are very > good. It effectively means that maintenance_work_mem does not do what > it's meant to for the initial validation of referential integrity > checks. If you're not planning on fixing work_mem too, would you just > propose to leave those hacks in there forever? I agree, it is better to fix all them together. I also do not like this hack, it will be removed from the patch, if I check and change all <work_mem_vars> at once. I think, it will take about 1 week to fix and test all changes. I will estimate the total volume of the changes and think, how to group them in the patch ( I hope, it will be only one patch) -- Best regards, Vladlen Popolitov.
On Tue, 24 Sept 2024 at 02:47, Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > I agree, it is better to fix all them together. I also do not like this > hack, it will be removed from the patch, if I check and change > all <work_mem_vars> at once. > I think, it will take about 1 week to fix and test all changes. I will > estimate the total volume of the changes and think, how to group them > in the patch ( I hope, it will be only one patch) There's a few places that do this: Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE; /* choose the maxBlockSize to be no larger than 1/16 of work_mem */ while (16 * maxBlockSize > work_mem * 1024L) I think since maxBlockSize is a Size variable, that the above should probably be: while (16 * maxBlockSize > (Size) work_mem * 1024) Maybe there can be a precursor patch to fix all those to get rid of the 'L' and cast to the type we're comparing to or assigning to rather than trying to keep the result of the multiplication as a long. David
David Rowley писал(а) 2024-09-24 01:07: > On Tue, 24 Sept 2024 at 02:47, Vladlen Popolitov > <v.popolitov@postgrespro.ru> wrote: >> I agree, it is better to fix all them together. I also do not like >> this >> hack, it will be removed from the patch, if I check and change >> all <work_mem_vars> at once. >> I think, it will take about 1 week to fix and test all changes. I will >> estimate the total volume of the changes and think, how to group them >> in the patch ( I hope, it will be only one patch) > > There's a few places that do this: > > Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE; > > /* choose the maxBlockSize to be no larger than 1/16 of work_mem */ > while (16 * maxBlockSize > work_mem * 1024L) > > I think since maxBlockSize is a Size variable, that the above should > probably be: > > while (16 * maxBlockSize > (Size) work_mem * 1024) > > Maybe there can be a precursor patch to fix all those to get rid of > the 'L' and cast to the type we're comparing to or assigning to rather > than trying to keep the result of the multiplication as a long. Yes. It is what I mean, when I wrote about the context - in this case variable is used in "Size" context and the cast to Size type should be used. It is why I need to check all places in code. I am going to do it during this week. -- Best regards, Vladlen Popolitov.
v.popolitov@postgrespro.ru писал(а) 2024-10-01 00:30: > David Rowley писал(а) 2024-09-24 01:07: >> On Tue, 24 Sept 2024 at 02:47, Vladlen Popolitov >> <v.popolitov@postgrespro.ru> wrote: >>> I agree, it is better to fix all them together. I also do not like >>> this >>> hack, it will be removed from the patch, if I check and change >>> all <work_mem_vars> at once. >>> I think, it will take about 1 week to fix and test all changes. I >>> will >>> estimate the total volume of the changes and think, how to group them >>> in the patch ( I hope, it will be only one patch) >> >> There's a few places that do this: >> >> Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE; >> >> /* choose the maxBlockSize to be no larger than 1/16 of work_mem */ >> while (16 * maxBlockSize > work_mem * 1024L) >> >> I think since maxBlockSize is a Size variable, that the above should >> probably be: >> >> while (16 * maxBlockSize > (Size) work_mem * 1024) >> >> Maybe there can be a precursor patch to fix all those to get rid of >> the 'L' and cast to the type we're comparing to or assigning to rather >> than trying to keep the result of the multiplication as a long. > > Hi > > I rechecked all <work_mem_vars>, that depend on MAX_KILOBYTES limit and > fixed > all casts that are affected by 4-bytes long type in Windows 64-bit. Now > next variables are limited by 2TB in all 64-bit systems: > maintenance_work_mem > work_mem > logical_decoding_work_mem > max_stack_depth > autovacuum_work_mem > gin_pending_list_limit > wal_skip_threshold > Also wal_keep_size_mb, min_wal_size_mb, max_wal_size_mb, > max_slot_wal_keep_size_mb are not affected by "long" cast. Hi everyone. The patch added to Commitfest: https://commitfest.postgresql.org/50/5343/ -- Best regards, Vladlen Popolitov.