Re: [PATCH] Add tests for Bitmapset - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: [PATCH] Add tests for Bitmapset |
Date | |
Msg-id | aMz57nHgkLozddYu@paquier.xyz Whole thread Raw |
In response to | Re: [PATCH] Add tests for Bitmapset (Greg Burd <greg@burd.me>) |
List | pgsql-hackers |
On Thu, Sep 18, 2025 at 02:50:45PM -0400, Greg Burd wrote: > Thanks for all the feedback and time spent reviewing this patch. I > switched out the encode/decode functions to use the nodeToString() and > stringToNode() functions and change all the SQL testing function > signatures to TEXT from BYTEA. This exercises more code and that's good > for testing purposes. I took out the code that checks the hash values, > I can't think of a reason that should cause a future failure should that > change and output different values. I re-integrated the earlier random > testing function along with the newer code. I'm not sure this buys us > much if anything. > > coverage: HEAD v7 > lines......: 87.1 90.5 +3.4% > functions..: 100.0 100.0 +0.0% > branches...: 63.5 75.9 +12.4% Nathan has given me his blessing, so as I could look at this patch and put my hands on it. I have spotted one bug, and things that could be done a bit better. Please see below. As far as I am concerned there is a mistake with bms_overlap_list(). This API checks a Bitmapset with a list of integers, but you have coded things so as it checks for a list of Bitmapset. I think that test_bms_overlap_list() should take in input a int4 array, then the code in charge of the conversion from the array to the list needs to be tweaked a little bit. With the current code, we could get random failures with negative members included in a Bitmapset, depending on what's on the stack. I am not really convinced about the value brought by bms_values() now that the output is generated for all the SQL functions using the "decode" and "encode" routines (which may be actually less confusing if renamed as "bms_to_text" and "text_to_bms", but that's a personal take), because this results in doing an extra round-trip between text and Bitmapset. The tests don't rely much on NULL as input to translate that as "(b)". bmsToString() is just a direct call to what the decode and encode routines do. Same remark about test_bms_from_array(), that mimicks nodeRead(), in charge of doing a int4[] -> Bitmapset conversion. Wouldn't it be simpler to just use "(b 1 2 3)" in input and let the decode/encode routines do the job? Let's replace the test descriptions in the queries that we know have an individual result by comments. For example all the "Range operations". These descriptions are useful in the UNION queries, where we want to rely on a set of Bitmapsets for multiple operations, to be able to know the description of the result. For individual queries, this bloats the output. The numbering in the tests are not that useful to have. If the test suite is updated, that would mean more updates required if we add a new block in the middle. +SELECT 'overlapping ranges' as test, + bms_values(test_bms_union( + test_bms_add_range(NULL, 50, 150), + test_bms_add_range(NULL, 100, 200) The output generated by this query leads to a bitmapset with 200 members. It would be simpler and more readable to reduce these ranges, or is there a reason for these to be large? Same remark for "difference subtraction edge case" with its 50 members. +SELECT 'add_range large range' as test, bms_values(test_bms_add_range(NULL, 0, 1000)) as result; This one also is funky with its output of 1000 elements. Any reason this is justified in terms of coverage? If yes, we could use some substr() calls to get the end and the beginning of the output generated, perhaps to reduce the output of the whole, or the length, or something like that relevant enough to validate the output. There is a second test a bit down that uses a range of [1000,1100], as well. -- Michael
Attachment
pgsql-hackers by date: