Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET |
Date | |
Msg-id | Z9dyXt2cuQkcOabX@paquier.xyz Whole thread Raw |
In response to | Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET (David Rowley <dgrowleyml@gmail.com>) |
Responses |
RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET |
List | pgsql-hackers |
On Fri, Mar 14, 2025 at 04:14:25PM +1300, David Rowley wrote: > I don't think I'm discarding them... As far as I'm aware, your > remaining concern is with custom jumble functions and you showed an > example in [1] of a hypothetical jumble function that could cause the > same bug as this thread is discussing. My response to that was that > the custom jumble function is broken and should be fixed, which seems > to me to be analogous to Ivan's proposal to fix _jumbleNode() to do > something with NULL pointers. Okay, that's where our opinions diverge. So, well, please let me reformulate that with some different words and terms. This is not a problem of having a NULL node vs a non-NULL node in the jumbling, because by design is it possible for one to decide if a Node can be included in the jumbling depending on its internal values, particularly if it gets used across one than one type of parent node. You are claiming that such functions are broken, but what I am saying is that such function can be OK depending on how one wants this code to behave depending on their node structure and what they expect from them. So I'm making a claim about keeping this stuff more flexible, while also addressing the problem of the same node defined successively more than once in a parent structure. > I now can't tell which of the following is true: 1) I've missed one of > your concerns, or; 2) you want to find a way to make badly coded > custom jumble functions not suffer from this bug, or; 3) you think > that adding jumble bytes unconditionally regardless of the field's > value still does not fix the bug in question, or; 4) something else. > It would be good to know which of these it is. I hope that my opinion counts, as I've worked on this whole design in 3db72ebcbe20, 8eba3e3f0208 and other related commits. FWIW, I don't really see any disadvantage in supporting what I'm claiming is OK, giving more flexibility to folks to do what they want with this facility, if it also tackles this thread's problem with the Node handling. So, here is attached a counter-proposal, where we can simply added a counter tracking a node count in _jumbleNode() to add more entropy to the mix, incrementing it as well for NULL nodes. By the way, I was looking at the set of tests proposed upthread at this message, which is as far as I know the latest version proposed: https://www.postgresql.org/message-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain The tests do require neither a relation nor a stats reset, so let's make it cheaper as I've proposed upthread and in the attached, including checks with multiple queries and different constants to make sure that these are correctly grouped in the pgss reports. The null_sequence_number reset each time we run through a non-NULL node from variant B reduces a bit the entropy, btw.. The argument about adding a counter in AppendJumble() is the wrong level to use for such a change. Anyway, perhaps we should have your extra byte '\0' or something equivalent added to JUMBLE_STRING() if dealing with a NULL string rather than ignoring it. There's an argument about simplicity, IMO, still it is true that ignoring a NULL value reduces the entropy of the query ID. So I'd be OK with that if you feel strongly about this point, at it sound to me that you are? This could be better than a hardcoded '\0' byte as we could use a counter and JUMBLE_FIELD_SINGLE() in the NULL case of JUMBLE_STRING(). It's not proved to be a problem yet, and there's value in having a simpler solution, as well. Anyway, one line I'd like to draw is that appendJumble() remains as it is written currently. One extra thing that We could do is to update it so as the size of an item given is always more than zero, with an Assert(), to enforce a stricter policy. -- Michael
Attachment
pgsql-hackers by date: