Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table. - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table. |
Date | |
Msg-id | CA+Tgmoagq6dvUcs9fbzMeZjngscnW6s2WgYrwpcedFp2jnc_LA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.
|
List | pgsql-hackers |
On Fri, Aug 29, 2025 at 11:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm surprised that this didn't break sooner, TBH, since it touches > so many test cases. I'd kind of like to either get it pushed or > write it off as a bad idea. Does anyone else care to comment? I'll take the risk of expressing an opinion: I think we should do something about this problem, but I'm not all that convinced we should do this particular thing. For example, consider this change from your patch: - idxpart1_a_idx | CREATE INDEX idxpart1_a_idx ON public.idxpart1 USING btree (a) - idxpart1_c_b_idx | CREATE INDEX idxpart1_c_b_idx ON public.idxpart1 USING btree (c, b) idxpart2_a_idx | CREATE INDEX idxpart2_a_idx ON public.idxpart2 USING btree (a) idxpart2_c_b_idx | CREATE INDEX idxpart2_c_b_idx ON public.idxpart2 USING btree (c, b) idxparti | CREATE INDEX idxparti ON ONLY public.idxpart USING btree (a) idxparti2 | CREATE INDEX idxparti2 ON ONLY public.idxpart USING btree (c, b) + idxparti2_1 | CREATE INDEX idxparti2_1 ON public.idxpart1 USING btree (c, b) + idxparti_1 | CREATE INDEX idxparti_1 ON public.idxpart1 USING btree (a) There are two things about this that don't seem great to me. First, the index names for idxpart1 are no longer consistent with the index names for idxpart2. Second, IMHO, the names are worse. Let's talk about each of those problems separately. The reason the names aren't consistent any more is because idxpart2 is created as a standalone table and then attached as a partition, whereas idxpart1 is a partition from the first moment of its existence. It is not essential that the index names not vary based on which way the user does it, but I think it is a nicer user experience if they don't. Now, what about the absolute quality of the names? The change makes the index names more consistent with the name of the index on the parent, which is nice, but we also lose something: the index names are now less consistent with the names of the child tables, and they don't mention the affected columns any more. I think those are pretty major drawbacks. In fact, these kinds of examples can understand the degree to which we've lost commonality with the child table names, because here the child tables and the indexes are named with increasing integer counters, but in general the partitions might have names like 2025_09, 2025_10, etc. and the indexes are just going to be counting up 1, 2, ... and I feel like that's worse than what we do now. The pg_overexplain output changes make this point somewhat dramatically, perhaps over-dramatically: - -> Index Scan using brassica_id_idx on brassica v1_1 (actual rows=N.NN loops=1) + -> Index Scan using vegetables_id_idx_1 on brassica v1_1 (actual rows=N.NN loops=1) - -> Index Scan using daucus_id_idx on daucus v1_2 (actual rows=N.NN loops=1) + -> Index Scan using vegetables_id_idx_2 on daucus v1_2 (actual rows=N.NN loops=1) Surely, it's not as nice for the indexes on the brassica and daucus tables to be named vegetables_id_idx_1 and vegetables_id_idx_2 rather than brasica_id_ix and daucus_id_idx. That's just gotta be worse. The fact that you could still get it the other way if you created the partitions standalone and then attached them makes it even worse. IMHO, the real problem here is that when an index is created on a column, we have this idea (with which I agree) that it would be nice to include the column name in the index, but when we have an expression we go "oh, rats, there's no column name, I guess we'll just use 'expr'", which doesn't scale very well beyond a single expression index. For example: - "pt12_expr_idx" btree ((mydouble(category) + 1)) - "pt12_sdata_idx" btree (sdata) - "pt12_tdata_idx" btree (tdata COLLATE mycollation) + "pti1_1" btree ((mydouble(category) + 1)) + "pti2_1" btree (sdata) + "pti3_1" btree (tdata COLLATE mycollation) Hypothetically, what if the second and third indexes ended up being named just as they are, but the first index ended up being called pt12_mydouble_idx? That would be similar to the way that a function name is used as a column alias if none is provided. Granted, that wouldn't help the OP, whose expressions look like this: jsondata -> 'a' -> 'b' jsondata -> 'x' -> 'y' It's asking a lot for an algorithm to produce distinct, human-readable names for these indexes, especially given that there could be other expression indexes on jsondata ---> 'a'----> 'b' i.e. differing only in the operator name. Nonetheless, my intuition here is that the root of the problem here is that we throw up our hands and just say "expr". One way forward could be to do some sort of hash for the plan tree and instead of saying "expr", say "exprXXXX" where each X is an integer or a hex digit or something. Then it's very likely that all of the indexes would get different autogenerated names regardless of how the expression differs. This is not without problems: it's complicated, and it might create buildfarm instability. But from a theoretical perspective I like it better, because it seems to me that it's getting at the root cause of the problem, which IMHO is that "expr" is not a great descriptor for an arbitrary expression. Now all that being said, I'm not volunteering to do the work here and I don't think that what you propose is the most horribly idea anyone's ever had, or anything remotely close to that. It could be that you don't agree (perhaps rightly) or it could be that you agree in a theoretical world but don't want to do the work to get there. So don't understand this as a vigorous attempt to block the patch, just as me giving my view of how I see it. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: