Re: SQL Property Graph Queries (SQL/PGQ) - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: SQL Property Graph Queries (SQL/PGQ) |
Date | |
Msg-id | CAExHW5t8UJgon2VW2HMndESyOfhpk2syDYaBSuw60ig9qXcHAA@mail.gmail.com Whole thread Raw |
In response to | Re: SQL Property Graph Queries (SQL/PGQ) (Amit Langote <amitlangote09@gmail.com>) |
List | pgsql-hackers |
Hi Amit, Thanks for your comments On Wed, Jul 23, 2025 at 2:08 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Hi Ashutosh, > > On Tue, Jul 22, 2025 at 8:40 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > Here's the patchset rebased on the latest master. > > Thanks for the update. I was going through the patch series today and > had a few comments on the structure that might help with reviewing and > eventually committing, whoever ends up doing that. :) > > I noticed that some functions, like graph_table_property_reference(), > are introduced in one patch and then renamed or removed in a later > one. It’s easier to review if such refactoring is avoided across > patches -- ideally, each patch should introduce code in its final > intended form. That also helps reduce the number of patches and makes > each one more self-contained. > > One of the later patches (0004) collects several follow-up changes. > While it fixes a real bug -- a crash when GRAPH_TABLE is used inside > plpgsql due to a conflict with the columnref hook -- it also includes > incidental cleanups like switching to foreach_node(), updating > comments, and adjusting function signatures. Those would be better > folded into the patches that introduced the relevant code, rather than > grouped into a catch-all at the end. That keeps each patch focused and > easier to review -- and easier to merge when committing. > > A general principle that might help: if you're refactoring existing > code, a standalone preliminary patch makes sense. But if you're > tweaking something just added in the same series, it’s usually better > to squash that into the original patch. The rename from > graph_table_property_reference() to transformGraphTablePropertyRef() > may be a fair exception since it reflects a shift prompted by the bug > fix -- but most other adjustments could be folded in without loss of > clarity. > > I understand the intent to spell out each change, but if the details > are incidental to the overall design, they don’t necessarily need to > be split out. Explaining the reasoning in the thread is always > helpful, but consolidating the patches once the design has settled > makes things easier for both reviewers and committers. 0001 is almost the same patch that Peter posted almost a year ago. Each of 0002 onwards patches contains logically grouped changes. I have changed 0001 minimally (so that it continues to apply, compile and pass regression). I think this arrangement will make it easy for Peter to review the changes. It will help him understand what all changed since 0001 and each change in its own context. That has led to a lot of overlap between 0002 and 0001 which I think should be squashed together even now. But I would like to know what Peter thinks - what would make his review easier. This arrangement might also help Junwang, who has reviewed some patches, to continue his incremental review. I have rearranged the patches 0002 to 0014 so that the same change isn't revised in multiple patches. Hope that helps. I am also attaching a single patch as well to this email, so that newer reviewers can review the changes as a whole. SQL_PGQ_20250807.zip is separate patches zipped together. SQL_PGQ_one_patch_20250807.zip is a single diff with changes from all the patches squashed together. It's still zipped to avoid the email being held for moderation. > > Finally, attaching the patches directly, with versioned names like > v8-000x, instead of zipping them helps. Many folks (myself included) > will casually skip a zip file because of the small hassle of unzipping > just to read a patch. I once postponed reading such a patch and didn’t > get back to it for quite a while. :) I used to attach those as separate patches till [1], but after that the total size of the patchset grew so much that my emails used to get held back for moderation. So I started attaching zip files. The increase in size is mostly due to the 0014 patch. If we think that it's not needed or can be trimmed down, we will be able to attach the patchset as separate attachments. Here's what changed between the last patchset and this 0001 has some TODOs, FIXMEs and XXXs. I and Junwang had fixed some of them earlier. This patchset fixes some more. Below is my analysis of each of them. + + <!-- TODO: multiple path patterns in a graph pattern (comma-separated) --> + + <!-- TODO: quantifiers --> I think we should remove these TODOs since we are not going to support these cases in this patchset. We will add the documentation when we implement these features. +/* + * 15.2 + * PG_DEFINED_LABEL_SETS view + */ + +-- TODO + + +/* + * 15.3 + * PG_DEFINED_LABEL_SET_LABELS view + */ + +-- TODO + + +/* + * 15.4 + * PG_EDGE_DEFINED_LABEL_SETS view + */ + +-- TODO + +/* + * 15.6 + * PG_EDGE_TRIPLETS view + */ + +-- TODO + +/* + * 15.15 + * PG_VERTEX_DEFINED_LABEL_SETS view + */ + +-- TODO All these views are related to the defined label set. IIUC, defined label set is a set of labels which is shared by multiple edges or vertices of a property graph. Please note it's the whole set shared; not just individual labels in the set. In that sense, I think, the labels associated with each edge or vertex table in a property graph form a defined label set. That way the property graph element's OID becomes the defined label set's identifier. I am not sure if this view has any practical use for tabular property graphs. It's not clear to me what these views will contain. Since they are not essential for SQL/PGQ functionality, I have not implemented them. + /* + * Now check number and names. + * + * XXX We could provide more detail in the error messages, but that would + * probably only be useful for some ALTER commands, because otherwise it's + * not really clear which label definition is the wrong one, and so you'd + * have to construct a rather verbose report to be of any use. Let's keep + * it simple for now. + */ Agree with XXX. I think the code is good enough as is. We can leave XXX there in case we expect someone to improve it in future. Otherwise we should remove XXX as well. + + rel = table_open(PropgraphLabelPropertyRelationId, RowShareLock); + ScanKeyInit(&key[0], + Anum_pg_propgraph_label_property_plppropid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(propoid)); + scan = systable_beginscan(rel, InvalidOid /* FIXME */ , + true, NULL, 1, key); I think the FIXME is to add the correct index OID. pg_propgraph_label_property_label_prop_index contains plppropid as a key but it's not the first column. So probably some other, yet not defined, index is expected here? +ALTER PROPERTY GRAPH g3 ALTER VERTEX TABLE t3 DROP LABEL t3l3; +ALTER PROPERTY GRAPH g3 DROP VERTEX TABLES (t2); -- fail (TODO: dubious error message) +ERROR: cannot drop vertex t2 of property graph g3 because other objects depend on it +DETAIL: edge e1 of property graph g3 depends on vertex t2 of property graph g3 +HINT: Use DROP ... CASCADE to drop the dependent objects too. I don't see what's dubious about this error message. edge e1 connects t2 and t1. Dropping t2 would leave e1 dangling; error is preventing it. Looks similar to how we prevent a referenced table being dropped. Or is it expected that e1 be dropped without specifying cascade when t2 is dropped since e1 has no existence without t2? Also the patchset fixes the CI failure from the previous patchset. The patches are reordered slightly. Each patch contains a commit message that explains the changes in that patch. [1] https://www.postgresql.org/message-id/CAExHW5sM+ZGVzR1428FsDHuWc-FU2-6zQr5j91KLh6vZaWY0ow@mail.gmail.com -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-hackers by date: