Re: Commitfest problems - Mailing list pgsql-hackers
From | Mark Cave-Ayland |
---|---|
Subject | Re: Commitfest problems |
Date | |
Msg-id | 548DA025.3040706@ilande.co.uk Whole thread Raw |
In response to | Re: Commitfest problems (Craig Ringer <craig@2ndquadrant.com>) |
Responses |
Re: Commitfest problems
Re: Commitfest problems |
List | pgsql-hackers |
On 13/12/14 09:37, Craig Ringer wrote: >> Speaking as the originator of commitfests, they were *always* intended >> to be a temporary measure, a step on the way to something else like >> continuous integration. > > I'd really like to see the project revisit some of the underlying > assumptions that're being made in this discussion: > > - Patches must be email attachments to a mailing list > > - Changes must be committed by applying a diff > > ... and take a look at some of the options a git-based workflow might > offer, especially in combination with some of the tools out there that > help track working branches, run CI, etc. > > Having grown used to push/pull workflows with CI integration I find the > PostgreSQL patch workflow very frustrating, especially for larger > patches. It's particularly annoying to see a patch series squashed into > a monster patch whenever it changes hands or gets rebased, because it's > being handed around as a great honking diff not a git working branch. > > Is it time to stop using git like CVS? While not so active these days with PostgreSQL, I believe there is still great scope for improving the workflow of reviewing and applying patches. And while the commitfests were a good idea when they started, it's obvious that in their current form they do not scale to meet the current rate of development. If I could name just one thing that I think would improve things it would be submission of patches to the list in git format-patch format. Why? Because it enables two things: 1) by definition patches are "small-chunked" into individually reviewable pieces and 2) subject lines allow list members to filter things that aren't relevant to them. Here's an example of how the workflow works for the QEMU project which I'm involved in, which also has similar issues in terms of numbers of developers and supported platforms: 1) Developer submits a patch to the -devel list with git-format-patch - For subsystems with listed maintainers in the MAINTAINERS file, the maintainer must be CCd on the patchset (this is so that even if developers decide to filter PATCH emails to the list, they still get the CC copy) - Patchsets must be iterative and fully bisectable, with clear comments supplied in the commit message - Any patchset with > 1 patch MUST have a cover letter - All patches have a "Signed-off-by" indicating that the developer has accepted the terms of the project license - Each subject line should be in the format "[PATCH] subsystem: one line comment" to enabled people to determine its relevance 2) Other developers start reviewing patches There are several mini-workflows that tend to occur here so I'll try and give some popular examples. - If a patch is a build fix, one of the core committers will verify and apply directly to the master repository - If a maintainer is happy with the whole patchset, they reply to the cover letter with a "Reviewed-by" and a line stating which subtree the patchset has been applied to. Maintainers add a "Signed-off-by" to all patches accepted via their subtree. - A maintainer may indicate that the patch itself is fine, but the commit message is not clear/incorrect and should be changed before being resubmitted - Any member of the mailing list may reply/review an individual patch by hitting "reply" in their mail client. Patch comments are included inline. If a reviewer is happy with an individual patch, they reply to the patch and add a "Reviewed-by" line; anyone can provide a review, even if they are not a maintainer - For complex patches, a maintainer may request that the patchset may be split into further patchsets. In general patchsets of > 20-30 patches tend to be frowned upon, and will often immediately result in a reply saying "please break this down into smaller chunks" - A maintainer may reply and say that while the patchset in its final form needs more work, some clean-ups introduced by the patch are ready to be applied, e.g. please resubmit patches 1-4, 7 and 8 as a separate patchset which can then be applied directly to a subtree 3) Patchset tidy-up and submission After the initial review, new versions of the patchset are generally reposted to the list within a few days. - The patch version is included in the header with the same subject line, e.g. "[PATCHv2] Add new feature" - The cover letter contains a mini-changelog giving the changes between v1 and v2 e.g. v2: Fix spelling mistake pointed out by Peter Change lock ordering requested by Andreas - "Reviewed-by" tags for individual patches sent to the list are appended to the commit message for that patch before resubmission Once a maintainer has accepted a patch into their subtree, they send a pull request to the core maintainers to apply their trees to the main repository. I appreciate that currently merge commits aren't used in the PostgreSQL git repository so a pull request effectively becomes "rebase this patchset onto master and push". So why does this help patch review? From my experience I can definitely say the following: 1) Patch review is very cheap All I need to do is hit reply in my mail client and include an inline comment for it to get picked up by a maintainer and the author - no need to start messing around with attachments. Especially useful if I'm on a mobile connection out of the office and not sat at my main PC. 2) Developers only need to review parts of the patchset that are relevant to them As an example if someone were to post a single large patch to -hackers to change the way the stats collector worked then I'm extremely unlikely to download and start reviewing it as I don't have any particular expertise in this area. However if it's posted as part of a patchset with a subject of "[PATCH 4/12] index: change retrieval of stats for GiST indexes" then that immediately tells me that this patch is something I can review, and even better I can do this in isolation without having to verify the entire patch. 3) Patch status is easy to spot Searching on the subject line and looking for "Reviewed-by" tags means it is possible to determine the status of each patchset really easily. 4) Patch bisection becomes very easy A particular use case from QEMU: a patchset was applied changing the way memory accesses work which broke one of my test images. Using git bisect, not only could I determine it was this patchset that broke my image, it was the particular patch that changed the routine for handling unaligned memory accesses across pages. And even better, I knew exactly from the log message who to email about it because of the "Reviewed-by" tags for that one patch included in the commit message. Compare this to say, for example, huge patches such as RLS. With this being applied as a single monolithic patch then if a bug is found then locating exactly which change causes an issue can be much harder and time-consuming. 5) Individual patch comments are held in the project history Verbose commit messages for each patch of a patchset allow someone to understand the reasoning/history for each change with a simple bisection without having to start searching the archives. In general the level of documentation held within the repository for a patchset split into several patches is greater than if it were simply applied as one single commit with a single message. 6) Smaller patches are applied more often By breaking large patches down into smaller chunks, even if the main feature itself isn't ready to be applied to the main repository then a lot of the preceding patches laying down the groundwork can. The benefits here are that people with large out-of-tree patches find that they reduce in size over time, and also trees tend to be merged more often (every couple of weeks) since as large features get implemented as a series of smaller patches, maintainers are more confident about merging them. And if a groundwork patch does cause breakage, this gets picked up much earlier in the process and becomes very easy to isolate. Another benefit of merging smaller patchsets more often is that developers of larger patches end up spending less time rebasing patches which have rotted in the weeks and months between review and merge (I've seen this happen a few times with patches in the commitfest). I hope that this email gives a reasonable argument as to why I believe that patches submitted in this manner will get more review and merged more often. However it's only fair that I should also point out the negative parts I've seen from this approach: 1) Mailing list volume increases The day after freeze was over, I had nearly 300 emails appear in my inbox(!). Fortunately with the patch subject lines, it's easy to delete the one's that aren't personally relevant to me, but it does takes a few mins extra each day to filter my mails. 2) Unresponsive maintainers A couple of patches I've supplied to QEMU have been ignored for months because the maintainer for the subsystem just disappeared. Not so relevant for PostgreSQL, although there are already implicit owners for various parts of the code, e.g replication/WAL, indexes etc. But I'll leave discussion of this point to other people :) ATB, Mark.
pgsql-hackers by date: