Home > mailing lists
Re: Commitfest problems - Mailing list pgsql-hackers

From	Mark Cave-Ayland
Subject	Re: Commitfest problems
Date	December 14, 2014 14:36:04
Msg-id	548DA025.3040706@ilande.co.uk Whole thread Raw
In response to	Re: Commitfest problems (Craig Ringer <craig@2ndquadrant.com>)
Responses	Re: Commitfest problems Re: Commitfest problems
List	pgsql-hackers
Tree view
On 13/12/14 09:37, Craig Ringer wrote:

>> Speaking as the originator of commitfests, they were *always* intended
>> to be a temporary measure, a step on the way to something else like
>> continuous integration.
> 
> I'd really like to see the project revisit some of the underlying
> assumptions that're being made in this discussion:
> 
> - Patches must be email attachments to a mailing list
> 
> - Changes must be committed by applying a diff
> 
> ... and take a look at some of the options a git-based workflow might
> offer, especially in combination with some of the tools out there that
> help track working branches, run CI, etc.
> 
> Having grown used to push/pull workflows with CI integration I find the
> PostgreSQL patch workflow very frustrating, especially for larger
> patches. It's particularly annoying to see a patch series squashed into
> a monster patch whenever it changes hands or gets rebased, because it's
> being handed around as a great honking diff not a git working branch.
> 
> Is it time to stop using git like CVS?

While not so active these days with PostgreSQL, I believe there is still
great scope for improving the workflow of reviewing and applying
patches. And while the commitfests were a good idea when they started,
it's obvious that in their current form they do not scale to meet the
current rate of development.

If I could name just one thing that I think would improve things it
would be submission of patches to the list in git format-patch format.
Why? Because it enables two things: 1) by definition patches are
"small-chunked" into individually reviewable pieces and 2) subject lines
allow list members to filter things that aren't relevant to them.

Here's an example of how the workflow works for the QEMU project which
I'm involved in, which also has similar issues in terms of numbers of
developers and supported platforms:


1) Developer submits a patch to the -devel list with git-format-patch

- For subsystems with listed maintainers in the MAINTAINERS file, the
maintainer must be CCd on the patchset (this is so that even if
developers decide to filter PATCH emails to the list, they still get the
CC copy)

- Patchsets must be iterative and fully bisectable, with clear comments
supplied in the commit message

- Any patchset with > 1 patch MUST have a cover letter

- All patches have a "Signed-off-by" indicating that the developer has
accepted the terms of the project license

- Each subject line should be in the format "[PATCH] subsystem: one line
comment" to enabled people to determine its relevance


2) Other developers start reviewing patches

There are several mini-workflows that tend to occur here so I'll try and
give some popular examples.

- If a patch is a build fix, one of the core committers will verify and
apply directly to the master repository

- If a maintainer is happy with the whole patchset, they reply to the
cover letter with a "Reviewed-by" and a line stating which subtree the
patchset has been applied to. Maintainers add a "Signed-off-by" to all
patches accepted via their subtree.

- A maintainer may indicate that the patch itself is fine, but the
commit message is not clear/incorrect and should be changed before being
resubmitted

- Any member of the mailing list may reply/review an individual patch by
hitting "reply" in their mail client. Patch comments are included
inline. If a reviewer is happy with an individual patch, they reply to
the patch and add a "Reviewed-by" line; anyone can provide a review,
even if they are not a maintainer

- For complex patches, a maintainer may request that the patchset may be
split into further patchsets. In general patchsets of > 20-30 patches
tend to be frowned upon, and will often immediately result in a reply
saying "please break this down into smaller chunks"

- A maintainer may reply and say that while the patchset in its final
form needs more work, some clean-ups introduced by the patch are ready
to be applied, e.g. please resubmit patches 1-4, 7 and 8 as a separate
patchset which can then be applied directly to a subtree


3) Patchset tidy-up and submission

After the initial review, new versions of the patchset are generally
reposted to the list within a few days.

- The patch version is included in the header with the same subject
line, e.g. "[PATCHv2] Add new feature"

- The cover letter contains a mini-changelog giving the changes between
v1 and v2 e.g.

v2: Fix spelling mistake pointed out by Peter Change lock ordering requested by Andreas

- "Reviewed-by" tags for individual patches sent to the list are
appended to the commit message for that patch before resubmission


Once a maintainer has accepted a patch into their subtree, they send a
pull request to the core maintainers to apply their trees to the main
repository. I appreciate that currently merge commits aren't used in the
PostgreSQL git repository so a pull request effectively becomes "rebase
this patchset onto master and push".

So why does this help patch review? From my experience I can definitely
say the following:


1) Patch review is very cheap

All I need to do is hit reply in my mail client and include an inline
comment for it to get picked up by a maintainer and the author - no need
to start messing around with attachments. Especially useful if I'm on a
mobile connection out of the office and not sat at my main PC.


2) Developers only need to review parts of the patchset that are
relevant to them

As an example if someone were to post a single large patch to -hackers
to change the way the stats collector worked then I'm extremely unlikely
to download and start reviewing it as I don't have any particular
expertise in this area.

However if it's posted as part of a patchset with a subject of "[PATCH
4/12] index: change retrieval of stats for GiST indexes" then that
immediately tells me that this patch is something I can review, and even
better I can do this in isolation without having to verify the entire patch.


3) Patch status is easy to spot

Searching on the subject line and looking for "Reviewed-by" tags means
it is possible to determine the status of each patchset really easily.


4) Patch bisection becomes very easy

A particular use case from QEMU: a patchset was applied changing the way
memory accesses work which broke one of my test images.

Using git bisect, not only could I determine it was this patchset that
broke my image, it was the particular patch that changed the routine for
handling unaligned memory accesses across pages. And even better, I knew
exactly from the log message who to email about it because of the
"Reviewed-by" tags for that one patch included in the commit message.

Compare this to say, for example, huge patches such as RLS. With this
being applied as a single monolithic patch then if a bug is found then
locating exactly which change causes an issue can be much harder and
time-consuming.


5) Individual patch comments are held in the project history

Verbose commit messages for each patch of a patchset allow someone to
understand the reasoning/history for each change with a simple bisection
without having to start searching the archives. In general the level of
documentation held within the repository for a patchset split into
several patches is greater than if it were simply applied as one single
commit with a single message.


6) Smaller patches are applied more often

By breaking large patches down into smaller chunks, even if the main
feature itself isn't ready to be applied to the main repository then a
lot of the preceding patches laying down the groundwork can.

The benefits here are that people with large out-of-tree patches find
that they reduce in size over time, and also trees tend to be merged
more often (every couple of weeks) since as large features get
implemented as a series of smaller patches, maintainers are more
confident about merging them. And if a groundwork patch does cause
breakage, this gets picked up much earlier in the process and becomes
very easy to isolate.

Another benefit of merging smaller patchsets more often is that
developers of larger patches end up spending less time rebasing patches
which have rotted in the weeks and months between review and merge (I've
seen this happen a few times with patches in the commitfest).


I hope that this email gives a reasonable argument as to why I believe
that patches submitted in this manner will get more review and merged
more often. However it's only fair that I should also point out the
negative parts I've seen from this approach:


1) Mailing list volume increases

The day after freeze was over, I had nearly 300 emails appear in my
inbox(!). Fortunately with the patch subject lines, it's easy to delete
the one's that aren't personally relevant to me, but it does takes a few
mins extra each day to filter my mails.


2) Unresponsive maintainers

A couple of patches I've supplied to QEMU have been ignored for months
because the maintainer for the subsystem just disappeared. Not so
relevant for PostgreSQL, although there are already implicit owners for
various parts of the code, e.g replication/WAL, indexes etc. But I'll
leave discussion of this point to other people :)


ATB,

Mark.
pgsql-hackers by date:
From: David Rowley
Date: 14 December 2014, 11:58:16
Subject: Re: speedup tidbitmap patch: cache page
From: "Greg Sabino Mullane"
Date: 14 December 2014, 15:12:30
Subject: Re: Commitfest problems
Re: Commitfest problems - Mailing list pgsql-hackers

Previous

Next