Re: Commitfest 2023-03 starting tomorrow! - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Commitfest 2023-03 starting tomorrow! |
Date | |
Msg-id | CA+hUKGK=7mTwheXRfxz=bD47+m7WUa2xWmce0EfoycsfRN98wg@mail.gmail.com Whole thread Raw |
In response to | Re: Commitfest 2023-03 starting tomorrow! (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: Commitfest 2023-03 starting tomorrow!
|
List | pgsql-hackers |
On Tue, Mar 21, 2023 at 10:59 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I gave a talk on Friday at a private EDB mini-conference about the > PostgreSQL open source process; and while preparing for that one, I > ran some 'git log' commands to obtain the number of code contributors > for each release, going back to 9.4 (when we started using the > 'Authors:' tag more prominently). What I saw is a decline in the number > of unique contributors, from its maximum at version 12, down to the > numbers we had in 9.5. We went back 4 years. That scared me a lot. Can you share the subtotals? One immediate thought about commit log-based data is that we're not using git Author, and the Author footer convention is only used by some committers. So I guess it must have been pretty laborious to read the prose-form data? We do have machine-readable Discussion footers though. By scanning those threads for SMTP From headers on messages that had patches attached, we can find the set of (distinct) addresses that contributed to each commit. (I understand that some people are co-authors and may not send an email, but if you counted those and I didn't then you counted more, not fewer, contributors I guess? On the other hand if someone posted a patch that wasn't used in the commit, or posted from two home/work/whatever accounts that's a false positive for my technique.) In a quick and dirty attempt at this made from bits of Python I already had lying around (which may of course later turn out to be flawed and need refinement), I extracted, for example: postgres=# select * from t where commit = '8d578b9b2e37a4d9d6f422ced5126acec62365a7'; commit | time | address ------------------------------------------+------------------------+---------------------------------------------- 8d578b9b2e37a4d9d6f422ced5126acec62365a7 | 2023-03-21 14:29:34+13 | Melanie Plageman <melanieplageman@gmail.com> 8d578b9b2e37a4d9d6f422ced5126acec62365a7 | 2023-03-21 14:29:34+13 | Thomas Munro <thomas.munro@gmail.com> (2 rows) You can really only go back about 5-7 years before that technique runs out of steam, as the links run out. For what they're worth, these numbers seem to suggests around ~260 distinct email addresses send patches to threads referenced by commits. Maybe we're in a 3-year long plateau, but I don't see a peak back in r12: postgres=# select date_trunc('year', time), count(distinct address) from t group by 1 order by 1; date_trunc | count ------------------------+------- 2015-01-01 00:00:00+13 | 13 2016-01-01 00:00:00+13 | 37 2017-01-01 00:00:00+13 | 144 2018-01-01 00:00:00+13 | 187 2019-01-01 00:00:00+13 | 225 2020-01-01 00:00:00+13 | 260 2021-01-01 00:00:00+13 | 256 2022-01-01 00:00:00+13 | 262 2023-01-01 00:00:00+13 | 119 (9 rows) Of course 2023 is only just getting started. Zooming in closer, the peak period for this measurement is March/April, as I guess a lot of little things make it into the final push: postgres=# select date_trunc('month', time), count(distinct address) from t where time > '2021-01-01' group by 1 order by 1; date_trunc | count ------------------------+------- 2021-01-01 00:00:00+13 | 83 2021-02-01 00:00:00+13 | 70 2021-03-01 00:00:00+13 | 100 2021-04-01 00:00:00+13 | 109 2021-05-01 00:00:00+12 | 54 2021-06-01 00:00:00+12 | 82 2021-07-01 00:00:00+12 | 86 2021-08-01 00:00:00+12 | 83 2021-09-01 00:00:00+12 | 73 2021-10-01 00:00:00+13 | 68 2021-11-01 00:00:00+13 | 66 2021-12-01 00:00:00+13 | 48 2022-01-01 00:00:00+13 | 68 2022-02-01 00:00:00+13 | 73 2022-03-01 00:00:00+13 | 110 2022-04-01 00:00:00+13 | 90 2022-05-01 00:00:00+12 | 47 2022-06-01 00:00:00+12 | 50 2022-07-01 00:00:00+12 | 72 2022-08-01 00:00:00+12 | 81 2022-09-01 00:00:00+12 | 105 2022-10-01 00:00:00+13 | 68 2022-11-01 00:00:00+13 | 74 2022-12-01 00:00:00+13 | 58 2023-01-01 00:00:00+13 | 65 2023-02-01 00:00:00+13 | 61 2023-03-01 00:00:00+13 | 64 (27 rows) Perhaps the present March is looking a little light compared to the usual 100+ number, but actually if you take just the 1st to the 21st of previous Marches, they were similar sorts of numbers. postgres=# select date_trunc('month', time), count(distinct address) from t where (time >= '2022-03-01' and time <= '2022-03-21') or (time >= '2021-03-01' and time <= '2021-03-21') or (time >= '2020-03-01' and time <= '2020-03-21') or (time >= '2019-03-01' and time <= '2019-03-21') group by 1 order by 1; date_trunc | count ------------------------+------- 2019-03-01 00:00:00+13 | 57 2020-03-01 00:00:00+13 | 57 2021-03-01 00:00:00+13 | 77 2022-03-01 00:00:00+13 | 72 (4 rows) Another thing we could count is distinct names in the Commitfest app. I count 162 names in Commitfest 42 today. Unfortunately I don't have the data to hand to look at earlier Commitfests. That'd be interesting. I've plotted that before back in 2018 for some conference talk, and it was at ~100 and climbing back then. > So I started a conversation about that and some people told me that it's > very easy to be discouraged by our process. I don't need to mention > that it's antiquated -- this in itself turns off youngsters. But in > addition to that, I think newbies might be discouraged because their > contributions seem to go nowhere even after following the process. I don't disagree with your sentiment, though. > This led me to suggesting that perhaps we need to be more lenient when > it comes to new contributors. As I said, for seasoned contributors, > it's not a problem to keep up with our requirements, however silly they > are. But people who spend their evenings a whole week or month trying > to understand how to patch for one thing that they want, to be received > by six months of silence followed by a constant influx of "please rebase > please rebase please rebase", no useful feedback, and termination with > "eh, you haven't rebased for the 1001th time, your patch has been WoA > for X days, we're setting it RwF, feel free to return next year" ... > they are most certainly off-put and will *not* try again next year. Right, that is pretty discouraging.
pgsql-hackers by date: