Re: refactoring basebackup.c (zstd workers) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: refactoring basebackup.c (zstd workers) |
Date | |
Msg-id | CA+TgmoZz-F1pA1Vv=-6FOPWuS1eFi2ifNyySV4u6pW0+FBeG-g@mail.gmail.com Whole thread Raw |
In response to | Re: refactoring basebackup.c (Dipesh Pandit <dipesh.pandit@gmail.com>) |
Responses |
Re: refactoring basebackup.c (zstd workers)
|
List | pgsql-hackers |
On Mon, Mar 21, 2022 at 9:18 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote: > > > This patch also needs to update the other user-facing docs. > > > > Which ones exactly? > > I mean pg_basebackup -Z > > -Z level > -Z [{client|server}-]method[:level] > --compress=level > --compress=[{client|server}-]method[:level] Ah, right. Thanks. Here's v3. I have updated that section of the documentation. I also went and added a bunch more test cases for validation of compression detail strings, many inspired by your examples, and fixed all the bugs that I found in the process. I think the crashes you complained about are now fixed, but please let me know if I have missed any. I also added _() calls as you suggested. I searched for the "contain a an" typo that you mentioned but was not able to find it. Can you give me a more specific pointer? I looked a little bit more at the compression method vs. compression algorithm thing. I agree that there is some inconsistency in terminology here, but I'm still not sure that we are well-served by trying to make it totally uniform, especially if we pick the word "method" as the standard rather than "algorithm". In my opinion, "method" is less specific than "algorithm". If someone asks me to choose a compression algorithm, I know that I should give an answer like "lz4" or "zstd". If they ask me to pick a compression method, I'm not quite sure whether they want that kind of answer or whether they want something more detailed, like "use lz4 with compression level 3 and a 1MB block size". After all, that is (at least according to my understanding of how English works) a perfectly valid answer to the question "what method should I use to compress this data?" -- but not to the question "what algorithm should I use to compress this data?". The latter can ONLY be properly answered by saying something like "lz4". And I think that's really the root of my hesitation to make the kinds of changes you want here. If it's just a question of specifying a compression algorithm and a level, I don't think using the name "method" for the algorithm is going to be too bad. But as we enrich the system with multiple compression algorithms each of which may have multiple and different parameters, I think the whole thing becomes murkier and the need for precision in language goes up. Now that is of course an arguable position and you're welcome to disagree with it, but I think that's part of why I'm hesitating. Another part of it, at least for me, is that complete uniformity is not always a positive. I suppose all of us have had the experience at some point of reading a manual that says something like "to activate the boil water function, press and release the 'boil water' button" and rolled our eyes at how useless it was. It's important to me that we don't fall into that trap. We clearly don't want to go ballistic and have random inconsistencies in language for no reason, but at the same time, it's not useful to tell people that METHOD should be replaced with a compression method and LEVEL with a compression level. I mean, if you end up saying something like that interspersed with non-obvious information, that is OK, and I don't want to overstate the point I'm trying to make. But it seems to me that if there's a little variation in phrasing and we end up saying that METHOD means the compression algorithm or that ALGORITHM means the compression method or whatever, that can actually make things more clear. Here again it's debatable: how much variation in phraseology is helpful, and at what point does it just start to seem inconsistent? Well, everyone may have their own opinion. I'm not trying to pretend that this patch (or the existing code base) gets this all right. But I do think that, to the extent that we have a considered position on what to do here, we can make that change later, perhaps even after getting some user feedback on what does and does not make sense to other people. And I also think that what we end up doing here may well end up being more nuanced than a blanket search-and-replace. I'm not saying we couldn't make a blanket search-and-replace. I just don't see it as necessarily creating value, or being all that closely connected to the goal of this patch, which is to quickly clean up a forward-compatibility risk before we hit feature freeze. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: