Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers
From | gkokolatos@pm.me |
---|---|
Subject | Re: Add LZ4 compression in pg_dump |
Date | |
Msg-id | XPMIIorJ5munrlLRZGquartiN5tuapdjiBhaPZ6iEmJccRiKWCIHSowSLPhaGj4gLpqvaS1pQn7PGkA_WNwvuUTVdl-lrU_TzHA81LqujJ4=@pm.me Whole thread Raw |
In response to | Re: Add LZ4 compression in pg_dump (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump Re: Add LZ4 compression in pg_dump |
List | pgsql-hackers |
------- Original Message ------- On Thursday, March 16th, 2023 at 10:20 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 3/16/23 18:04, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: > > > > > On 3/14/23 16:18, gkokolatos@pm.me wrote: > > > > > > > ...> Would you mind me trying to come with a patch to address your points? > > > > > > That'd be great, thanks. Please keep it split into smaller patches - two > > > might work, with one patch for "cosmetic" changes and the other tweaking > > > the API error-handling stuff. > > > > Please find attached a set for it. I will admit that the splitting in the > > series might not be ideal and what you requested. It is split on what > > seemed as a logical units. Please advice how a better split can look like. > > > > 0001 is unifying types and return values on the API > > 0002 is addressing the constant definitions > > 0003 is your previous 0004 adding comments > > > Thanks. I think the split seems reasonable - the goal was to not mix > different changes, and from that POV it works. > > I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I > mean, gzread/gzwrite returns int, so how does renaming the size_t > variable solve the issue of negative values for errors? I mean, this > > - size_t ret; > + size_t gzret; > > - ret = gzread(gzfp, ptr, size); > + gzret = gzread(gzfp, ptr, size); > > means we still lost the information gzread() returned a negative value, > no? We'll still probably trigger an error, but it's a bit weird. You are obviously correct. My bad, I miss-read the return type of gzread(). Please find an amended version attached. > Unless I'm missing something, if gzread() ever returns -1 or some other > negative error value, we'll cast it to size_t, while condition will > evaluate to "true" and we'll happily chew on some random chunk of data. > > So the confusion is (at least partially) a preexisting issue ... > > For gzwrite() it seems to be fine, because that only returns 0 on error. > OTOH it's defined to take 'int size' but then we happily pass size_t > values to it. > > As I wrote earlier, this apparently assumes we never need to deal with > buffers larger than int, and I don't think we have the ambition to relax > that (I'm not sure it's even needed / possible). Agreed. > I see the read/write functions are now defined as int, but we only ever > return 0/1 from them, and then interpret that as bool. Why not to define > it like that? I don't think we need to adhere to the custom that > everything returns "int". This is an internal API. Or if we want to > stick to int, I'd define meaningful "nice" constants for 0/1. The return types are now booleans and the callers have been made aware. > 0002 seems fine to me. I see you've ditched the idea of having two > separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine > with me, although I wonder if this might have negative impact on > performance or something (but I doubt that). > I doubt that too. Thank you. > 0003 seems fine too. Thank you. > > As far as the error handling is concerned, you had said upthread: > > > > > I think the right approach is to handle all library errors and not just > > > let them through. So Gzip_write() needs to check the return value, and > > > either call pg_fatal() or translate it to an error defined by the API. > > > > While working on it, I thought it would be clearer and more consistent > > for the pg_fatal() to be called by the caller of the individual functions. > > Each individual function can keep track of the specifics of the error > > internally. Then the caller upon detecting that there was an error by > > checking the return value, can call pg_fatal() with a uniform error > > message and then add the specifics by calling the get_error_func(). > > > I agree it's cleaner the way you did it. > > I was thinking that with each compression function handling error > internally, the callers would not need to do that. But I haven't > realized there's logic to detect ENOSPC and so on, and we'd need to > duplicate that in every compression func. > If you agree, I can prepare a patch to improve on the error handling aspect of the API as a separate thread, since here we are trying to focus on correctness. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: