Re: refactoring basebackup.c - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: refactoring basebackup.c |
Date | |
Msg-id | CA+TgmoYT4FmncwD2t8p_=5E9hTqvANrek2avB-1tip9yJQ9c_Q@mail.gmail.com Whole thread Raw |
In response to | Re: refactoring basebackup.c (Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>) |
Responses |
Re: refactoring basebackup.c
|
List | pgsql-hackers |
On Tue, Oct 5, 2021 at 5:51 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > I have fixed the autoFlush issue. Basically, I was wrongly initializing > the lz4 preferences in bbsink_lz4_begin_archive() instead of > bbsink_lz4_begin_backup(). I have fixed the issue in the attached > patch, please have a look at it. Thanks for the new patch. Seems like this is getting closer, but: +/* + * Read the input buffer in CHUNK_SIZE length in each iteration and pass it to + * the lz4 compression. Defined as 8k, since the input buffer is multiple of + * BLCKSZ i.e. multiple of 8k. + */ +#define CHUNK_SIZE 8192 BLCKSZ does not have to be 8kB. + size_t compressedSize; + int nextChunkLen = CHUNK_SIZE; + + /* Last chunk to be read from the input. */ + if (avail_in < CHUNK_SIZE) + nextChunkLen = avail_in; This is the only place where CHUNK_SIZE gets used, and I don't think I see any point to it. I think the 5th argument to LZ4F_compressUpdate could just be avail_in. And as soon as you do that then I think bbsink_lz4_archive_contents() no longer needs to be a loop. For gzip, the output buffer isn't guaranteed to be big enough to write all the data, so the compression step can fail to compress all the data. But LZ4 forces us to make the output buffer big enough that no such failure can happen. Therefore, that can't happen here except if you artificially limit the amount of data that you pass to LZ4F_compressUpdate() to something less than the size of the input buffer. And I don't see any reason to do that. + /* First of all write the frame header to destination buffer. */ + headerSize = LZ4F_compressBegin(mysink->ctx, + mysink->base.bbs_next->bbs_buffer, + mysink->base.bbs_next->bbs_buffer_length, + &mysink->prefs); + compressedSize = LZ4F_compressEnd(mysink->ctx, + mysink->base.bbs_next->bbs_buffer + mysink->bytes_written, + mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written, + NULL); I think there's some issue with these two chunks of code. What happens if one of these functions wants to write more data than will fit in the output buffer? It seems like either there needs to be some code someplace that ensures adequate space in the output buffer at the time of these calls, or else there needs to be a retry loop that writes as much of the data as possible, flushes the output buffer, and then loops to generate more output data. But there's clearly no retry loop here, and I don't see any code that guarantees that the output buffer has to be large enough (and in the case of LZ4F_compressEnd, have enough remaining space) either. In other words, all the same concerns that apply to LZ4F_compressUpdate() also apply here ... but in LZ4F_compressUpdate() you seem to BOTH have a retry loop and ALSO code to make sure that the buffer is certain to be large enough (which is more than you need, you only need one of those) and here you seem to have NEITHER of those things (which is not enough, you need one or the other). + /* Initialize compressor object. */ + prefs->frameInfo.blockSizeID = LZ4F_max256KB; + prefs->frameInfo.blockMode = LZ4F_blockLinked; + prefs->frameInfo.contentChecksumFlag = LZ4F_noContentChecksum; + prefs->frameInfo.frameType = LZ4F_frame; + prefs->frameInfo.contentSize = 0; + prefs->frameInfo.dictID = 0; + prefs->frameInfo.blockChecksumFlag = LZ4F_noBlockChecksum; + prefs->compressionLevel = 0; + prefs->autoFlush = 0; + prefs->favorDecSpeed = 0; + prefs->reserved[0] = 0; + prefs->reserved[1] = 0; + prefs->reserved[2] = 0; How about instead using memset() to zero the whole thing and then omitting the zero initializations? That seems like it would be less fragile, if the upstream structure definition ever changes. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: