Re: [HACKERS] Proposal : For Auto-Prewarm. - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Proposal : For Auto-Prewarm. |
Date | |
Msg-id | CA+TgmoZ13ybkqM71bvi-wwpZ=3BoLaDhLOMJL=70Dy+r21TLVA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Proposal : For Auto-Prewarm. (Mithun Cy <mithun.cy@enterprisedb.com>) |
Responses |
Re: [HACKERS] Proposal : For Auto-Prewarm.
|
List | pgsql-hackers |
On Tue, May 30, 2017 at 8:15 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Tue, May 30, 2017 at 10:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> Thanks Robert, > > Sorry, there was one more mistake ( a typo) in dump_now() instead of > using pfree I used free corrected same in the new patch v10. + * contrib/autoprewarm.c Wrong. + Oid database; /* database */ + Oid spcnode; /* tablespace */ + Oid filenode; /* relation's filenode. */ + ForkNumber forknum; /* fork number */ + BlockNumber blocknum; /* block number */ Why spcnode rather than tablespace? Also, the third line has a period, but not the others. I think you could just drop these comments; they basically just duplicate the structure names, except for spcnode which doesn't but probably should. + bool can_do_prewarm; /* if set can't do prewarm task. */ The comment and the name of the variable imply opposite meanings. +/* + * detach_blkinfos - on_shm_exit detach the dsm allocated for blockinfos. + */ +static void +detach_blkinfos(int code, Datum arg) +{ + if (seg != NULL) + dsm_detach(seg); +} I assume you don't really need this. Presumably process exit is going to detach the DSM anyway. + if (seg == NULL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("unable to map dynamic shared memory segment"))); Let's use the wording from the message in parallel.c rather than this wording. Actually, we should probably (as a separate patch) change test_shm_mq's worker.c to use the parallel.c wording also. + SetCurrentStatementStartTimestamp(); Why do we need this? + StartTransactionCommand(); Do we need a transaction? If so, how about starting a new transaction for each relation instead of using a single one for the entire prewarm? + if (status == BGWH_STOPPED) + return; + + if (status == BGWH_POSTMASTER_DIED) + { + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_RESOURCES), + errmsg("cannot start bgworker autoprewarm without postmaster"), + errhint("Kill all remaining database processes and restart" + " the database."))); + } + + Assert(0); Instead of writing it this way, remove the first "if" block and change the second one to Assert(status == BGWH_STOPPED). Also, I'd ditch the curly braces in this case. + file = fopen(AUTOPREWARM_FILE, PG_BINARY_R); Use AllocateFile() instead. See the comments for that function and at the top of fd.c. Then you don't need to worry about leaking the file handle on an error, so you can remove the fclose() before ereport() stuff -- which is incomplete anyway, because you could fail e.g. inside dsm_create(). + errmsg("Error reading num of elements in \"%s\" for" + " autoprewarm : %m", AUTOPREWARM_FILE))); As I said in a previous review, please do NOT split error messages across lines like this. Also, this error message is nothing close to PostgreSQL style. Please read https://www.postgresql.org/docs/devel/static/error-style-guide.html and learn to follow all those guidelines written therein. I see at least 3 separate problems with this message. + num_elements = i; I'd do something like if (i != num_elements) elog(ERROR, "autoprewarm block dump has %d entries but expected %d", i, num_elements); It seems OK for this to be elog() rather than ereport() because the file should never be corrupt unless the user has cheated by hand-editing it. I think you can get rid of next_db_pos altogether, and this prewarm_elem thing too. Design sketch: 1. Move all the stuff that's in prewarm_elem into AutoPrewarmSharedState. Rename start_pos to prewarm_start_idx and end_of_blockinfos to prewarm_stop_idx. 2. Instead of computing all of the database ranges first and then launching workers, do it all in one loop. So figure out where the records for the current database end and set prewarm_start_idx and prewarm_end_idx appropriately. Launch a worker. When the worker terminates, set prewarm_start_idx = prewarm_end_idx and advance prewarm_end_idx to the end of the next database's records. This saves having to allocate memory for the next_db_pos array, and it also avoids this crock: + memcpy(&pelem, MyBgworkerEntry->bgw_extra, sizeof(prewarm_elem)); The reason that's bad is because it only works so long as bgw_extra is large enough to hold prewarm_elem. If prewarm_elem grows or bgw_extra shrinks, this turns into a buffer overrun. I would use AUTOPREWARM_FILE ".tmp" rather than a name incorporating the PID for the temporary file. Otherwise, you might leave many temporary files behind under different names. If you use the same name every time, you'll never have more than one, and the next successful dump will end up getting rid of it along the way. + pfree(block_info_array); + CloseTransientFile(fd); + unlink(transient_dump_file_path); + ereport(ERROR, + (errcode_for_file_access(), + errmsg("error writing to \"%s\" : %m", + AUTOPREWARM_FILE))); Again, this is NOT a standard error message text. It occurs in zero other places in the source tree. You are not the first person to need an error message for a failed write to a file; please look at what the previous authors did. Also, the pfree() before report is not needed; isn't the whole process going to terminate? Also, you can't really use errcode_for_file_access() here, because you've meanwhile done CloseTransientFile() and unlink(), which will have clobbered errno. + ereport(LOG, (errmsg("saved metadata info of %d blocks", num_blocks))); Not project style for ereport(). Line break after the first comma. Similarly elsewhere. + * dump_now - the main routine which goes through each buffer header of buffer + * pool and dumps their meta data. We Sort these data and then dump them. + * Sorting is necessary as it facilitates sequential read during load. This is no longer true, because you moved the sort to the load side. It's also not capitalized properly. Discussions of the format of the autoprewarm dump file involve inexplicably varying number of < and > symbols: + * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in + * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it + buflen = sprintf(buf, "%u,%u,%u,%u,%u\n", +#ifndef __AUTOPREWARM_H__ +#define __AUTOPREWARM_H__ We don't use double-underscored names for header files. Please consult existing header files for the appropriate style. Also, why does this file exist at all, instead of just including them in the .c file? The pointer of a header is for things that need to be included by multiple .c files, but there's no such need here. + * load. If there are no other block infos than the global objects + * we silently ignore them. Should I throw error? Silently ignoring them seems fine. Throwing an error doesn't seem like it would improve things. + /* + * Register a sub-worker to load new database's block. Wait until the + * sub-worker finish its job before launching next sub-worker. + */ + launch_prewarm_subworker(&pelem); The function name implies that it launches the worker, but the comment implies that it also waits for it to terminate. Functions should be named in a way that matches what they do. I feel like the get_autoprewarm_task() function is introducing fairly baroque logic for something that really ought to be more simple. All autoprewarm_main() really needs to do is: if (!state->autoprewarm_done) autoprewarm(); dump_block_info_periodically(); The locking in autoprewarm_dump_now() is a bit tricky. There are two trouble cases. One is that we try to rename() our new dump file on top of the existing one while a background worker is still using it to perform an autoprewarm. The other is that we try to write a new temporary dump file while some other process is trying to do the same thing. I think we should fix this by storing a PID in AutoPrewarmSharedState; a process which wants to perform a dump or an autoprewarm must change the PID from 0 to their own PID, and change it back to 0 on successful completion or error exit. If we go to perform an immediate dump process and finds a non-zero value already just does ereport(ERROR, ...), including the PID of the other process in the message (e.g. "unable to perform block dump because dump file is being used by PID %d"). In a background worker, if we go to dump and find the file in use, log a message (e.g. "skipping block dump because it is already being performed by PID %d", "skipping prewarm because block dump file is being rewritten by PID %d"). I also think we should change is_bgworker_running to a PID, so that if we try to launch a new worker we can report something like "autoprewarm worker is already running under PID %d". So putting that all together, I suppose AutoPrewarmSharedState should end up looking like this: LWLock lock; /* mutual exclusion */ pid_t bgworker_pid; /* for main bgworker */ pid_t pid_using_dumpfile; /* for autoprewarm or block dump */ /* following items are for communication with per-database worker */ dsm_handle block_info_handle; Oid database; int prewarm_start_idx; int prewarm_stop_idx; I suggest going through and changing "subworker" to "per-database worker" throughout. BTW, have you tested how long this takes to run with, say, shared_buffers = 8GB? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: