Re: Maintaining cluster order on insert - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Re: Maintaining cluster order on insert |
Date | |
Msg-id | 457.1184023772@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Maintaining cluster order on insert (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: Maintaining cluster order on insert
Re: Maintaining cluster order on insert Re: Maintaining cluster order on insert |
List | pgsql-patches |
[ back to the cluster-order patch ] Awhile back, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > The performance characteristics of this patch hasn't been thoroughly > discussed yet. The reason why you want to cluster your tables is to > speed up SELECTs that return a bunch of tuples with similar values, for > example range queries. The reason for keeping them clustered on inserts > is to reduce the need to run CLUSTER as often. > It doesn't come without a cost, however. In the worst case, there never > is room for new inserts on pages, and each insert needs to do one extra > I/O to fetch the optimal heap page where the insert should go, see that > there's no room, and then insert somewhere else. Using a non-zero > fillfactor helps, but even when there is room on the page, it's often > cheaper to just append to the end of the table and running CLUSTER at > night for example, than do random access to insert to the "right" pages > in the heap. I looked through the thread and could not find any clear evidence that anyone had done any performance testing of this patch at all, so I hacked together a crude test that alternates between inserting/deleting a random subset of rows and SELECTing a range of rows. The results do not look very good: while there is detectable improvement in the SELECT performance, it's not much, and it comes at a *very* sizable penalty in INSERT performance. Attached is a test program, which I ran against today's CVS HEAD with and without the v8 version of the patch. The test program sets up a clustered million-row table with a row width chosen to fit about 20 rows per page. It then iterates a loop of: * delete a random 2% of the table * vacuum to recover space * insert a random 2% of the table * select (about) 1000 consecutively-numbered rows * select all the rows (this is just a cross check that the number of rows isn't changing too much) What you would hope to see as the benefit of the patch is that the time for the range SELECT degrades more slowly as more of the table is replaced. Ignoring the first SELECT as being a startup transient, it looks like HEAD degrades from about 3 msec to 6 msec over 10 iterations (20% replacement of the table), whereas with the patch it's about 3 msec to about 4 and a half. However, the INSERT steps went from around 20 sec each to about twice that. I used just the following nondefault parameter settings: shared_buffers = 10000 # min 128kB or max_connections*16kB maintenance_work_mem = 160MB # min 1MB max_fsm_pages = 2048000 # min max_fsm_relations*16, 6 bytes each wal_buffers = 640kB # min 32kB checkpoint_segments = 30 # in logfile segments, min 1, 16MB each enable_bitmapscan = off enable_seqscan = off autovacuum = off # enable autovacuum subprocess? which for the most part are just 10x the factory defaults. The hardware is just a Dell x86_64 workstation with crappy IDE disk, so maybe things would look better elsewhere, but it's all I have to work with. Considering that the patch is really severely ugly from a modularity and layering standpoint, I'm now inclined to reject it. AFAICT this test case is showing the patch at its best possible advantage; under real-world conditions the benefit would be less. It doesn't look to me like the gain is worth the loss of system understandability and maintainability that the patch would impose. regards, tom lane /* * derived from testlibpq3.c */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/time.h> #include "libpq-fe.h" #define TABLESIZE 1000000 /* rows in table */ #define ITERS 10 static void exit_nicely(PGconn *conn) { PQfinish(conn); exit(1); } static char * elapsed_time(struct timeval start_t, struct timeval elapse_t) { static char buf[64]; if (elapse_t.tv_usec < start_t.tv_usec) { elapse_t.tv_sec--; elapse_t.tv_usec += 1000000; } sprintf(buf, "%3ld.%06ld", (long) (elapse_t.tv_sec - start_t.tv_sec), (long) (elapse_t.tv_usec - start_t.tv_usec)); return buf; } static void do_cmd(PGconn *conn, const char *cmd) { PGresult *res; struct timeval start_t; struct timeval elapse_t; gettimeofday(&start_t, NULL); res = PQexec(conn, cmd); gettimeofday(&elapse_t, NULL); if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "%s\nCommand failed: %s", cmd, PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } printf("executed %-.6s in %s sec\n", cmd, elapsed_time(start_t, elapse_t)); fflush(stdout); PQclear(res); } static void do_select(PGconn *conn, const char *cmd) { PGresult *res; struct timeval start_t; struct timeval elapse_t; gettimeofday(&start_t, NULL); res = PQexec(conn, cmd); gettimeofday(&elapse_t, NULL); if (PQresultStatus(res) != PGRES_TUPLES_OK) { fprintf(stderr, "%s\nCommand failed: %s", cmd, PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } printf("retrieved %4s tuples in %s sec\n", PQgetvalue(res, 0, 0), elapsed_time(start_t, elapse_t)); fflush(stdout); PQclear(res); } int main(int argc, char **argv) { const char *conninfo; PGconn *conn; PGresult *res; char cmd[1024]; int i; /* * If the user supplies a parameter on the command line, use it as the * conninfo string; otherwise default to setting dbname=postgres and using * environment variables or defaults for all other connection parameters. */ if (argc > 1) conninfo = argv[1]; else conninfo = "dbname = postgres"; /* Make a connection to the database */ conn = PQconnectdb(conninfo); /* Check to see that the backend connection was successfully made */ if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Connection to database failed: %s", PQerrorMessage(conn)); exit_nicely(conn); } /* Ignore any error while dropping old table */ res = PQexec(conn, "DROP TABLE testtab"); PQclear(res); /* Set up test table */ do_cmd(conn, "CREATE TABLE testtab(k int, d text)"); sprintf(cmd, "INSERT INTO testtab " "SELECT x, repeat('x',350) FROM generate_series(1,%d) x", TABLESIZE); do_cmd(conn, cmd); do_cmd(conn, "CREATE INDEX testtabi ON testtab(k)"); do_cmd(conn, "CLUSTER testtab USING testtabi"); do_cmd(conn, "VACUUM ANALYZE testtab"); for (i = 0; i < ITERS; i++) { int st; sprintf(cmd, "DELETE FROM testtab WHERE random() < 1.0/50"); do_cmd(conn, cmd); do_cmd(conn, "VACUUM testtab"); sprintf(cmd, "INSERT INTO testtab " "SELECT (random() * %d)::int, repeat('x',350) " "FROM generate_series(1,%d)", TABLESIZE, TABLESIZE/50); do_cmd(conn, cmd); st = ((double) random() * (TABLESIZE - 1000)) / 0x7FFFFFFF; sprintf(cmd, "SELECT count(*) FROM testtab " "WHERE k BETWEEN %d AND %d", st, st + 1000); do_select(conn, cmd); do_select(conn, "SELECT count(*) FROM testtab"); } /* close the connection to the database and cleanup */ PQfinish(conn); return 0; } executed CREATE in 0.066563 sec executed INSERT in 40.465653 sec executed CREATE in 9.152698 sec executed CLUSTE in 20.036375 sec executed VACUUM in 1.440232 sec executed DELETE in 14.770937 sec executed VACUUM in 10.663301 sec executed INSERT in 2.449248 sec retrieved 995 tuples in 0.170685 sec retrieved 999971 tuples in 0.648237 sec executed DELETE in 33.760709 sec executed VACUUM in 28.762174 sec executed INSERT in 12.212027 sec retrieved 999 tuples in 0.003139 sec retrieved 999642 tuples in 0.646765 sec executed DELETE in 30.541053 sec executed VACUUM in 13.204475 sec executed INSERT in 17.972502 sec retrieved 985 tuples in 0.003621 sec retrieved 999612 tuples in 0.623749 sec executed DELETE in 14.911813 sec executed VACUUM in 27.443921 sec executed INSERT in 19.125950 sec retrieved 1002 tuples in 0.004900 sec retrieved 999784 tuples in 0.667716 sec executed DELETE in 22.651369 sec executed VACUUM in 10.743926 sec executed INSERT in 21.631076 sec retrieved 987 tuples in 0.004566 sec retrieved 999711 tuples in 0.632185 sec executed DELETE in 11.587629 sec executed VACUUM in 15.278964 sec executed INSERT in 23.725325 sec retrieved 1011 tuples in 0.005960 sec retrieved 999624 tuples in 0.680135 sec executed DELETE in 3.905152 sec executed VACUUM in 42.848288 sec executed INSERT in 18.619609 sec retrieved 1006 tuples in 0.007106 sec retrieved 999479 tuples in 0.678316 sec executed DELETE in 27.288273 sec executed VACUUM in 9.329839 sec executed INSERT in 23.361110 sec retrieved 983 tuples in 0.005997 sec retrieved 999354 tuples in 0.615004 sec executed DELETE in 20.416503 sec executed VACUUM in 17.537463 sec executed INSERT in 19.781416 sec retrieved 1021 tuples in 0.006894 sec retrieved 999688 tuples in 0.670918 sec executed DELETE in 14.063273 sec executed VACUUM in 17.439971 sec executed INSERT in 17.381126 sec retrieved 987 tuples in 0.006664 sec retrieved 999930 tuples in 0.632494 sec executed CREATE in 0.086862 sec executed INSERT in 50.746362 sec executed CREATE in 12.115655 sec executed CLUSTE in 33.656341 sec executed VACUUM in 4.306563 sec executed DELETE in 18.062664 sec executed VACUUM in 28.487570 sec executed INSERT in 25.638022 sec retrieved 998 tuples in 1.498475 sec retrieved 1000019 tuples in 0.624082 sec executed DELETE in 30.211317 sec executed VACUUM in 24.147135 sec executed INSERT in 40.759404 sec retrieved 1006 tuples in 0.002711 sec retrieved 1000184 tuples in 0.984822 sec executed DELETE in 31.616131 sec executed VACUUM in 22.383567 sec executed INSERT in 36.174291 sec retrieved 990 tuples in 0.002840 sec retrieved 1000275 tuples in 0.723260 sec executed DELETE in 34.279871 sec executed VACUUM in 34.855060 sec executed INSERT in 36.652868 sec retrieved 990 tuples in 0.003554 sec retrieved 1000376 tuples in 0.715215 sec executed DELETE in 37.396236 sec executed VACUUM in 17.721296 sec executed INSERT in 32.413756 sec retrieved 974 tuples in 0.003498 sec retrieved 1000383 tuples in 0.723111 sec executed DELETE in 44.696337 sec executed VACUUM in 13.850628 sec executed INSERT in 47.649557 sec retrieved 1002 tuples in 0.004165 sec retrieved 1000554 tuples in 0.722704 sec executed DELETE in 17.854653 sec executed VACUUM in 52.446585 sec executed INSERT in 41.293512 sec retrieved 1038 tuples in 0.004169 sec retrieved 1000422 tuples in 0.692400 sec executed DELETE in 32.102991 sec executed VACUUM in 22.018916 sec executed INSERT in 44.296194 sec retrieved 991 tuples in 0.004636 sec retrieved 1000713 tuples in 0.739701 sec executed DELETE in 31.678825 sec executed VACUUM in 12.643271 sec executed INSERT in 53.179401 sec retrieved 1015 tuples in 0.004745 sec retrieved 1000797 tuples in 0.708603 sec executed DELETE in 35.682182 sec executed VACUUM in 42.612243 sec executed INSERT in 45.726663 sec retrieved 1017 tuples in 0.004753 sec retrieved 1000755 tuples in 0.696441 sec
pgsql-patches by date: