Restoring large tables with COPY - Mailing list pgsql-hackers
From | Marko Kreen |
---|---|
Subject | Restoring large tables with COPY |
Date | |
Msg-id | 20011211151005.GA28141@l-t.ee Whole thread Raw |
Responses |
Re: Restoring large tables with COPY
|
List | pgsql-hackers |
Maybe I am missing something obvious, but I am unable to load larger tables (~300k rows) with COPY command that pg_dump by default produces. Yes, dump as INSERTs works but is slow. "Cant" as in "it does not work with the default setup I have running on devel machine" - 128M mem, 128M swap, basically default postgresql.conf: 1) Too few WAL files. - well, increase the wal_files (eg to 32), 2) Machine runs out of swap, PostgreSQL seems to keep whole TX in memory. - So I must put 1G of swap? But what if I have1G of rows? Or shortly: during pg_restore the resource requirements are order of magnitude higher than during pg_dump, which is non-obvious and may be a bad surprise when in real trouble. This is annoying, especially as dump as COPY's should be preferred as it is faster and smaller. Ofcourse the dump-as-INSERTs has also positive side - eg. ALTER TABLE DROP COLUMN with sed... Patch below implements '-m NUM' switch to pg_dump, which splits each COPY command to chunks, each maximum NUM rows. Comments? What am I missing? -- marko Index: doc/src/sgml/ref/pg_dump.sgml =================================================================== RCS file: /opt/cvs/pgsql/pgsql/doc/src/sgml/ref/pg_dump.sgml,v retrieving revision 1.41 diff -u -c -r1.41 pg_dump.sgml *** doc/src/sgml/ref/pg_dump.sgml 8 Dec 2001 03:24:37 -0000 1.41 --- doc/src/sgml/ref/pg_dump.sgml 11 Dec 2001 03:58:30 -0000 *************** *** 35,40 **** --- 35,41 ---- <arg>-f <replaceable>file</replaceable></arg> <arg>-F <replaceable>format</replaceable></arg> <arg>-i</arg> + <arg>-m <replaceable>num_rows</replaceable></arg> <group> <arg>-n</arg> <arg>-N</arg> </group> <arg>-o</arg> <arg>-O</arg> *************** *** 301,306 **** --- 302,321 ---- if you need to override the version check (and if <command>pg_dump</command> then fails, don't say you weren't warned). + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>-m <replaceable class="parameter">num_rows</replaceable></term> + <term>--maxrows=<replaceable class="parameter">num_rows</replaceable></term> + <listitem> + <para> + Set maximum number of rows to put into one COPY statement. + This starts new COPY command after every + <replaceable class="parameter">num_rows</replaceable>. + This is useful on large tables to avoid restoring whole table in + one transaction which may consume lot of resources. </para> </listitem> </varlistentry> Index: src/bin/pg_dump/pg_dump.c =================================================================== RCS file: /opt/cvs/pgsql/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.236 diff -u -c -r1.236 pg_dump.c *** src/bin/pg_dump/pg_dump.c 28 Oct 2001 06:25:58 -0000 1.236 --- src/bin/pg_dump/pg_dump.c 11 Dec 2001 04:48:42 -0000 *************** *** 116,121 **** --- 116,123 ---- bool dataOnly; bool aclsSkip; + int g_max_copy_rows = 0; + char g_opaque_type[10]; /* name for the opaque type */ /* placeholders for the delimiters for comments */ *************** *** 151,156 **** --- 153,159 ---- " -h, --host=HOSTNAME database server host name\n" " -i, --ignore-version proceed even when server version mismatches\n" " pg_dump version\n" + " m, --maxrows=NUM max rows in one COPY command\n" " -n, --no-quotes suppress mostquotes around identifiers\n" " -N, --quotes enable most quotes around identifiers\n" " -o, --oids include oids in dump\n" *************** *** 187,192 **** --- 190,196 ---- " pg_dump version\n" " -n suppressmost quotes around identifiers\n" " -N enable most quotes around identifiers\n" + " m NUM max rows in one COPY command\n" " -o include oids in dump\n" " -O do not output \\connect commands in plain\n" " text format\n" *************** *** 244,249 **** --- 248,255 ---- int ret; bool copydone; char copybuf[COPYBUFSIZ]; + int cur_row; + int linestart; if (g_verbose) write_msg(NULL, "dumping out the contents of table %s\n", classname); *************** *** 297,302 **** --- 303,310 ---- else { copydone = false; + linestart = 1; + cur_row = 0; while (!copydone) { *************** *** 310,316 **** --- 318,338 ---- } else { + /* + * Avoid too large transactions by breaking them up. + */ + if (g_max_copy_rows > 0 && linestart + && cur_row >= g_max_copy_rows) + { + cur_row = 0; + archputs("\\.\n", fout); + archprintf(fout, "COPY %s %sFROM stdin;\n", + fmtId(classname, force_quotes), + (oids && hasoids) ? "WITH OIDS " : ""); + } + archputs(copybuf, fout); + switch (ret) { case EOF: *************** *** 318,325 **** --- 340,350 ---- /* FALLTHROUGH */ case 0: archputc('\n', fout); + cur_row++; + linestart = 1; break; case 1: + linestart = 0; break; } } *************** *** 696,701 **** --- 721,727 ---- {"compress", required_argument, NULL, 'Z'}, {"help", no_argument, NULL, '?'}, {"version",no_argument, NULL, 'V'}, + {"maxrows", required_argument, NULL, 'm'}, /* * the following options don't have an equivalentshort option *************** *** 748,756 **** } #ifdef HAVE_GETOPT_LONG ! while ((c = getopt_long(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?", long_options, &optindex)) != -1) #else ! while ((c = getopt(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?-")) != -1) #endif { --- 774,782 ---- } #ifdef HAVE_GETOPT_LONG ! while ((c = getopt_long(argc, argv, "abcCdDf:F:h:im:nNoOp:RsS:t:uU:vWxX:zZ:V?", long_options, &optindex)) != -1) #else ! while ((c = getopt(argc, argv, "abcCdDf:F:h:im:nNoOp:RsS:t:uU:vWxX:zZ:V?-")) != -1) #endif { *************** *** 798,803 **** --- 824,833 ---- case 'i': /* ignore database version mismatch */ ignore_version= true; + break; + + case 'm': + g_max_copy_rows = atoi(optarg); break; case 'n': /* Do not forcedouble-quotes on
pgsql-hackers by date: