Restoring large tables with COPY - Mailing list pgsql-hackers
| From | Marko Kreen |
|---|---|
| Subject | Restoring large tables with COPY |
| Date | |
| Msg-id | 20011211151005.GA28141@l-t.ee Whole thread Raw |
| Responses |
Re: Restoring large tables with COPY
|
| List | pgsql-hackers |
Maybe I am missing something obvious, but I am unable to load
larger tables (~300k rows) with COPY command that pg_dump by
default produces. Yes, dump as INSERTs works but is slow.
"Cant" as in "it does not work with the default setup I have
running on devel machine" - 128M mem, 128M swap, basically
default postgresql.conf:
1) Too few WAL files. - well, increase the wal_files (eg to 32),
2) Machine runs out of swap, PostgreSQL seems to keep whole TX in memory. - So I must put 1G of swap? But what if I
have1G of rows?
Or shortly: during pg_restore the resource requirements are
order of magnitude higher than during pg_dump, which is
non-obvious and may be a bad surprise when in real trouble.
This is annoying, especially as dump as COPY's should be
preferred as it is faster and smaller. Ofcourse the
dump-as-INSERTs has also positive side - eg. ALTER TABLE DROP
COLUMN with sed...
Patch below implements '-m NUM' switch to pg_dump, which splits
each COPY command to chunks, each maximum NUM rows.
Comments? What am I missing?
--
marko
Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /opt/cvs/pgsql/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.41
diff -u -c -r1.41 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml 8 Dec 2001 03:24:37 -0000 1.41
--- doc/src/sgml/ref/pg_dump.sgml 11 Dec 2001 03:58:30 -0000
***************
*** 35,40 ****
--- 35,41 ---- <arg>-f <replaceable>file</replaceable></arg> <arg>-F <replaceable>format</replaceable></arg>
<arg>-i</arg>
+ <arg>-m <replaceable>num_rows</replaceable></arg> <group> <arg>-n</arg> <arg>-N</arg> </group> <arg>-o</arg>
<arg>-O</arg>
***************
*** 301,306 ****
--- 302,321 ---- if you need to override the version check (and if <command>pg_dump</command> then fails, don't
say you weren't warned).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>-m <replaceable class="parameter">num_rows</replaceable></term>
+ <term>--maxrows=<replaceable class="parameter">num_rows</replaceable></term>
+ <listitem>
+ <para>
+ Set maximum number of rows to put into one COPY statement.
+ This starts new COPY command after every
+ <replaceable class="parameter">num_rows</replaceable>.
+ This is useful on large tables to avoid restoring whole table in
+ one transaction which may consume lot of resources. </para> </listitem> </varlistentry>
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /opt/cvs/pgsql/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.236
diff -u -c -r1.236 pg_dump.c
*** src/bin/pg_dump/pg_dump.c 28 Oct 2001 06:25:58 -0000 1.236
--- src/bin/pg_dump/pg_dump.c 11 Dec 2001 04:48:42 -0000
***************
*** 116,121 ****
--- 116,123 ---- bool dataOnly; bool aclsSkip;
+ int g_max_copy_rows = 0;
+ char g_opaque_type[10]; /* name for the opaque type */ /* placeholders for the delimiters for comments
*/
***************
*** 151,156 ****
--- 153,159 ---- " -h, --host=HOSTNAME database server host name\n" " -i,
--ignore-version proceed even when server version mismatches\n" " pg_dump
version\n"
+ " m, --maxrows=NUM max rows in one COPY command\n" " -n, --no-quotes suppress
mostquotes around identifiers\n" " -N, --quotes enable most quotes around identifiers\n"
" -o, --oids include oids in dump\n"
***************
*** 187,192 ****
--- 190,196 ---- " pg_dump version\n" " -n
suppressmost quotes around identifiers\n" " -N enable most quotes around identifiers\n"
+ " m NUM max rows in one COPY command\n" " -o
include oids in dump\n" " -O do not output \\connect commands in plain\n"
" text format\n"
***************
*** 244,249 ****
--- 248,255 ---- int ret; bool copydone; char copybuf[COPYBUFSIZ];
+ int cur_row;
+ int linestart; if (g_verbose) write_msg(NULL, "dumping out the contents of table %s\n",
classname);
***************
*** 297,302 ****
--- 303,310 ---- else { copydone = false;
+ linestart = 1;
+ cur_row = 0; while (!copydone) {
***************
*** 310,316 ****
--- 318,338 ---- } else {
+ /*
+ * Avoid too large transactions by breaking them up.
+ */
+ if (g_max_copy_rows > 0 && linestart
+ && cur_row >= g_max_copy_rows)
+ {
+ cur_row = 0;
+ archputs("\\.\n", fout);
+ archprintf(fout, "COPY %s %sFROM stdin;\n",
+ fmtId(classname, force_quotes),
+ (oids && hasoids) ? "WITH OIDS " : "");
+ }
+ archputs(copybuf, fout);
+ switch (ret) { case EOF:
***************
*** 318,325 ****
--- 340,350 ---- /* FALLTHROUGH */ case 0:
archputc('\n', fout);
+ cur_row++;
+ linestart = 1; break; case 1:
+ linestart = 0; break; }
}
***************
*** 696,701 ****
--- 721,727 ---- {"compress", required_argument, NULL, 'Z'}, {"help", no_argument, NULL, '?'},
{"version",no_argument, NULL, 'V'},
+ {"maxrows", required_argument, NULL, 'm'}, /* * the following options don't have an
equivalentshort option
***************
*** 748,756 **** } #ifdef HAVE_GETOPT_LONG
! while ((c = getopt_long(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?", long_options, &optindex)) != -1)
#else
! while ((c = getopt(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?-")) != -1) #endif {
--- 774,782 ---- } #ifdef HAVE_GETOPT_LONG
! while ((c = getopt_long(argc, argv, "abcCdDf:F:h:im:nNoOp:RsS:t:uU:vWxX:zZ:V?", long_options, &optindex)) != -1)
#else
! while ((c = getopt(argc, argv, "abcCdDf:F:h:im:nNoOp:RsS:t:uU:vWxX:zZ:V?-")) != -1) #endif {
***************
*** 798,803 ****
--- 824,833 ---- case 'i': /* ignore database version mismatch */
ignore_version= true;
+ break;
+
+ case 'm':
+ g_max_copy_rows = atoi(optarg); break; case 'n': /* Do not
forcedouble-quotes on
pgsql-hackers by date: