Home > mailing lists
Restoring large tables with COPY - Mailing list pgsql-hackers

From	Marko Kreen
Subject	Restoring large tables with COPY
Date	December 11, 2001 10:29:20
Msg-id	20011211151005.GA28141@l-t.ee Whole thread Raw
Responses	Re: Restoring large tables with COPY
List	pgsql-hackers
Tree view
Maybe I am missing something obvious, but I am unable to load
larger tables (~300k rows) with COPY command that pg_dump by
default produces.  Yes, dump as INSERTs works but is slow.

"Cant" as in "it does not work with the default setup I have
running on devel machine" - 128M mem, 128M swap, basically
default postgresql.conf:

1) Too few WAL files.  - well, increase the wal_files (eg to 32),

2) Machine runs out of swap, PostgreSQL seems to keep whole TX  in memory.  - So I must put 1G of swap?   But what if I
have1G of rows?
 

Or shortly: during pg_restore the resource requirements are
order of magnitude higher than during pg_dump, which is
non-obvious and may be a bad surprise when in real trouble.

This is annoying, especially as dump as COPY's should be
preferred as it is faster and smaller.  Ofcourse the
dump-as-INSERTs has also positive side - eg. ALTER TABLE DROP
COLUMN with sed...

Patch below implements '-m NUM' switch to pg_dump, which splits
each COPY command to chunks, each maximum NUM rows.

Comments?  What am I missing?

-- 
marko



Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /opt/cvs/pgsql/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.41
diff -u -c -r1.41 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml    8 Dec 2001 03:24:37 -0000    1.41
--- doc/src/sgml/ref/pg_dump.sgml    11 Dec 2001 03:58:30 -0000
***************
*** 35,40 ****
--- 35,41 ----    <arg>-f <replaceable>file</replaceable></arg>     <arg>-F <replaceable>format</replaceable></arg>
<arg>-i</arg>
+    <arg>-m <replaceable>num_rows</replaceable></arg>    <group> <arg>-n</arg> <arg>-N</arg> </group>    <arg>-o</arg>
  <arg>-O</arg>
 
***************
*** 301,306 ****
--- 302,321 ----     if you need to override the version check (and if     <command>pg_dump</command> then fails, don't
   say you weren't warned).
 
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term>-m <replaceable class="parameter">num_rows</replaceable></term>
+       <term>--maxrows=<replaceable class="parameter">num_rows</replaceable></term>
+       <listitem>
+        <para>
+     Set maximum number of rows to put into one COPY statement.
+     This starts new COPY command after every
+     <replaceable class="parameter">num_rows</replaceable>.
+     This is useful on large tables to avoid restoring whole table in 
+     one transaction which may consume lot of resources.        </para>       </listitem>      </varlistentry>
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /opt/cvs/pgsql/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.236
diff -u -c -r1.236 pg_dump.c
*** src/bin/pg_dump/pg_dump.c    28 Oct 2001 06:25:58 -0000    1.236
--- src/bin/pg_dump/pg_dump.c    11 Dec 2001 04:48:42 -0000
***************
*** 116,121 ****
--- 116,123 ---- bool        dataOnly; bool        aclsSkip; 
+ int            g_max_copy_rows = 0;
+  char        g_opaque_type[10];    /* name for the opaque type */  /* placeholders for the delimiters for comments
*/
***************
*** 151,156 ****
--- 153,159 ----                  "  -h, --host=HOSTNAME      database server host name\n"                  "  -i,
--ignore-version    proceed even when server version mismatches\n"                  "                           pg_dump
version\n"
+                  "  m, --maxrows=NUM        max rows in one COPY command\n"     "  -n, --no-quotes          suppress
mostquotes around identifiers\n"      "  -N, --quotes             enable most quotes around identifiers\n"
   "  -o, --oids               include oids in dump\n"
 
***************
*** 187,192 ****
--- 190,196 ----                  "                           pg_dump version\n"     "  -n
suppressmost quotes around identifiers\n"      "  -N                       enable most quotes around identifiers\n"
 
+                  "  m NUM                   max rows in one COPY command\n"                  "  -o
   include oids in dump\n"                  "  -O                       do not output \\connect commands in plain\n"
             "                           text format\n"
 
***************
*** 244,249 ****
--- 248,255 ----     int            ret;     bool        copydone;     char        copybuf[COPYBUFSIZ];
+     int            cur_row;
+     int            linestart;      if (g_verbose)         write_msg(NULL, "dumping out the contents of table %s\n",
classname);
***************
*** 297,302 ****
--- 303,310 ----         else         {             copydone = false;
+             linestart = 1;
+             cur_row = 0;              while (!copydone)             {
***************
*** 310,316 ****
--- 318,338 ----                 }                 else                 {
+                     /*
+                      * Avoid too large transactions by breaking them up.
+                      */
+                     if (g_max_copy_rows > 0 && linestart
+                             && cur_row >= g_max_copy_rows)
+                     {
+                         cur_row = 0;
+                         archputs("\\.\n", fout);
+                         archprintf(fout, "COPY %s %sFROM stdin;\n",
+                                 fmtId(classname, force_quotes),
+                                 (oids && hasoids) ? "WITH OIDS " : "");
+                     }
+                      archputs(copybuf, fout);
+                                          switch (ret)                     {                         case EOF:
***************
*** 318,325 ****
--- 340,350 ----                             /* FALLTHROUGH */                         case 0:
  archputc('\n', fout);
 
+                             cur_row++;
+                             linestart = 1;                             break;                         case 1:
+                             linestart = 0;                             break;                     }
}
***************
*** 696,701 ****
--- 721,727 ----         {"compress", required_argument, NULL, 'Z'},         {"help", no_argument, NULL, '?'},
{"version",no_argument, NULL, 'V'},
 
+         {"maxrows", required_argument, NULL, 'm'},          /*          * the following options don't have an
equivalentshort option
 
***************
*** 748,756 ****     }  #ifdef HAVE_GETOPT_LONG
!     while ((c = getopt_long(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?", long_options, &optindex)) != -1)
#else
!     while ((c = getopt(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?-")) != -1) #endif      {
--- 774,782 ----     }  #ifdef HAVE_GETOPT_LONG
!     while ((c = getopt_long(argc, argv, "abcCdDf:F:h:im:nNoOp:RsS:t:uU:vWxX:zZ:V?", long_options, &optindex)) != -1)
#else
!     while ((c = getopt(argc, argv, "abcCdDf:F:h:im:nNoOp:RsS:t:uU:vWxX:zZ:V?-")) != -1) #endif      {
***************
*** 798,803 ****
--- 824,833 ----              case 'i':            /* ignore database version mismatch */
ignore_version= true;
 
+                 break;
+ 
+             case 'm':
+                 g_max_copy_rows = atoi(optarg);                 break;              case 'n':            /* Do not
forcedouble-quotes on
pgsql-hackers by date:
From: Thomas Lockhart
Date: 11 December 2001, 10:07:59
Subject: Re: Explicit configuration file
From: Jan Wieck
Date: 11 December 2001, 10:59:13
Subject: Re: pg_dump: Sorted output, referential integrity
Restoring large tables with COPY - Mailing list pgsql-hackers

Previous

Next