Re: random observations while testing with a 1,8B row table - Mailing list pgsql-hackers
From | Stefan Kaltenbrunner |
---|---|
Subject | Re: random observations while testing with a 1,8B row table |
Date | |
Msg-id | 4411D807.1040903@kaltenbrunner.cc Whole thread Raw |
In response to | Re: random observations while testing with a 1,8B row ("Luke Lonergan" <llonergan@greenplum.com>) |
Responses |
Re: random observations while testing with a 1,8B row table
Re: random observations while testing with a 1,8B row |
List | pgsql-hackers |
Luke Lonergan wrote: > Stefan, > > On 3/10/06 9:40 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote: > > >>I will summarize some of the just in case somebody is interested: > > > I am! heh - not surprised :-) > > >>-> table used has 5 integer columns non-indexed during the loads >>-> hardware is a Dual Opteron 280 with 4 cores@2,4GHz and 16GB RAM, data >>is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL >>and data. > > > How many connections out of the machine? How many disks behind each LUN? 2 HBAs in the server, 2x2 possible paths to each LUN. 6 disks for the WAL and 12 disks for the data > > So - about 20 Bytes per row (5*4) unless those are int8, but on disk it's > 108GB/1.8B = 60 Bytes per row on disk. I wonder what all that overhead is? > > >>1. data loading - I'm using COPY with batches of 300M rows it takes >> >>*) with one copy running it takes about 20minutes/batch to load the data >>(~250k rows/sec) and virtually no context switches. >> >>*) with two copys running concurrently it takes a bit less then 30 >>minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall) >> >>*) with three copy it takes about 40min/batch at 140k context >>switches/sec (380k rows/sec overall) > > > So, from 15 MB/s up to about 20 MB/s. > > > >>while the amount of IO going on is quite a lot it looks like we are >>still mostly CPU-bound for COPY. > > > It's what we see almost always. In this case if your I/O configuration is > capable of performing at about 3x the 20MB/s max parsing rate, or 60MB/s, > you will be CPU limited. the IO-System I use should be capable of doing that if pushed hard enough :-) > > The 3x is approximate, and based on observations, the reasoning underneath > it is that Postgres is writing the data several times, once to the WAL, then > from the WAL to the heap files. > > >>2. updating all of the rows in the table: >> >>I updated all of the rows in the table with a simple UPDATE testtable >>set a=a+1; >>this took about 2,5 hours (~200rows/sec) > > > Ugh. This is where Bizgres MPP shines, I'll try to recreate your test and > post results. This scales linearly in Bizgres MPP with the number of disks > and CPUs available, but I would hope for much more than that. interesting to know, but still I'm testing/playing with postgresql here not bizgres MPP ... > > >>3. vacuuming this table - it turned out that VACUUM FULL is completly >>unusable on a table(which i actually expected before) of this size not >>only to the locking involved but rather due to a gigantic memory >>requirement and unbelievable slowness. > > > Simple vacuum should be enough IMO. sure, that was mostly meant as an experiment, if I had to do this on a production database I would most likely use CLUSTER to get the desired effect (which in my case was purely getting back the diskspace wasted by dead tuples) Stefan
pgsql-hackers by date: