Home > mailing lists

Re: random observations while testing with a 1,8B row table - Mailing list pgsql-hackers

From	Stefan Kaltenbrunner
Subject	Re: random observations while testing with a 1,8B row table
Date	March 10, 2006 15:48:27
Msg-id	4411D807.1040903@kaltenbrunner.cc Whole thread Raw
In response to	Re: random observations while testing with a 1,8B row ("Luke Lonergan" <llonergan@greenplum.com>)
Responses	Re: random observations while testing with a 1,8B row table Re: random observations while testing with a 1,8B row
List	pgsql-hackers

Tree view

Luke Lonergan wrote:
> Stefan,
> 
> On 3/10/06 9:40 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote:
> 
> 
>>I will summarize some of the just in case somebody is interested:
> 
> 
> I am!

heh - not surprised :-)

> 
> 
>>-> table used has 5 integer columns non-indexed during the loads
>>-> hardware is a Dual Opteron 280 with 4 cores@2,4GHz and 16GB RAM, data
>>is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL
>>and data.
> 
> 
> How many connections out of the machine?  How many disks behind each LUN?

2 HBAs in the server, 2x2 possible paths to each LUN.
6 disks for the WAL and 12 disks for the data


> 
> So - about 20 Bytes per row (5*4) unless those are int8, but on disk it's
> 108GB/1.8B = 60 Bytes per row on disk.  I wonder what all that overhead is?
> 
> 
>>1. data loading - I'm using COPY with batches of 300M rows it takes
>>
>>*) with one copy running it takes about 20minutes/batch to load the data
>>(~250k rows/sec) and virtually no context switches.
>>
>>*) with two copys running concurrently it takes a bit less then 30
>>minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall)
>>
>>*) with three copy it takes about 40min/batch at 140k context
>>switches/sec (380k rows/sec overall)
> 
> 
> So, from 15 MB/s up to about 20 MB/s.
>  
> 
> 
>>while the amount of IO going on is quite a lot it looks like we are
>>still mostly CPU-bound for COPY.
> 
> 
> It's what we see almost always.  In this case if your I/O configuration is
> capable of performing at about 3x the 20MB/s max parsing rate, or 60MB/s,
> you will be CPU limited.

the IO-System I use should be capable of doing that if pushed hard
enough :-)

> 
> The 3x is approximate, and based on observations, the reasoning underneath
> it is that Postgres is writing the data several times, once to the WAL, then
> from the WAL to the heap files.
>  
> 
>>2. updating all of the rows in the table:
>>
>>I updated all of the rows in the table with a simple UPDATE testtable
>>set a=a+1;
>>this took about 2,5 hours (~200rows/sec)
> 
> 
> Ugh.  This is where Bizgres MPP shines, I'll try to recreate your test and
> post results.  This scales linearly in Bizgres MPP with the number of disks
> and CPUs available, but I would hope for much more than that.

interesting to know, but still I'm testing/playing with postgresql here
not bizgres MPP ...

> 
> 
>>3. vacuuming this table - it turned out that VACUUM FULL is completly
>>unusable on a table(which i actually expected before) of this size not
>>only to the locking involved but rather due to a gigantic memory
>>requirement and unbelievable slowness.
> 
> 
> Simple vacuum should be enough IMO.

sure, that was mostly meant as an experiment, if I had to do this on a
production database I would most likely use CLUSTER to get the desired
effect (which in my case was purely getting back the diskspace wasted by
dead tuples)



Stefan

pgsql-hackers by date:

From: "Marc G. Fournier"
Date: 10 March 2006, 15:16:15
Subject: Re: PostgreSQL Anniversary Summit, Call for Contributions

From: Tom Lane
Date: 10 March 2006, 15:54:20
Subject: Re: random observations while testing with a 1,8B row table

Re: random observations while testing with a 1,8B row table - Mailing list pgsql-hackers

Previous

Next