Strange Create Index behaviour - Mailing list pgsql-performance
From | Gary Doades |
---|---|
Subject | Strange Create Index behaviour |
Date | |
Msg-id | 43F38867.6010701@gpdnet.co.uk Whole thread Raw |
Responses |
Re: Strange Create Index behaviour
Re: Strange Create Index behaviour Re: Strange Create Index behaviour |
List | pgsql-performance |
Platform: FreeBSD 6.0, Postgresql 8.1.2 compiled from the ports collection. Not sure if this belongs in performance or bugs.. A pg_restore of my 2.5GB database was taking up to 2 hours to complete instead of the expected 10-15 minutes. Checking the server it was mostly CPU bound. Testing further I discovered that is was spending huge amounts of CPU time creating some indexes. It took a while to find out, but basically it boils down to this: If the column that is having the index created has a certain distribution of values then create index takes a very long time. If the data values (integer in this case) a fairly evenly distributed then create index is very quick, if the data values are all the same it is very quick. I discovered that in the slow cases the column had approximately half the values as zero and the rest fairly spread out. One column started off with around 400,000 zeros and the rest of the following rows spread between values of 1 to 500,000. I have put together a test case that demonstrates the problem (see below). I create a simple table, as close in structure to one of my problem tables and populate an integer column with 100,000 zeros follow by 100,000 random integers between 0 and 100,000. Then create an index on this column. I then drop the table and repeat. The create index should take around 1-2 seconds. A fair proportion of the time it takes 50 seconds!!! If I fill the same row with all random data the create index always takes a second or two. If I fill the column with all zeros everything is still OK. When my tables that I am trying to restore are over 2 million rows the creating one index can take an hour!! (almost all CPU time). All other areas of performance, once the dump is restored and analysed seem to be OK, even large hash/merge joins and sorts This is entirely repeatable in FreeBSD in that around half the time create index will be incredibly slow. All postgresql.conf settings are at the defaults for the test initially (fresh install) The final interesting thing is that as I increase shared buffers to 2000 or 3000 the problem gets *worse* The following text is output from the test script.. select version(); version ------------------------------------------------------------------------------------------------ PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518 (1 row) \timing Timing is on. ----- Many slow cases, note the 50+ seconds cases create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 81.859 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now(); INSERT 0 100000 Time: 1482.141 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1543.508 ms create index idx on atest(r); CREATE INDEX Time: 56685.230 ms drop table atest; DROP TABLE Time: 4.616 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 6.889 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now(); INSERT 0 100000 Time: 2009.787 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1828.663 ms create index idx on atest(r); CREATE INDEX Time: 3991.257 ms drop table atest; DROP TABLE Time: 3.796 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 19.965 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now(); INSERT 0 100000 Time: 1625.059 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 2622.827 ms create index idx on atest(r); CREATE INDEX Time: 1082.799 ms drop table atest; DROP TABLE Time: 4.627 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 2.953 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now(); INSERT 0 100000 Time: 2068.744 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 2671.420 ms create index idx on atest(r); CREATE INDEX Time: 8047.660 ms drop table atest; DROP TABLE Time: 3.675 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 2.582 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now(); INSERT 0 100000 Time: 1723.987 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 2263.131 ms create index idx on atest(r); CREATE INDEX Time: 50050.308 ms drop table atest; DROP TABLE Time: 52.744 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 25.370 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now(); INSERT 0 100000 Time: 2052.733 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 2631.317 ms create index idx on atest(r); CREATE INDEX Time: 61440.897 ms drop table atest; DROP TABLE Time: 26.137 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 24.794 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now(); INSERT 0 100000 Time: 2851.977 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1553.046 ms create index idx on atest(r); CREATE INDEX Time: 1774.920 ms ---- Fast (Normal?) cases drop table atest; DROP TABLE Time: 4.422 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 2.543 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1516.246 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1407.400 ms create index idx on atest(r); CREATE INDEX Time: 903.503 ms drop table atest; DROP TABLE Time: 3.820 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 22.861 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1455.556 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 2037.996 ms create index idx on atest(r); CREATE INDEX Time: 718.286 ms drop table atest; DROP TABLE Time: 4.503 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 3.448 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1523.540 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1261.473 ms create index idx on atest(r); CREATE INDEX Time: 727.707 ms drop table atest; DROP TABLE Time: 3.564 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 2.897 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1447.504 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1403.525 ms create index idx on atest(r); CREATE INDEX Time: 754.577 ms drop table atest; DROP TABLE Time: 4.633 ms create table atest(i int4, r int4,d1 timestamp, d2 timestamp); CREATE TABLE Time: 3.196 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1618.544 ms insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now(); INSERT 0 100000 Time: 1530.450 ms create index idx on atest(r); CREATE INDEX Time: 802.980 ms drop table atest; DROP TABLE Time: 4.707 ms mserver# Regards, Gary.
pgsql-performance by date: