Thread: tables with 300+ partitions

tables with 300+ partitions

From

Pablo Alcaraz

Date:

30 October 2007, 15:17:22

Hi All!

I had a big big big table. I tried to divide it in 300 partitions with
30M rows each one. The problem was when I used the table to insert
information: the perfomance was LOW.

I did some testing. I created a 300 partitioned empty table. Then, I
inserted some rows on it and the perfomance was SLOW too.

SLOW = 1% perfomance compared with a non partitioned table. That is too
much.

Then, I did a 10 partitioned table version with 30M rows each one and I
inserted rows there. The performance was the same that the no
partitioned table version.

I suspect there is a lock problem there. I think every SQL command do a
lock to ALL the partitions so the perfomance with concurrent inserts and
updates are far worst than the no partitioned version.

The perfomace degrade with the number of partitions. And it degrade
fast: I have troubles with 40 partitions.

Am I right? is there a workaround? Can I replace the partitioned version
with another schema? any suggestion? I prefer to use something
transparent for the program because it uses EJB3 = deep changes and
testing on any change to the database layer.


Regards

Pablo Alcaraz

Re: tables with 300+ partitions

From

Heikki Linnakangas

Date:

30 October 2007, 15:57:04

Pablo Alcaraz wrote:
> I had a big big big table. I tried to divide it in 300 partitions with
> 30M rows each one. The problem was when I used the table to insert
> information: the perfomance was LOW.

That's very vague. What exactly did you do? Just inserted a few rows, or
perhaps a large bulk load of millions of rows? What was the bottleneck,
disk I/O or CPU usage? How long did the operation take, and how long did
you expect it to take?

> I did some testing. I created a 300 partitioned empty table. Then, I
> inserted some rows on it and the perfomance was SLOW too.
>
> SLOW = 1% perfomance compared with a non partitioned table. That is too
> much.
>
> Then, I did a 10 partitioned table version with 30M rows each one and I
> inserted rows there. The performance was the same that the no
> partitioned table version.

That suggests that the CPU time is spent in planning the query, possibly
in constraint exclusion. But that's a very different scenario from
having millions of rows in each partition.

> I suspect there is a lock problem there. I think every SQL command do a
> lock to ALL the partitions so the perfomance with concurrent inserts and
> updates are far worst than the no partitioned version.

Every query takes an AccessShareLock on each partition, but that doesn't
prevent concurrent inserts or updates, and acquiring the locks isn't
very expensive. In other words: no, that's not it.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: tables with 300+ partitions

From

"Steven Flatt"

Date:

30 October 2007, 18:00:28

On 10/30/07, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:

I did some testing. I created a 300 partitioned empty table. Then, I
inserted some rows on it and the perfomance was SLOW too.

Is the problem with inserting to the partitioned table or selecting from it? It sounds like inserting is the problem in which case I ask: how are you redirecting inserts to the appropriate partition? If you're using rules, then insert performance will quickly degrade with number of partitions as *every* rule needs to be evaluated for *every* row inserted to the base table. Using a trigger which you can modify according to some schedule is much faster, or better yet, use some application-level logic to insert directly to the desired partition.

Steve

Re: tables with 300+ partitions

From

Pablo Alcaraz

Date:

31 October 2007, 13:14:38

Steven Flatt wrote:

On 10/30/07, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:
I did some testing. I created a 300 partitioned empty table. Then, I
inserted some rows on it and the perfomance was SLOW too.

Is the problem with inserting to the partitioned table or selecting from it? It sounds like inserting is the problem in which case I ask: how are you redirecting inserts to the appropriate partition? If you're using rules, then insert performance will quickly degrade with number of partitions as *every* rule needs to be evaluated for *every* row inserted to the base table. Using a trigger which you can modify according to some schedule is much faster, or better yet, use some application-level logic to insert directly to the desired partition.

Steve

I was a program inserting into the base table. The program ran in 200+ threads and every thread insert data on it. Every thread inserts a row every 3 seconds aprox.(or they used to do it), but when I put more partitions the insert speed went to 1 insert every 2 minutes.

The selects that need to evaluate all partitions were slow too, but I think I can wait for them. :D

I wonder if the update are slow too. I do not know that.

Do I need to do a trigger for insert only or I need a trigger to update and delete too?

To modify the appilication logic for this is not an options because they need to open the program, modify it and retest. All because an implementation problem. I prefer to try to solve it at the database level because the database need this table partitioned.

Thanks for your help

Regards.

Pablo

Re: tables with 300+ partitions

From

"Steven Flatt"

Date:

31 October 2007, 15:23:07

On 10/31/07, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:

I was a program inserting into the base table. The program ran in 200+ threads and every thread insert data on it. Every thread inserts a row every 3 seconds aprox.(or they used to do it), but when I put more partitions the insert speed went to 1 insert every 2 minutes.

We still need to know how you're redirecting inserts to the appropriate partition. My guess is that you have rules on the base table which say "if an incoming row matches a certain criteria, then insert into partition x instead". There are several discussions on this newsgroup already about why using rules for partitioning hurts insert performance.

Next question is *how* are you partitioning the table? If you're partitioning based on some sort of log time field (i.e. today's data goes in this partition, tomorrow's data goes in that partition, etc.), then it is reasonably easy to use a single trigger (as oppose to many rules) on the base table which you can modify on some schedule (using cron, for example). If you're partitioning based on some other id field, then using a trigger won't work as nicely.

Do I need to do a trigger for insert only or I need a trigger to update and delete too?

For delete, probably not. For update, depends on how you're partitioning. Could the update cause the row to belong to a different partition? Do you care about moving it at that point?

Steve

Re: tables with 300+ partitions

From

"Scott Marlowe"

Date:

31 October 2007, 16:05:07

On 10/31/07, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:
>
>  Steven Flatt wrote:
>
> On 10/30/07, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:
> > I did some testing. I created a 300 partitioned empty table. Then, I
> > inserted some rows on it and the perfomance was SLOW too.
>
>
> Is the problem with inserting to the partitioned table or selecting from it?
>  It sounds like inserting is the problem in which case I ask: how are you
> redirecting inserts to the appropriate partition?  If you're using rules,
> then insert performance will quickly degrade with number of partitions as
> *every* rule needs to be evaluated for *every* row inserted to the base
> table.  Using a trigger which you can modify according to some schedule is
> much faster, or better yet, use some application-level logic to insert
> directly to the desired partition.
>
> Steve I was a program inserting into the base table. The program ran in 200+
> threads and every thread insert data on it. Every thread inserts a row every
> 3 seconds aprox.(or they used to do it), but when I put more partitions the
> insert speed went to 1 insert every 2 minutes.
>
>  The selects that need to evaluate all partitions were slow too, but I think
> I can wait for them. :D
>
>  I wonder if the update are slow too. I do not know that.
>
>  Do I need to do a trigger for insert only or I need a trigger to update and
> delete too?

You need a trigger for any update / delete / insert you don't want to
be really slow.  Basically, if a rule is doing it now, you need a
trigger to do it to speed it up.

My experience has been that at 200 to 1000 partitions, the speed of
the smaller tables still makes selects faster than with one big table
for certain kinds of access.  At some point, the cost of planning a
lookup against thousands of tables will be more than the savings of
looking in a really small table.

The nice thing about triggers is that you can use maths to figure out
the name of the table you'll be writing to instead of a static table
like most rules use.  So, all you have to do is make sure the new
tables get added under the parent and poof, you're ready to go, no
need for a new trigger.

Re: tables with 300+ partitions

From

Tomáš Vondra

Date:

31 October 2007, 16:44:56

 > Steven Flatt wrote:
 >> On 10/30/07, *Pablo Alcaraz* <pabloa@laotraesquina.com.ar
 >> <mailto:pabloa@laotraesquina.com.ar>> wrote:
 >>
 >>     I did some testing. I created a 300 partitioned empty table. Then, I
 >>     inserted some rows on it and the perfomance was SLOW too.
 >>
 >> Is the problem with inserting to the partitioned table or selecting
 >> from it?  It sounds like inserting is the problem in which case I
 >> ask: how are you redirecting inserts to the appropriate partition?
 >> If you're using rules, then insert performance will quickly degrade
 >> with number of partitions as *every* rule needs to be evaluated for
 >> *every* row inserted to the base table.  Using a trigger which you
 >> can modify according to some schedule is much faster, or better yet,
 >> use some application-level logic to insert directly to the desired
 >> partition.
 >>
 >> Steve
 > I was a program inserting into the base table. The program ran in 200+
 > threads and every thread insert data on it. Every thread inserts a row
 > every 3 seconds aprox.(or they used to do it), but when I put more
 > partitions the insert speed went to 1 insert every 2 minutes.

OK, that gives about 70 inserts per second - depending on the amount of
data inserted this may or may not be manageable. What is the size of the
data the threads are writing with each insert, or what is the size of
the whole table (not the number of rows, but size in MB / GB). What is
the table structure - what indices are defined on it, etc.?

What kind of SELECT queries do you execute on the table / partitions?
Aggregations or simple queries? Have you executed ANALYZE on all the
partitions after loading the data? What are the EXPLAIN plan for the
slow SELECT queries?

Anyway 300 partitions for 200 threads seems a little bit too much to me.
I'd use something like 10 partitions or something like that. What
strategy have you chosen to redirect the inserts into the partitions,
i.e. how do you determine the partition the insert should be written to?

Maybe I missed something, but what is the CPU and I/O load? In other
words, is the system CPU bound or I/O bound?

 > The selects that need to evaluate all partitions were slow too, but I
 > think I can wait for them. :D
 >
 > I wonder if the update are slow too. I do not know that.
 >
 > Do I need to do a trigger for insert only or I need a trigger to
 > update and delete too?

If you have created the queries using "INHERITS" then all you need to do
is redirect inserts - either using a RULE, a BEFORE INSERT trigger, or a
stored procedure. Each of these options has advandages / disadvantages:

Rules are quite easy to maintain (once you create a new partition you
just need to create a new rule), but may have serious overhead in case
of many partitions as you have to evaluate all rules .

Triggers are not as easy to maintain as all the tables have to be in a
single procedure, and adding / removing a partition means modifying the
procedure. On the other side the performance may be better in case of
many partitions.

Both the solutions mentioned above have the advantage of transparency,
i.e. the clients don't need to know about them. Stored procedures have
the advantages and disadvanteges of a trigger, plus they have to be
invoked by the client.

  Tomas

Re: tables with 300+ partitions

From

Pablo Alcaraz

Date:

31 October 2007, 18:53:49

Scott Marlowe wrote:

On 10/31/07, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:

 Steven Flatt wrote:

On 10/30/07, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:

I did some testing. I created a 300 partitioned empty table. Then, I
inserted some rows on it and the perfomance was SLOW too.

Is the problem with inserting to the partitioned table or selecting from it?It sounds like inserting is the problem in which case I ask: how are you
redirecting inserts to the appropriate partition?  If you're using rules,
then insert performance will quickly degrade with number of partitions as
*every* rule needs to be evaluated for *every* row inserted to the base
table.  Using a trigger which you can modify according to some schedule is
much faster, or better yet, use some application-level logic to insert
directly to the desired partition.

Steve I was a program inserting into the base table. The program ran in 200+
threads and every thread insert data on it. Every thread inserts a row every
3 seconds aprox.(or they used to do it), but when I put more partitions the
insert speed went to 1 insert every 2 minutes.
The selects that need to evaluate all partitions were slow too, but I think
I can wait for them. :D
I wonder if the update are slow too. I do not know that.
Do I need to do a trigger for insert only or I need a trigger to update and
delete too?

You need a trigger for any update / delete / insert you don't want to
be really slow.  Basically, if a rule is doing it now, you need a
trigger to do it to speed it up.

My experience has been that at 200 to 1000 partitions, the speed of
the smaller tables still makes selects faster than with one big table
for certain kinds of access.  At some point, the cost of planning a
lookup against thousands of tables will be more than the savings of
looking in a really small table.

The nice thing about triggers is that you can use maths to figure out
the name of the table you'll be writing to instead of a static table
like most rules use.  So, all you have to do is make sure the new
tables get added under the parent and poof, you're ready to go, no
need for a new trigger.

Currently I have a insert rule only and the updates are right solved. I think the UPDATEs use the constraint because the program use the base table everywhere.

This is the base table structure:

-- Table: t

-- DROP TABLE t;

CREATE TABLE t
(
idt bigint NOT NULL,
idtpadre bigint NOT NULL,
e integer NOT NULL,
dmodi timestamp without time zone NOT NULL DEFAULT now(),
p integer NOT NULL DEFAULT 0,
m text NOT NULL
)
WITHOUT OIDS;
ALTER TABLE t OWNER TO e;

-- Rule: "t_update_00003 ON t"

-- DROP RULE t_update_00003 ON t;

CREATE OR REPLACE RULE t_update_00003 AS
    ON INSERT TO t
   WHERE new.idt >= 1::bigint AND new.idt <= 30000000::bigint DO INSTEAD INSERT INTO t_00003 (idt, idtpadre, e, dmodi, p, m)
VALUES (new.idt, new.idtpadre, new.e, new.dmodi, new.p, new.m);

-- Rule: "t_update_00006 ON t"

-- DROP RULE t_update_00006 ON t;

CREATE OR REPLACE RULE t_update_00006 AS
    ON INSERT TO t
   WHERE new.idt >= 30000001::bigint AND new.idt <= 60000000::bigint DO INSTEAD INSERT INTO t_00006 (idt, idtpadre, e, dmodi, p, m)
VALUES (new.idt, new.idtpadre, new.e, new.dmodi, new.p, new.m);

-- Rule: "t_update_00009 ON t"

-- DROP RULE t_update_00009 ON t;

CREATE OR REPLACE RULE t_update_00009 AS
    ON INSERT TO t
   WHERE new.idt >= 60000001::bigint AND new.idt <= 90000000::bigint DO INSTEAD INSERT INTO t_00009 (idt, idtpadre, e, dmodi, p, m)
VALUES (new.idt, new.idtpadre, new.e, new.dmodi, new.p, new.m);

-- Rule: "t_update_00012 ON t"

-- DROP RULE t_update_00012 ON t;

CREATE OR REPLACE RULE t_update_00012 AS
    ON INSERT TO t
   WHERE new.idt >= 90000001::bigint AND new.idt <= 120000000::bigint DO INSTEAD INSERT INTO t_00012 (idt, idtpadre, e, dmodi, p, m)
VALUES (new.idt, new.idtpadre, new.e, new.dmodi, new.p, new.m);

etc ... 300 hundred partitions

The partitions are created like:

CREATE TABLE t_00003
(
CONSTRAINT t_00003_pkey PRIMARY KEY (idt),
CONSTRAINT t_00003_idt_check CHECK (idt >= 1::bigint AND idt <= 30000000::bigint)
) INHERITS (t)
WITHOUT OIDS;
ALTER TABLE t_00003 OWNER TO e;

CREATE INDEX t_00003_e
ON t_00003
USING btree
(e);

etc