Thread: Re: pgbench with partitioned tables

Re: pgbench with partitioned tables

From
Melanie Plageman
Date:
On Fri, Jan 31, 2025 at 6:32 AM Sergey Tatarintsev
<s.tatarintsev@postgrespro.ru> wrote:
>
> When using manually created partitioned tables for pgbench, an error
> occurred:
> ERROR:  cannot perform COPY FREEZE on a partitioned table.
>
> Currently only pgbench_accounts can be partitioned, all others must be a
> regular tables.
>
> I wrote a patch to disable WITH (FREEZE = ON) when generating data for
> non-regular tables.

Thanks for the patch!

I recommend including a repro when proposing such things. Here's one I made

# pgbench init so all the tables are there
pgbench -i -s 10

# drop pgbench_tellers and recreate it partitioned
psql -f- <<EOF
DROP TABLE IF EXISTS pgbench_tellers;
CREATE TABLE pgbench_tellers (
    tid INT NOT NULL,
    bid INT,
    tbalance INT,
    filler CHAR(84)
) PARTITION BY RANGE (tid);

CREATE TABLE pgbench_tellers_p1
PARTITION OF pgbench_tellers
FOR VALUES FROM (1) TO (50);

CREATE TABLE pgbench_tellers_p2
PARTITION OF pgbench_tellers
FOR VALUES FROM (50) TO (2000);
EOF

# run pgbench init with only the data gen step
pgbench -i -I g -s 10

Personally, I have needed to disable COPY FREEZE during pgbench -i
when benchmarking features related to vacuum's freezing behavior.

Maybe instead of just not using COPY FREEZE on a table if it is
partitioned, we could add new data generation init_steps. Perhaps one
that is client-side data generation (g) but with no freezing? I'm not
really sure what the letter would be (f? making it f, g, and G?).

It seems they did not consider making COPY FREEZE optional when adding
the feature to pgbench [1].

This differs from your patch's behavior but it might still solve your problem?

- Melanie

[1] https://www.postgresql.org/message-id/flat/20210308.143907.2014279678657453983.t-ishii%40gmail.com



Re: pgbench with partitioned tables

From
Álvaro Herrera
Date:
On 2025-Jan-31, Melanie Plageman wrote:

> Maybe instead of just not using COPY FREEZE on a table if it is
> partitioned, we could add new data generation init_steps. Perhaps one
> that is client-side data generation (g) but with no freezing? I'm not
> really sure what the letter would be (f? making it f, g, and G?).

I think it makes sense to do what you suggest, but on the other hand,
the original code that Sergey is patching looks like a hack in the sense
that it hardcodes which tables to use FREEZE with.  There's no point to
doing things that way actually, so accepting Sergey's patch to replace
that with a relkind check feels sensible to me.

I think the query should be
SELECT relkind FROM pg_catalog.pg_class WHERE oid='%s'::pg_catalog.regclass
if only for consistency with pgbench's other query on catalogs.


Your proposal to add different init_steps might be reasonable, at least
if we allowed partitionedness of tables to vary in other ways (eg. if we
made pgbench_history partitioned), but I don't think it conflicts with
Sergey's patch in spirit.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Here's a general engineering tip: if the non-fun part is too complex for you
to figure out, that might indicate the fun part is too ambitious." (John Naylor)
https://postgr.es/m/CAFBsxsG4OWHBbSDM%3DsSeXrQGOtkPiOEOuME4yD7Ce41NtaAD9g%40mail.gmail.com



Re: pgbench with partitioned tables

From
Sami Imseih
Date:
I was looking at the comments [1] for why COPY FREEZE is
not allowed on a parent table, and it was mainly due
to having to open up partitions to check if they are able
to take the optimization (table created or truncated in the
current transaction ). Obviously as the number of
partitions grow, it will take more to perform these
checks. My suspicious is that in the cases in which partitions
are used, the benefits of COPY FREEZE could outweigh
the overhead of these additional checks.

So while we could try to solve the COPY FREEZE issue
specifically for pgbench, I wonder if we should try to do better
and see if the limitation on a parent partition can be removed.

I can provide a patch and some benchmark numbers unless
there is something bigger I am missing about the reason this
limitation exists?


[1] https://github.com/postgres/postgres/blob/master/src/backend/commands/copyfrom.c#L727-L735

Regards,

Sami