Thread: Partitioning Tables
PostgreSQL 12
I have a challenge. I have two tables, group and group_member. A group table has a type indicator telling me which of three ways the table can be used. The group member table is made up of 50 million records that have a foreign key to the group table.
I really need the queries to be fast for one particular type of group. This type has less than a million members in it. So my first thought was to create a separate group_member table just for members of this type of group. But I have to change a lot of SQL.
The other idea I thought of is there a way to use table partitions? If the query goes against a group of type A then it would pull from the small partition but if it is a group of some other type it would query against the other partition.
So if I do a query like the below it would ideally pull from partition group_member_a rather than group_member_other. However, how would PostgreSQL know on insert a group_member into the proper partition?
Select group_member.* from group, group_member WHERE group.type=’A’ and group.id=group_member.user=’bob smith’;
Also, what happens if I did a query like this. Will it know to scan both group_member_a and group_member_other:
Select group_member.* from group_member where group_member.user=’bob smith’;
Thoughts?
Thanks,
Lance Campbell
> On Feb 5, 2021, at 11:27 AM, Campbell, Lance <lance@illinois.edu> wrote: > > So if I do a query like the below it would ideally pull from partition group_member_a rather than group_member_other. However, how would PostgreSQL know on insert a group_member into the proper partition? > > Select group_member.* from group, group_member WHERE group.type=’A’ and group.id=group_member.user=’bob smith’; When you define the partition, you declare what values go into it. With PG 12, you can insert into the parent and the rowwill get redirected into the right place. Some older versions (I don't remember how old) required you to insert into theright table, or some voodoo with rewrite rules or triggers. In other words, smarts about which partitions to scan in your example pre-dates smarts about inserting. I don't remember for sure, but I think maybe PG 13 adds the feature where if you update the group type, the row gets moved.Prior, you'd have to delete and re-insert. Someone correct me? > Also, what happens if I did a query like this. Will it know to scan both group_member_a and group_member_other: > > Select group_member.* from group_member where group_member.user=’bob smith’; Yes. Anyway, you are on the right track, in that what you're describing could work. There are less invasive things you could try, which wouldn't give as much benefit--like partial indexes for instance create index ... on group_member(user) where type = 'A' that gets you fast location of a single user, but if you looked for a range of users, reading the index would be fast, butyou'd still wind up with the matching rows scattered all over the single big tablee
On 2/5/21 12:42 PM, Scott Ribe wrote: >> On Feb 5, 2021, at 11:27 AM, Campbell, Lance <lance@illinois.edu> wrote: >> >> So if I do a query like the below it would ideally pull from partition group_member_a rather than group_member_other. However, how would PostgreSQL know on insert a group_member into the proper partition? >> >> Select group_member.* from group, group_member WHERE group.type=’A’ and group.id=group_member.user=’bob smith’; > When you define the partition, you declare what values go into it. With PG 12, you can insert into the parent and the rowwill get redirected into the right place. Some older versions (I don't remember how old) required you to insert into theright table, or some voodoo with rewrite rules or triggers. > > In other words, smarts about which partitions to scan in your example pre-dates smarts about inserting. > > I don't remember for sure, but I think maybe PG 13 adds the feature where if you update the group type, the row gets moved.Prior, you'd have to delete and re-insert. Someone correct me? I don't know about v13, but v12 (and below) error on updating the partition key. (That's a common limitation which helps the optimizer /know/ what's in a partition.) -- Angular momentum makes the world go 'round.
This is what I really want to do. But it does not look like PostgreSQL supports referencing the parent tables for decidingwhat partition to put the data in: CREATE TABLE group_manager.group (id INTEGER, type TEXT, name TEXT, CONSTRAINT group_pkey PRIMARY KEY (id) ); -- type would have values of A , B or C CREATE TABLE group_member (fk_group_id INTEGER, id INTEGER, user_id TEXT) PARTITION BY LIST(group.type); -- Above PARTITION BY LIST(group.type) does not work. CREATE TABLE group_member_A PARTITION OF group_member FOR VALUES IN ('A'); CREATE TABLE group_member_B PARTITION OF group_member FOR VALUES IN ('B'); CREATE TABLE group_member_C PARTITION OF group_member FOR VALUES IN ('C'); It does look like it would work if I added the "type" from the group table to the group_member table. But otherwise thePARITION BY LIST does not support referencing a parent table. Lance On 2/5/21, 12:42 PM, "Scott Ribe" <scott_ribe@elevated-dev.com> wrote: > On Feb 5, 2021, at 11:27 AM, Campbell, Lance <lance@illinois.edu> wrote: > > So if I do a query like the below it would ideally pull from partition group_member_a rather than group_member_other. However, how would PostgreSQL know on insert a group_member into the proper partition? > > Select group_member.* from group, group_member WHERE group.type=’A’ and group.id=group_member.user=’bob smith’; When you define the partition, you declare what values go into it. With PG 12, you can insert into the parent and therow will get redirected into the right place. Some older versions (I don't remember how old) required you to insert intothe right table, or some voodoo with rewrite rules or triggers. In other words, smarts about which partitions to scan in your example pre-dates smarts about inserting. I don't remember for sure, but I think maybe PG 13 adds the feature where if you update the group type, the row getsmoved. Prior, you'd have to delete and re-insert. Someone correct me? > Also, what happens if I did a query like this. Will it know to scan both group_member_a and group_member_other: > > Select group_member.* from group_member where group_member.user=’bob smith’; Yes. Anyway, you are on the right track, in that what you're describing could work. There are less invasive things you could try, which wouldn't give as much benefit--like partial indexes for instance create index ... on group_member(user) where type = 'A' that gets you fast location of a single user, but if you looked for a range of users, reading the index would be fast,but you'd still wind up with the matching rows scattered all over the single big tablee
I had a typo: This is what I really want to do. But it does not look like PostgreSQL supports referencing the parent tables for decidingwhat partition to put the data in: CREATE TABLE group (id INTEGER, type TEXT, name TEXT, CONSTRAINT group_pkey PRIMARY KEY (id) ); -- type would have values of A , B or C CREATE TABLE group_member (fk_group_id INTEGER, id INTEGER, user_id TEXT) PARTITION BY LIST(group.type); -- Above PARTITION BY LIST(group.type) does not work. CREATE TABLE group_member_A PARTITION OF group_member FOR VALUES IN ('A'); CREATE TABLE group_member_B PARTITION OF group_member FOR VALUES IN ('B'); CREATE TABLE group_member_C PARTITION OF group_member FOR VALUES IN ('C'); It does look like it would work if I added the "type" from the group table to the group_member table. But otherwise thePARITION BY LIST does not support referencing a parent table. Lance On 2/5/21, 1:18 PM, "Campbell, Lance" <lance@illinois.edu> wrote: This is what I really want to do. But it does not look like PostgreSQL supports referencing the parent tables for decidingwhat partition to put the data in: CREATE TABLE group_manager.group (id INTEGER, type TEXT, name TEXT, CONSTRAINT group_pkey PRIMARY KEY (id) ); -- type would have values of A , B or C CREATE TABLE group_member (fk_group_id INTEGER, id INTEGER, user_id TEXT) PARTITION BY LIST(group.type); -- Above PARTITION BY LIST(group.type) does not work. CREATE TABLE group_member_A PARTITION OF group_member FOR VALUES IN ('A'); CREATE TABLE group_member_B PARTITION OF group_member FOR VALUES IN ('B'); CREATE TABLE group_member_C PARTITION OF group_member FOR VALUES IN ('C'); It does look like it would work if I added the "type" from the group table to the group_member table. But otherwisethe PARITION BY LIST does not support referencing a parent table. Lance On 2/5/21, 12:42 PM, "Scott Ribe" <scott_ribe@elevated-dev.com> wrote: > On Feb 5, 2021, at 11:27 AM, Campbell, Lance <lance@illinois.edu> wrote: > > So if I do a query like the below it would ideally pull from partition group_member_a rather than group_member_other. However, how would PostgreSQL know on insert a group_member into the proper partition? > > Select group_member.* from group, group_member WHERE group.type=’A’ and group.id=group_member.user=’bob smith’; When you define the partition, you declare what values go into it. With PG 12, you can insert into the parent andthe row will get redirected into the right place. Some older versions (I don't remember how old) required you to insertinto the right table, or some voodoo with rewrite rules or triggers. In other words, smarts about which partitions to scan in your example pre-dates smarts about inserting. I don't remember for sure, but I think maybe PG 13 adds the feature where if you update the group type, the row getsmoved. Prior, you'd have to delete and re-insert. Someone correct me? > Also, what happens if I did a query like this. Will it know to scan both group_member_a and group_member_other: > > Select group_member.* from group_member where group_member.user=’bob smith’; Yes. Anyway, you are on the right track, in that what you're describing could work. There are less invasive things you could try, which wouldn't give as much benefit--like partial indexes for instance create index ... on group_member(user) where type = 'A' that gets you fast location of a single user, but if you looked for a range of users, reading the index would befast, but you'd still wind up with the matching rows scattered all over the single big tablee
> On Feb 5, 2021, at 12:19 PM, Campbell, Lance <lance@illinois.edu> wrote: > > But it does not look like PostgreSQL supports referencing the parent tables for deciding what partition to put the datain correct, I missed that in your first post
It does look like it would work if I added the "type" from the group table to the group_member table. But otherwise the PARITION BY LIST does not support referencing a parent table.
On Fri, 2021-02-05 at 18:27 +0000, Campbell, Lance wrote: > I have a challenge. I have two tables, group and group_member. A group table has a > type indicator telling me which of three ways the table can be used. The group member > table is made up of 50 million records that have a foreign key to the group table. > > I really need the queries to be fast for one particular type of group. This type has > less than a million members in it. So my first thought was to create a separate > group_member table just for members of this type of group. But I have to change a lot of SQL. > > The other idea I thought of is there a way to use table partitions? If the query goes > against a group of type A then it would pull from the small partition but if it is a group > of some other type it would query against the other partition. If you want to partition "group_member", the partitioning key cannot be in "group". So you would have to (redundantly) add "group.type" to "group_member". You could ensure consistency by including that column in the foreign key. If you want partition pruning to take effect, you will have to include a condition on "group_member.type" in the WHERE condition. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com