Thread: Use of partial index
I'm a little confused about partial indexes. I have a couple of tables, like this: CREATE TABLE events ( event_id INTEGER PRIMARY KEY, tag_type_fk INTEGER REFERENCES tag_types (tag_type_id), place_fk INTEGER REFERENCES places (place_id), event_date CHAR(18) NOT NULL DEFAULT '000000003000000001', sort_date DATE NOT NULL DEFAULT '40041024BC', event_text TEXT NOT NULL DEFAULT '', sentence TEXT NOT NULL DEFAULT'' ); To this table I have created a partial index: CREATE INDEX events_born ON events (tag_type_fk) WHERE tag_type_fk = 2; Another table: CREATE TABLE participants ( -- the TMG 'E' file participant_id INTEGER PRIMARY KEY, person_fk INTEGERREFERENCES persons (person_id), event_fk INTEGER REFERENCES events (event_id), role_type_fk INTEGER REFERENCES role_types (role_type_id), is_principal BOOLEAN NOT NULL DEFAULT 'f', is_primary_event BOOLEANNOT NULL DEFAULT 'f', participant_note TEXT NOT NULL DEFAULT '', participant_name TEXT NOT NULL DEFAULT'', age_mean INTEGER NOT NULL DEFAULT 0, age_devi INTEGER NOT NULL DEFAULT 0, CONSTRAINTperson_event UNIQUE (person_id, event_id) ); And a view: CREATE OR REPLACE VIEW principals AS SELECT participants.person_fk AS person, events.event_id AS event, events.place_fk AS place, events.event_date ASevent_date, events.sort_date AS sort_date, events.tag_type_fk AS tag_type FROM events, participants WHERE events.event_id = participants.event_fk AND participants.is_principal IS TRUE; Now, here's an "explain select": pgslekt=> explain select event_date, place from principals where person=2 and tag_type=2; QUERY PLAN -----------------------------------------------------------------------Nested Loop (cost=0.00..23.15 rows=2 width=26) -> Index Scan using person_event on participants (cost=0.00..13.63 rows=3 width=4) Index Cond:(person_fk = 2) Filter: (is_principal IS TRUE) -> Index Scan using events_pkey on events (cost=0.00..3.16 rows=1 width=30) Index Cond: (events.event_id = "outer".event_fk) Filter: (tag_type_fk= 2) (7 rader) Why doesn't this SELECT use the partial index "events_born" above? Is there any way to make this happen? -- Leif Biberg Kristensen http://solumslekt.org/
As I understand it, partial indices are generally useful when you only want to index a range of values, or if the select condition is on a different field from the one being indexed (eg: ON foo (a) WHERE b IS NOT NULL). I am just guessing here, but it sounds like 'person_fk = 2' is going to be a lot more selective (ie return less rows) than 'tag_type_fk = 2', so it's quicker to use the pkey and then filter the results. Depending on how many 'tag_type' values you have, indexing on it will not help at all. In other words, if more than a few percent of the rows have the value '2' for 'tag_type_fg', postgres will tend to favour more selective indices if you are doing a join, or a seqscan if you are doing a straight select on that value. Does that help? Dmitri PS Your query seems to be quite quick already, why don't you like this plan? > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Leif B. > Kristensen > Sent: Wednesday, October 05, 2005 11:17 AM > To: Pgsql-sql@postgresql.org > Subject: [SQL] Use of partial index > > > I'm a little confused about partial indexes. I have a couple > of tables, > like this: > > CREATE TABLE events ( > event_id INTEGER PRIMARY KEY, > tag_type_fk INTEGER REFERENCES tag_types (tag_type_id), > place_fk INTEGER REFERENCES places (place_id), > event_date CHAR(18) NOT NULL DEFAULT '000000003000000001', > sort_date DATE NOT NULL DEFAULT '40041024BC', > event_text TEXT NOT NULL DEFAULT '', > sentence TEXT NOT NULL DEFAULT '' > ); > > To this table I have created a partial index: > > CREATE INDEX events_born > ON events (tag_type_fk) > WHERE tag_type_fk = 2; > > Another table: > > CREATE TABLE participants ( -- the TMG 'E' file > participant_id INTEGER PRIMARY KEY, > person_fk INTEGER REFERENCES persons (person_id), > event_fk INTEGER REFERENCES events (event_id), > role_type_fk INTEGER REFERENCES role_types (role_type_id), > is_principal BOOLEAN NOT NULL DEFAULT 'f', > is_primary_event BOOLEAN NOT NULL DEFAULT 'f', > participant_note TEXT NOT NULL DEFAULT '', > participant_name TEXT NOT NULL DEFAULT '', > age_mean INTEGER NOT NULL DEFAULT 0, > age_devi INTEGER NOT NULL DEFAULT 0, > CONSTRAINT person_event UNIQUE (person_id, event_id) > ); > > And a view: > > CREATE OR REPLACE VIEW principals AS > SELECT > participants.person_fk AS person, > events.event_id AS event, > events.place_fk AS place, > events.event_date AS event_date, > events.sort_date AS sort_date, > events.tag_type_fk AS tag_type > FROM > events, participants > WHERE > events.event_id = participants.event_fk > AND > participants.is_principal IS TRUE; > > Now, here's an "explain select": > > pgslekt=> explain select event_date, place from principals where > person=2 and tag_type=2; > QUERY PLAN > -------------------------------------------------------------- > --------- > Nested Loop (cost=0.00..23.15 rows=2 width=26) > -> Index Scan using person_event on participants > (cost=0.00..13.63 rows=3 width=4) > Index Cond: (person_fk = 2) > Filter: (is_principal IS TRUE) > -> Index Scan using events_pkey on events > (cost=0.00..3.16 rows=1 width=30) > Index Cond: (events.event_id = "outer".event_fk) > Filter: (tag_type_fk = 2) > (7 rader) > > Why doesn't this SELECT use the partial index "events_born" above? Is > there any way to make this happen? > -- > Leif Biberg Kristensen > http://solumslekt.org/ > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend > The information transmitted is intended only for the person or entity to which it is addressed and may contain confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receivedthis in error, please contact the sender and delete the material from any computer
"Leif B. Kristensen" <leif@solumslekt.org> writes: > I'm a little confused about partial indexes. I have a couple of tables, > like this: > CREATE TABLE events ( > event_id INTEGER PRIMARY KEY, > tag_type_fk INTEGER REFERENCES tag_types (tag_type_id), > place_fk INTEGER REFERENCES places (place_id), > event_date CHAR(18) NOT NULL DEFAULT '000000003000000001', > sort_date DATE NOT NULL DEFAULT '40041024BC', > event_text TEXT NOT NULL DEFAULT '', > sentence TEXT NOT NULL DEFAULT '' > ); > To this table I have created a partial index: > CREATE INDEX events_born > ON events (tag_type_fk) > WHERE tag_type_fk = 2; > ... > Now, here's an "explain select": > pgslekt=> explain select event_date, place from principals where > person=2 and tag_type=2; > QUERY PLAN > ----------------------------------------------------------------------- > Nested Loop (cost=0.00..23.15 rows=2 width=26) > -> Index Scan using person_event on participants > (cost=0.00..13.63 rows=3 width=4) > Index Cond: (person_fk = 2) > Filter: (is_principal IS TRUE) > -> Index Scan using events_pkey on events > (cost=0.00..3.16 rows=1 width=30) > Index Cond: (events.event_id = "outer".event_fk) > Filter: (tag_type_fk = 2) > (7 rader) > Why doesn't this SELECT use the partial index "events_born" above? Because the plan it did choose is better. events_born could only serve to select the rows with tag_type_fk = 2; assuming there's more than one of those rows, there would be multiple fetches needed to see if any of them have the desired event_id. With this plan it's getting at most one row, by definition (since event_id is the primary key). Had you created the partial index as CREATE INDEX events_born ON events (event_id) WHERE tag_type_fk = 2; then it would be competitive for this query, since the index could effectively handle both constraints not just one. (THe way you did define it, the actual content of the index keys is just dead weight, since they obviously must all be "2". It's often better to define the index column(s) of a partial index as some other column than the one involved in the index predicate...) regards, tom lane
On Wed, 5 Oct 2005, Leif B. Kristensen wrote: > I'm a little confused about partial indexes. I have a couple of tables, > like this: > > CREATE TABLE events ( > event_id INTEGER PRIMARY KEY, > tag_type_fk INTEGER REFERENCES tag_types (tag_type_id), > place_fk INTEGER REFERENCES places (place_id), > event_date CHAR(18) NOT NULL DEFAULT '000000003000000001', > sort_date DATE NOT NULL DEFAULT '40041024BC', > event_text TEXT NOT NULL DEFAULT '', > sentence TEXT NOT NULL DEFAULT '' > ); > > To this table I have created a partial index: > > CREATE INDEX events_born > ON events (tag_type_fk) > WHERE tag_type_fk = 2; > > Another table: > > CREATE TABLE participants ( -- the TMG 'E' file > participant_id INTEGER PRIMARY KEY, > person_fk INTEGER REFERENCES persons (person_id), > event_fk INTEGER REFERENCES events (event_id), > role_type_fk INTEGER REFERENCES role_types (role_type_id), > is_principal BOOLEAN NOT NULL DEFAULT 'f', > is_primary_event BOOLEAN NOT NULL DEFAULT 'f', > participant_note TEXT NOT NULL DEFAULT '', > participant_name TEXT NOT NULL DEFAULT '', > age_mean INTEGER NOT NULL DEFAULT 0, > age_devi INTEGER NOT NULL DEFAULT 0, > CONSTRAINT person_event UNIQUE (person_id, event_id) > ); > > And a view: > > CREATE OR REPLACE VIEW principals AS > SELECT > participants.person_fk AS person, > events.event_id AS event, > events.place_fk AS place, > events.event_date AS event_date, > events.sort_date AS sort_date, > events.tag_type_fk AS tag_type > FROM > events, participants > WHERE > events.event_id = participants.event_fk > AND > participants.is_principal IS TRUE; > > Now, here's an "explain select": > > pgslekt=> explain select event_date, place from principals where > person=2 and tag_type=2; > QUERY PLAN > ----------------------------------------------------------------------- > Nested Loop (cost=0.00..23.15 rows=2 width=26) > -> Index Scan using person_event on participants > (cost=0.00..13.63 rows=3 width=4) > Index Cond: (person_fk = 2) > Filter: (is_principal IS TRUE) > -> Index Scan using events_pkey on events > (cost=0.00..3.16 rows=1 width=30) > Index Cond: (events.event_id = "outer".event_fk) > Filter: (tag_type_fk = 2) > (7 rader) > > Why doesn't this SELECT use the partial index "events_born" above? Is > there any way to make this happen? I would think you'd want an index ON events(event_id) WHERE tag_type_fk=2 for the query given.
On Wednesday 05 October 2005 18:44, you wrote: > As I understand it, partial indices are generally useful when you > only want to index a range of values, or if the select condition is > on a different field from the one being indexed (eg: ON foo (a) WHERE > b IS NOT NULL). > > I am just guessing here, but it sounds like 'person_fk = 2' is going > to be a lot more selective (ie return less rows) than 'tag_type_fk = > 2', so it's quicker to use the pkey and then filter the results. That makes a lot of sense to me. As any person will usually participate in several events, I'll estimate that the ratio between person_fk=x and tag_type_fk=y is about 1:4. > Depending on how many 'tag_type' values you have, indexing on it will > not help at all. In other words, if more than a few percent of the > rows have the value '2' for 'tag_type_fg', postgres will tend to > favour more selective indices if you are doing a join, or a seqscan > if you are doing a straight select on that value. I have only 53 different tag types, but most of them are rather rare. The most heavilyly used are birth=2, death=3, and marriage=4, and I've created similar partial indexes for the other two. In a previous MySQL project that's roughly sharing the same data model, I created a redundant table called "marriages" that speeded up the generation of a family view by a factor of four. That's why I thought that a partial index would have a similar effect here. > Does that help? Yessir, thank you very much! > Dmitri > PS Your query seems to be quite quick already, why don't you like > this plan? I didn't say that I didn't like it, - I just was a little perplexed :-) -- Leif Biberg Kristensen | Registered Linux User #338009 http://solumslekt.org/ | Cruising with Gentoo-Linux/KDE
On Wednesday 05 October 2005 18:49, you wrote: [Leif] > > Now, here's an "explain select": > > > > pgslekt=> explain select event_date, place from principals where > > person=2 and tag_type=2; > > QUERY PLAN > > ----------------------------------------------------------------------- > > Nested Loop (cost=0.00..23.15 rows=2 width=26) > > -> Index Scan using person_event on participants > > (cost=0.00..13.63 rows=3 width=4) > > Index Cond: (person_fk = 2) > > Filter: (is_principal IS TRUE) > > -> Index Scan using events_pkey on events > > (cost=0.00..3.16 rows=1 width=30) > > Index Cond: (events.event_id = "outer".event_fk) > > Filter: (tag_type_fk = 2) > > (7 rader) > > Because the plan it did choose is better. events_born could only > serve to select the rows with tag_type_fk = 2; assuming there's more > than one of those rows, there would be multiple fetches needed to see > if any of them have the desired event_id. With this plan it's getting > at most one row, by definition (since event_id is the primary key). > > Had you created the partial index as > > CREATE INDEX events_born > ON events (event_id) > WHERE tag_type_fk = 2; > > then it would be competitive for this query, since the index could > effectively handle both constraints not just one. (THe way you did > define it, the actual content of the index keys is just dead weight, > since they obviously must all be "2". It's often better to define > the index column(s) of a partial index as some other column than the > one involved in the index predicate...) > pgslekt=> create index events_born2 on events (event_id) where tag_type_fk=2; CREATE INDEX pgslekt=> explain select event_date, place from principals where person=2 and tag_type=2; QUERY PLAN ----------------------------------------------------------------------------------------Nested Loop (cost=0.00..22.88 rows=2width=26) -> Index Scan using person_event on participants (cost=0.00..13.63 rows=3 width=4) Index Cond: (person_fk = 2) Filter: (is_principal IS TRUE) -> Index Scan using events_born2on events (cost=0.00..3.07 rows=1 width=30) Index Cond: (events.event_id = "outer".event_fk) Filter: (tag_type_fk = 2) (7 rader) From 23.15 to 22.88 ... but now at least it used my partial index, as it does a slightly better job. I'm starting to get it - I think. Thank you for your explanation. With regards to optimization, it seems that I'm still too hung up in MySQL issues. PostgreSQL seems to behave a lot more intelligently with queries. -- Leif Biberg Kristensen | Registered Linux User #338009 http://solumslekt.org/ | Cruising with Gentoo-Linux/KDE