Thread: Have: Seq Scan - Want: Index Scan - what am I doing wrong?
Hi guys, PG = 9.1.5 OS = winDOS 2008R8 I have a table that currently has 207 million rows. there is a timestamp field that contains data. more data gets copied from another database into this database. How do I make this do an index scan instead? I did an "analyze audittrailclinical" to no avail. I tested different indexes - no same behavior. The query does this: SELECT audittrailclinical.pgid, audittrailclinical.timestamp, mmuser.logon, audittrailclinical.entityname, audittrailclinical.clinicalactivity, audittrailclinical.audittraileventcode, account.accountnumber, patient.dnsortpersonnumber FROM public.account, public.audittrailclinical, public.encounter, public.entity, public.mmuser, public.patient, public.patientaccount WHERE audittrailclinical.encountersid = encounter.encountersid and audittrailclinical.timestamp >= '2008-01-01'::timestamp without time zone and audittrailclinical.timestamp <= '2012-10-05'::timestamp without time zone AND encounter.practiceid = patient.practiceid AND encounter.patientid = patient.patientid AND encounter.staffid = patient.staffid AND entity.entitysid = audittrailclinical.entitysid AND mmuser.mmusersid = audittrailclinical.mmusersid AND patient.practiceid = patientaccount.practiceid AND patient.patientid = patientaccount.patientid AND patientaccount.accountsid = account.accountsid AND patientaccount.defaultaccount = 'Y' AND patient.dnsortpersonnumber = '347450' ; The query plan says: " -> Seq Scan on audittrailclinical (cost=0.00..8637598.76 rows=203856829 width=62)" " Filter: (("timestamp" >= '2008-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2012-10-0500:00:00'::timestamp without time zone))" which takes forever. How do I make this do an index scan instead? I did an "analyze audittrailclinical" to no avail. the table definitions are (the createstamp field is empty - I know, bad data): CREATE TABLE audittrailclinical ( audittrailid text, audittraileventcode text, clinicalactivity text, eventsuccessful text, externalunique text, recordstamp timestamp without time zone, recorddescription text, encountersid integer, eventuserlogon text, computername text, applicationcode text, practiceid integer, mmusersid integer, entitysid integer, entityname text, "timestamp" timestamp without time zone, lastuser integer, createstamp timestamp without time zone, pgid bigint DEFAULT nextval(('"bravepoint_seq"'::text)::regclass) ) WITH ( OIDS=FALSE ); ALTER TABLE audittrailclinical OWNER TO intergy; GRANT ALL ON TABLE audittrailclinical TO intergy; GRANT SELECT ON TABLE audittrailclinical TO rb; -- Index: atc_en_time CREATE INDEX atc_en_time ON audittrailclinical USING btree (entitysid , "timestamp" ); -- Index: atc_id -- DROP INDEX atc_id; CREATE INDEX atc_id ON audittrailclinical USING btree (audittrailid COLLATE pg_catalog."default" );
On Oct 17, 2012, at 3:52 AM, Chris Ruprecht <chris@cdrbill.com> wrote: > Hi guys, > > PG = 9.1.5 > OS = winDOS 2008R8 > > I have a table that currently has 207 million rows. > there is a timestamp field that contains data. > more data gets copied from another database into this database. > How do I make this do an index scan instead? > I did an "analyze audittrailclinical" to no avail. > I tested different indexes - no same behavior. > > The query does this: > > SELECT > audittrailclinical.pgid, > audittrailclinical.timestamp, > mmuser.logon, > audittrailclinical.entityname, > audittrailclinical.clinicalactivity, > audittrailclinical.audittraileventcode, > account.accountnumber, > patient.dnsortpersonnumber > FROM > public.account, > public.audittrailclinical, > public.encounter, > public.entity, > public.mmuser, > public.patient, > public.patientaccount > WHERE > audittrailclinical.encountersid = encounter.encountersid > and audittrailclinical.timestamp >= '2008-01-01'::timestamp without time zone > and audittrailclinical.timestamp <= '2012-10-05'::timestamp without time zone > AND encounter.practiceid = patient.practiceid > AND encounter.patientid = patient.patientid > AND encounter.staffid = patient.staffid > AND entity.entitysid = audittrailclinical.entitysid > AND mmuser.mmusersid = audittrailclinical.mmusersid > AND patient.practiceid = patientaccount.practiceid > AND patient.patientid = patientaccount.patientid > AND patientaccount.accountsid = account.accountsid > AND patientaccount.defaultaccount = 'Y' > AND patient.dnsortpersonnumber = '347450' ; > > The query plan says: > > " -> Seq Scan on audittrailclinical (cost=0.00..8637598.76 rows=203856829 width=62)" > " Filter: (("timestamp" >= '2008-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2012-10-0500:00:00'::timestamp without time zone))" > > which takes forever. > Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan. Do you have an index on patient.dnsortpersonnumber? Can you post a result from select count(*) from patient where dnsortpersonnumber = '347450'; ? > How do I make this do an index scan instead? > I did an "analyze audittrailclinical" to no avail. > > the table definitions are (the createstamp field is empty - I know, bad data): > > CREATE TABLE audittrailclinical > ( > audittrailid text, > audittraileventcode text, > clinicalactivity text, > eventsuccessful text, > externalunique text, > recordstamp timestamp without time zone, > recorddescription text, > encountersid integer, > eventuserlogon text, > computername text, > applicationcode text, > practiceid integer, > mmusersid integer, > entitysid integer, > entityname text, > "timestamp" timestamp without time zone, > lastuser integer, > createstamp timestamp without time zone, > pgid bigint DEFAULT nextval(('"bravepoint_seq"'::text)::regclass) > ) > WITH ( > OIDS=FALSE > ); > ALTER TABLE audittrailclinical > OWNER TO intergy; > GRANT ALL ON TABLE audittrailclinical TO intergy; > GRANT SELECT ON TABLE audittrailclinical TO rb; > > -- Index: atc_en_time > > CREATE INDEX atc_en_time > ON audittrailclinical > USING btree > (entitysid , "timestamp" ); > > -- Index: atc_id > > -- DROP INDEX atc_id; > > CREATE INDEX atc_id > ON audittrailclinical > USING btree > (audittrailid COLLATE pg_catalog."default" ); > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Oct 16, 2012, at 20:01 , Evgeny Shishkin <itparanoia@gmail.com> wrote: > Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan. > > Do you have an index on patient.dnsortpersonnumber? Can you post a result from > select count(*) from patient where dnsortpersonnumber = '347450'; ? > Yes, there is an index: "Aggregate (cost=6427.06..6427.07 rows=1 width=0)" " -> Index Scan using patient_pracsortpatientnumber on patient (cost=0.00..6427.06 rows=1 width=0)" " Index Cond: (dnsortpersonnumber = '347450'::text)" In fact, all the other criteria is picked using an index. I fear that the >= and <= on the timestamp is causing the issue.If I do a "=" of just one of them, I get an index scan. But I need to scan the entire range. I get queries like "giveme everything that was entered into the system for this patient between these two dates". A single date wouldn't work.
On Tue, Oct 16, 2012 at 08:19:43PM -0400, Chris Ruprecht wrote: > > On Oct 16, 2012, at 20:01 , Evgeny Shishkin <itparanoia@gmail.com> wrote: > > > Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan. > > > > Do you have an index on patient.dnsortpersonnumber? Can you post a result from > > select count(*) from patient where dnsortpersonnumber = '347450'; ? > > > > Yes, there is an index: > > "Aggregate (cost=6427.06..6427.07 rows=1 width=0)" > " -> Index Scan using patient_pracsortpatientnumber on patient (cost=0.00..6427.06 rows=1 width=0)" > " Index Cond: (dnsortpersonnumber = '347450'::text)" > > > In fact, all the other criteria is picked using an index. I fear that the >= and <= on the timestamp is causing the issue.If I do a "=" of just one of them, I get an index scan. But I need to scan the entire range. I get queries like "giveme everything that was entered into the system for this patient between these two dates". A single date wouldn't work. Have you read our FAQ on this matter? http://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Thanks Bruce, I have, and I even thought, I understood it :). I just ran an explain analyze on another table - and ever since the query plan changed. It's now using the index as expected.I guess, I have some more reading to do. On Oct 16, 2012, at 20:31 , Bruce Momjian <bruce@momjian.us> wrote: > > Have you read our FAQ on this matter? >