Thread: BUG #9833: daterange is not utilizing index correctly
The following bug has been logged on the website: Bug reference: 9833 Logged by: Shahar Hadas Email address: shaharhd@gmail.com PostgreSQL version: 9.3.3 Operating system: Mac OSX - Postgres.app Description: Simple table which has a user_birthday field with a type of date (can be NULL value) there's an index (btree) defined on that field, with the rule of NOT user_birthday IS NULL. compare the following queries: explain analyze SELECT * FROM users WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)') explain analyze SELECT * FROM users WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date at first glance both should have the same execution plan, but for some reason, here are the results: "Seq Scan on users (cost=0.00..52314.25 rows=11101 width=241) (actual time=0.014..478.983 rows=208886 loops=1)" " Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)" " Rows Removed by Filter: 901214" "Total runtime: 489.584 ms" "Bitmap Heap Scan on users (cost=4468.01..46060.53 rows=210301 width=241) (actual time=57.104..489.785 rows=209019 loops=1)" " Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday <= '1983-03-01'::date))" " Rows Removed by Index Recheck: 611375" " -> Bitmap Index Scan on ix_users_birthday (cost=0.00..4415.44 rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)" " Index Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday <= '1983-03-01'::date))" "Total runtime: 500.983 ms" as you can see, the <@ daterange is not utilizing the existing index, while the between does. (note that the actual use case for this rule is in a more complex query, which doesn't result in the Recheck Cond and Bitmap Heap scan) Is this a bug? or how the daterange was designed to function?
On Wed, Apr 2, 2014 at 2:10 PM, <shaharhd@gmail.com> wrote: > The following bug has been logged on the website: > > Bug reference: 9833 > Logged by: Shahar Hadas > Email address: shaharhd@gmail.com > PostgreSQL version: 9.3.3 > Operating system: Mac OSX - Postgres.app > Description: > > Simple table which has a user_birthday field with a type of date (can be > NULL value) > > there's an index (btree) defined on that field, with the rule of NOT > user_birthday IS NULL. > > compare the following queries: > > explain analyze SELECT * > FROM users > WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)') > > explain analyze SELECT * > FROM users > WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date > > at first glance both should have the same execution plan, but for some > reason, here are the results: > > "Seq Scan on users (cost=0.00..52314.25 rows=11101 width=241) (actual > time=0.014..478.983 rows=208886 loops=1)" > " Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)" > " Rows Removed by Filter: 901214" > "Total runtime: 489.584 ms" > > "Bitmap Heap Scan on users (cost=4468.01..46060.53 rows=210301 width=241) > (actual time=57.104..489.785 rows=209019 loops=1)" > " Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday > <= '1983-03-01'::date))" > " Rows Removed by Index Recheck: 611375" > " -> Bitmap Index Scan on ix_users_birthday (cost=0.00..4415.44 > rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)" > " Index Cond: ((user_birthday >= '1978-07-15'::date) AND > (user_birthday <= '1983-03-01'::date))" > "Total runtime: 500.983 ms" > > as you can see, the <@ daterange is not utilizing the existing index, while > the between does. > > (note that the actual use case for this rule is in a more complex query, > which doesn't result in the Recheck Cond and Bitmap Heap scan) > > Is this a bug? or how the daterange was designed to function? > Three things off the top of my head: 1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up to - but not including - 1983-03-01", whereas BETWEEN is inclusive. 2). I haven't tested, but wouldn't you need to create a GiST index for <@ to use the index? 3). Related to #2, have you tried installing the btree_gist extension to allow the query planner to use the index?
On 3 Apr 2014 21:39, "bricklen" <bricklen@gmail.com> wrote: > > 1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up to - but not including - 1983-03-01", whereas BETWEEN is inclusive. That means we can't simply rewrite the with as a BETWEEN clause. However conceptually btree could be extended to handle @> operators like this. This is a perfectly reasonable feature request but it's just not something btree can handle currently. Btree operator classes can handle a few specific operators <, <=, =, >=, and > and that's it I believe. Adding more is non trivial work and range data types are fairly new. > 2). I haven't tested, but wouldn't you need to create a GiST index for <@ to use the index? > 3). Related to #2, have you tried installing the btree_gist extension to allow the query planner to use the index? Gist is more flexible about adding new operators. I don't know if that means it has this one though. If it does you would have to build a new index using the btree_gist operator class which is may not be worth it's weight in your database.
I have a more elaborate documentation of the issue in : http://stackoverflow.com/questions/22824314/postgresql-daterange-not-using-index-correctly As per your questions: 1. Doesn't really make a difference - its just a one day range, different dates will give the same result 2. Yes, tried to create a gist index on user_birthday, didn't have any affect. 3. The btree_gist in loaded. It must be if you want to create a gist index on a date field (simple types) this time the email with Reply-All On Thu, Apr 3, 2014 at 9:39 PM, bricklen <bricklen@gmail.com> wrote: > > On Wed, Apr 2, 2014 at 2:10 PM, <shaharhd@gmail.com> wrote: > >> The following bug has been logged on the website: >> >> Bug reference: 9833 >> Logged by: Shahar Hadas >> Email address: shaharhd@gmail.com >> PostgreSQL version: 9.3.3 >> Operating system: Mac OSX - Postgres.app >> Description: >> >> Simple table which has a user_birthday field with a type of date (can be >> NULL value) >> >> there's an index (btree) defined on that field, with the rule of NOT >> user_birthday IS NULL. >> >> compare the following queries: >> >> explain analyze SELECT * >> FROM users >> WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)') >> >> explain analyze SELECT * >> FROM users >> WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date >> >> at first glance both should have the same execution plan, but for some >> reason, here are the results: >> >> "Seq Scan on users (cost=0.00..52314.25 rows=11101 width=241) (actual >> time=0.014..478.983 rows=208886 loops=1)" >> " Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)" >> " Rows Removed by Filter: 901214" >> "Total runtime: 489.584 ms" >> >> "Bitmap Heap Scan on users (cost=4468.01..46060.53 rows=210301 width=241) >> (actual time=57.104..489.785 rows=209019 loops=1)" >> " Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday >> <= '1983-03-01'::date))" >> " Rows Removed by Index Recheck: 611375" >> " -> Bitmap Index Scan on ix_users_birthday (cost=0.00..4415.44 >> rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)" >> " Index Cond: ((user_birthday >= '1978-07-15'::date) AND >> (user_birthday <= '1983-03-01'::date))" >> "Total runtime: 500.983 ms" >> >> as you can see, the <@ daterange is not utilizing the existing index, >> while >> the between does. >> >> (note that the actual use case for this rule is in a more complex query, >> which doesn't result in the Recheck Cond and Bitmap Heap scan) >> >> Is this a bug? or how the daterange was designed to function? >> > > Three things off the top of my head: > 1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up to > - but not including - 1983-03-01", whereas BETWEEN is inclusive. > 2). I haven't tested, but wouldn't you need to create a GiST index for <@ > to use the index? > 3). Related to #2, have you tried installing the btree_gist extension to > allow the query planner to use the index? >
On Fri, Apr 4, 2014 at 7:36 AM, Greg Stark <stark@mit.edu> wrote: > > On 3 Apr 2014 21:39, "bricklen" <bricklen@gmail.com> wrote: > > > > > 1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up > to - but not including - 1983-03-01", whereas BETWEEN is inclusive. > > That means we can't simply rewrite the with as a BETWEEN clause. However > conceptually btree could be extended to handle @> operators like this. > You might have over looked something. When using a '[<some date>, <some date>]' range, postgres automatically change it to '[<some date>, <some date> + 1)' The explain shows it: EXPLAIN ANALYZE SELECT * FROM users WHERE user_birthday <@ daterange('[1978-07-15,1983-03-02]') results in: "Seq Scan on users (cost=0.00..53248.25 rows=11101 width=241) (actual time=0.023..546.617 rows=209154 loops=1)" " Filter: (user_birthday <@ '[1978-07-15,1983-03-03)'::daterange)" " Rows Removed by Filter: 900946" "Total runtime: 558.557 ms" > > This is a perfectly reasonable feature request but it's just not something > btree can handle currently. Btree operator classes can handle a few > specific operators <, <=, =, >=, and > and that's it I believe. Adding more > is non trivial work and range data types are fairly new. > My assumption was, that due to the fact that a range has an upper and lower limit, it would be more efficient than between, as the planner can always assume that the upper limit is always equal or above the lower limit. Wouldn't that make the index search much faster, as by the time you found the index for the lower limit, you're search options for the upper limit are limited and then easier to plan on. > > 2). I haven't tested, but wouldn't you need to create a GiST index for > <@ to use the index? > How do you do that? create a GiST index for a data field with <@ operator? that's the relevant table: CREATE TABLE users ( user_id bigserial NOT NULL, user_email text NOT NULL, user_password text, user_first_name text NOT NULL, user_middle_name text, user_last_name text NOT NULL, user_birthday date, CONSTRAINT pk_users PRIMARY KEY (user_id) ) The users table has 1.1 million records with full randomness on the dates between the dates: 1974-04-01 and 1998-04-03. all I was able to do was create this, which wouldn't affect this query... CREATE INDEX ix_users_birthday_gist ON glances.users USING gist (user_birthday) WHERE NOT user_birthday IS NULL; > > 3). Related to #2, have you tried installing the btree_gist extension > to allow the query planner to use the index? > > Gist is more flexible about adding new operators. I don't know if that > means it has this one though. If it does you would have to build a new > index using the btree_gist operator class which is may not be worth it's > weight in your database. > btree_gist is enabled and loaded, otherwise I wouldn't be able to create the GiST index on user_birthday. but as I said - it didn't had any affect. Shahar