Thread: GIN vs BTREE - query planner picking the wrong one some times

GIN vs BTREE - query planner picking the wrong one some times

From

Max Kremer

Date:

28 June 2016, 21:17:53

Using postgres 9.5 and I'm trying to improve performance of searches using the LIKE operator on column containing a URL.

Consider a table like this with about 50 million rows

CREATE TABLE page_hit (

timestamp_ timestamp without time zone NOT NULL,

location_url character varying(2048)

)

and a query like this

SELECT count(*)

FROM page_hit

WHERE location_url LIKE 'http://mysite.foo.com/path/morepath%'

The above query makes use of the following index

CREATE INDEX location_idx

ON page_hit

USING btree

(location_url varchar_pattern_ops);

The works great..BUT... to support LIKE wild cards in other parts of the string (for example a leading %) I created a GIN index as follows

CREATE INDEX location_idx_gin

ON page_hit

USING gin

(location COLLATE pg_catalog."default" gin_trgm_ops);

The problem is when I run EXPLAIN ANALYZE on the above query now its only ever using location_idx_gin and ignoring location_idx which kinda sucks cause the location_idx index is faster at trailing % queries. The query planner seems to ignore my BTREE index in all cases and uses the GIN index.

The BTREE index outperforms the GIN index (in the case of a trailing %) but I want the planner to pick the GIN index in the case of leading %. Is there a way to do this?

Some metrics (queries trailing %):

- BTREE : <1 second explain: https://explain.depesz.com/s/7wgx

- GIN : 3.8 seconds explain: https://explain.depesz.com/s/wYhk

ᐧ

Re: GIN vs BTREE - query planner picking the wrong one some times

From

Tom Lane

Date:

29 June 2016, 17:20:41

Max Kremer <mkremer@trialfire.com> writes:
> The problem is when I run EXPLAIN ANALYZE on the above query now its only
> ever using location_idx_gin and ignoring location_idx which kinda sucks
> cause the location_idx index is faster at trailing % queries. The query
> planner seems to ignore my BTREE index in all cases and uses the GIN index.

I tried to reproduce that, but in my hands the planner just about always
prefers the btree if it has a choice.  Especially for longer strings such
as in your example.  (It looks to me like the cost estimate for GIN is
mostly proportional to the number of trigrams in the query string; I find
that GIN is estimated to cost about the same for LIKE strings as short as
'the%', but loses badly for anything longer.)

Can you show a concrete example (with sample data) where this doesn't
hold?

> Some metrics (queries trailing %):
>  - BTREE : <1 second explain: https://explain.depesz.com/s/7wgx
>  - GIN   :   3.8 seconds explain: https://explain.depesz.com/s/wYhk

The obfuscation of the index conditions makes these just about useless
for telling what's going on :-(

            regards, tom lane