Thread: GIN vs BTREE - query planner picking the wrong one some times
Using postgres 9.5 and I'm trying to improve performance of searches using the LIKE operator on column containing a URL.
Consider a table like this with about 50 million rows
CREATE TABLE page_hit (
timestamp_ timestamp without time zone NOT NULL,
location_url character varying(2048)
)
and a query like this
SELECT count(*)
FROM page_hit
WHERE location_url LIKE 'http://mysite.foo.com/path/morepath%'
The above query makes use of the following index
CREATE INDEX location_idx
ON page_hit
USING btree
(location_url varchar_pattern_ops);
The works great..BUT... to support LIKE wild cards in other parts of the string (for example a leading %) I created a GIN index as follows
CREATE INDEX location_idx_gin
ON page_hit
USING gin
(location COLLATE pg_catalog."default" gin_trgm_ops);
The problem is when I run EXPLAIN ANALYZE on the above query now its only ever using location_idx_gin and ignoring location_idx which kinda sucks cause the location_idx index is faster at trailing % queries. The query planner seems to ignore my BTREE index in all cases and uses the GIN index.
The BTREE index outperforms the GIN index (in the case of a trailing %) but I want the planner to pick the GIN index in the case of leading %. Is there a way to do this?
Some metrics (queries trailing %):
- BTREE : <1 second explain: https://explain.depesz.com/s/7wgx
- GIN : 3.8 seconds explain: https://explain.depesz.com/s/wYhk
Max Kremer <mkremer@trialfire.com> writes: > The problem is when I run EXPLAIN ANALYZE on the above query now its only > ever using location_idx_gin and ignoring location_idx which kinda sucks > cause the location_idx index is faster at trailing % queries. The query > planner seems to ignore my BTREE index in all cases and uses the GIN index. I tried to reproduce that, but in my hands the planner just about always prefers the btree if it has a choice. Especially for longer strings such as in your example. (It looks to me like the cost estimate for GIN is mostly proportional to the number of trigrams in the query string; I find that GIN is estimated to cost about the same for LIKE strings as short as 'the%', but loses badly for anything longer.) Can you show a concrete example (with sample data) where this doesn't hold? > Some metrics (queries trailing %): > - BTREE : <1 second explain: https://explain.depesz.com/s/7wgx > - GIN : 3.8 seconds explain: https://explain.depesz.com/s/wYhk The obfuscation of the index conditions makes these just about useless for telling what's going on :-( regards, tom lane