GIN, pg_trgm and large table - Mailing list pgsql-novice
From | Max Fomichev |
---|---|
Subject | GIN, pg_trgm and large table |
Date | |
Msg-id | 44c55f0b-a67e-94d3-07f3-6381204ef9e9@gmail.com Whole thread Raw |
Responses |
Re: GIN, pg_trgm and large table
Re: GIN, pg_trgm and large table |
List | pgsql-novice |
Hello! I have the following table and index with about 15 billion records. CREATE TABLE us_en.ngrams ( ngram text COLLATE pg_catalog."default" NOT NULL, year smallint NOT NULL, occurrence bigint NOT NULL, words smallint NOT NULL ) WITH ( OIDS = FALSE ) TABLESPACE pg_default; CREATE INDEX trgm_idx_ngram_ngrams_us_en ON us_en.ngrams USING gin (ngram COLLATE pg_catalog."default" gin_trgm_ops) TABLESPACE pg_default; Configuration: PostgreSQL 9.5.12 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit 128GB RAM, 2TB SDD Server settings: shared_buffers = 32GB effective_cache_size = 96GB work_mem = 4GB maintenance_work_mem = 16GB min_wal_size = 4GB max_wal_size = 8GB checkpoint_completion_target = 0.9 wal_buffers = 16MB max_worker_processes = 16 gin_fuzzy_search_limit = 10000 My queries related to us_en.ngrams.ngram field are too slow: ngrams=# set enable_seqscan=false; SET ngrams=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM us_en.ngrams WHERE ngram LIKE '%computer%' LIMIT 20; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=232884.91..232933.12 rows=20 width=36) (actual time=44962.926..44966.214 rows=14 loops=1) Buffers: shared hit=18177345 read=124224 -> Bitmap Heap Scan on ngrams (cost=232884.91..61129746.57 rows=25261021 width=36) (actual time=44962.925..44966.202 rows=14 loops=1) Recheck Cond: (ngram ~~ '%computer%'::text) Rows Removed by Index Recheck: 10 Heap Blocks: exact=8 Buffers: shared hit=18177345 read=124224 -> Bitmap Index Scan on trgm_idx_ngram_ngrams_us_en (cost=0.00..226569.66 rows=25261021 width=0) (actual time=44961.929..44961.929 rows=24 loops=1) Index Cond: (ngram ~~ '%computer%'::text) Buffers: shared hit=18177345 read=124216 Planning time: 0.305 ms Execution time: 44966.271 ms (12 rows) ngrams=# set enable_seqscan=true; SET ngrams=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM us_en.ngrams WHERE ngram LIKE '%computer%' LIMIT 20; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..215.41 rows=20 width=36) (actual time=0.248..29.089 rows=20 loops=1) Buffers: shared hit=14 read=1509 -> Seq Scan on ngrams (cost=0.00..272072628.00 rows=25261021 width=36) (actual time=0.247..29.074 rows=20 loops=1) Filter: (ngram ~~ '%computer%'::text) Rows Removed by Filter: 207598 Buffers: shared hit=14 read=1509 Planning time: 0.332 ms Execution time: 29.117 ms (8 rows) ngrams=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM us_en.ngrams WHERE ngram LIKE '%version%' ORDER BY (occurrence, year) DESC LIMIT 20; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=4661042.23..4661042.28 rows=20 width=36) (actual time=144417.365..144417.365 rows=0 loops=1) Buffers: shared hit=28982531 read=167634 -> Sort (cost=4661042.23..4664172.58 rows=1252138 width=36) (actual time=144417.363..144417.363 rows=0 loops=1) Sort Key: (ROW(occurrence, year)) DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=28982531 read=167634 -> Bitmap Heap Scan on ngrams (cost=20372.07..4627723.29 rows=1252138 width=36) (actual time=144417.331..144417.331 rows=0 loops=1) Recheck Cond: (ngram ~~ '%version%'::text) Buffers: shared hit=28982527 read=167633 -> Bitmap Index Scan on trgm_idx_ngram_ngrams_us_en (cost=0.00..20059.04 rows=1252138 width=0) (actual time=144417.328..144417.328 rows=0 loops=1) Index Cond: (ngram ~~ '%version%'::text) Buffers: shared hit=28982527 read=167633 Planning time: 0.344 ms Execution time: 144417.522 ms (14 rows) It there any way to improve GIN/pg_tgrm performance on a such large table? Thank you! -- Best regards, Max Fomichev
pgsql-novice by date: