Thread: Use of index for 50% column restriction
As part of my research on the parsing/planning behavior of PREPARE, I found a surprising behavior --- a WHERE clause that is 50% restrictive is using an index. I thought only <10% restrictions used indexes. To setup the test: DROP TABLE IF EXISTS test;CREATE TABLE test (c1 INT, c2 INT, c3 INT);INSERT INTO test SELECT c1, 0, 0 FROM generate_series(1,10000) AS a(c1);INSERT INTO test SELECT c1, 1, 1 FROM generate_series(10001, 20000) AS a(c1);CREATE INDEXi_test_c2 ON test (c2);ANALYZE test;EXPLAIN SELECT * FROM test WHERE c2 = 0; The output is: QUERY PLAN ----------------------------------------------------------------------------- Index Scan using i_test_c2 on test (cost=0.29..349.29rows=10000 width=12) ---------- Index Cond: (c2 = 0) (2 rows) \timing does show the optimizer is making the right decision to use the index, and this behavior is the same back to at least 9.3. Setting effective_cache_size = '8kB' does not change this behavior. What am I missing? Is my 10% assumption wrong? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> writes: > As part of my research on the parsing/planning behavior of PREPARE, I > found a surprising behavior --- a WHERE clause that is 50% restrictive > is using an index. I thought only <10% restrictions used indexes. There's no such hard-and-fast rule. The cost estimate break point depends greatly on the index order correlation (which is 100% in your example), as well as some other factors like the index size versus effective_cache_size. For randomly-ordered data I believe the cutover is actually well below 10%. regards, tom lane
On Wed, Jun 8, 2016 at 01:28:54PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > As part of my research on the parsing/planning behavior of PREPARE, I > > found a surprising behavior --- a WHERE clause that is 50% restrictive > > is using an index. I thought only <10% restrictions used indexes. > > There's no such hard-and-fast rule. The cost estimate break point depends > greatly on the index order correlation (which is 100% in your example), > as well as some other factors like the index size versus > effective_cache_size. > > For randomly-ordered data I believe the cutover is actually well below 10%. Ah, I had not considered the correlation order of the rows in the table. This test returns the sequential scan I expected by using floor(random() * 2): DROP TABLE IF EXISTS test;CREATE TABLE test (c1 INT, c2 INT, c3 INT);INSERT INTO test SELECT c1, floor(random() * 2), 0 FROMgenerate_series(1, 10000) AS a(c1);INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000)AS a(c1);CREATE INDEX i_test_c2 ON test (c2);ANALYZE test;EXPLAIN SELECT * FROM test WHERE c2 = 0; Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Wed, Jun 8, 2016 at 05:07:34PM -0400, Bruce Momjian wrote: > > For randomly-ordered data I believe the cutover is actually well below 10%. > > Ah, I had not considered the correlation order of the rows in the table. > This test returns the sequential scan I expected by using floor(random() > * 2): > > DROP TABLE IF EXISTS test; > CREATE TABLE test (c1 INT, c2 INT, c3 INT); > INSERT INTO test SELECT c1, floor(random() * 2), 0 FROM generate_series(1, 10000) AS a(c1); > INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000) AS a(c1); > CREATE INDEX i_test_c2 ON test (c2); > ANALYZE test; > EXPLAIN SELECT * FROM test WHERE c2 = 0; > > Thanks. Just a follow-up, but even with a randomized correlation order, it seems 25% restrictivity generates a Bitmap Index Scan: DROP TABLE IF EXISTS test;CREATE TABLE test (c1 INT, c2 INT, c3 INT);INSERT INTO test SELECT c1, abs(floor(random() * 4)-1),abs(floor(random() * 4)-1) FROM generate_series(1, 10000) AS a(c1);INSERT INTO test SELECT c1, abs(floor(random() *4)-1), abs(floor(random() * 4)-1) FROM generate_series(10001, 15000) AS a(c1);INSERT INTO test SELECT c1, abs(floor(random()* 4)-1), abs(floor(random() * 4)-1) FROM generate_series(15001, 20000) AS a(c1);CREATE INDEX i_test_c2ON test (c2);ANALYZE test; SELECT c2, COUNT(*) FROM test GROUP BY c2 ORDER BY 1; c2 | count----+------- 0 | 5020 25% 1 | 10006 50% 2 | 4974 25% EXPLAIN SELECT * FROM TEST WHERE c2 = 1; QUERY PLAN-----------------------------------------------------------Seq Scan on test (cost=0.00..359.00 rows=10006 width=12) Filter: (c2 = 1) EXPLAIN SELECT * FROM TEST WHERE c2 = 0; QUERY PLAN----------------------------------------------------------------------------Bitmap Heap Scan on test (cost=99.19..270.94rows=5020 width=12) Recheck Cond: (c2 = 0) -> Bitmap Index Scan on i_test_c2 (cost=0.00..97.94 rows=5020width=0) Index Cond: (c2 = 0) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 6/8/16 4:36 PM, Bruce Momjian wrote: > Just a follow-up, but even with a randomized correlation order, it seems > 25% restrictivity generates a Bitmap Index Scan: AFAIK we do the bitmap heap scan in heap order, thereby eliminating the effect of correlation? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Tue, Jun 14, 2016 at 03:24:12PM -0500, Jim Nasby wrote: > On 6/8/16 4:36 PM, Bruce Momjian wrote: > >Just a follow-up, but even with a randomized correlation order, it seems > >25% restrictivity generates a Bitmap Index Scan: > > AFAIK we do the bitmap heap scan in heap order, thereby eliminating the > effect of correlation? High correlation would still cause us to access fewer heap pages than random data. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +