Re: Get more from indices. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Get more from indices. |
Date | |
Msg-id | 20140108.132257.201466486.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Get more from indices. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Get more from indices.
|
List | pgsql-hackers |
Hello, tgl> I'm pretty sure that IsA test prevents the optimization from ever firing. Thank you. tgl> But aside from hasty typos, Oops! I've picked up wrong node. It always denies pathkeys extension. | !IsA(member, Var)) is a mistake of the following. | !IsA(member->em_expr, Var)) tgl> is there enough left of this optimization to tgl> be worth the complication? Certainly. This patch is intended to be with my another UNION-ALL optimization pathce at first. It does very much with UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned tables)-DISTINCT-ORDERBY-LIMIT. ===== test tables create table pu1 (a int not null, b int not null, c int, d text); create unique index i_pu1_ab on pu1 (a, b); create table cu11 (like pu1 including all) inherits (pu1); create table cu12 (like pu1 including all) inherits (pu1); create table pu2 (like pu1 including all); create table cu21 (like pu2 including all) inherits (pu2); create table cu22 (like pu2 including all) inherits (pu2); insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a); insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a); insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a); insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a); ==== Following example is an implicit union derived from partitioned tables created as above. Limit is added to highlight the effect. 9.4dev with no patch, | =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100; | QUERY PLAN | ---------------------------------------------------------------------------- | Limit (actual time=246.653..246.730 rows=100 loops=1) | -> Unique (actual time=246.646..246.713 rows=100 loops=1) | -> Sort (actual time=246.645..246.666 rows=100 loops=1) | Sort Key: pu1.a, pu1.b, pu1.c, pu1.d | Sort Method: external sort Disk: 5280kB | -> Append (actual time=0.017..52.608 rows=200000 loops=1) | -> Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1) | -> Seq Scan on cu11 (actual time=0.015..17.047 rows=100000 loops=1) | -> Seq Scan on cu12 (actual time=0.007..15.474 rows=100000 loops=1) | Total runtime: 247.854 ms Fairly common query. It will get following plan with this patch. | =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100; | QUERY PLAN | ----------------------------------------------------------------------------- | Limit (actual time=0.108..0.350 rows=100 loops=1) | -> Unique (actual time=0.104..0.329 rows=100 loops=1) | -> Merge Append (actual time=0.103..0.214 rows=100 loops=1) | Sort Key: pu1.a, pu1.b, pu1.c, pu1.d | -> Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1) | -> Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1) | -> Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1) | Total runtime: 0.666 ms The same could be said on union with my another union-pullup patch. We will get the following result with only the union-pullup patch. |=# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * fromcu22 order by a limit 10000; | QUERY PLAN |--------------------------------------------------------------------------- | Limit (actual time=474.825..482.326 rows=10000 loops=1) | -> Unique (actual time=474.824..481.357 rows=10000 loops=1) | -> Sort (actual time=474.823..477.101 rows=10000 loops=1) | Sort Key: cu11.a, cu11.b, cu11.c, cu11.d | Sort Method: external sort Disk: 10568kB | -> Append (actual time=0.018..99.310 rows=400000 loops=1) | -> Seq Scan on cu11 (actual time=0.017..16.263 rows=100000 loops=1) | -> Seq Scan on cu12 (actual time=0.006..14.831 rows=100000 loops=1) | -> Seq Scan on cu21 (actual time=0.006..14.766 rows=100000 loops=1) | -> Seq Scan on cu22 (actual time=0.006..14.721 rows=100000 loops=1) | Total runtime: 484.555 ms This is also a quite common operation but implicit DISTINCT prevents planner from selecting index scans. (ORDER BY is not essential but needed because planner does not consult distinct_pathkeys on planning scans and I've left it as it is.) The planner gives the following plan with this patch. | =# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select *from cu22 order by a limit 10000; | QUERY PLAN | ------------------------------------------------------------------------- | Limit (actual time=0.197..14.739 rows=10000 loops=1) | -> Unique (actual time=0.191..13.527 rows=10000 loops=1) | -> Merge Append (actual time=0.191..7.894 rows=10000 loops=1) | Sort Key: cu11.a, cu11.b, cu11.c, cu11.d | -> Index Scan .. on cu11 (actual time=0.054..4.676 rows=10000 loops=1) | -> Index Scan .. on cu12 (actual time=0.045..0.045 rows=1 loops=1) | -> Index Scan .. on cu21 (actual time=0.044..0.044 rows=1 loops=1) | -> Index Scan .. on cu22 (actual time=0.043..0.043 rows=1 loops=1) | Total runtime: 15.872 ms This seems for me worth enough to do. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: