Thread: index problems (again)
Hi all Firstly, I appreciate that my index problems are fairly difficult to debug given that I can't upload the data anywhere (it's commercially sensitive); I tried creating an equivalent dataset for my last problem using a lot of random() inserts, but unfortunately, even though the sizes and index cardinality seemed similar, it didn't exhibit the same problem, which leaves me a bit stuck. I now have (what seems to me to be) an utterly bizarre situation where postgres is using the "wrong" index, to the extent where I can't even begin to comprehend why it would do so. http://explain.depesz.com/s/uF4L # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate BETWEEN 20160219 AND 20160221; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=25.54..25.55 rows=1 width=0) (actual time=25337.593..25337.594 rows=1 loops=1) Buffers: shared hit=2976790 read=152188 InitPlan 1 (returns $0) -> Limit (cost=0.43..25.54 rows=1 width=4) (actual time=25337.579..25337.587 rows=1 loops=1) Buffers: shared hit=2976790 read=152188 -> Index Scan using legs_sc_id_idx on legs (cost=0.43..361498.49 rows=14394 width=4) (actual time=25337.578..25337.578 rows=1 loops=1) Index Cond: (sc_id IS NOT NULL) Filter: ((scdate >= 20160219) AND (scdate <= 20160221)) Rows Removed by Filter: 4068865 Buffers: shared hit=2976790 read=152188 Planning time: 0.235 ms Execution time: 25337.620 ms (12 rows) Time: 25338.375 ms There is an index on scdate,sc_id that (I would have thought) should be ideal for this query but it's ignored. sc_id has no null values - it's even _defined_ as NOT NULL. I've no idea why the planner would think that it needs to use the sc_id index on this query. If I create an index on sc_id,scdate, that one is used (index-only scan) and the query returns in 200ms or so. http://explain.depesz.com/s/3qNC =# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate BETWEEN 20160219 AND 20160221; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=7.32..7.33 rows=1 width=0) (actual time=207.194..207.194 rows=1 loops=1) Buffers: shared hit=1 read=11120 InitPlan 1 (returns $0) -> Limit (cost=0.43..7.32 rows=1 width=4) (actual time=207.187..207.187 rows=1 loops=1) Buffers: shared hit=1 read=11120 -> Index Only Scan using legs_sc_id_scdate_idx on legs (cost=0.43..99204.99 rows=14394 width=4) (actual time=207.185..207.185 rows=1 loops=1) Index Cond: ((sc_id IS NOT NULL) AND (scdate >= 20160219) AND (scdate <= 20160221)) Heap Fetches: 0 Buffers: shared hit=1 read=11120 Planning time: 0.236 ms Execution time: 207.223 ms I'm utterly at a loss. There are only 427 distinct scdate values on this table, but 4 million sc_id values (and the spread across scdate is reasonably similar - between 6000 and 11000 for each), so using an index on (just) sc_id makes absolutely no sense (I would expect it to be slower than a tablescan, no?). I also don't see how sc_id,scdate is more useful than scdate,sc_id. Have I completely misunderstood how this is all meant to work? I tried reading the documentation around understanding EXPLAIN and the slow query questions in the FAQ/Wiki but what I read didn't really seem to suggest any investigative steps other than "RUN ANALYZE and VACUUM" - is there a good doc on how to go about debugging this kind of thing? Or even on how the planner makes its decisions? I'm currently at the point where I'm just throwing random indexes at tables in the vain hope that it might help. I'm fairly sure that that's suboptimal :) As before, pg9.5.1, CentOS 6 x64, 4GB RAM, Xeon X3220. effective_cache_size is set to 3GB (but changing it wildly up or down doesn't change anything), shared_buffers is 1GB, work_mem is 5242kB (but changing to anything up to 1GB makes no difference). Thanks Geoff
2016-03-07 13:38 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:
# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;
Will it help if you'll add `count(*)` to your query like this:
SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND 20160221;
?
--
?
Victor Y. Yegorov
On 7 March 2016 at 11:48, Victor Yegorov <vyegorov@gmail.com> wrote: > 2016-03-07 13:38 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>: >> >> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate >> BETWEEN 20160219 AND 20160221; > > > Will it help if you'll add `count(*)` to your query like this: > > SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND > 20160221; Thanks for the reply. Yes, that does work around the problem, sort-of (although it's only using the scdate-only index, since it needs all the data): Aggregate (cost=1242.59..1242.60 rows=1 width=4) -> Index Scan using legs_scdate_idx on legs (cost=0.43..1170.62 rows=14394 width=4) Index Cond: ((scdate >= 20160219) AND (scdate <= 20160221)) Unfortunately the cost of changing all the code that uses MIN() in this way would be higher than just adding an extra index :( I suppose the thought is that for selecting just the MIN() value, by traipsing through the index you immediately find the lowest match - so for a dataset where scdate cardinality is higher, this would make sense; indeed if I give this query a value with scdate in the low range of the table it returns quickly (although still slower than when it uses the scdate index). It seems to me that the weighting the planner applied to this MIN() rule is too high, or perhaps it needs to pay more attention to the statistics of the indexes for the WHERE clauses? Even given that, I still don't see why the (scdate,sc_id) index isn't perfect for this; it allows the planner to use sc_id for MIN() while using scdate to restrict the values. Three values to look up from the index-only. If I manually change the query to do what I hoped the planner would do for me: SELECT LEAST(( SELECT MIN(sc_id) FROM legs WHERE scdate =20160219), ( SELECT MIN(sc_id) FROM legs WHERE scdate =20160220), ( SELECT MIN(sc_id) FROM legs WHERE scdate =20160221)); it returns in 16ms - and uses the (scdate_sc_id_idx) index as expected; again though, I can't really justify changing all the code to do that instead. Geoff
2016-03-07 15:01 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:
Unfortunately the cost of changing all the code that uses MIN() in
this way would be higher than just adding an extra index :(
I suppose the thought is that for selecting just the MIN() value, by
traipsing through the index you immediately find the lowest match - so
for a dataset where scdate cardinality is higher, this would make
sense; indeed if I give this query a value with scdate in the low
range of the table it returns quickly (although still slower than when
it uses the scdate index).
It seems to me that the weighting the planner applied to this MIN()
rule is too high, or perhaps it needs to pay more attention to the
statistics of the indexes for the WHERE clauses?
Even given that, I still don't see why the (scdate,sc_id) index isn't
perfect for this; it allows the planner to use sc_id for MIN() while
using scdate to restrict the values. Three values to look up from the
index-only.
Your `sc_id` and `scdate` columns are correlated.
Planner has no such knowledge and assumes columns being independent. Your `scdate` predicate is
estimate to return 14394 rows (based on the EXPLAIN of your first post). I think, that this corresponds to
a quite small portion of your table, less than 1% (based on `Rows Removed by Filter: 4068865` from the
same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere in the table.
Therefore, reading min values in the order of your PK is optimal, as you're expected to hit a rows that
matches given conditions quite soon.
Problem is — your predicate matches a bunch of rows towards the end of the table, which causes Postgres
to read a big portion of your index before it finds the row that fits.
Right now (9.5 and earlier versions) I do not know of any options that would not require fixing your queries.
P.S. Maybe `Upper pathification` patch, that is being considered for 9.6, can deal with such cases.
Victor Y. Yegorov
On 7 March 2016 at 13:23, Victor Yegorov <vyegorov@gmail.com> wrote: > Your `sc_id` and `scdate` columns are correlated. Actually not necessarily, although in the majority case that's mostly true. > Planner has no such knowledge and assumes columns being independent. Your > `scdate` predicate is > estimate to return 14394 rows (based on the EXPLAIN of your first post). I > think, that this corresponds to > a quite small portion of your table, less than 1% (based on `Rows Removed by > Filter: 4068865` from the > same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere > in the table. Oh I see! To make sure I understand what you're saying: given a fully random distribution of scdate across the table, searching the whole table for the first instance of scdate=(20160219|20160220|20160221) will be fastest, because using the scdate index could end up doing lots of random-accesses across the whole table to get sc_id values, whereas assuming a random distribution of scdate values I would only expect to have to scan (sequentially) 3% of the table before I find one of the scdate values that match, yes? > Right now (9.5 and earlier versions) I do not know of any options that would > not require fixing your queries. Well I can cope with using (sc_id,scdate) index (which I guess wins purely by being index-only and smaller than the table), but it makes me unhappy. I think I realised why the planner won't use (scdate,sc_id): the multicolumn index page (http://www.postgresql.org/docs/current/static/indexes-multicolumn.html) suggests that: The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. So since the first column is an inequality constraint, PG won't use the second column from index(scdate,sc_id) to retrieve the MIN() value. This explains why it happily uses the index for the LEAST(MIN(),MIN(),MIN()) version (since each one is an equality condition); it seems like that rule is just too restrictive, because a range constraint on the first key, when used in conjunction with an index-only scan, is almost always going to win, even if the second constraint matches all of the rows and even if the range constraint returns all the values in the table (unless the index is larger than the table itself, I suppose). I might suggest that perhaps the rule should be relaxed so that an inequality constraint does not stop the subsequent columns being used in an index-only scan. That assumes that I've not completely misunderstood, of course :) Geoff
On 7 March 2016 at 14:18, I wrote: > That assumes that I've not completely misunderstood, of course :) Always a dangerous assumption, I'm rapidly learning. The very next section: Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. So it seems that it should in fact be usable after all. So I'm still stumped as to why the (scdate,sc_id) index isn't used :( Geoff
On 7 March 2016 at 14:27, I wrote: > So it seems that it should in fact be usable after all. So I'm still > stumped as to why the (scdate,sc_id) index isn't used :( Also, while the index on sc_id will be sorted there's no guarantee that sc_id values will be in order in the table itself, so you're still left with (30,000) potentially random accesses to the table, even assuming fully random distribution of scdate (with a worst-case of 970000 random accesses). That average case is no better than the (30,000) random accesses that were required from using an scdate index, even ignoring the scdate/sc_id index. So I'm afraid I'm fully back in the "I still don't get it" column. Geoff
Geoff Winkless <pgsqladmin@geoff.dj> writes: > So it seems that it should in fact be usable after all. So I'm still > stumped as to why the (scdate,sc_id) index isn't used :( Because the other way is estimated to be cheaper. The estimate is wrong, because it's based on a statistical assumption that's wrong (ie that sc_id and scdate are uncorrelated), but it's what we have to work with at the moment. As you found upthread, that index could be used in the way you want if you had an equality condition on scdate. So the workaround I'd suggest is to whack the query into that shape. Something along the lines of (untested) select min((select min(sc_id) from legs where scdate = gs)) from generate_series(20160219, 20160221) gs This would only work well for relatively small ranges of scdate, but if you had a large range then I think the original plan would've been fine. regards, tom lane
On 7 March 2016 at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Geoff Winkless <pgsqladmin@geoff.dj> writes: >> So it seems that it should in fact be usable after all. So I'm still >> stumped as to why the (scdate,sc_id) index isn't used :( > > Because the other way is estimated to be cheaper. The estimate is > wrong, because it's based on a statistical assumption that's wrong > (ie that sc_id and scdate are uncorrelated), but it's what we have > to work with at the moment. Are you saying that the planner can't tell without scanning the index how much of the index the range constraint will retrieve? That's reasonable, I suppose, but if you consider the relative size of the index (92MB) and table (1.6GB) (both of which pieces of information are available to the planner at query-time) if I were to scan 3% of the table (which we assume the planner is estimating because of the cardinality of the scdate field) I've read as much data from disk as I've read for 50% of the index. That's ignoring the reads I'll have to do from the sc_id index too... so in the worst-case where I've had to read the entire index (because the range didn't actually restrict any records) I'm still only 2x the average-case of the other way. Whereas the worst-case of the sc_id-only-index-plus-table-retrieve is about 1000x the worst case of the index-only scan. > select min((select min(sc_id) from legs where scdate = gs)) > from generate_series(20160219, 20160221) gs > This would only work well for relatively small ranges of scdate, As it happens it works for the full range of scdate and returns in 99ms. # select min((select min(sc_id) from legs where scdate = gs)) from generate_series(20150101, 20160303) gs; min ---------- 12914746 (1 row) Time: 99.210 ms > but if you had a large range then I think the original plan > would've been fine. Well yes, obviously doing MIN() across the whole range is going to be able to just return as soon as it gets the first value from sc_id and references the table to check the date; however even in that _best_ case the value comes back in 25ms, ie the _best-case_ index-plus-table-scan is 1/3 the time of the worst-case index-only scan. I accept that this is how the planner behaves, but I don't accept that it's optimal. Geoff
Geoff Winkless <pgsqladmin@geoff.dj> writes: > On 7 March 2016 at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Because the other way is estimated to be cheaper. The estimate is >> wrong, because it's based on a statistical assumption that's wrong >> (ie that sc_id and scdate are uncorrelated), but it's what we have >> to work with at the moment. > Are you saying that the planner can't tell without scanning the index > how much of the index the range constraint will retrieve? The question isn't "how much", the question is "where is that data exactly?". In English, what that plan is trying to do is scan the index in sc_id order until it hits a row with scdate in the target range. The first such row, by definition, has the correct min(sc_id) value. The problem is that we're guessing at how soon we'll hit such a row. If the columns are independent, then the planner can guess based on how many rows in the whole table have scdate in the target range, and it will probably be about right. But that estimate can fall down very badly if sc_id and scdate increase together, because then the target rows aren't randomly distributed in the index sequence but could all be all the way at the far end of the index. > I accept that this is how the planner behaves, but I don't accept that > it's optimal. If we had cross-column correlation stats we could detect this pitfall, but without that it's hard to do. regards, tom lane
On 7 March 2016 at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In English, what that plan is trying to do is scan the index > in sc_id order until it hits a row with scdate in the target range. > The first such row, by definition, has the correct min(sc_id) value. > The problem is that we're guessing at how soon we'll hit such a row. > If the columns are independent, then the planner can guess based on how > many rows in the whole table have scdate in the target range, and it > will probably be about right. But that estimate can fall down very > badly if sc_id and scdate increase together, because then the target > rows aren't randomly distributed in the index sequence but could all be > all the way at the far end of the index. I'm sorry, I'm obviously not being clear. I already accepted this argument when Victor gave it, although I believe that in part it falls down because sc_id is also (potentially) randomly distributed so it's not like you're doing a sequential table scan (it might work better on a clustered table, but we don't have those :) ) So you still have an extra layer of indirection into a large table with lots of random accesses. > If we had cross-column correlation stats we could detect this pitfall, > but without that it's hard to do. But as far as I can see, apart from the absolute extremes, the index-only scan is _always_ going to be quicker than the index+table scan. It doesn't matter whether or not the distribution is random or skewed, the index-only scan is going to be better (or approximately equally as good). We can see that by the massive speedup I get by using index(scid,scdate), which in all other respects is going to suffer from exactly the same problem from that the scid-only index suffers. And the real advantage: at the extremes, the index-only worst-case is minimally worse than the best case. Whereas the worst-case of the index-scan-plus-table-compare method is horrific. I don't believe you need any further statistics than what is currently available to be able to make that judgement, and that's why I believe it's suboptimal. Geoff
Geoff Winkless <pgsqladmin@geoff.dj> writes: > But as far as I can see, apart from the absolute extremes, the > index-only scan is _always_ going to be quicker than the index+table > scan. Well, that is a different issue: what does the planner think of an index-only scan as compared to a regular index scan. I suspect that it's pricing the IOS very high because a lot of the table is dirty and therefore will have to be visited even in a nominally index-only scan. You might check whether the plan choice changes immediately after a VACUUM of the table. regards, tom lane
On 7 March 2016 at 16:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Geoff Winkless <pgsqladmin@geoff.dj> writes: >> But as far as I can see, apart from the absolute extremes, the >> index-only scan is _always_ going to be quicker than the index+table >> scan. > > Well, that is a different issue: what does the planner think of an > index-only scan as compared to a regular index scan. I suspect that > it's pricing the IOS very high because a lot of the table is dirty > and therefore will have to be visited even in a nominally index-only > scan. You might check whether the plan choice changes immediately > after a VACUUM of the table. I ran VACUUM FULL and VACUUM ANALYZE. It made no difference. I would have thought that if it were the case then the equality-test queries would suffer from the same problem anyway, no? Even being fairly kind and selecting an scdate range that's only 1% into the set the query takes over 4 times the amount of time taken by the indexed query - so the "best" range for the index+table method is utterly tiny - it would be reasonable only when the scdate field is uniformly distributed, which even in a table without correlation between the fields is likely to be almost never. Geoff
On Mon, Mar 7, 2016 at 5:01 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > On 7 March 2016 at 11:48, Victor Yegorov <vyegorov@gmail.com> wrote: >> 2016-03-07 13:38 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>: >>> >>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate >>> BETWEEN 20160219 AND 20160221; >> >> >> Will it help if you'll add `count(*)` to your query like this: >> >> SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND >> 20160221; > > Thanks for the reply. > > Yes, that does work around the problem, sort-of (although it's only > using the scdate-only index, since it needs all the data): You could also do "min(sc_id+0)" rather than adding a count(*) column. Although that is not as future proof, as someday the planner might recognize that '+0' is a no-op. If your table is well-vacuumed such that pg_class.relallvisible is high, then it should use the (scdate,sc_id) index in an index-only scan. But if relallvisible is low, it has to check the table itself for visibility information which destroys most of the benefit of an index-only scan, and thus would prefer to use the smaller index instead. > Even given that, I still don't see why the (scdate,sc_id) index isn't > perfect for this; it allows the planner to use sc_id for MIN() while > using scdate to restrict the values. Three values to look up from the > index-only. > > If I manually change the query to do what I hoped the planner would do for me: > > SELECT LEAST(( SELECT MIN(sc_id) FROM legs WHERE scdate =20160219), ( > SELECT MIN(sc_id) FROM legs WHERE scdate =20160220), ( SELECT > MIN(sc_id) FROM legs WHERE scdate =20160221)); PostgreSQL does not (yet) implement "loose" index scans or "skip scans", which is what you are asking for. You can roll your own using the techniques described here: https://wiki.postgresql.org/wiki/Loose_indexscan, which has the benefit over your example code in that you don't need to enumerate all possible values, it effectively does it for you. Cheers, Jeff
On Mon, Mar 7, 2016 at 8:37 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > > But as far as I can see, apart from the absolute extremes, the > index-only scan is _always_ going to be quicker than the index+table > scan. If relallvisible is zero, it thinks it gets zero benefit from an index only scan. It thinks that using a larger index has a small, but non-zero, cost over the smaller index. > We can see that by the massive speedup I get by > using index(scid,scdate), which in all other respects is going to > suffer from exactly the same problem from that the scid-only index > suffers. What massive speedup? (scid,scdate) is the index it *does* use in your worse demonstrated case.
On Mon, Mar 7, 2016 at 9:35 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > On 7 March 2016 at 16:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Geoff Winkless <pgsqladmin@geoff.dj> writes: >>> But as far as I can see, apart from the absolute extremes, the >>> index-only scan is _always_ going to be quicker than the index+table >>> scan. >> >> Well, that is a different issue: what does the planner think of an >> index-only scan as compared to a regular index scan. I suspect that >> it's pricing the IOS very high because a lot of the table is dirty >> and therefore will have to be visited even in a nominally index-only >> scan. You might check whether the plan choice changes immediately >> after a VACUUM of the table. > > I ran VACUUM FULL and VACUUM ANALYZE. It made no difference. I would > have thought that if it were the case then the equality-test queries > would suffer from the same problem anyway, no? No. The range case scans the entire date range, visits the table for each row in that range (to check visibility), and takes the min over the sc_ids which pass the visibility check. The equality test case jumps directly to the lowest sc_id for the given scdate, and then has to walk up the sc_ids only until it finds one which passes the visibility check. Once it finds one which is visible, it is done with that scdate. Assuming most tuples are visible, that is a huge difference in the amount of table blocks being visited. (And maybe index blocks as well) Cheers, Jeff
On 2016-03-07 16:37:37 +0000, Geoff Winkless wrote: > On 7 March 2016 at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > In English, what that plan is trying to do is scan the index > > in sc_id order until it hits a row with scdate in the target range. > > The first such row, by definition, has the correct min(sc_id) value. > > The problem is that we're guessing at how soon we'll hit such a row. > > If the columns are independent, then the planner can guess based on how > > many rows in the whole table have scdate in the target range, and it > > will probably be about right. But that estimate can fall down very > > badly if sc_id and scdate increase together, because then the target > > rows aren't randomly distributed in the index sequence but could all be > > all the way at the far end of the index. > > I'm sorry, I'm obviously not being clear. I already accepted this > argument when Victor gave it, although I believe that in part it falls > down because sc_id is also (potentially) randomly distributed so it's > not like you're doing a sequential table scan (it might work better on > a clustered table, but we don't have those :) ) > > So you still have an extra layer of indirection into a large table > with lots of random accesses. > > > If we had cross-column correlation stats we could detect this pitfall, > > but without that it's hard to do. > > But as far as I can see, apart from the absolute extremes, the > index-only scan is _always_ going to be quicker than the index+table > scan. We are talking about an "absolute extreme" here. You have about 420 date values and you are looking for 3 of them. Assuming for the moment that your distribution is uniform, that's 140th of the whole table. So if PostgreSQL were using the (sc_date,sc_id) index, it would have so scan 4E6/140 = 29000 index entries, extract the id value and get the minumum of those 29000 values. OTOH, if it uses the sc_id index, it only expects to have to scan 140 entries until it finds a matching entry. And then it is finished. So it's 140 index entries plus row accesses against 29000 index entries. To choose the second plan, the planner would have to estimate that reading a random row is more than 200 times slower than reading an index entry, which apparently it doesn't. As Tom wrote, the estimate of having to read only about 140 rows is only valid if sc_id and sc_date are uncorrelated. In reality your query has to read a lot more than 140 rows, so it is much slower. > I don't believe you need any further statistics than what is currently > available to be able to make that judgement, and that's why I believe > it's suboptimal. We all know it is suboptimal, but unfortunately, without additional statistics I don't think there is a better way. The other way around - assuming that the columns are correlated in the worst possible way - would remove viable plans in many cases. This is, I think one of the places where hints are a good idea. The programmer sometimes knows more about the characteristics of the data than the planner can possibly know and it is a pity that there is no way for the programmer to pass that knowledge to the planner. (And yes, I know that quite often the programmer is wrong - but I do believe in giving people enough rope to hang themselves with) hp -- _ | Peter J. Holzer | I want to forget all about both belts and |_|_) | | suspenders; instead, I want to buy pants | | | hjp@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/
Attachment
On 7 March 2016 at 20:23, Jeff Janes <jeff.janes@gmail.com> wrote: > PostgreSQL does not (yet) implement "loose" index scans or "skip > scans", which is what you are asking for. You can roll your own using > the techniques described here: > https://wiki.postgresql.org/wiki/Loose_indexscan, which has the > benefit over your example code in that you don't need to enumerate all > possible values, it effectively does it for you. Uh huh. This is obviously where my expectation is wrong, thanks. It certainly makes it more obvious why (sc_id,scdate) is more attractive to the planner than (scdate,sc_id) and why the index that was transferred from the Other Database that we've migrated from isn't useful here :) Geoff
On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > As Tom wrote, the estimate of having to read only about 140 rows is only > valid if sc_id and sc_date are uncorrelated. In reality your query has > to read a lot more than 140 rows, so it is much slower. But as I've said previously, even if I do select from scdate values that I know to be in the first 1% of the data (supposedly the perfect condition) the scan method is insignificantly quicker than the index (scdate,scid) method. Even with the absolute perfect storm (loading in the entire index for the full range) it's still not too bad (1.3 seconds or so). The point is that to assume, knowing nothing about the data, that the data is in an even distribution is only a valid strategy if the worst case (when that assumption turns out to be wildly incorrect) is not catastrophic. That's not the case here. Geoff
On 2016-03-08 10:16:57 +0000, Geoff Winkless wrote: > On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > As Tom wrote, the estimate of having to read only about 140 rows is only > > valid if sc_id and sc_date are uncorrelated. In reality your query has > > to read a lot more than 140 rows, so it is much slower. > > But as I've said previously, even if I do select from scdate values > that I know to be in the first 1% of the data (supposedly the perfect > condition) the scan method is insignificantly quicker than the index > (scdate,scid) method. Actually the planner expects find a match within the first 0.0035 %, so to find out how fast that would be you would have to use a value from that range. > Even with the absolute perfect storm (loading in the entire index for > the full range) it's still not too bad (1.3 seconds or so). > > The point is that to assume, knowing nothing about the data, that the > data is in an even distribution is only a valid strategy if the worst > case (when that assumption turns out to be wildly incorrect) is not > catastrophic. That's not the case here. True. The fundamental problem here is that the planner doesn't have any notion of a worst case. It only knows "cost", and that is a single number for each operation. For many operations, both the best case and the worst case are unusable as cost - the first would almost always underestimate the time and choose a plan which is far from optimal and the second would almost always overestimate it and reject an optimal plan. The art of programming a planner (which I've dabbled with in a previous (not postgresql-related) project but certainly can't claim any expertise in) lies in choosing a cost function which is quite close most of the time and catastrophically wrong only very rarely. It is clear that PostgreSQL hasn't succeed in the latter category: Correlated columns do occur and the current cost function, which assumes that all columns are uncorrelated can catastrophically underestimate the cost in this case. The question is what can be done to improve the situation. Tom thinks that correlation statistics would help. That seems plausible to me. You claim that no statistics are needed. That may or may not be true: You haven't proposed an alternate method yet. I feel fairly certain that using the worst case (the cost for scanning the whole table) would be just as bad in and would cause inferior plans to be used in many instances. Maybe computing the cost as weighted average of the best, average and worst case (e.g. cost = cost_best*0.05 + cost_avg*0.90 + cost_worst*0.05) would penalize methods with a large spread between best and worst case enough - but that still leaves the problem of determining the weights and determining what the "average" is. So it's the same black magic as now, just the little more complicated (on the plus side, this would probably be a relatively simple patch). If we assume that we could revamp the planner completely, other possibilities come to mind: For example, since I think that the core problem is having a single number for the cost, the planner could instead compute a distribution (in the most simple case just best and worst case, but ideally many values). Then the planner could say something like: I have two plans A nd B and A is at most 20 % faster in almost all cases. But in the worst case, A is 1000 times slower. Being 20 % faster most of the time is nice but doesn't outweigh the risk of being 1000 times slower sometimes, so I'll use B anyway. Another possibility I've been considering for some time is feeding back the real execution times into the planner, but that sounds like a major research project. (Actually I think Oracle does something like this since version 12) hp -- _ | Peter J. Holzer | I want to forget all about both belts and |_|_) | | suspenders; instead, I want to buy pants | | | hjp@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/
Attachment
On 12 March 2016 at 18:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > On 2016-03-08 10:16:57 +0000, Geoff Winkless wrote: >> On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: >> > As Tom wrote, the estimate of having to read only about 140 rows is only >> > valid if sc_id and sc_date are uncorrelated. In reality your query has >> > to read a lot more than 140 rows, so it is much slower. >> >> But as I've said previously, even if I do select from scdate values >> that I know to be in the first 1% of the data (supposedly the perfect >> condition) the scan method is insignificantly quicker than the index >> (scdate,scid) method. > > Actually the planner expects find a match within the first 0.0035 %, so > to find out how fast that would be you would have to use a value from > that range. Fair point, although given my data I can't create a query that does that since the first 0.3% or so all have the same value (although I suppose I could modify the data to do so); however the thing I was getting at was more that it doesn't have to be very far off perfect distribution for the second method to be "better". > The question is what can be done to improve the situation. > > Tom thinks that correlation statistics would help. That seems plausible > to me. I'm sure that correlation statistics would help in this one instance (if Tom thinks so, I certainly wouldn't argue!). I was more looking at the planner choices in general because I assumed (perhaps incorrectly) that correlated columns aren't the only time this sort of best/worst scenario hits it. > You claim that no statistics are needed. Well that's a bit confrontational. > That may or may not be true: You haven't proposed an alternate method > yet. You could make an assumption that perfect distribution isn't true: that actually the distribution is within a certain _deviation_ of that perfect distribution. It wouldn't have to have been very much to make the index-only scan win here and would still keep the planner from choosing less optimal queries most of the time (and where it did end up making the "wrong" choice it's not going to be far off anyway). But I'm making assumptions here, I'm aware of that. Chances are that actually most people's data _does_ fit into this perfect distribution set. Is there any research that shows that real-world data usually does? > I feel fairly certain that using the worst case (the cost for scanning > the whole table) would be just as bad in and would cause inferior plans > to be used in many instances. Oh I certainly agree with that. As Jeff points out I'd have a much larger win in this instance by someone spending the time implementing skip index scans rather than messing with the planner :) In the end I bit the bullet and converted the code to use LEAST((SELECT MIN..),(SELECT MIN...),(SELECT MIN...)) which (being an equality test) happily uses the (scdate,scid) index. Since the query ends up producing an equivalent set I could (mostly) do a search-replace, which made it a much simpler change than (for example) the ,COUNT (*) version. Geoff
On 2016-03-12 21:00:04 +0000, Geoff Winkless wrote: > On 12 March 2016 at 18:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > The question is what can be done to improve the situation. > > > > Tom thinks that correlation statistics would help. That seems plausible > > to me. [...] > > You claim that no statistics are needed. > > Well that's a bit confrontational. Sorry. Didn't want to sound confrontational. I was just repeating points made by Tom and you previously in this thread to establish a baseline. > > That may or may not be true: You haven't proposed an alternate method > > yet. > > You could make an assumption that perfect distribution isn't true: > that actually the distribution is within a certain _deviation_ of that > perfect distribution. It wouldn't have to have been very much to make > the index-only scan win here and would still keep the planner from > choosing less optimal queries most of the time (and where it did end > up making the "wrong" choice it's not going to be far off anyway). > > But I'm making assumptions here, I'm aware of that. Chances are that > actually most people's data _does_ fit into this perfect distribution > set. Is there any research that shows that real-world data usually > does? I don't think most people's data is perfectly distributed. But as you say most data is probably within some deviation of being perfectly distributed and as long as that deviation isn't too big it doesn't matter. But there are certainly some common examples of highly correlated columns. Having a serial id and a date as in your case is probably quite common. Another example might be a surrogate primary key which is computed from some other fields (e.g. a timeseries code starting with a country code, or a social security number starting with the birth date, ...). That's probably not that uncommon either. So, I agree with you. This is a problem and it should be fixed. I'm just sceptical that it can be done with a simple cost adjustment. > As Jeff points out I'd have a much larger win in this instance by > someone spending the time implementing skip index scans rather than > messing with the planner :) Yeah. I think I have some code which could benefit from this, too. I'll have to try that trick from the wiki. hp -- _ | Peter J. Holzer | I want to forget all about both belts and |_|_) | | suspenders; instead, I want to buy pants | | | hjp@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/
Attachment
On Tue, Mar 8, 2016 at 2:16 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: >> As Tom wrote, the estimate of having to read only about 140 rows is only >> valid if sc_id and sc_date are uncorrelated. In reality your query has >> to read a lot more than 140 rows, so it is much slower. > > But as I've said previously, even if I do select from scdate values > that I know to be in the first 1% of the data (supposedly the perfect > condition) the scan method is insignificantly quicker than the index > (scdate,scid) method. That is sure not the case in my hands. If I select from the first 1%, I get the (scid) index being 10 times faster than (scdate,scid), and (scid,scdate) being 50 times faster. > Even with the absolute perfect storm (loading in the entire index for > the full range) it's still not too bad (1.3 seconds or so). > > The point is that to assume, knowing nothing about the data, that the > data is in an even distribution is only a valid strategy if the worst > case (when that assumption turns out to be wildly incorrect) is not > catastrophic. That's not the case here. That just makes someone else's catastrophic worst case come to the fore. Cheers, Jeff
On Sat, Mar 12, 2016 at 1:00 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > > As Jeff points out I'd have a much larger win in this instance by > someone spending the time implementing skip index scans rather than > messing with the planner :) But I think the hardest part of implementing skip index scans would probably be getting the planner to understand the cost of using them! So one way or another, the planner has to get messed with. Cheers, Jeff
On 12 March 2016 at 22:00, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > I don't think most people's data is perfectly distributed. But as you > say most data is probably within some deviation of being perfectly > distributed and as long as that deviation isn't too big it doesn't > matter. Is that how what I wrote came across? I was trying to say exactly the opposite: that if, instead of assuming perfect distribution, you assume a certain deviation from that distribution, you will end up with plans that would win with perfect distribution losing out to plans that are almost as good on that perfect distribution but behave better on data that is not perfectly distributed. Geoff