Thread: index problems (again)

index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 11:38:56

Hi all

Firstly, I appreciate that my index problems are fairly difficult to
debug given that I can't upload the data anywhere (it's commercially
sensitive); I tried creating an equivalent dataset for my last problem
using a lot of random() inserts, but unfortunately, even though the
sizes and index cardinality seemed similar, it didn't exhibit the same
problem, which leaves me a bit stuck.

I now have (what seems to me to be) an utterly bizarre situation where
postgres is using the "wrong" index, to the extent where I can't even
begin to comprehend why it would do so.

http://explain.depesz.com/s/uF4L

# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;
                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=25.54..25.55 rows=1 width=0) (actual
time=25337.593..25337.594 rows=1 loops=1)
   Buffers: shared hit=2976790 read=152188
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..25.54 rows=1 width=4) (actual
time=25337.579..25337.587 rows=1 loops=1)
           Buffers: shared hit=2976790 read=152188
           ->  Index Scan using legs_sc_id_idx on legs
(cost=0.43..361498.49 rows=14394 width=4) (actual
time=25337.578..25337.578 rows=1 loops=1)
                 Index Cond: (sc_id IS NOT NULL)
                 Filter: ((scdate >= 20160219) AND (scdate <= 20160221))
                 Rows Removed by Filter: 4068865
                 Buffers: shared hit=2976790 read=152188
 Planning time: 0.235 ms
 Execution time: 25337.620 ms
(12 rows)

Time: 25338.375 ms

There is an index on scdate,sc_id that (I would have thought) should
be ideal for this query but it's ignored.

sc_id has no null values - it's even _defined_ as NOT NULL. I've no
idea why the planner would think that it needs to use the sc_id index
on this query.

If I create an index on sc_id,scdate, that one is used (index-only
scan) and the query returns in 200ms or so.

http://explain.depesz.com/s/3qNC

=# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;

 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=7.32..7.33 rows=1 width=0) (actual
time=207.194..207.194 rows=1 loops=1)
   Buffers: shared hit=1 read=11120
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..7.32 rows=1 width=4) (actual
time=207.187..207.187 rows=1 loops=1)
           Buffers: shared hit=1 read=11120
           ->  Index Only Scan using legs_sc_id_scdate_idx on legs
(cost=0.43..99204.99 rows=14394 width=4) (actual time=207.185..207.185
rows=1 loops=1)
                 Index Cond: ((sc_id IS NOT NULL) AND (scdate >=
20160219) AND (scdate <= 20160221))
                 Heap Fetches: 0
                 Buffers: shared hit=1 read=11120
 Planning time: 0.236 ms
 Execution time: 207.223 ms



I'm utterly at a loss. There are only 427 distinct scdate values on
this table, but 4 million sc_id values (and the spread across scdate
is reasonably similar - between 6000 and 11000 for each), so using an
index on (just) sc_id makes absolutely no sense (I would expect it to
be slower than a tablescan, no?). I also don't see how sc_id,scdate is
more useful than scdate,sc_id.

Have I completely misunderstood how this is all meant to work? I tried
reading the documentation around understanding EXPLAIN and the slow
query questions in the FAQ/Wiki but what I read didn't really seem to
suggest any investigative steps other than "RUN ANALYZE and VACUUM" -
is there a good doc on how to go about debugging this kind of thing?
Or even on how the planner makes its decisions?

I'm currently at the point where I'm just throwing random indexes at
tables in the vain hope that it might help. I'm fairly sure that
that's suboptimal :)

As before, pg9.5.1, CentOS 6 x64, 4GB RAM, Xeon X3220.
effective_cache_size is set to 3GB (but changing it wildly up or down
doesn't change anything), shared_buffers is 1GB, work_mem is 5242kB
(but changing to anything up to 1GB makes no difference).

Thanks

Geoff

Re: index problems (again)

From

Victor Yegorov

Date:

07 March 2016, 11:48:47

2016-03-07 13:38 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:

# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;

Will it help if you'll add `count(*)` to your query like this:

SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND 20160221;

?

Victor Y. Yegorov

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 13:01:36

On 7 March 2016 at 11:48, Victor Yegorov <vyegorov@gmail.com> wrote:
> 2016-03-07 13:38 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:
>>
>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
>> BETWEEN 20160219 AND 20160221;
>
>
> Will it help if you'll add `count(*)` to your query like this:
>
>     SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND
> 20160221;

Thanks for the reply.

Yes, that does work around the problem, sort-of (although it's only
using the scdate-only index, since it needs all the data):

 Aggregate  (cost=1242.59..1242.60 rows=1 width=4)
   ->  Index Scan using legs_scdate_idx on legs  (cost=0.43..1170.62
rows=14394 width=4)
         Index Cond: ((scdate >= 20160219) AND (scdate <= 20160221))

Unfortunately the cost of changing all the code that uses MIN() in
this way would be higher than just adding an extra index :(

I suppose the thought is that for selecting just the MIN() value, by
traipsing through the index you immediately find the lowest match - so
for a dataset where scdate cardinality is higher, this would make
sense; indeed if I give this query a value with scdate in the low
range of the table it returns quickly (although still slower than when
it uses the scdate index).

It seems to me that the weighting the planner applied to this MIN()
rule is too high, or perhaps it needs to pay more attention to the
statistics of the indexes for the WHERE clauses?

Even given that, I still don't see why the (scdate,sc_id) index isn't
perfect for this; it allows the planner to use sc_id for MIN() while
using scdate to restrict the values. Three values to look up from the
index-only.

If I manually change the query to do what I hoped the planner would do for me:

SELECT LEAST(( SELECT MIN(sc_id) FROM legs WHERE scdate =20160219), (
SELECT MIN(sc_id) FROM legs WHERE scdate =20160220), ( SELECT
MIN(sc_id) FROM legs WHERE scdate =20160221));

it returns in 16ms - and uses the (scdate_sc_id_idx) index as
expected; again though, I can't really justify changing all the code
to do that instead.

Geoff

Re: index problems (again)

From

Victor Yegorov

Date:

07 March 2016, 13:23:48

2016-03-07 15:01 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:

Unfortunately the cost of changing all the code that uses MIN() in
this way would be higher than just adding an extra index :(

I suppose the thought is that for selecting just the MIN() value, by
traipsing through the index you immediately find the lowest match - so
for a dataset where scdate cardinality is higher, this would make
sense; indeed if I give this query a value with scdate in the low
range of the table it returns quickly (although still slower than when
it uses the scdate index).

It seems to me that the weighting the planner applied to this MIN()
rule is too high, or perhaps it needs to pay more attention to the
statistics of the indexes for the WHERE clauses?

Even given that, I still don't see why the (scdate,sc_id) index isn't
perfect for this; it allows the planner to use sc_id for MIN() while
using scdate to restrict the values. Three values to look up from the
index-only.

Your `sc_id` and `scdate` columns are correlated.

Planner has no such knowledge and assumes columns being independent. Your `scdate` predicate is

estimate to return 14394 rows (based on the EXPLAIN of your first post). I think, that this corresponds to

a quite small portion of your table, less than 1% (based on `Rows Removed by Filter: 4068865` from the

same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere in the table.

Therefore, reading min values in the order of your PK is optimal, as you're expected to hit a rows that

matches given conditions quite soon.

Problem is — your predicate matches a bunch of rows towards the end of the table, which causes Postgres

to read a big portion of your index before it finds the row that fits.

Right now (9.5 and earlier versions) I do not know of any options that would not require fixing your queries.

P.S. Maybe `Upper pathification` patch, that is being considered for 9.6, can deal with such cases.

Victor Y. Yegorov

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 14:18:32

On 7 March 2016 at 13:23, Victor Yegorov <vyegorov@gmail.com> wrote:
> Your `sc_id` and `scdate` columns are correlated.

Actually not necessarily, although in the majority case that's mostly true.

> Planner has no such knowledge and assumes columns being independent. Your
> `scdate` predicate is
> estimate to return 14394 rows (based on the EXPLAIN of your first post). I
> think, that this corresponds to
> a quite small portion of your table, less than 1% (based on `Rows Removed by
> Filter: 4068865` from the
> same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere
> in the table.

Oh I see! To make sure I understand what you're saying: given a fully
random distribution of scdate across the table, searching the whole
table for the first instance of scdate=(20160219|20160220|20160221)
will be fastest, because using the scdate index could end up doing
lots of random-accesses across the whole table to get sc_id values,
whereas assuming a random distribution of scdate values I would only
expect to have to scan (sequentially) 3% of the table before I find
one of the scdate values that match, yes?

> Right now (9.5 and earlier versions) I do not know of any options that would
> not require fixing your queries.

Well I can cope with using (sc_id,scdate) index (which I guess wins
purely by being index-only and smaller than the table), but it makes
me unhappy.

I think I realised why the planner won't use (scdate,sc_id): the
multicolumn index page
(http://www.postgresql.org/docs/current/static/indexes-multicolumn.html)
suggests that:

   The exact rule is that equality constraints on leading columns,
   plus any inequality constraints on the first column that does not
   have an equality constraint, will be used to limit the portion of
   the index that is scanned.

So since the first column is an inequality constraint, PG won't use
the second column from index(scdate,sc_id) to retrieve the MIN()
value. This explains why it happily uses the index for the
LEAST(MIN(),MIN(),MIN()) version (since each one is an equality
condition); it seems like that rule is just too restrictive, because a
range constraint on the first key, when used in conjunction with an
index-only scan, is almost always going to win, even if the second
constraint matches all of the rows and even if the range constraint
returns all the values in the table (unless the index is larger than
the table itself, I suppose).

I might suggest that perhaps the rule should be relaxed so that an
inequality constraint does not stop the subsequent columns being used
in an index-only scan.

That assumes that I've not completely misunderstood, of course :)

Geoff

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 14:27:36

On 7 March 2016 at 14:18, I wrote:
> That assumes that I've not completely misunderstood, of course :)

Always a dangerous assumption, I'm rapidly learning.

The very next section:

   Constraints on columns to the right of these columns are checked
   in the index, so they save visits to the table proper, but they do not
   reduce the portion of the index that has to be scanned.

So it seems that it should in fact be usable after all. So I'm still
stumped as to why the (scdate,sc_id) index isn't used :(

Geoff

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 14:47:34

On 7 March 2016 at 14:27, I wrote:
> So it seems that it should in fact be usable after all. So I'm still
> stumped as to why the (scdate,sc_id) index isn't used :(

Also, while the index on sc_id will be sorted there's no guarantee
that sc_id values will be in order in the table itself, so you're
still left with (30,000) potentially random accesses to the table,
even assuming fully random distribution of scdate (with a worst-case
of 970000 random accesses). That average case is no better than the
(30,000) random accesses that were required from using an scdate
index, even ignoring the scdate/sc_id index.

So I'm afraid I'm fully back in the "I still don't get it" column.

Geoff

Re: index problems (again)

From

Tom Lane

Date:

07 March 2016, 14:51:59

Geoff Winkless <pgsqladmin@geoff.dj> writes:
> So it seems that it should in fact be usable after all. So I'm still
> stumped as to why the (scdate,sc_id) index isn't used :(

Because the other way is estimated to be cheaper.  The estimate is
wrong, because it's based on a statistical assumption that's wrong
(ie that sc_id and scdate are uncorrelated), but it's what we have
to work with at the moment.

As you found upthread, that index could be used in the way you want
if you had an equality condition on scdate.  So the workaround
I'd suggest is to whack the query into that shape.  Something
along the lines of (untested)

select min((select min(sc_id) from legs where scdate = gs))
from generate_series(20160219, 20160221) gs

This would only work well for relatively small ranges of scdate,
but if you had a large range then I think the original plan
would've been fine.

            regards, tom lane

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 15:43:51

On 7 March 2016 at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Geoff Winkless <pgsqladmin@geoff.dj> writes:
>> So it seems that it should in fact be usable after all. So I'm still
>> stumped as to why the (scdate,sc_id) index isn't used :(
>
> Because the other way is estimated to be cheaper.  The estimate is
> wrong, because it's based on a statistical assumption that's wrong
> (ie that sc_id and scdate are uncorrelated), but it's what we have
> to work with at the moment.

Are you saying that the planner can't tell without scanning the index
how much of the index the range constraint will retrieve? That's
reasonable, I suppose, but if you consider the relative size of the
index (92MB) and table (1.6GB) (both of which pieces of information
are available to the planner at query-time) if I were to scan 3% of
the table (which we assume the planner is estimating because of the
cardinality of the scdate field) I've read as much data from disk as
I've read for 50% of the index. That's ignoring the reads I'll have to
do from the sc_id index too... so in the worst-case where I've had to
read the entire index (because the range didn't actually restrict any
records) I'm still only 2x the average-case of the other way. Whereas
the worst-case of the sc_id-only-index-plus-table-retrieve is about
1000x the worst case of the index-only scan.

> select min((select min(sc_id) from legs where scdate = gs))
> from generate_series(20160219, 20160221) gs

> This would only work well for relatively small ranges of scdate,

As it happens it works for the full range of scdate and returns in 99ms.

# select min((select min(sc_id) from legs where scdate = gs))
from generate_series(20150101, 20160303) gs;
   min
----------
 12914746
(1 row)

Time: 99.210 ms

> but if you had a large range then I think the original plan
> would've been fine.

Well yes, obviously doing MIN() across the whole range is going to be
able to just return as soon as it gets the first value from sc_id and
references the table to check the date; however even in that _best_
case the value comes back in 25ms, ie the _best-case_
index-plus-table-scan is 1/3 the time of the worst-case index-only
scan.

I accept that this is how the planner behaves, but I don't accept that
it's optimal.

Geoff

Re: index problems (again)

From

Tom Lane

Date:

07 March 2016, 16:02:30

Geoff Winkless <pgsqladmin@geoff.dj> writes:
> On 7 March 2016 at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Because the other way is estimated to be cheaper.  The estimate is
>> wrong, because it's based on a statistical assumption that's wrong
>> (ie that sc_id and scdate are uncorrelated), but it's what we have
>> to work with at the moment.

> Are you saying that the planner can't tell without scanning the index
> how much of the index the range constraint will retrieve?

The question isn't "how much", the question is "where is that data
exactly?".  In English, what that plan is trying to do is scan the index
in sc_id order until it hits a row with scdate in the target range.
The first such row, by definition, has the correct min(sc_id) value.
The problem is that we're guessing at how soon we'll hit such a row.
If the columns are independent, then the planner can guess based on how
many rows in the whole table have scdate in the target range, and it
will probably be about right.  But that estimate can fall down very
badly if sc_id and scdate increase together, because then the target
rows aren't randomly distributed in the index sequence but could all be
all the way at the far end of the index.

> I accept that this is how the planner behaves, but I don't accept that
> it's optimal.

If we had cross-column correlation stats we could detect this pitfall,
but without that it's hard to do.

            regards, tom lane

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 16:38:00

On 7 March 2016 at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In English, what that plan is trying to do is scan the index
> in sc_id order until it hits a row with scdate in the target range.
> The first such row, by definition, has the correct min(sc_id) value.
> The problem is that we're guessing at how soon we'll hit such a row.
> If the columns are independent, then the planner can guess based on how
> many rows in the whole table have scdate in the target range, and it
> will probably be about right.  But that estimate can fall down very
> badly if sc_id and scdate increase together, because then the target
> rows aren't randomly distributed in the index sequence but could all be
> all the way at the far end of the index.

I'm sorry, I'm obviously not being clear. I already accepted this
argument when Victor gave it, although I believe that in part it falls
down because sc_id is also (potentially) randomly distributed so it's
not like you're doing a sequential table scan (it might work better on
a clustered table, but we don't have those :) )

So you still have an extra layer of indirection into a large table
with lots of random accesses.

> If we had cross-column correlation stats we could detect this pitfall,
> but without that it's hard to do.

But as far as I can see, apart from the absolute extremes, the
index-only scan is _always_ going to be quicker than the index+table
scan. It doesn't matter whether or not the distribution is random or
skewed, the index-only scan is going to be better (or approximately
equally as good). We can see that by the massive speedup I get by
using index(scid,scdate), which in all other respects is going to
suffer from exactly the same problem from that the scid-only index
suffers.

And the real advantage: at the extremes, the index-only worst-case is
minimally worse than the best case. Whereas the worst-case of the
index-scan-plus-table-compare method is horrific.

I don't believe you need any further statistics than what is currently
available to be able to make that judgement, and that's why I believe
it's suboptimal.

Geoff

Re: index problems (again)

From

Tom Lane

Date:

07 March 2016, 16:44:49

Geoff Winkless <pgsqladmin@geoff.dj> writes:
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.

Well, that is a different issue: what does the planner think of an
index-only scan as compared to a regular index scan.  I suspect that
it's pricing the IOS very high because a lot of the table is dirty
and therefore will have to be visited even in a nominally index-only
scan.  You might check whether the plan choice changes immediately
after a VACUUM of the table.

            regards, tom lane

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 17:35:24

On 7 March 2016 at 16:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Geoff Winkless <pgsqladmin@geoff.dj> writes:
>> But as far as I can see, apart from the absolute extremes, the
>> index-only scan is _always_ going to be quicker than the index+table
>> scan.
>
> Well, that is a different issue: what does the planner think of an
> index-only scan as compared to a regular index scan.  I suspect that
> it's pricing the IOS very high because a lot of the table is dirty
> and therefore will have to be visited even in a nominally index-only
> scan.  You might check whether the plan choice changes immediately
> after a VACUUM of the table.

I ran VACUUM FULL and VACUUM ANALYZE. It made no difference. I would
have thought that if it were the case then the equality-test queries
would suffer from the same problem anyway, no?

Even being fairly kind and selecting an scdate range that's only 1%
into the set the query takes over 4 times the amount of time taken by
the indexed query - so the "best" range for the index+table method is
utterly tiny - it would be reasonable only when the scdate field is
uniformly distributed, which even in a table without correlation
between the fields is likely to be almost never.

Geoff

Re: index problems (again)

From

Jeff Janes

Date:

07 March 2016, 20:23:18

On Mon, Mar 7, 2016 at 5:01 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
> On 7 March 2016 at 11:48, Victor Yegorov <vyegorov@gmail.com> wrote:
>> 2016-03-07 13:38 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:
>>>
>>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
>>> BETWEEN 20160219 AND 20160221;
>>
>>
>> Will it help if you'll add `count(*)` to your query like this:
>>
>>     SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND
>> 20160221;
>
> Thanks for the reply.
>
> Yes, that does work around the problem, sort-of (although it's only
> using the scdate-only index, since it needs all the data):

You could also do "min(sc_id+0)" rather than adding a count(*) column.
Although that is not as future proof, as someday the planner might
recognize that '+0' is a no-op.

If your table is well-vacuumed such that pg_class.relallvisible is
high, then it should use the (scdate,sc_id) index in an index-only
scan.  But if relallvisible is low, it has to check the table itself
for visibility information which destroys most of the benefit of an
index-only scan, and thus would prefer to use the smaller index
instead.

> Even given that, I still don't see why the (scdate,sc_id) index isn't
> perfect for this; it allows the planner to use sc_id for MIN() while
> using scdate to restrict the values. Three values to look up from the
> index-only.
>
> If I manually change the query to do what I hoped the planner would do for me:
>
> SELECT LEAST(( SELECT MIN(sc_id) FROM legs WHERE scdate =20160219), (
> SELECT MIN(sc_id) FROM legs WHERE scdate =20160220), ( SELECT
> MIN(sc_id) FROM legs WHERE scdate =20160221));

PostgreSQL does not (yet) implement "loose" index scans or "skip
scans", which is what you are asking for.  You can roll your own using
the techniques described here:
https://wiki.postgresql.org/wiki/Loose_indexscan, which has the
benefit over your example code in that you don't need to enumerate all
possible values, it effectively does it for you.

Cheers,

Jeff

Re: index problems (again)

From

Jeff Janes

Date:

07 March 2016, 20:31:06

On Mon, Mar 7, 2016 at 8:37 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
>
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.

If relallvisible is zero, it thinks it gets zero benefit from an index
only scan.  It thinks that using a larger index has a small, but
non-zero, cost over the smaller index.

> We can see that by the massive speedup I get by
> using index(scid,scdate), which in all other respects is going to
> suffer from exactly the same problem from that the scid-only index
> suffers.

What massive speedup?  (scid,scdate) is the index it *does* use in
your worse demonstrated
case.

Re: index problems (again)

From

Jeff Janes

Date:

07 March 2016, 20:38:00

On Mon, Mar 7, 2016 at 9:35 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
> On 7 March 2016 at 16:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Geoff Winkless <pgsqladmin@geoff.dj> writes:
>>> But as far as I can see, apart from the absolute extremes, the
>>> index-only scan is _always_ going to be quicker than the index+table
>>> scan.
>>
>> Well, that is a different issue: what does the planner think of an
>> index-only scan as compared to a regular index scan.  I suspect that
>> it's pricing the IOS very high because a lot of the table is dirty
>> and therefore will have to be visited even in a nominally index-only
>> scan.  You might check whether the plan choice changes immediately
>> after a VACUUM of the table.
>
> I ran VACUUM FULL and VACUUM ANALYZE. It made no difference. I would
> have thought that if it were the case then the equality-test queries
> would suffer from the same problem anyway, no?

No.  The range case scans the entire date range, visits the table for
each row in that range (to check visibility), and takes the min over
the sc_ids which pass the visibility check.

The equality test case jumps directly to the lowest sc_id for the
given scdate, and then has to walk up the sc_ids only until it finds
one which passes the visibility check.  Once it finds one which is
visible, it is done with that scdate.

Assuming most tuples are visible, that is a huge difference in the
amount of table blocks being visited.  (And maybe index blocks as
well)

Cheers,

Jeff

Re: index problems (again)

From

"Peter J. Holzer"

Date:

07 March 2016, 20:40:28

On 2016-03-07 16:37:37 +0000, Geoff Winkless wrote:
> On 7 March 2016 at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > In English, what that plan is trying to do is scan the index
> > in sc_id order until it hits a row with scdate in the target range.
> > The first such row, by definition, has the correct min(sc_id) value.
> > The problem is that we're guessing at how soon we'll hit such a row.
> > If the columns are independent, then the planner can guess based on how
> > many rows in the whole table have scdate in the target range, and it
> > will probably be about right.  But that estimate can fall down very
> > badly if sc_id and scdate increase together, because then the target
> > rows aren't randomly distributed in the index sequence but could all be
> > all the way at the far end of the index.
>
> I'm sorry, I'm obviously not being clear. I already accepted this
> argument when Victor gave it, although I believe that in part it falls
> down because sc_id is also (potentially) randomly distributed so it's
> not like you're doing a sequential table scan (it might work better on
> a clustered table, but we don't have those :) )
>
> So you still have an extra layer of indirection into a large table
> with lots of random accesses.
>
> > If we had cross-column correlation stats we could detect this pitfall,
> > but without that it's hard to do.
>
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.

We are talking about an "absolute extreme" here. You have about 420 date
values and you are looking for 3 of them. Assuming for the moment that
your distribution is uniform, that's 140th of the whole table.

So if PostgreSQL were using the (sc_date,sc_id) index, it would have so
scan 4E6/140 = 29000 index entries, extract the id value and get the
minumum of those 29000 values.

OTOH, if it uses the sc_id index, it only expects to have to scan 140
entries until it finds a matching entry. And then it is finished.

So it's 140 index entries plus row accesses against 29000 index entries.
To choose the second plan, the planner would have to estimate that
reading a random row is more than 200 times slower than reading an index
entry, which apparently it doesn't.

As Tom wrote, the estimate of having to read only about 140 rows is only
valid if sc_id and sc_date are uncorrelated. In reality your query has
to read a lot more than 140 rows, so it is much slower.

> I don't believe you need any further statistics than what is currently
> available to be able to make that judgement, and that's why I believe
> it's suboptimal.

We all know it is suboptimal, but unfortunately, without additional
statistics I don't think there is a better way. The other way around -
assuming that the columns are correlated in the worst possible way -
would remove viable plans in many cases.

This is, I think one of the places where hints are a good idea. The
programmer sometimes knows more about the characteristics of the data
than the planner can possibly know and it is a pity that there is no way
for the programmer to pass that knowledge to the planner. (And yes, I
know that quite often the programmer is wrong - but I do believe in
giving people enough rope to hang themselves with)

        hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment

signature.asc

Re: index problems (again)

From

Geoff Winkless

Date:

07 March 2016, 21:23:55

On 7 March 2016 at 20:23, Jeff Janes <jeff.janes@gmail.com> wrote:
> PostgreSQL does not (yet) implement "loose" index scans or "skip
> scans", which is what you are asking for.  You can roll your own using
> the techniques described here:
> https://wiki.postgresql.org/wiki/Loose_indexscan, which has the
> benefit over your example code in that you don't need to enumerate all
> possible values, it effectively does it for you.

Uh huh. This is obviously where my expectation is wrong, thanks. It
certainly makes it more obvious why (sc_id,scdate) is more attractive
to the planner than (scdate,sc_id) and why the index that was
transferred from the Other Database that we've migrated from isn't
useful here :)

Geoff

Re: index problems (again)

From

Geoff Winkless

Date:

08 March 2016, 10:17:23

On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> As Tom wrote, the estimate of having to read only about 140 rows is only
> valid if sc_id and sc_date are uncorrelated. In reality your query has
> to read a lot more than 140 rows, so it is much slower.

But as I've said previously, even if I do select from scdate values
that I know to be in the first 1% of the data (supposedly the perfect
condition) the scan method is insignificantly quicker than the index
(scdate,scid) method.

Even with the absolute perfect storm (loading in the entire index for
the full range) it's still not too bad (1.3 seconds or so).

The point is that to assume, knowing nothing about the data, that the
data is in an even distribution is only a valid strategy if the worst
case (when that assumption turns out to be wildly incorrect) is not
catastrophic. That's not the case here.

Geoff

Re: index problems (again)

From

"Peter J. Holzer"

Date:

12 March 2016, 18:43:49

On 2016-03-08 10:16:57 +0000, Geoff Winkless wrote:
> On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> > As Tom wrote, the estimate of having to read only about 140 rows is only
> > valid if sc_id and sc_date are uncorrelated. In reality your query has
> > to read a lot more than 140 rows, so it is much slower.
>
> But as I've said previously, even if I do select from scdate values
> that I know to be in the first 1% of the data (supposedly the perfect
> condition) the scan method is insignificantly quicker than the index
> (scdate,scid) method.

Actually the planner expects find a match within the first 0.0035 %, so
to find out how fast that would be you would have to use a value from
that range.

> Even with the absolute perfect storm (loading in the entire index for
> the full range) it's still not too bad (1.3 seconds or so).
>
> The point is that to assume, knowing nothing about the data, that the
> data is in an even distribution is only a valid strategy if the worst
> case (when that assumption turns out to be wildly incorrect) is not
> catastrophic. That's not the case here.

True. The fundamental problem here is that the planner doesn't have any
notion of a worst case. It only knows "cost", and that is a single
number for each operation. For many operations, both the best case and
the worst case are unusable as cost - the first would almost always
underestimate the time and choose a plan which is far from optimal and
the second would almost always overestimate it and reject an optimal
plan. The art of programming a planner (which I've dabbled with in a
previous (not postgresql-related) project but certainly can't claim any
expertise in) lies in choosing a cost function which is quite close most
of the time and catastrophically wrong only very rarely. It is clear
that PostgreSQL hasn't succeed in the latter category: Correlated
columns do occur and the current cost function, which assumes that all
columns are uncorrelated can catastrophically underestimate the cost in
this case.

The question is what can be done to improve the situation.

Tom thinks that correlation statistics would help. That seems plausible
to me.

You claim that no statistics are needed.

That may or may not be true: You haven't proposed an alternate method
yet.

I feel fairly certain that using the worst case (the cost for scanning
the whole table) would be just as bad in and would cause inferior plans
to be used in many instances.

Maybe computing the cost as weighted average of the best, average and
worst case (e.g. cost = cost_best*0.05 + cost_avg*0.90 + cost_worst*0.05)
would penalize methods with a large spread between best and worst case
enough - but that still leaves the problem of determining the weights
and determining what the "average" is. So it's the same black magic as
now, just the little more complicated (on the plus side, this would
probably be a relatively simple patch).

If we assume that we could revamp the planner completely, other
possibilities come to mind:

For example, since I think that the core problem is having a single
number for the cost, the planner could instead compute a distribution
(in the most simple case just best and worst case, but ideally many
values). Then the planner could say something like: I have two plans A
nd B and A is at most 20 % faster in almost all cases. But in the worst
case, A is 1000 times slower. Being 20 % faster most of the time is nice
but doesn't outweigh the risk of being 1000 times slower sometimes, so
I'll use B anyway.

Another possibility I've been considering for some time is feeding back
the real execution times into the planner, but that sounds like a major
research project. (Actually I think Oracle does something like this
since version 12)

        hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment

signature.asc

Re: index problems (again)

From

Geoff Winkless

Date:

12 March 2016, 21:00:26

On 12 March 2016 at 18:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> On 2016-03-08 10:16:57 +0000, Geoff Winkless wrote:
>> On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>> > As Tom wrote, the estimate of having to read only about 140 rows is only
>> > valid if sc_id and sc_date are uncorrelated. In reality your query has
>> > to read a lot more than 140 rows, so it is much slower.
>>
>> But as I've said previously, even if I do select from scdate values
>> that I know to be in the first 1% of the data (supposedly the perfect
>> condition) the scan method is insignificantly quicker than the index
>> (scdate,scid) method.
>
> Actually the planner expects find a match within the first 0.0035 %, so
> to find out how fast that would be you would have to use a value from
> that range.

Fair point, although given my data I can't create a query that does
that since the first 0.3% or so all have the same value (although I
suppose I could modify the data to do so); however the thing I was
getting at was more that it doesn't have to be very far off perfect
distribution for the second method to be "better".

> The question is what can be done to improve the situation.
>
> Tom thinks that correlation statistics would help. That seems plausible
> to me.

I'm sure that correlation statistics would help in this one instance
(if Tom thinks so, I certainly wouldn't argue!). I was more looking at
the planner choices in general because I assumed (perhaps incorrectly)
that correlated columns aren't the only time this sort of best/worst
scenario hits it.

> You claim that no statistics are needed.

Well that's a bit confrontational.

> That may or may not be true: You haven't proposed an alternate method
> yet.

You could make an assumption that perfect distribution isn't true:
that actually the distribution is within a certain _deviation_ of that
perfect distribution. It wouldn't have to have been very much to make
the index-only scan win here and would still keep the planner from
choosing less optimal queries most of the time (and where it did end
up making the "wrong" choice it's not going to be far off anyway).

But I'm making assumptions here, I'm aware of that. Chances are that
actually most people's data _does_ fit into this perfect distribution
set. Is there any research that shows that real-world data usually
does?

> I feel fairly certain that using the worst case (the cost for scanning
> the whole table) would be just as bad in and would cause inferior plans
> to be used in many instances.

Oh I certainly agree with that.

As Jeff points out I'd have a much larger win in this instance by
someone spending the time implementing skip index scans rather than
messing with the planner :)

In the end I bit the bullet and converted the code to use
LEAST((SELECT MIN..),(SELECT MIN...),(SELECT MIN...)) which (being an
equality test) happily uses the (scdate,scid) index. Since the query
ends up producing an equivalent set I could (mostly) do a
search-replace, which made it a much simpler change than (for example)
the ,COUNT (*) version.

Geoff

Re: index problems (again)

From

"Peter J. Holzer"

Date:

12 March 2016, 22:00:29

On 2016-03-12 21:00:04 +0000, Geoff Winkless wrote:
> On 12 March 2016 at 18:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> > The question is what can be done to improve the situation.
> >
> > Tom thinks that correlation statistics would help. That seems plausible
> > to me.
[...]
> > You claim that no statistics are needed.
>
> Well that's a bit confrontational.

Sorry. Didn't want to sound confrontational. I was just repeating points
made by Tom and you previously in this thread to establish a baseline.

> > That may or may not be true: You haven't proposed an alternate method
> > yet.
>
> You could make an assumption that perfect distribution isn't true:
> that actually the distribution is within a certain _deviation_ of that
> perfect distribution. It wouldn't have to have been very much to make
> the index-only scan win here and would still keep the planner from
> choosing less optimal queries most of the time (and where it did end
> up making the "wrong" choice it's not going to be far off anyway).
>
> But I'm making assumptions here, I'm aware of that. Chances are that
> actually most people's data _does_ fit into this perfect distribution
> set. Is there any research that shows that real-world data usually
> does?

I don't think most people's data is perfectly distributed. But as you
say most data is probably within some deviation of being perfectly
distributed and as long as that deviation isn't too big it doesn't
matter.

But there are certainly some common examples of highly correlated
columns. Having a serial id and a date as in your case is probably quite
common. Another example might be a surrogate primary key which is
computed from some other fields (e.g. a timeseries code starting with a
country code, or a social security number starting with the birth date,
...). That's probably not that uncommon either.

So, I agree with you. This is a problem and it should be fixed. I'm just
sceptical that it can be done with a simple cost adjustment.

> As Jeff points out I'd have a much larger win in this instance by
> someone spending the time implementing skip index scans rather than
> messing with the planner :)

Yeah. I think I have some code which could benefit from this, too. I'll
have to try that trick from the wiki.

        hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment

signature.asc

Re: index problems (again)

From

Jeff Janes

Date:

13 March 2016, 06:40:17

On Tue, Mar 8, 2016 at 2:16 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
> On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>> As Tom wrote, the estimate of having to read only about 140 rows is only
>> valid if sc_id and sc_date are uncorrelated. In reality your query has
>> to read a lot more than 140 rows, so it is much slower.
>
> But as I've said previously, even if I do select from scdate values
> that I know to be in the first 1% of the data (supposedly the perfect
> condition) the scan method is insignificantly quicker than the index
> (scdate,scid) method.

That is sure not the case in my hands.  If I select from the first 1%,
I get the (scid) index being 10 times faster than (scdate,scid), and
(scid,scdate) being 50 times faster.

> Even with the absolute perfect storm (loading in the entire index for
> the full range) it's still not too bad (1.3 seconds or so).
>
> The point is that to assume, knowing nothing about the data, that the
> data is in an even distribution is only a valid strategy if the worst
> case (when that assumption turns out to be wildly incorrect) is not
> catastrophic. That's not the case here.

That just makes someone else's catastrophic worst case come to the fore.

Cheers,

Jeff

Re: index problems (again)

From

Jeff Janes

Date:

13 March 2016, 06:46:32

On Sat, Mar 12, 2016 at 1:00 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
>
> As Jeff points out I'd have a much larger win in this instance by
> someone spending the time implementing skip index scans rather than
> messing with the planner :)

But I think the hardest part of implementing skip index scans would
probably be getting the planner to understand the cost of using them!
So one way or another, the planner has to get messed with.

Cheers,

Jeff

Re: index problems (again)

From

Geoff Winkless

Date:

13 March 2016, 18:40:03

On 12 March 2016 at 22:00, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> I don't think most people's data is perfectly distributed. But as you
> say most data is probably within some deviation of being perfectly
> distributed and as long as that deviation isn't too big it doesn't
> matter.

Is that how what I wrote came across? I was trying to say exactly the
opposite: that if, instead of assuming perfect distribution, you
assume a certain deviation from that distribution, you will end up
with plans that would win with perfect distribution losing out to
plans that are almost as good on that perfect distribution but behave
better on data that is not perfectly distributed.

Geoff