Home > mailing lists

Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date	November 20, 2014 18:30:53
Msg-id	CAM3SWZStRMTxbow+j70DHPJ4VuVkf2=fvjY0WUH9zbd4GLOctA@mail.gmail.com Whole thread Raw
In response to	Re: Doing better at HINTing an appropriate column within errorMissingColumn() (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Doing better at HINTing an appropriate column within errorMissingColumn()
List	pgsql-hackers

Tree view

On Thu, Nov 20, 2014 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> In general, I think the cost of a bad suggestion is much lower than
>> the benefit of a good one. You seem to be suggesting that they're
>> equal. Or that they're equally likely in an organic situation. In my
>> estimation, this is not the case at all.
>
> The way I see it, the main cost of a bad suggestion is that it annoys
> the user with clutter which they may brand as "stupid".  Think about
> how much vitriol has been spewed over the years against progress bars
> (or estimated completion) times that don't turn out to mirror reality.

Well, you can judge the quality of the suggestion immediately. I
imagined a mechanism that gives a little bit more than the minimum
amount of guidance for things like contractions/abbreviations.

> Microsoft has gotten more cumulative flack about their inaccurate
> progress bars over the years than they would have for dropping an
> elevator on a cute baby.

I haven't used a more recent version of Windows than Windows Vista,
but I'm pretty sure that they kept it up.

>> I'm curious about your thoughts on the compromise of a ramped up
>> distance threshold to apply a test for the absolute quality of a
>> match. I think that the fact that git gives bad suggestions with terse
>> strings tells us a lot, though. Note that unlike git, with terse
>> strings we may well have a good deal more equidistant matches, and as
>> soon as the number of would-be matches exceeds 2, we actually give no
>> matches at all. So that's an additional protection against poor
>> matches with terse strings.
>
> I don't know what you mean by a ramped-up distance threshold, exactly.
> I think it's good for the distance threshold to be lower for small
> strings and higher for large ones.  I think I'm somewhat open to
> negotiation on the details, but I think any system that's going to
> suggest "quantity" for "tit" is going too far.

I mean the suggestion of raising the cost threshold more gradually,
not as a step function of the number of characters in the string [1]
where it's either over 6 characters and must pass the 50% test, or
isn't and has no absolute quality test. The exact modification I
described will FWIW remove the "quantity" for "qty" suggestion, as
well as all the similar suggestions that you found objectionable (like
"tit" also offering a suggestion of "quantity").

If you look at the regression tests, none of the sensible suggestions
are lost (some would be by an across the board 50% absolute quality
threshold, as I previously pointed out [2]), but all the bad ones are.
I attach failed regression test output showing the difference between
the previous expected values, and actual values with that small
modification - it looks like most or all bad cases are now fixed.

> If the user types
> "qty" when they meant "quantity", they probably don't really need the
> hint, because they're going to say to themselves "wait, I guess I
> didn't abbreviate that".  The time when they need the hint is when
> they typed "quanttiy", because it's quite possible to read a query
> with that sort of typo multiple times and not realize that you've made
> one.

I agree that that's a more important case.

> In other words, I think there's value in trying to clue somebody in
> when they've made a typo, but not when they've made a think-o.  We
> won't be able to do the latter accurately enough to make it more
> useful than annoying.

That's certainly true; I think that we only disagree about the exact
point at which we enter the think-o correction business.

[1] http://www.postgresql.org/message-id/CAM3SWZT+7hH29Go6ZuY2OrCS40=6yPVM_nt9NjfovP3XwjixDw@mail.gmail.com
[2] http://www.postgresql.org/message-id/CAM3SWZTSGokNhT8rK+0Eed7spNJg4pAdMbqqYi0FH9bWcNvTGA@mail.gmail.com
--
Peter Geoghegan

Attachment

regression.diffs

pgsql-hackers by date:

From: Peter Geoghegan
Date: 20 November 2014, 18:08:54
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Albe Laurenz
Date: 20 November 2014, 18:57:07
Subject: Re: Functions used in index definitions shouldn't be changed

Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

Attachment

Previous

Next