Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Doing better at HINTing an appropriate column within errorMissingColumn() |
Date | |
Msg-id | CAM3SWZStRMTxbow+j70DHPJ4VuVkf2=fvjY0WUH9zbd4GLOctA@mail.gmail.com Whole thread Raw |
In response to | Re: Doing better at HINTing an appropriate column within errorMissingColumn() (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Doing better at HINTing an appropriate column within errorMissingColumn()
|
List | pgsql-hackers |
On Thu, Nov 20, 2014 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> In general, I think the cost of a bad suggestion is much lower than >> the benefit of a good one. You seem to be suggesting that they're >> equal. Or that they're equally likely in an organic situation. In my >> estimation, this is not the case at all. > > The way I see it, the main cost of a bad suggestion is that it annoys > the user with clutter which they may brand as "stupid". Think about > how much vitriol has been spewed over the years against progress bars > (or estimated completion) times that don't turn out to mirror reality. Well, you can judge the quality of the suggestion immediately. I imagined a mechanism that gives a little bit more than the minimum amount of guidance for things like contractions/abbreviations. > Microsoft has gotten more cumulative flack about their inaccurate > progress bars over the years than they would have for dropping an > elevator on a cute baby. I haven't used a more recent version of Windows than Windows Vista, but I'm pretty sure that they kept it up. >> I'm curious about your thoughts on the compromise of a ramped up >> distance threshold to apply a test for the absolute quality of a >> match. I think that the fact that git gives bad suggestions with terse >> strings tells us a lot, though. Note that unlike git, with terse >> strings we may well have a good deal more equidistant matches, and as >> soon as the number of would-be matches exceeds 2, we actually give no >> matches at all. So that's an additional protection against poor >> matches with terse strings. > > I don't know what you mean by a ramped-up distance threshold, exactly. > I think it's good for the distance threshold to be lower for small > strings and higher for large ones. I think I'm somewhat open to > negotiation on the details, but I think any system that's going to > suggest "quantity" for "tit" is going too far. I mean the suggestion of raising the cost threshold more gradually, not as a step function of the number of characters in the string [1] where it's either over 6 characters and must pass the 50% test, or isn't and has no absolute quality test. The exact modification I described will FWIW remove the "quantity" for "qty" suggestion, as well as all the similar suggestions that you found objectionable (like "tit" also offering a suggestion of "quantity"). If you look at the regression tests, none of the sensible suggestions are lost (some would be by an across the board 50% absolute quality threshold, as I previously pointed out [2]), but all the bad ones are. I attach failed regression test output showing the difference between the previous expected values, and actual values with that small modification - it looks like most or all bad cases are now fixed. > If the user types > "qty" when they meant "quantity", they probably don't really need the > hint, because they're going to say to themselves "wait, I guess I > didn't abbreviate that". The time when they need the hint is when > they typed "quanttiy", because it's quite possible to read a query > with that sort of typo multiple times and not realize that you've made > one. I agree that that's a more important case. > In other words, I think there's value in trying to clue somebody in > when they've made a typo, but not when they've made a think-o. We > won't be able to do the latter accurately enough to make it more > useful than annoying. That's certainly true; I think that we only disagree about the exact point at which we enter the think-o correction business. [1] http://www.postgresql.org/message-id/CAM3SWZT+7hH29Go6ZuY2OrCS40=6yPVM_nt9NjfovP3XwjixDw@mail.gmail.com [2] http://www.postgresql.org/message-id/CAM3SWZTSGokNhT8rK+0Eed7spNJg4pAdMbqqYi0FH9bWcNvTGA@mail.gmail.com -- Peter Geoghegan
Attachment
pgsql-hackers by date: