Re: [HACKERS] What about LIMIT in SELECT ? - Mailing list pgsql-hackers
From | jwieck@debis.com (Jan Wieck) |
---|---|
Subject | Re: [HACKERS] What about LIMIT in SELECT ? |
Date | |
Msg-id | m0zTRrE-000EBRC@orion.SAPserv.Hamburg.dsh.de Whole thread Raw |
In response to | Re: [HACKERS] What about LIMIT in SELECT ? (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: [HACKERS] What about LIMIT in SELECT ?
|
List | pgsql-hackers |
Oleg Bartunov wrote: > This is not a problem for CGI-script to know which key to start. Never meant that would be a problem. A FORM variable will of course do this. > Without LIMIT every CGI call backend will do *FULL* selection > and cursor helps just in fetching a definite number of rows, > in principle I can do this with CGI-script. Also, cursor > returns data back in ASCII format (man l declare) and this requires > additional job for backend to convert data from intrinsic (binary) > format. Right implementation of LIMIT offset,number_of_rows could be > a great win and make postgres superior free database engine for > Web applications. Many colleagues of mine used mysql instead of That's the point I was missing. The offset! > postgres just because of lacking LIMIT. Tatsuo posted a patch > for set query_limit to 'num', I just tested it and seems it > works fine. Now, we need only possibility to specify offset, > say > set query_limit to 'offset,num' > ( Tatsuo, How difficult to do this ?) > and LIMIT problem will ne gone. Think you haven't read my posting completely. Even with the executor limit, the complete scan into the sort is done by the backend. You need to specify ORDER BY to get the same list again (without the offset doesn't make sense). But currently, ORDER BY forces a sort node into the query plan. What the executor limit tells is how many rows will be returned from the sorted data. Not what goes into the sort. Filling the sort and sorting the data consumes the most time of the queries execution. I haven't looked at Tatsuo's patch very well. But if it limits the amount of data going into the sort (on ORDER BY), it will break it! The requested ordering could be different from what the choosen index might return. The used index is choosen by the planner upon the qualifications given, not the ordering wanted. So if you select WHERE b = 1 ORDER BY a, then it will use an index on attribute b to match the qualification. The complete result of that index scan goes into the sort to get ordered by a. If now the executor limit stops sort filling after the limit is exceeded, only the same tuples will go into the sort every time. But they have nothing to do with the requested order by a. What LIMIT first needs is a planner enhancement. In file backend/optimizer/plan/planner.c line 284 it must be checked if the actual plan is an indexscan, if the indexed attributes are all the same as those in the given sort clause and that the requested sort order (operator) is that what the index will return. If that all matches, it can ignore the sort clause and return the index scan itself. Second enhancement must be the handling of the offset. In the executor, the index scan must skip offset index tuples before returning the first. But NOT if the plan isn't a 1-table-index-scan. In that case the result tuples (from the topmost unique/join/whatever node) have to be skipped. With these enhancements, the index tuples to be skipped (offset) will still be scanned, but not the data tuples they point to. Index scanning might be somewhat faster. This all will only speedup simple 1-table-queries, no joins or if the requested order isn't that what the index exactly returns. Anyway, I'll take a look if I can change the planner to omit the sort if the tests described above are true. I think it would be good anyway. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
pgsql-hackers by date: