Re: not exactly a bug report, but surprising behaviour - Mailing list pgsql-general
From | Greg Stark |
---|---|
Subject | Re: not exactly a bug report, but surprising behaviour |
Date | |
Msg-id | 87vfzz5lv3.fsf@stark.dyndns.tv Whole thread Raw |
In response to | Re: not exactly a bug report, but surprising behaviour (Stephan Szabo <sszabo@megazone23.bigpanda.com>) |
Responses |
Re: not exactly a bug report, but surprising behaviour
Re: not exactly a bug report, but surprising behaviour |
List | pgsql-general |
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > This would have a couple big advantages: > > > > 1) If columns were excluded from the results from limit/offset clauses then > > possibly expensive functions in the select list wouldn't have to be > > calculated. It also means if the sort is quick but the functions slow that > > the first rows would be returned quickly to the application even if the > > total time was the same. > > That's a useful advantage. You can probably get this effect from pushing > clauses into from list subselects as a current optimization, but it'd be > nice since that's not an obvious conversion. Not if the clause depends on the sort results such as limit/offset or having clauses. > > 2) The sort could be done on pointers to the tuples rather than pushing the > > data in the tuple around in the sort. Obviously the transaction integrity > > issues are tricky with this though. But it could save a lot of memory > > pressure from sorts. > > This'd be an advantage in the case of a single table. Unless you can > guarantee the join keeps the ordering properties, I don't think this'd > help with anything that's doing joins or anything more complicated. Joins shouldn't be a problem, just sort a list of pointers to records and have a structure on the side saying which tables each pointer corresponds to and which columns and expressions are requested from each. I'm not sure how to retain transactional integrity with this though. The tricky part would be evaluating when it would help. When sorting many records with a narrow result set it wouldn't pay to generate pointers that would be just as large as the original data and possibly push the original data out of cache. But for sorting a limited number of records of many columns each it would be way more efficient. I'm watching this query I'm working on now drive the cpu to 100% for 6+ hours with virtually no I/O. And I think it's the best I can do. All the times is being spent moving bits around from one place to another. I'm entirely blocked on memory throughput right now. The fewer data are actually pushed around the faster sorts will be. It occurs to me that it's possible postgres is doing this already at a lower level of abstraction. The sort engine could load all the records to sort into an array and generate a list of indexes into that array and sort the indexes. But I get the impression from the previous conversation that that's not what's happening. I'm constantly surprised by how much work there is to build a good database engine. But I'm also constantly amazed by how much postgres is already doing. It's really quite remarkable how complete it is. -- greg
pgsql-general by date: