Re: Cursor-based results: bafflingly slow - Mailing list pgsql-jdbc
From | Oliver Jowett |
---|---|
Subject | Re: Cursor-based results: bafflingly slow |
Date | |
Msg-id | 4A4E11EB.4040809@opencloud.com Whole thread Raw |
In response to | Cursor-based results: bafflingly slow (Robin Houston <robin.houston@gmail.com>) |
Responses |
Re: Cursor-based results: bafflingly slow
|
List | pgsql-jdbc |
Robin Houston wrote: > 2009/7/3 Oliver Jowett <oliver@opencloud.com <mailto:oliver@opencloud.com>>: >> They're not really equivalent. A portal with a limited fetchsize is more >> like DECLARE CURSOR, not LIMIT. >> >> What does the plan look like if you plan it without the LIMIT? > > Hmm, well... If I do the same as before, but without the limit, then the > plan is: [.. different plan with an outermost Sort step ..] > Surely the server is not so stupid as to use an unnecessary server-side > sort, if I've explicitly asked to fetch the rows 100 at a time? I'm not too familiar with the planner any more, but as I understand it this is what's happening: Think about the difference between (1) SELECT * FROM hugetable ORDER BY some_key; (2) SELECT * FROM hugetable ORDER BY some_key LIMIT 100; where there is a suitable index on some_key. For (1) a seqscan + local sort is likely to be faster (because a sequential scan is faster than doing a random-access index scan over the entire table) For (2) an index scan is likely to be faster, because you'll stop after at most 100 random-access lookups, which is cheaper than scanning the entire table. If you take case (1) and set the fetchsize to 100, that doesn't actually change things at all - it doesn't turn into case (2)! Portals let you grab results a bit at a time (and let you give up on the query halfway through easily), but they don't change the underlying plan at all; the server plans on the assumption that the query will run to completion. In fact, it doesn't even know the fetchsize that will eventually be used during execution when it is doing the planning step. (If you look back at your connection tracing, you'll see that the fetchsize is only provided in the Execute message; query planning happens back when the Parse message is sent) > Is there any easy way to find out what execution plan the server is > actually using? I suppose I could ask the DBAs to enable auto_explain on > the dev database. You should be able to issue an EXPLAIN ANALYZE via JDBC; I can't remember exactly how the results get back to you though (might be SQLWarnings, or a separate resultset) > If this *is* the problem, do you know any way round it. Obviously I want > the plan that *doesn't* involve sorting everything before returning > anything. If you want to stream results promptly, you may be stuck with removing that ORDER BY. I'm not sure if there's any way to say to the planner "give me a plan with a low startup cost, even if it has a higher total cost" which seems to be what you want here. Or if you really do only care about the first N rows, then put a LIMIT in your query. > If I explicitly create a cursor, with no limit, then it does return the > first 100 rows very quickly. On the other hand, I have to hard-code the > parameter this way (because I don't know any way, in psql, to use a bind > variable with an explicit cursor.) I came across some (very old) posts in the archives that suggested that DECLARE CURSOR has a fudge where it ranks plan costs assuming that only some rows (10%?) will actually be fetched. I'm not sure if that's true of the current planner. -O
pgsql-jdbc by date: