Re: Increasing GROUP BY CHAR columns speed - Mailing list pgsql-performance
From | Scott Carey |
---|---|
Subject | Re: Increasing GROUP BY CHAR columns speed |
Date | |
Msg-id | BDFBB77C9E07BE4A984DAAE981D19F961ACA17D9CE@EXVMBX018-1.exch018.msoutlookonline.net Whole thread Raw |
In response to | Re: Increasing GROUP BY CHAR columns speed ("Scott Marlowe" <scott.marlowe@gmail.com>) |
Responses |
Re: Increasing GROUP BY CHAR columns speed
Re: Increasing GROUP BY CHAR columns speed Re: Increasing GROUP BY CHAR columns speed Re: Increasing GROUP BY CHAR columns speed |
List | pgsql-performance |
I'm not sure that postgres allocates the whole work_mem each time, and in any event, the OS won't try and allocate to RAMuntil the page is actually used. My experience is that oversized work_mem doesn't hurt until it is actually used. Else,my configuration with 1000 concurrent connections and work_mem = 1GB would have blown up. I don't have that much RAM+ SWAP * overcommit. Of the 1000 connections, only 8 run queries that would ever need more than 2 or 3 MB of space toexecute. Of course, one has to be certain what the client connections do for it to be very over sized, so I would notrecommend the above in general. ---------- Back to this query: In the query case shown, the explain analyze shows: "GroupAggregate (cost=43403.38..52521.63 rows=41923 width=838) (actual time=8083.171..8620.908 rows=577 loops=1)" Thus, the planner thought that it needed ~40K ish rows for results of ~800 bytes in size, hence an approximation of the requiredhash space is 80M. However, it returns only 577 rows, so the actual needed hash space is much smaller. This isa common problem I've seen -- the query planner has very conservative estimates for result row counts from any non-trivialfilter condition / aggregate which leads to poor query plans. I'd be suprised if this query used more than 1MB total work_mem in reality for that last step if it used a hash. As it stands,sorting will actually use much more. I'm still not sure why the planner chose to sort rather than hash with oversized work_mem (is there an implied order in thequery results I missed?). My guess is that this query can still get much faster if a hash is possible on the last part. It looks like the gain so far has more to do with sorting purely in memory which reduced the number of compares required. But that is just a guess. ________________________________________ From: Scott Marlowe [scott.marlowe@gmail.com] Sent: Friday, November 28, 2008 10:24 AM To: Andrus Cc: Scott Carey; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Increasing GROUP BY CHAR columns speed On Fri, Nov 28, 2008 at 10:58 AM, Andrus <kobruleht2@hot.ee> wrote: > > SET work_mem = 2097151 (this is max allowed value) or SET work_mem = 97151 > decreases query time from 12 seconds to 9 seconds. Setting work_mem so high that it allocates memory that isn't there WILL slow down your queries, because the OS will then wind up swapping out one part of the swap to make room for another part. There are values between 100M and 2G. Run it with increasing work_mem from 100 to 500 or so Meg and see if that helps. Keep an eye on vmstat 1 or something to see if your machine starts swapping out while running the query. If it does you've overallocated memory and things will start to slow down a LOT.
pgsql-performance by date: