Re: Help with slow table update - Mailing list pgsql-general
From | Jim Nasby |
---|---|
Subject | Re: Help with slow table update |
Date | |
Msg-id | 552D94C6.4030004@BlueTreble.com Whole thread Raw |
In response to | Re: Help with slow table update (Pawel Veselov <pawel.veselov@gmail.com>) |
Responses |
Re: Help with slow table update
|
List | pgsql-general |
On 4/14/15 4:44 PM, Pawel Veselov wrote: > On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>> wrote: > > On 4/14/15 1:28 PM, Pawel Veselov wrote: > > > I wonder if what I need to do, considering that I update a lot > of "the > same" rows as I process this queue, is to create a temp table, > update > the rows there, and then update the actual tables once at the end... > > > That's what I'd do. > > > Well, in short, I changed (repeat the body of loop for how many tables > are there) > > LOOP (item) > UPDATE table with item > IF not found INSERT item INTO table; END IF; > END LOOP; > > to: > > CREATE TEMP TABLE xq_table (like table) on commit drop; > LOOP (item) > LOOP > UPDATE xq_table with item; > exit when found; > INSERT INTO xq_table select * from table for update; > continue when found; > INSERT item INTO xq_table; > exit; > END LOOP; > END LOOP; > UPDATE table a set (rows) = (xq.rows) > FROM xq_table xq > WHERE (a.keys) = (xq.keys) > > That works significantly faster. The final update statement is very > fast. The process is somewhat slow in the beginning as it sucks in > records from "total" into "xq_total", but once all of that is moved into > the temp table, it rushes through the rest. Databases like to think in sets. It will generally be more efficient to do set operations instead of a bunch of row-by-row stuff. Since you're pulling all of this from some other table your best bet is probably something like: CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *; CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY; UPDATE ar_hourly SET ... FROM hourly_v JOIN ...; INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...; -- Same thing for daily -- Same thing for total > The other option would be to use a constraint trigger paired with a > per-row trigger on the hourly table to drive the daily table, and on > the daily table to drive the total table. The way that would work is > the per-row table would simply keep track of all the unique records > that were changed in a statement (presumably by putting them in a > temp table). Once the statement is "done", the constraint trigger > would fire; it would summarize all the changed data and do a much > smaller number of updates to the table being summarized into. > > > I'm not sure how I would be able to avoid the same number of changes on > the total table, trigger would fire on each update, won't it? So, same > problem with a lot of changes on a table... The difference is that you'd be doing plain INSERTs into a temp table and then summarizing that. That's going to be a LOT more efficient than a slew of updates on an existing table. > BTW, you also made a comment about not having to hit the table if > you look at something in an index. You can only do that if all the > data you need is in the index, AND the page with the record is > marked as being all-visible (google for Postgres Visibility Map). If > that's not the case then you still have to pull the row in the table > in, in order to determine visibility. The only case where you can > still avoid hitting the table is something like a NOT EXISTS; if you > can't find any entries in the index for something then they > definitely won't be in the table. > > > What I was saying is that if a table has a unique index, and there is > cached fact that a particular index value points to a particular row, > there shouldn't be a need to re-scan the index again to search for any > more matching values (which would be necessary if the index was not > unique). Again, all considering the size of the index, the amount of > different index values that are being queried, etc. It still has to rescan because of visibility concerns. > But remember that if you update or delete a row, removing it from an > index, the data will stay in that index until vacuum comes along. > > Also, there's no point in doing a REINDEX after a VACUUM FULL; > vacuum full rebuilds all the indexes for you. > > > I was being desperate :) > > I still think there is something very wrong with this particular table. > First, I have production systems that employ this function on way larger > data set, and there is no problem (so far, but still). This machine is > part of a test deployment, there is no constant load, the only data that > is being written now is when I do these tests. Vacuuming should prune > all that dead stuff, and if it's absent, it's unclear where is the time > spent navigating/updating the table with 24 rows :) I think you definitely have a problem with dead rows, as evidenced by the huge improvement VACUUM FULL made. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
pgsql-general by date: