Home > mailing lists

Re: Help with slow table update - Mailing list pgsql-general

From	Jim Nasby
Subject	Re: Help with slow table update
Date	April 14, 2015 22:30:03
Msg-id	552D94C6.4030004@BlueTreble.com Whole thread Raw
In response to	Re: Help with slow table update (Pawel Veselov <pawel.veselov@gmail.com>)
Responses	Re: Help with slow table update
List	pgsql-general

Tree view

On 4/14/15 4:44 PM, Pawel Veselov wrote:
> On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com
> <mailto:Jim.Nasby@bluetreble.com>> wrote:
>
>     On 4/14/15 1:28 PM, Pawel Veselov wrote:
>
>
>         I wonder if what I need to do, considering that I update a lot
>         of "the
>         same" rows as I process this queue, is to create a temp table,
>         update
>         the rows there, and then update the actual tables once at the end...
>
>
>     That's what I'd do.
>
>
> Well, in short, I changed (repeat the body of loop for how many tables
> are there)
>
> LOOP (item)
>    UPDATE table with item
>    IF not found INSERT item INTO table; END IF;
> END LOOP;
>
> to:
>
> CREATE TEMP TABLE xq_table (like table) on commit drop;
> LOOP (item)
>    LOOP
>      UPDATE xq_table with item;
>      exit when found;
>      INSERT INTO xq_table select * from table for update;
>      continue when found;
>      INSERT item INTO xq_table;
>      exit;
>    END LOOP;
> END LOOP;
> UPDATE table a set (rows) = (xq.rows)
>    FROM xq_table xq
>    WHERE (a.keys) = (xq.keys)
>
> That works significantly faster. The final update statement is very
> fast. The process is somewhat slow in the beginning as it sucks in
> records from "total" into "xq_total", but once all of that is moved into
> the temp table, it rushes through the rest.

Databases like to think in sets. It will generally be more efficient to
do set operations instead of a bunch of row-by-row stuff.

Since you're pulling all of this from some other table your best bet is
probably something like:

CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total

>     The other option would be to use a constraint trigger paired with a
>     per-row trigger on the hourly table to drive the daily table, and on
>     the daily table to drive the total table. The way that would work is
>     the per-row table would simply keep track of all the unique records
>     that were changed in a statement (presumably by putting them in a
>     temp table). Once the statement is "done", the constraint trigger
>     would fire; it would summarize all the changed data and do a much
>     smaller number of updates to the table being summarized into.
>
>
> I'm not sure how I would be able to avoid the same number of changes on
> the total table, trigger would fire on each update, won't it? So, same
> problem with a lot of changes on a table...

The difference is that you'd be doing plain INSERTs into a temp table
and then summarizing that. That's going to be a LOT more efficient than
a slew of updates on an existing table.

>     BTW, you also made a comment about not having to hit the table if
>     you look at something in an index. You can only do that if all the
>     data you need is in the index, AND the page with the record is
>     marked as being all-visible (google for Postgres Visibility Map). If
>     that's not the case then you still have to pull the row in the table
>     in, in order to determine visibility. The only case where you can
>     still avoid hitting the table is something like a NOT EXISTS; if you
>     can't find any entries in the index for something then they
>     definitely won't be in the table.
>
>
> What I was saying is that if a table has a unique index, and there is
> cached fact that a particular index value points to a particular row,
> there shouldn't be a need to re-scan the index again to search for any
> more matching values (which would be necessary if the index was not
> unique). Again, all considering the size of the index, the amount of
> different index values that are being queried, etc.

It still has to rescan because of visibility concerns.

>     But remember that if you update or delete a row, removing it from an
>     index, the data will stay in that index until vacuum comes along.
>
>     Also, there's no point in doing a REINDEX after a VACUUM FULL;
>     vacuum full rebuilds all the indexes for you.
>
>
> I was being desperate :)
>
> I still think there is something very wrong with this particular table.
> First, I have production systems that employ this function on way larger
> data set, and there is no problem (so far, but still). This machine is
> part of a test deployment, there is no constant load, the only data that
> is being written now is when I do these tests. Vacuuming should prune
> all that dead stuff, and if it's absent, it's unclear where is the time
> spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by
the huge improvement VACUUM FULL made.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

pgsql-general by date:

From: Pawel Veselov
Date: 14 April 2015, 21:44:36
Subject: Re: Help with slow table update

From: Pawel Veselov
Date: 15 April 2015, 00:01:43
Subject: Re: Help with slow table update

Re: Help with slow table update - Mailing list pgsql-general

Previous

Next