Home > mailing lists
Re: Performance problem with query - Mailing list pgsql-general

From	Christian Rengstl
Subject	Re: Performance problem with query
Date	July 19, 2006 09:03:58
Msg-id	44BE3BB1.0AD0.0080.0@klinik.uni-regensburg.de Whole thread Raw
In response to	Re: Performance problem with query (Q <qdolan@gmail.com>)
Responses	Re: Performance problem with query
List	pgsql-general
Tree view
So here's the master table including the rules:

entry_no int8 NOT NULL DEFAULT nextval('public.master_seq'::regclass),
  pid varchar(15) NOT NULL,
  val_1 varchar(1),
  val_2 varchar(1),
  chr int2 NOT NULL,
  aendat timestamp DEFAULT now(),
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
  CONSTRAINT "UNIQUE_MASTER" UNIQUE (pid, entry_no)

CREATE OR REPLACE RULE "INSERT_INTO_1" AS
    ON INSERT TO public.master
   WHERE new.chr = 1 DO INSTEAD  INSERT INTO public.table_1 (entry_no, pid, val_1, val_2, chr, aendat, aennam)
  VALUES (new.entry_no, new.pid, new.val_1, new.val_2, new.chr, new.aendat, new.aennam);

Like this i have around 20 rules so far, but there might be more later on. The children tables are so far exactly as
themaster table. 

Q <qdolan@gmail.com> wrote on 07/19/06 1:52 pm:
> On 19/07/2006, at 8:49 PM, Christian Rengstl wrote:
>
>> Obviously it had something to do with the rule, because now
>> everything finished within 20 minutes. the problem is just that i
>> don't really want to give up the inheritance design. is there a way
>> to maintain the inheritance that doesn't cause this huge
>> performance problem?
>
> That is hard to say unless you post the rule and table schema you are
> currently using.
>
>> Q <qdolan@gmail.com> wrote on 07/19/06 11:54 am:
>>> On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:
>>>
>>>> The analyze is from the exact query and i dropped the indexes
>>>> before the insert as well without imrpvement. The target table is
>>>> as well completely empty and the insert is supposed to write, in
>>>> this case, more or less 8 million lines in the table. There is a
>>>> rule though, because i have inheritance table structure with one
>>>> master table and around 20 child tables.
>>>
>>> I would say the problem is in the rule.  Try doing the insert into a
>>> duplicate table with no rules or inheritance and see how long it
>>> takes.
>>>
>>> Perhaps you should provide the actual schema of tables and rules that
>>> are involved in the query in question.
>>>
>>>> Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:
>>>>>
>>>>> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>>>>>
>>>>>> now finally after a long time i have the query plan for the whole
>>>>>> filled table. I hope somebody can explain me why it takes so much
>>>>>> longer...
>>>>>
>>>>>
>>>>> These explain analyze results don't appear to be from the
>>>>> queries you
>>>>> posted previously. For these results to mean anything you need to
>>>>> include the EXACT queries you used to generate them.
>>>>>
>>>>>> Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
>>>>>> time=0.056..655772
>>>>>> .273 rows=8044000 loops=1)
>>>>>>   One-Time Filter: ((split_part(($1)::text, '_'::text,
>>>>>> 2))::smallint = 1)
>>>>>>   InitPlan
>>>>>>     ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>>>>>> time=0.003..0.003 rows
>>>>>> =1 loops=1)
>>>>>>           ->  Seq Scan on temp_table  (cost=0.00..194313.00
>>>>>> rows=8044000 width=
>>>>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>>>>>>     ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>>>>>> time=0.006..0.007 rows
>>>>>> =1 loops=1)
>>>>>>           ->  Seq Scan on temp_table  (cost=0.00..194313.00
>>>>>> rows=8044000 width=
>>>>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>>>>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
>>>>>> width=39) (act
>>>>>> ual time=0.002..191672.344 rows=8044000 loops=1)
>>>>>> Total runtime: 62259544.896 ms
>>>>>
>>>>>
>>>>> This is the query you want to be interested in, the others took no
>>>>> time at all.
>>>>>
>>>>> As a guess I would say the query is an INSERT INTO ... FROM
>>>>> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
>>>>> The majority of the time appears to be taken up on the actual
>>>>> INSERT
>>>>> and not the SELECT part.
>>>>>
>>>>> How many rows are in the target table and what indexes does it
>>>>> have?
>>>>> Does it have any triggers, check constraints, or rules applied
>>>>> to it?
>>>>> All these things can make the insert take longer as the number of
>>>>> rows you have already in the table increases.
>>>>>
>>>>> More than likely you have a target table with a LOT of rows and a
>>>>> bunch of indexes on it and your disks are being thrashed because
>>>>> the
>>>>> indexes are not able to stay cached in RAM. At this point you
>>>>> should
>>>>> ensure your machine is not swapping do disk, and at the very least
>>>>> you should go through one of the many tuning guidelines
>>>>> available and
>>>>> ensure you have allocated the appropriate amount of memory to
>>>>> postgresql for your needs.
>>>>>
>>>>> You may also want to consider dropping the indexes before you do
>>>>> the
>>>>> INSERT and recreate them afterwards.
>>>>>
>>>>>
>>>>>
>>>>>>>>> "Christian Rengstl" <Christian.Rengstl@klinik.uni-
>>>>>>>>> regensburg.de>
>>>>>>>>> 13.07.06 8.37 Uhr >>>
>>>>>> Good morning list,
>>>>>>
>>>>>> the following query takes about 15 to 20 minutes for around 2
>>>>>> million lines in the file myfile.txt, but with 8 million lines it
>>>>>> takes around 5 hours and i just don't understand why there is such
>>>>>> a huge discrepancy in performance.
>>>>>>
>>>>>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>>>>>
>>>>>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>>>>>>             SELECT pid, smid, val1, val12, CAST(split_part((SELECT
>>>>>> chr from public.temp_table LIMIT 1), '_', 2) as int2)
>>>>>>            FROM public.temp_table;
>>>>>>
>>>>>> INSERT INTO public.values(smid, pos, chr)
>>>>>>         SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr
>>>>>> from
>>>>>> public.temp_table LIMIT 1), '_', 2) as int2)
>>>>>>         FROM public.temp_table;
>>>>>>
>>>>>> I came up with this query, because i wanted to use the COPY
>>>>>> command
>>>>>> to load huge files into the db, but i don't want to load all the
>>>>>> columns contained in the file in only one table but copy some of
>>>>>> them into one table and some in a second table. As i found out
>>>>>> with
>>>>>> logging, the data is loaded into temp_table within 15 minutes, but
>>>>>> to transfer it from the temp_table toagain only something like 10
>>>>>> minutes. Can it be that the cast takes up so much more time than
>>>>>> when reading and transferring 2 million lines?
>>>>>
>>>>>
>>>>> --
>>>>> Seeya...Q
>>>>>
>>>>>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>
>>>>>                            _____  /  Quinton Dolan -
>>>>> qdolan@gmail.com
>>>>>    __  __/  /   /   __/   /      /
>>>>>       /    __  /   _/    /      /        Gold Coast, QLD, Australia
>>>>>    __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
>>>>>                      _______  /
>>>>>                              _\
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 2: Don't 'kill -9' the postmaster
>>>>
>>>>
>>>> --
>>>> Christian Rengstl M.A.
>>>> Klinik und Poliklinik für Innere Medizin II
>>>> Kardiologie - Forschung
>>>> Universitätsklinikum Regensburg
>>>> B3 1.388
>>>> Franz-Josef-Strauss-Allee 11
>>>> 93053 Regensburg
>>>> Tel.: +49-941-944-7230
>>>
>>>
>>> --
>>> Seeya...Q
>>>
>>>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>
>>>                            _____  /  Quinton Dolan - qdolan@gmail.com
>>>    __  __/  /   /   __/   /      /
>>>       /    __  /   _/    /      /        Gold Coast, QLD, Australia
>>>    __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
>>>                      _______  /
>>>                              _\
>>>
>>>
>>
>>
>> --
>> Christian Rengstl M.A.
>> Klinik und Poliklinik für Innere Medizin II
>> Kardiologie - Forschung
>> Universitätsklinikum Regensburg
>> B3 1.388
>> Franz-Josef-Strauss-Allee 11
>> 93053 Regensburg
>> Tel.: +49-941-944-7230
>
>
> --
> Seeya...Q
>
>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>
>                            _____  /  Quinton Dolan - qdolan@gmail.com
>    __  __/  /   /   __/   /      /
>       /    __  /   _/    /      /        Gold Coast, QLD, Australia
>    __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
>                      _______  /
>                              _\
>
>


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230
pgsql-general by date:
From: Timothy Smith
Date: 19 July 2006, 09:00:28
Subject: problem creating users via pythons script
From: Volkan YAZICI
Date: 19 July 2006, 09:15:34
Subject: Re: problem creating users via pythons script
Re: Performance problem with query - Mailing list pgsql-general

Previous

Next