Home > mailing lists

postgres_fdw batching vs. (re)creating the tuple slots - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	postgres_fdw batching vs. (re)creating the tuple slots
Date	May 30, 2021 20:22:10
Msg-id	ebbbcc7d-4286-8c28-0272-61b4753af761@enterprisedb.com Whole thread Raw
Responses	Re: postgres_fdw batching vs. (re)creating the tuple slots
List	pgsql-hackers

Tree view

Hi,

while looking at the other thread related to postgres_fdw batching [1]
and testing with very large batches, I noticed this disappointing
behavior when inserting 1M rows (just integers, nothing fancy):

no batching: 64782 ms
100 rows:     2118 ms
32767 rows:  41115 ms

Pretty nice improvement when batching 100 rows, but then it all goes
wrong for some reason.

The problem is pretty obvious from a perf profile:


  --100.00%--ExecModifyTable
   |
    --99.70%--ExecInsert
     |
     |--50.87%--MakeSingleTupleTableSlot
     |  |
     |   --50.85%--MakeTupleTableSlot
     |             |
     |              --50.70%--IncrTupleDescRefCount
     |                        |
     |                         --50.69%--ResourceOwnerRememberTupleDesc
     |                                   |
     |                                    --50.69%--ResourceArrayAdd
     |
     |--48.18%--ExecBatchInsert
     |  |
     |   --47.92%--ExecDropSingleTupleTableSlot
     |             |
     |             |--47.17%--DecrTupleDescRefCount
     |             |          |
     |             |           --47.15%--ResourceOwnerForgetTupleDesc
     |             |                     |
     |             |                      --47.14%--ResourceArrayRemove
     |             |
     |              --0.53%--ExecClearTuple
     |
      --0.60%--ExecCopySlot


There are two problems at play, here. Firstly, the way it's coded now
the slots are pretty much re-created for each batch. So with 1M rows and
batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful,
and it shouldn't be too difficult to keep the slots across batches. (We
can't initialize all the slots in advance, because we don't know how
many will be needed, but we don't have to release them between batches.)

The other problem is that ResourceArrayAdd/Remove seem to behave a bit
poorly with very many elements - I'm not sure if it's O(N^2) or worse,
but growing the array and linear searches seem to be a bit expensive.

I'll take a look at fixing the first point, but I'm not entirely sure
how much will that improve the situation.


regards


[1]
https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Andres Freund
Date: 30 May 2021, 20:12:44
Subject: Re: O_DIRECT on macOS

From: Andres Freund
Date: 30 May 2021, 20:34:39
Subject: Re: GISTSTATE is too large

postgres_fdw batching vs. (re)creating the tuple slots - Mailing list pgsql-hackers

Previous

Next