Thread: Variable-length FunctionCallInfoData

Variable-length FunctionCallInfoData

From

Andres Freund

Date:

05 June 2018, 20:29:52

Hi,

While prototyping codegen improvements for JITed expression evaluation,
I once more hit the issue that the FunctionCallInfoData structs are
really large (936 bytes), despite arguments beyond the fourth barely
every being used.  I think we should fix that.

What I think we should do is convert
FunctionCallInfoData->{arg,argisnull} into an array of NullableDatum
(new type, a struct of Datum and bool), and then use a variable length
array for the arguments.  In the super common case of 2 arguments that
reduces the size of the array from 936 to 64 bytes.  Besides the size
reduction this also noticably reduces the number of cachelines accessed
- before it's absolutely guaranteed that the arg and argnull arrays for
the same argument aren't on the same cacheline, after it's almost
guaranteed to be the case.

Attached is a *PROTOTYPE* patch doing so.  Note I was too lazy to fully
fix up the jit code, I didn't want to do the legwork before we've some
agreement on this.  We also can get rid of FUNC_MAX_ARGS after this, but
there's surrounding code that still relies on it.

There's some added uglyness, which I hope we can polish a bit
further. Right now we allocate a good number of FunctionCallInfoData
struct on the stack - which doesn't quite work afterwards anymore.  So
the stack allocations, for the majoroity cases where the argument number
is known, currently looks like:

    union {
        FunctionCallInfoData fcinfo;
        char *fcinfo_data[SizeForFunctionCallInfoData(0)];
    } fcinfodata;
    FunctionCallInfo fcinfo = &fcinfodata.fcinfo;

that's not pretty, but also not that bad.

It's a bit unfortunate that this'll break some extensions, but I don't
really see a way around that.  The current approach, to me, clearly
doesn't have a future.  I wonder if we should add a bunch of accessor
macros / inline functions that we (or extension authors) can backpatch
to reduce the pain of maintaining different code paths.

Besides the change here, I think we should also go much further with the
conversion to NullableDatum.  There's two main areas of change: I want
to move the execExpr.c related code so steps return data into
NullableDatums - that removes a good chunk of pointer dereferences and
allocations. Secondly I think we should move TupleTableSlot to this
format - the issue with nulls / datums being on separate cachelines is
noticeable in profiles, but more importantly the code looks more
consistent with it.


As an example for the difference in memory usage, here's the memory
consumption at ExecutorRun time, for TPCH's Q01:

master:
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
  PortalContext: 1024 total in 1 blocks; 576 free (0 chunks); 448 used:
    ExecutorState: 90744 total in 5 blocks; 31568 free (2 chunks); 59176 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 3488 free (0 chunks); 4704 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
Grand total: 149112 bytes in 13 blocks; 82976 free (2 chunks); 66136 used

patch:
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
  PortalContext: 1024 total in 1 blocks; 576 free (0 chunks); 448 used:
    ExecutorState: 65536 total in 4 blocks; 33536 free (6 chunks); 32000 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 5408 free (0 chunks); 2784 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
Grand total: 123904 bytes in 12 blocks; 86864 free (6 chunks); 37040 used

As you can see, the ExecutorState context uses nearly half the amount of
memory as before. In a lot of cases a good chunk of the benefit is going
to be hidden due to memory context sizing, but I'd expect that to matter
much less for more complex statements and plpgsql functions etc.

Comments?

Greetings,

Andres Freund

Attachment

v1-0001-Variable-length-FunctionCallInfoData.patch

RE: Variable-length FunctionCallInfoData

From

serge@rielau.com

Date:

05 June 2018, 20:40:22

Big +1 on this one.

Here is what we did. It's very crude, but minimized the amount of pain:

It helps that the C compiler treats arrays and pointers the same.

I can dig for the complete patch if you want...

Cheers
Serge

/*
* This struct is the data actually passed to an fmgr-called function.
* There are three flavors:
* FunctionCallInfoData:
* Used when the number of arguments is both known and fixed small
* This structure is used for direct function calls involving
* builtin functions
* This structure must be initialized with: InitFunctionCallInfoData()
* FunctionCallInfoDataVariable:
* Used when the number of arguments is unknown and possibly large
* This structure must be allocated with allocFCInfoVar() and initialized with
* InitFunctionCallInfoData().
* FunctionCallInfoDataLarge:
* Used when the number of arguments is unknown, possibly large and
* the struct is embedded somewhere where a variable length is not acceptable
* This structure must be initialized with: InitFunctionCallInfoData()
*
* All structures have the same header and the arg/argnull fields shoule not be
* accessed directly but via the below accessor macros.
*/

typedef struct FunctionCallInfoData
{
  FmgrInfo *flinfo; /* ptr to lookup info used for this call */
  fmNodePtr context; /* pass info about context of call */
  fmNodePtr resultinfo; /* pass or return extra info about result */
  Oid fncollation; /* collation for function to use */
  bool isnull; /* function must set true if result is NULL */
  bool isFixed; /* Must be true */
  short nargs; /* # arguments actually passed */
  Datum *arg; /* pointer to function arg array */
  bool *argnull; /* pointer to null indicator array */
  Datum __arg[FUNC_MAX_ARGS_FIX]; /* Arguments passed to function */
  bool __argnull[FUNC_MAX_ARGS_FIX]; /* T if arg[i] is actually NULL */
} FunctionCallInfoData;

typedef struct FunctionCallInfoDataVariable
{

  FmgrInfo *flinfo; /* ptr to lookup info used for this call */
  fmNodePtr context; /* pass info about context of call */
  fmNodePtr resultinfo; /* pass or return extra info about result */
  Oid fncollation; /* collation for function to use */
  bool isnull; /* function must set true if result is NULL */
  bool isFixed; /* Must be false */
  short nargs; /* # arguments actually passed */
  Datum *arg; /* pointer to function arg array */
  bool *argnull; /* pointer to null indicator array */
} FunctionCallInfoDataVariable;

typedef struct FunctionCallInfoDataLarge
{
  FmgrInfo *flinfo; /* ptr to lookup info used for this call */
  fmNodePtr context; /* pass info about context of call */
  fmNodePtr resultinfo; /* pass or return extra info about result */
  Oid fncollation; /* collation for function to use */
  bool isnull; /* function must set true if result is NULL */
  bool isFixed; /* Must be false */
  short nargs; /* # arguments actually passed */
  Datum *arg; /* pointer to function arg array */
  bool *argnull; /* pointer to null indicator array */
  Datum __arg[FUNC_MAX_ARGS]; /* Arguments passed to function */
  bool __argnull[FUNC_MAX_ARGS]; /* T if arg[i] is actually NULL */
} FunctionCallInfoDataLarge;