Thread: Use static inline functions for Float <-> Datum conversions
Hi, Now that we are OK with static inline functions, we can save some cycles from floating-point functions, by turning Float4GetDatum, Float8GetDatum, and DatumGetFloat8 into static inlines. They are only a few instructions, but couldn't be implemented as macros before, because they need a local union-variable for the conversion. That can add up to significant speedups with float-heavy queries. For example: create table floats as select g::float8 as a, g::float8 as b, g::float8 as c from generate_series(1, 1000000) g; select sum(a+b+c+1) from floats; The sum query is about 4% faster on my laptop with this patch. - Heikki
Attachment
Heikki Linnakangas <hlinnaka@iki.fi> writes: > Now that we are OK with static inline functions, we can save some cycles > from floating-point functions, by turning Float4GetDatum, > Float8GetDatum, and DatumGetFloat8 into static inlines. Looks good to me. I wonder whether there is a compiler-dependent way of avoiding the union trick ... or maybe gcc is already smart enough that it doesn't matter? regards, tom lane
On 08/31/2016 02:38 PM, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> Now that we are OK with static inline functions, we can save some cycles >> from floating-point functions, by turning Float4GetDatum, >> Float8GetDatum, and DatumGetFloat8 into static inlines. > > Looks good to me. Ok, will push. > I wonder whether there is a compiler-dependent way of avoiding the union > trick ... or maybe gcc is already smart enough that it doesn't matter? It seems to compile into a single instruction, so it can't get any better from a performance point of view. float8pl: .LFB79:.loc 1 871 0.cfi_startproc .LVL297: .LBB959: .LBB960:.loc 2 733 0movsd 40(%rdi), %xmm2 .LBE960: .LBE959: .LBB961: .LBB962:movsd 32(%rdi), %xmm1 ... A union is probably what language pedantics would prefer anyway, and anything else would be more of a trick. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 08/31/2016 02:38 PM, Tom Lane wrote: >> I wonder whether there is a compiler-dependent way of avoiding the union >> trick ... or maybe gcc is already smart enough that it doesn't matter? > It seems to compile into a single instruction, so it can't get any > better from a performance point of view. Yeah, confirmed here. On my not-real-new gcc (version 4.4.7, which ships with RHEL6), these test functions: Datum compare_int8(PG_FUNCTION_ARGS) {int64 x = PG_GETARG_INT64(0);int64 y = PG_GETARG_INT64(1); PG_RETURN_BOOL(x < y); } Datum compare_float8(PG_FUNCTION_ARGS) {double x = PG_GETARG_FLOAT8(0);double y = PG_GETARG_FLOAT8(1); PG_RETURN_BOOL(x < y); } compile into this (at -O2): compare_int8:.cfi_startprocmovq 40(%rdi), %raxcmpq %rax, 32(%rdi)setl %almovzbl %al, %eaxret.cfi_endproc compare_float8:.cfi_startprocmovsd 40(%rdi), %xmm0xorl %eax, %eaxucomisd 32(%rdi), %xmm0seta %alret.cfi_endproc (Not sure why the compiler does the widening of the comparison result differently, but it doesn't look like it matters.) Before this patch, that looked like: compare_float8:.cfi_startprocpushq %rbx.cfi_def_cfa_offset 16.cfi_offset 3, -16movq %rdi, %rbxsubq $16, %rsp.cfi_def_cfa_offset32movq 32(%rdi), %rdicall DatumGetFloat8movq 40(%rbx), %rdimovsd %xmm0, 8(%rsp)call DatumGetFloat8xorl %eax, %eaxucomisd 8(%rsp), %xmm0seta %aladdq $16, %rsp.cfi_def_cfa_offset 16popq %rbx.cfi_def_cfa_offset8ret.cfi_endproc Nice. regards, tom lane