Re: JIT compiling with LLVM v9.0 - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: JIT compiling with LLVM v9.0 |
Date | |
Msg-id | b84ae071-0931-e2eb-7207-f8d217246f98@postgrespro.ru Whole thread Raw |
In response to | JIT compiling with LLVM v9.0 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: JIT compiling with LLVM v9.0
Re: JIT compiling with LLVM v9.0 |
List | pgsql-hackers |
On 24.01.2018 10:20, Andres Freund wrote: > Hi, > > I've spent the last weeks working on my LLVM compilation patchset. In > the course of that I *heavily* revised it. While still a good bit away > from committable, it's IMO definitely not a prototype anymore. > > There's too many small changes, so I'm only going to list the major > things. A good bit of that is new. The actual LLVM IR emissions itself > hasn't changed that drastically. Since I've not described them in > detail before I'll describe from scratch in a few cases, even if things > haven't fully changed. > > > == JIT Interface == > > To avoid emitting code in very small increments (increases mmap/mremap > rw vs exec remapping, compile/optimization time), code generation > doesn't happen for every single expression individually, but in batches. > > The basic object to emit code via is a jit context created with: > extern LLVMJitContext *llvm_create_context(bool optimize); > which in case of expression is stored on-demand in the EState. For other > usecases that might not be the right location. > > To emit LLVM IR (ie. the portabe code that LLVM then optimizes and > generates native code for), one gets a module from that with: > extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context); > > to which "arbitrary" numbers of functions can be added. In case of > expression evaluation, we get the module once for every expression, and > emit one function for the expression itself, and one for every > applicable/referenced deform function. > > As explained above, we do not want to emit code immediately from within > ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed > expression sets the function to callback, which gets the actual native > function on the first actual call. That allows to batch together the > generation of all native functions that are defined before the first > expression is evaluated - in a lot of queries that'll be all. > > Said callback then calls > extern void *llvm_get_function(LLVMJitContext *context, const char *funcname); > which'll emit code for the "in progress" mutable module if necessary, > and then searches all generated functions for the name. The names are > created via > extern void *llvm_get_function(LLVMJitContext *context, const char *funcname); > currently "evalexpr" and deform" with a generation and counter suffix. > > Currently expression which do not have access to an EState, basically > all "parent" less expressions, aren't JIT compiled. That could be > changed, but I so far do not see a huge need. Hi, As far as I understand generation of native code is now always done for all supported expressions and individually by each backend. I wonder it will be useful to do more efforts to understand when compilation to native code should be done and when interpretation is better. For example many JIT-able languages like Lua are using traces, i.e. query is first interpreted and trace is generated. If the same trace is followed more than N times, then native code is generated for it. In context of DBMS executor it is obvious that only frequently executed or expensive queries have to be compiled. So we can use estimated plan cost and number of query executions as simple criteria for JIT-ing the query. May be compilation of simple queries (with small cost) should be done only for prepared statements... Another question is whether it is sensible to redundantly do expensive work (llvm compilation) in all backends. This question refers to shared prepared statement cache. But even without such cache, it seems to be possible to use for library name some signature of the compiled expression and allow to share this libraries between backends. So before starting code generation, ExecReadyCompiledExpr can first build signature and check if correspondent library is already present. Also it will be easier to control space used by compiled libraries in this case. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: