Optimizing Erlualib Calls
The fewer port calls, the faster the code.
This article is about optimizing calls from Erlang to Lua when using the embedded Lua driver library that we are hosting at github as Erlualib. To gain optimal speed, you want to program directly against the Lua C API. If you know that your calls to Lua from Erlang will be very frequent, you should consider optimizing them in this way.
Erlualib can be used as starting point to do just that. You have to touch a handful of files to implement one new function on the deepest level, for the minimal number of port calls. As we will see, there is no real in-between. You either implement on the Erlang side or in C. Neither is really difficult.
It is easier, faster to program and more secure to use the low level functions exposed of Erlualib, which mimick the Lua C API functions on the Erlang side. But it comes at the price of a certain overhead of one port call per each function call. Even if what you intend to do clearly implies only one cross over to Lua and back.
This is because every stack manipulation using those functions amounts to a message sent and received between two Erlang processes and some additional overhead due to the port call. (Note that ‘port‘ in this context has nothing to do with sockets.)
So let’s look at implementing more on the C side and writing most of the desired functionality in pure C, directly against the Lua C API. And how that is tied into the Erlang side for convenient use. It is more work but if you can foresee that the number of different functions that you will want to call is small, it should be worth the effort.
To illustrate what you want to happen and how you get there, the following examples all demonstrate different ways of calling Lua print(). They are progressing from port-intensive step-by-step execution to an implementation entirely in C, requiring only one port call.
What has to be done to trigger a Lua print is simply three steps: put the ‘print function pointer’ on the stack, put a value to be printed on the stack, execute a call using the stack top as parameters.
Variant 1: Step By Step From the Erlang Shell
$ erl -pa ./ebin
1> {ok, L} = lua:new_state().
2> lua:getfield(L, global, "print").
3> lua:pushstring(L, "Hello from Lua!").
4> lua:call(L, 1, 0).
This uses the low level functions as exposed by the Lua C API, mimicked by the low level functions in this package.
The last call prints this into the shell:
Hello from Lua!
Variant 2: Step By Step From Code
The example in samples/hello/lua_sample.erl is easier to call than the above but under the hood amounts to the same: three port calls shuttle back and forth between the Lua state engine and Erlang. Three times a message is dispatched and received between the main processes and the port process. Those are, of course, Erlang processes, NOT system processes and they do not switch system context. But even though Erlang processes switch very fast and Erlang messages are sent and received fast, they are not as fast as Erlang or C function calls.
The functions process() and receive_return() in c_src/commands.c and src/lua.erl are likewise called three times. Oviously, only one time should be necessary.
The sample should have compiled when you built Erlualib. You could execute it like so:
cd samples/hello
erl -pa ../../ebin -noinput -noshell -run lua_sample hello -s init stop
The source executed is the same as in the shell:
hello() ->
{ok, L} = lua:new_state(), % get the Lua engine
lua:getfield(L, global, "print"), % put "print" global on top of stack
lua:pushstring(L, "Hello from Lua!"), % put hello on top
lua:call(L, 1, 0). % execute on top 2 values on stack
It should also give you:
Hello from Lua!
Variant 3: Yet More Hidden – Still the Same
We can also use lua:port_print(), like so:
$ erl -pa ./ebin
1> {ok, L} = lua:new_state().
2> lua:port_print(L, "Hello Moon!").
This makes things easier and safer as it hides the lower lever calls inside the function port_print().
But again, the way this is implemented results into making three port calls.
See src/lua.erl:
-export([port_print/2]).
...
port_print(L, String) ->
lua:getglobal(L, "print"), % put "print" global on stack
lua:pushstring(L, String), % put text on top
lua:call(L, 1, 0). % execute using stack top 2
Variant 4: Optimized In C Against the Lua C API
The only true optimization is an implementation as seen with lua:c_print(), which in src/lua.erl looks like this:
c_print_variable(L, Name) ->
command(L, {?ERL_LUAC_PRINT_VARIABLE, Name}),
receive_return(L).
It is obvious from the call that this yields the difference we wanted, of making only one port call, resulting in only one send and one receive.
The implementation of the actual print is on the C side, directly against the Lua C API, and not, as all previous samples, against the Erlang functions that mimick them.
The actual implementation for this function now is in ‘c_src/commands.c`. It is implemented in C, which is the price we knew we’d had to pay. The slower versions above of course all got away without touching any C. But then again, have a look it turns out to be not much different from the Erlang side implementation, as the names of the functions as the same.
c_src/commands.c:
void
erl_lua_high_print(lua_drv_t *driver_data, char *buf, int index)
{
lua_State *L = driver_data->L;
char *str = decode_string(buf, &index);
lua_getfield(L, LUA_GLOBALSINDEX, "print"); /* function to call */
lua_pushstring(L, str); /* push text to print on stack */
lua_call(L, 1, 0); /* call 'print' w/ 1 arguments, 0 result */
reply_ok(driver_data); /* Send 'ok' back to Erlang */
free(str);
}
But not only is the not entirely transparent C source as seen above needed, with its additional burden of possible C errors; you also need to extend five other files to make the above C function accessible from Erlang:
1. src/lua.erl
-export([c_print/2]).
...
c_print(L, String) ->
command(L, {?ERL_LUAC_PRINT, String}),
receive_return(L).
2. include/lua_api.hrl
-define(ERL_LUAC_PRINT, 200).
3. c_src/commands.h
#define ERL_LUAC_PRINT 200
4. c_src/erlua.h
void erl_luac_print (lua_drv_t *driver_data, char *buf, int index);
5. c_src/erlua.c
case ERL_LUAC_PRINT:
erl_luac_print(driver_data, buf, index);
break;
6. c_src/commands.c
as shown above.
These are in order:
- The Erlang side exposed function that maps the functionality to a constant
- The Erlang constant definition that identifies this functionality cross-language
- The C constant definition that identifies this functionality cross-language
- The C header declaration for that function
- The switch that maps the C constant on the right C function
- The implementation of the C function
It’s not a lot that has to be added to each file, and it makes perfect sense once you get the principle.
Conclusion
It’s a bit of an effort to implement a Lua call efficiently but also quite straight forward. You will change half a score of files of Erlualib in the process — but consider it made for this. Once implemented, your result should be more stable and faster than a ’step-by-step’ call. Still, it will be worth your time only if you know that your calls to Lua will be quite frequent. Otherwise, use the low level functions that mimick the Lua C API on the Erlang side and that come ready made with Erlualib. After all, neither Erlang nor Lua are about brute force performance.
Credit: Ray Morgan for the original erl-lua.
[...] This post was mentioned on Twitter by grantmichaels and benoƮt chesneau, vsovietov. vsovietov said: Eonblast | Optimizing Erlualib Calls http://j.mp/ecBF2C [...]
Correct me if I’m wrong, but at the lowest level, don’t port calls actually involve a socket? I was under the impression that that is how they are implemented inside the Erlang VM – a fork() of some sort and a bunch of socket i/o or a pipe. I am probably totally confused though.