Lua的函数调用和协程中,栈的变化情况
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
|
#include <stdio.h> #include <lua.h> #include <lauxlib.h> int luv_dumpstack(lua_State *L) { int i, top; printf ( "-------------------
" ); printf ( "dumpstack: L=%p
" , L); top = lua_gettop(L); printf ( "top: %d
" , top); for (i = 1; i <= top; ++i) { printf ( "[%d][%s]: %s
" , i, luaL_typename(L, i), luaL_tolstring(L, i, NULL)); lua_pop(L, 1); } printf ( "-------------------
" ); return top; } static int cont(lua_State *L) { printf ( "stack after yield
" ); luv_dumpstack(L); lua_pushinteger(L, 21); lua_pushinteger(L, 22); lua_pushinteger(L, 23); return lua_gettop(L); } static int test_yield(lua_State *L) { printf ( "stack before yield
" ); luv_dumpstack(L); lua_pushinteger(L, 1); lua_pushinteger(L, 2); lua_pushinteger(L, 3); lua_pushinteger(L, 4); lua_pushinteger(L, 5); return lua_yieldk(L, 2, 0, cont); } static int test_resume(lua_State *L) { lua_State *L1 = lua_newthread(L); lua_pushinteger(L1, 11); lua_pushinteger(L1, 12); lua_pushcfunction(L1, test_yield); lua_pushinteger(L1, 13); lua_pushinteger(L1, 14); lua_pushinteger(L1, 15); printf ( "stack before resume
" ); luv_dumpstack(L1); printf ( "resume: %d
" , lua_resume(L1, L, 3)); printf ( "stack after resume
" ); luv_dumpstack(L1); lua_pushinteger(L1, 24); lua_pushinteger(L1, 25); printf ( "stack before second resume
" ); luv_dumpstack(L1); /* XXX notice even we pass 2, all values in stack (4,5,24,25) * will passed to coroutine */ printf ( "resume: %d
" , lua_resume(L1, L, 2)); printf ( "stack after second resume
" ); luv_dumpstack(L1); return 0; } int main( void ) { lua_State *L = luaL_newstate(); lua_pushcfunction(L, test_resume); lua_call(L, 0, 0); lua_close(L); return 0; } /* cc: libs+='-llua52' */ |
在Lua里面声明小数组的最好方法是什么?
在很早以前,就看到“闭包比表要快”的这么一个言论。一直没有验证过,只是心里就这么觉得了。所以自己第一次写的Lua网游,大量利用了闭包,最后估计还是有很严重的内存问题……在我的对象模型里面,对象构造函数通常是这样的:
1
2
3
4
5
6
7
8
9
10
11
12
|
function Object() local t = {} local object_state1 local object_state2 -- init ... -- methods function t:foo(...) return ... end function t:bar(...) return ... end -- return object return t end |
这个设计可以保证,在访问对象的函数的时候,速度可以达到最快——因为没有元表查询。但是,这样做恰好就违背了这样的原则:“在设计的初期,不要过早地考虑优化”。是的,因为任何优化都是有代价的,这里的代价理所当然就是内存了。
另外,在Lua-5.1中,我后来自己测试的结果是,表貌似还是比闭包要快一点,关键点是,这样貌似占用内存还小很多,自从这么摆了一道以后,反正至少对于“看见大括号就有点恐惧内存分配”的心理上是好过多了。不过至少对于闭包的大小和速度什么的心里头反而就没底了。
今天突然想到了一个叫做lua-vararg的lua库。这个库可以对vararg进行包装,提供对用户来说“比较自然”的vararg的体验——说白了,这就是用vararg去实现了元组(tuple)嘛。这个库很早以前就知道了,我还自己亲自改过,但是性能到底怎么样呢?心里没底,所以决定今天写个伪测试来看看效果。
首先,我们看一下Lua源代码里面表和闭包的描述:
560
561
562
563
564
565
566
567
568
569
570
|
typedef struct Table { CommonHeader; lu_byte flags; /* 1<<p means tagmethod(p) is not present */ lu_byte lsizenode; /* log2 of size of `node' array */ struct Table *metatable; TValue *array; /* array part */ Node *node; Node *lastfree; /* any free position is before this position */ GCObject *gclist; int sizearray; /* size of `array' array */ } Table; |
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
|
#define ClosureHeader CommonHeader; lu_byte nupvalues; GCObject *gclist typedef struct CClosure { ClosureHeader; lua_CFunction f; TValue upvalue[1]; /* list of upvalues */ } CClosure; typedef struct LClosure { ClosureHeader; struct Proto *p; UpVal *upvals[1]; /* list of upvalues */ } LClosure; typedef union Closure { CClosure c; LClosure l; } Closure; |
OK,代码有了,那么第一个问题是:表和闭包分别占多大的大小呢?
额= =这个问题不好搞啊,看起来好复杂的样子,还得考虑对齐……额,不好搞的话,直接写个程序不就可以了吗?
1
2
3
4
5
6
7
8
9
|
#include "src/lobject.h" #include <stdio.h> int main( void ) { printf ( "size of Table: %d
" , sizeof (Table)); printf ( "size of Closure: %d
" , sizeof (Closure)); return 0; } |
恩,输出结果是32和24……(别打偶……)
好吧,闭包居然比表要小!恩,这是很正常的。而且,这里的表可是“裸表”哦,一点数据都没有的,而这个被声明的闭包是默认带上了一个upvalue的!
那么,对于有十个元素的表和闭包,大小又是怎么样的呢?注意到表保存数组元素用的是TValue指针,而闭包里面的upvalue(主要指C闭包)也同样是TValue,我们只需要分别给表和闭包的大小增加10个TValue的大小即可——额,闭包只需要加9,因为前面已经有个元素了,最后得到的大小是:112字节和96字节……恩~
好了,那么在大小方面,的确闭包(特指C闭包)是保存小数组的最好途径了,那么,除开大小,在速度上面,它们之间有什么变化吗?我们继续测试一下!首先声明一下,这里的测试对于真实的环境是没有意义的,通常来说,我们并不需要考虑像分配小数组之间的效率问题,这里的测试,只是在于检验“闭包比表快”这个说法罢了,这样的检验本身,可以当作是使用vararg模块的依据,对做出某些设计决策是有好处的。
我们准备测试四种情况下的速度情况
- 用C API创建一个表,向其中压入10个数字
- 用C API创建一个闭包,其中有10个upvalue
- 用Lua代码创建一个表,其中有10个数字
- 用Lua代码创建一个闭包,其中有10个upvalue
下面是测试结果:
1
2
3
4
5
6
7
8
9
10
|
test c table time: 2.35s c table memory: 5724 c closure time: 0.20s c closure memory: 6020 lua table time: 0.40s lua table memory: 5836 lua closure time: 1.51s lua closure memory: 4556 Hit any key to close this window... |
我们来分析一下这个数据,最简单的分析方法是排序,对于时间的顺序是c closure < lua table < lua closure < c table,对于内存剩余的顺序是lua closure < c table == lua table < c closure。
好奇怪的结果!!我们试着来分析一下吧!看起来内存里面有一项是相同的:即用c和用lua创建表,占用内存的大小是近似相同的。而用C创建闭包,占用内存最大,但是也大不了多少。这里面是什么情况呢?我们关闭一下垃圾回收试试?对了!原来是这样……关掉垃圾回收以后,内存的占用非常非常夸张,原来,这是垃圾回收以后的结果(顺便说一下,关闭垃圾回收,所有的测试都变慢了,不过相对结果不变)。
那么,内存的意义上就不大了。只能说是垃圾回收的策略不同罢了。我们来分析一下时间问题好了。
首先,这个数据实在是太奇怪了了:如果说C的API比Lua代码快,那么为什么c table是最慢的呢?如果说闭包比表快,那么为什么lua闭包不如lua的表呢?这是怎么回事呢?
很容易发现问题的是lua闭包,这是倒数第二慢的……因为Lua5.2出了一个新功能:在创建lua闭包的时候,会和之前缓存的进行比较,看看是新的闭包还是以前的,因为我们创建的闭包都是新的函数调用,显然和以前闭包的upvalue不可能一样,因此这个比较始终是会失败的……看来也只能是这个地方会占用时间了。当然了,关联upvalue、关闭upvalue等等都需要占据时间。lua闭包比较慢几乎是可以预见到的了。这一点我们不奇怪。
奇怪的是,为什么用C API写的新建表比lua代码还要慢呢?注意到,Lua代码是不需要再创建新的“整数”的,它们是直接从常量表(K表)载入的,所以pushnumber会拖慢一点速度,其次,lua_rawseti会比luaH_setint多做一些检查,除了这两点以外,还有一个很容易忽视的地方是API调用开销:因为Lua是在DLL里面的,因此API调用相对较慢,如果静态链接的话,对其他测试来说,结果影响不大,但是c table测试瞬间就快了20%——也许是因为每次推入新的包,c table测试比别的方法多了10个C API的原因吧……Lua里面没有一次给表设置多个值的方法,必须一个一个地设置,每次设置都必须压入新值,所以即使是相对较快的lua_rawseti,也架不住每次设置的两次API啊……当然,从这一点上也可以看出来,我们的测试是很不合理的,因为即使是API效率,都可以极大地影响测试结果。当然这里需要说明的是,即使是将API效率的原因去除,c table仍然是最慢的。
然后,c cloure和lua table就比较符合我们的期望了。看来的确是闭包会快一点啊……
这里只比较了创建的效率,对于取值的效率,大家可以自己去比较。但是从这里可以看出来,如果希望交换大量小数据块,那么closure的确是一个合理的选择:它体积较小,没有多余的功能(比如表支持的哈希查找或者增加大小或者独立的元表等等)。作为需要快速保存少量数据,lua-vararg的确是一个很好的选择——另外,尽量选择C实现的版本,如果无法使用C库,那就还是用表来实现吧……这样比较快……
最后,附上我自己写的vararg的C实现:https://gist.github.com/starwing/5893607
#define LUA_LIB #include <lua.h> #include <lauxlib.h> static lua_Integer posrelat(lua_Integer pos, size_t len) { if (pos >= 0) return pos; else if (0u - (size_t)pos > len) return 0; else return (lua_Integer)len + pos + 1; } static int tuple(lua_State *L) { int top, n = (int)lua_tointeger(L, lua_upvalueindex(1)); lua_Integer i, j; switch (lua_type(L, 1)) { case LUA_TNIL: /* as iterator */ i = lua_tointeger(L, 2) + 1; if (i <= 0 || i > n) return 0; lua_pushinteger(L, i); lua_pushvalue(L, lua_upvalueindex(i + 1)); return 2; case LUA_TSTRING: /* as length operator */ if (*lua_tostring(L, 1) == '#') { lua_pushinteger(L, n); return 1; } break; case LUA_TNONE: /* get all varargs */ luaL_checkstack(L, n, "too many values"); for (i = 1; i <= n; ++i) lua_pushvalue(L, lua_upvalueindex(i+1)); return n; case LUA_TNUMBER: /* get/set a range */ i = posrelat(luaL_checkinteger(L, 1), n); j = posrelat(luaL_optinteger(L, 2, i), n); if (i > j) return 0; n = (int)(j-i+1); luaL_checkstack(L, n, "too many values"); if ((top = lua_gettop(L)) <= 2) { /* get */ for (; i <= j; ++i) lua_pushvalue(L, lua_upvalueindex(i+1)); } else { int idx; lua_settop(L, top = n + 2); for (idx = 3; idx <= top; ++idx) { lua_pushvalue(L, idx); lua_replace(L, lua_upvalueindex(i+idx-2)); } } return n; } return luaL_argerror(L, 1, "invalid argument"); } static int Lpack(lua_State *L) { int n = lua_gettop(L); if (n >= 255) luaL_error(L, "too many values to pack"); lua_pushinteger(L, n); lua_insert(L, 1); lua_pushcclosure(L, tuple, n+1); return 1; } static int Lrange(lua_State *L) { int n = lua_gettop(L) - 2; lua_Integer i, j; if (n < 0) return 0; i = posrelat(luaL_checkinteger(L, 1), n); j = posrelat(luaL_checkinteger(L, 2), n); if (i > j || j == 0) return 0; if (j > n) luaL_checkstack(L, j-n, "range is too big"); lua_settop(L, j + 2); return j-i+1; } static int Linsert(lua_State *L) { int n = lua_gettop(L) - 2; lua_Integer i = posrelat(luaL_checkinteger(L, 2), n); if (i > n) { luaL_checkstack(L, i-n, "index is too big"); lua_settop(L, i + 1); lua_pushvalue(L, 1); return i; } lua_pushvalue(L, 1); lua_insert(L, i + 2); return n + 1; } static int Lremove(lua_State *L) { int n = lua_gettop(L) - 1; lua_Integer i = posrelat(luaL_checkinteger(L, 1), n); if (i <= n) { lua_remove(L, i + 1); --n; } return n; } static int Lreplace(lua_State *L) { int n = lua_gettop(L) - 2; lua_Integer i = posrelat(luaL_checkinteger(L, 2), n); if (i > n) { luaL_checkstack(L, i-n, "index is too big"); lua_settop(L, i + 1); lua_pushvalue(L, 1); return i; } lua_pushvalue(L, 1); lua_replace(L, i + 2); return n; } static int Lpush(lua_State *L) { lua_pushvalue(L, 1); return lua_gettop(L) - 1; } static int Lpop(lua_State *L) { lua_pop(L, 1); return lua_gettop(L); } static int Ltake(lua_State *L) { int n = lua_gettop(L) - 1; lua_Integer i = posrelat(luaL_checkinteger(L, 1), n); if (i > n) return 0; lua_pop(L, n-i); return i; } static int Ltail(lua_State *L) { int n = lua_gettop(L) - 1; lua_Integer i = posrelat(luaL_checkinteger(L, 1), n); if (i > n) return 0; return n-i+1; } static int Lshift(lua_State *L) { return lua_gettop(L) - 1; } static int Lmap(lua_State *L) { int i, n = lua_gettop(L); luaL_checkany(L, 1); for (i = 2; i <= n; ++i) { lua_pushvalue(L, 1); lua_pushvalue(L, i); lua_call(L, 1, 1); lua_replace(L, i); } return n-1; } static int Lfilter(lua_State *L) { int i, n = lua_gettop(L); luaL_checkany(L, 1); for (i = 2; i <= n; ++i) { lua_pushvalue(L, 1); lua_pushvalue(L, i); lua_call(L, 1, 1); if (!lua_toboolean(L, -1)) { lua_remove(L, i); --i, --n; } lua_pop(L, 1); } return n-1; } static int Lreduce(lua_State *L) { int i, n = lua_gettop(L); luaL_checkany(L, 1); if (n <= 3) { lua_call(L, n-1, 1); return 1; } lua_pushvalue(L, 1); lua_pushvalue(L, 2); lua_pushvalue(L, 3); lua_call(L, 2, 1); for (i = 4; i <= n; ++i) { lua_pushvalue(L, 1); lua_insert(L, -2); lua_pushvalue(L, i); lua_call(L, 2, 1); } return 1; } static int Lunpack(lua_State *L) { int i, n = lua_gettop(L); for (i = 1; i <= n; ++i) { lua_pushvalue(L, i); lua_call(L, 0, LUA_MULTRET); } return lua_gettop(L)-n; } static int Lrotate(lua_State *L) { int n = lua_gettop(L) - 1; lua_Integer i, c = luaL_checkinteger(L, 1) % n; #if LUA_VERSION_NUM >= 503 (void)i; /* unused */ lua_rotate(L, 2, (int)c); #else c = -c + ((c > 0) ? n+1 : 1); if (c > 1) luaL_checkstack(L, c-1, "too many values"); for (i = 2; i <= c; ++i) lua_pushvalue(L, i); #endif return n; } static int Lreverse(lua_State *L) { int i, j, n = lua_gettop(L); for (i = 1, j = n; i < j; ++i, --j) { lua_pushvalue(L, i); lua_pushvalue(L, j); lua_replace(L, i); lua_replace(L, j); } return n; } static int Lrep(lua_State *L) { int top = lua_gettop(L), n = top - 1; lua_Integer i, j, count = luaL_checkinteger(L, 1); if (count <= 0) return 0; if (n == 0) { luaL_checkstack(L, count, "too many values"); lua_settop(L, count+1); return count; } luaL_checkstack(L, n*(count-1), "too many values"); for (i = 1; i < count; ++i) for (j = 2; j <= top; ++j) lua_pushvalue(L, j); return n*count; } LUALIB_API int luaopen_vararg(lua_State *L) { luaL_Reg libs[] = { { "concat", Lunpack }, { "unpack", Lunpack }, { "filter", Lfilter }, { "insert", Linsert }, { "map", Lmap }, { "pack", Lpack }, { "pop", Lpop }, { "push", Lpush }, { "range", Lrange }, { "reduce", Lreduce }, { "remove", Lremove }, { "rep", Lrep }, { "replace", Lreplace }, { "reverse", Lreverse }, { "rotate", Lrotate }, { "shift", Lshift }, { "tail", Ltail }, { "take", Ltake }, { NULL, NULL } }; #if LUA_VERSION_NUM >= 502 luaL_newlib(L, libs); #else luaL_register(L, "vararg", libs); #endif return 1; } /* cc: flags+='-s -O3 -mdll -DLUA_BUILD_AS_DLL' libs+='-llua53' * cc: output='vararg.dll' */
local _G = require "_G" local assert = _G.assert local pcall = _G.pcall local print = _G.print local select = _G.select local type = _G.type local math = require "math" local ceil = math.ceil local huge = math.huge local min = math.min local table = require "table" local unpack = table.unpack or _G.unpack local vararg = require "vararg" local pack = vararg.pack local range = vararg.range local insert = vararg.insert local remove = vararg.remove local replace = vararg.replace local push = vararg.push local concat = vararg.concat local map = vararg.map local rotate = vararg.rotate -- auxiliary functions---------------------------------------------------------- local values = {} local maxstack for i = 1, huge do if not pcall(unpack, values, 1, 2^i) then local min, max = 2^(i-1), 2^i while min < max do local mid = ceil((min+max)/2) if pcall(unpack, values, 1, mid) then min = mid else max = mid-1 end end maxstack = max break end end for i = 1, maxstack, 2 do values[i] = i end local function tpack(...) return {..., n=select("#", ...)} end local function assertsame(v, i, j, ...) local count = select("#", ...) assert(count == j-i+1, count..","..i..","..j) for pos = 1, count do assert(v[i+pos-1] == select(pos, ...)) end end local function asserterror(expected, f, ...) local ok, actual = pcall(f, ...) assert(ok == false, "error was expected") assert(actual:find(expected, 1), "wrong error, got "..actual) end -- test 'pack' function -------------------------------------------------------- local function testpack(...) local v = {...} local n = select("#", ...) local p = pack(...) assertsame(v, 1, n, p()) assert(n == p("#")) for i,pv in p do assert(v[i] == pv) end for i = 1, n do assert(v[i] == p(i)) if n > 0 then assert(v[i] == p(i-n-1)) end end for i = 1, n, 10 do local j = i+9 assertsame(v, i, j, p(i, j)) if n > j then assertsame(v, i, j, p(i-n-1, j-n-1)) end end end testpack() testpack({},{},{}) testpack(nil) testpack(nil, nil) testpack(nil, 1, nil) testpack(unpack(values, 1, 254)) local ok, err = pcall(pack, unpack(values, 1, 255)) if ok then -- Lua version assert(type(err) == "function") else -- C version assert(ok == false and err == "too many values to pack") end -- test 'range' function ------------------------------------------------------- local function testrange(n, ...) local v = {...} for c = 1, 3 do for i = 1, n, c do local j = min(i+c-1, n) assertsame(v, i, j, range(i, j, ...)) local n = select("#", ...) if n > 0 then assertsame(v, i, j, range(i-n-1, j-n-1, ...)) end end end end local ok, err = pcall(range, 0, 0, ...) if ok then -- Lua version assert(err == nil) else -- C version assert(ok == false and err == "bad argument #1 to '?' (index out of bounds (0))") end testrange(10) testrange(10, 1,2,3,4,5,6,7,8,9,0) maxstack = 10000 -- use a smaller value testrange(maxstack, unpack(values, 1, maxstack)) -- test other functions -------------------------------------------------------- assertsame({1,2,3,4,5}, 1, 5, insert(3, 3, 1,2,4,5)) assertsame({1,2,3,4,5}, 1, 5, insert(4,-1, 1,2,3,5)) assertsame({1,2,nil,4}, 1, 4, insert(4, 4, 1,2)) assertsame({nil,nil,3}, 1, 3, insert(3, 3)) assertsame({1,2,3,4,5}, 1, 5, replace(3, 3, 1,2,0,4,5)) assertsame({1,2,3,4,5}, 1, 5, replace(5,-1, 1,2,3,4,0)) assertsame({1,2,nil,4}, 1, 4, replace(4, 4, 1,2)) assertsame({nil,nil,3}, 1, 3, replace(3, 3)) assertsame({1,2,3,4,5}, 1, 5, remove( 3, 1,2,0,3,4,5)) assertsame({1,2,3,4,5}, 1, 5, remove(-1, 1,2,3,4,5,0)) assertsame({1,2,nil,4}, 1, 4, remove( 4, 1,2,nil,0,4)) assertsame({nil,nil,3}, 1, 3, remove( 3, nil,nil,0,3)) assertsame({1,2,3,4,5}, 1, 5, remove(10, 1,2,3,4,5)) assertsame({1,2,3,4,5}, 1, 5, push(5, 1,2,3,4)) assertsame({1,2,nil,4}, 1, 4, push(4, 1,2,nil)) assertsame({nil,nil,3}, 1, 3, push(3, nil,nil)) assertsame({5,1,2,3,4}, 1, 5, rotate(1, 1,2,3,4,5)) assertsame({2,3,4,5,1}, 1, 5, rotate(-1, 1,2,3,4,5)) assertsame({5,1,2,3,4}, 1, 5, rotate(11, 1,2,3,4,5)) assertsame({2,3,4,5,1}, 1, 5, rotate(-11, 1,2,3,4,5)) assertsame({1,2,3,4,5,6,7,8,9}, 1, 9, concat(pack(1,2,3), pack(4,5,6), pack(7,8,9))) -- test function errors and expectional conditions --------------------------- assertsame({"1","2","3","4","5"}, 1, 5, map(tostring, 1,2,3,4,5)) assertsame({"1","2","nil","4" }, 1, 4, map(tostring, 1,2,nil,4)) assertsame({"nil","nil","3" }, 1, 3, map(tostring, nil,nil,3)) assertsame({"1","nil","nil" }, 1, 3, map(tostring, 1,nil,nil)) asserterror("bad argument #2 to '[^']*' %(number expected, got no value%)", insert) asserterror("bad argument #2 to '[^']*' %(number expected, got no value%)", insert, nil) asserterror("bad argument #2 to '[^']*' %(number expected, got nil%)", insert, nil, nil) --asserterror("bad argument #2 to '[^']*' %(index out of bounds %(0%)%)", insert, nil, 0) assert(select('#', insert(nil, 0)) == 1) asserterror("bad argument #2 to '[^']*' %(number expected, got no value%)", replace) asserterror("bad argument #2 to '[^']*' %(number expected, got no value%)", replace, nil) asserterror("bad argument #2 to '[^']*' %(number expected, got nil%)", replace, nil, nil) --asserterror("bad argument #2 to '[^']*' %(index out of bounds %(0%)%)", replace, nil, 0) assert(select('#', replace(nil, 0)) == 0) asserterror("bad argument #1 to '[^']*' %(number expected, got no value%)", remove) asserterror("bad argument #1 to '[^']*' %(number expected, got nil%)", remove, nil) --asserterror("bad argument #1 to '[^']*' %(index out of bounds %(0%)%)", remove, 0) assert(select('#', remove(0)) == 0) assertsame({}, 1, 0, push()) assertsame({nil}, 1, 1, push(nil)) assertsame({}, 1, 0, concat()) asserterror("attempt to call a nil value", concat, nil) asserterror("bad argument #1 to '[^']*' %(value expected%)", map) assertsame({}, 1, 0, map(nil)) asserterror("attempt to call a nil value", map, nil, nil)