Deep analysis of GC optimization for each value type of XLua under Unity

Foreword

Unity 's C#GC Alloc (abbreviated as gc below ) is a big problem. After embedding a dynamic Lua , the interaction between them is easy to produce gc , and various Luaprograms also regard this as the focus of performance optimization. These optimizations are simply not complicated.

The culprit is here

Take a look at these two functions
1
2
3
4
5
6
7
8
9
int inc1(int i)
{
    return i + 1;
}
 
object inc2(object o)
{
    return (int)o + 1;
}

window measured under inc1 performance is inc2 of 20 times!
Why is the difference so big? The main reason is that in its parameters and return type, the inc2 parameter is an object type, meaning that a value type (such as an integer) needs to be boxing . The specific point is to apply a block of memory on the heap, copy the type information and values ​​into it, and use it. Need unboxing , that is, just copy the stack of memory to the stack, such as the completion of the function is executed, the heap memory is gc detected no reference, release the heap memory.
A 20- fold difference is a return of one parameter. With such parameters increasing, the difference is even greater. And what's worse is that: GC is more difficult to control, Unity 's mobile games project, GC is often the culprit of Caton.
All current lua programs for lua and c # inter interactive gc optimization, value or type of optimization, in fact, are doing one thing: avoid inc2 situation .

C# calls Lua to avoid inc2

Lua is a dynamically typed language, its function can accept any type, any number of parameters, the return value is any type, any number. If you want to access luafunctions with a common interface , the situation will be worse than inc2 : In order to support any number of arbitrary parameters, we may need to use variable parameters; in order to support any type of multi-return value, this interface may need to return an object Array, not an object . So we have two more arrays to allocate and release. The function prototype is roughly as follows:
Object [] Call( params object [] args)
For the above reasons, although most of the programs provide this method (because it is convenient), they are not recommended. Some programs provide GC-less usage, for example ulua if you want to avoid gc , you have to do this:
1
2
3
4
5
6
var func = lua.GetFunction("inc");  
func.BeginPCall();
func.Push(123456);
func.PCall();
int num = (int)func.CheckNumber();
func.EndPCall();
The idea is to expose Lua 's stack operation api , push the parameters one by one, and call one of the return values. The interfaces for pushing and returning values ​​are of a definite type, in other words the interface of inc1 .
The above is only a single parameter, single return value, in most cases the code will be more tedious.
And slua did not find the relevant program.

The core idea of xLua 's solution is: As long as you tell me what parameters to call, I will help you optimize.
1
2
3
4
[CSharpCallLua]
public delegate int Inc(int i);
Inc func= luaenv.Global.Get("inc");
int num =  func(123456);
1, according to your need to declare a delegate , labeled CSharpCallLua ;
2, the implementation of the generated code;
3. Use the Get interface of Table to map the inc function to the func delegate;
4. Next, you can use this delegate happily .
More complex parameters are the same as above: declare, get, use. There is only one more step than the Call interface with gc , which is as simple as using the Callinterface, and it is even simpler to handle return values, and it also brings the benefits of strong type checking.
What if the lua function has multiple return values?
Multiple return values ​​will be mapped to the return value of C# and the output parameters, mapped from left to right.
In addition, xLua also supports a lua table mapping to a C# interface . Access to the interface 's properties will access the corresponding field in the lua table . The member method call will call the corresponding function in the lua table . Similarly, no gc .
How is this done? In other words, it is not complicated. Lua functions map to c# delegate . xLua generates a code for the delegate that declares CSharpCallLua . For example , the generated code of Inc will be similar to this:
1
2
3
4
5
6
7
8
9
10
11
12
13
public int SystemInt32(int x)
{
    //...init
    LuaAPI.lua_getref(L, _Reference);
              
    LuaAPI.xlua_pushinteger(L, x);
    int __gen_error = LuaAPI.lua_pcall(L, 1, 1, err_func);
 
    //...error handle
    int __gen_ret = LuaAPI.xlua_tointeger(L, err_func + 1);
    LuaAPI.lua_settop(L, err_func - 1);
    return  __gen_ret;
}

The delegate returned by the Get method will point to this method. From the code point of view, and ulua no gc code is similar, the difference is that someone's home was handwritten, and because xLua layer of less packaging, direct call Lua 's API , these should also be more efficient.

Complex value type optimization

Sent from the complex object transfer from C# to lua
Lua virtual machine, for .net is unmanaged code, to pass objects in the past, to solve a few problems:
1, lua use this object, the object can not be gc ;
2. If the unmanaged code ( lua ) calls back to the managed code ( c# ), when the reference to the object is returned, the corresponding object should be found correctly;
3. Repeatedly passing an object. The reference in the unmanaged code indicates that the object is consistent;
Problem 1 and problem 2 official to scheme pined objects, found pined a performance object and release substantially and Dictionary of Set / Get considerable, and the problem 1 and problem 2 can be optimized for an array operation, performance can be compared Pined higher Scheme 4 ~ 5 -fold: to accept an object to find an empty position put on a list, returns the index of the array as an object reference. By arranging the empty locations via the linked list, the empty location search can be optimized to O ( 1 ) operations, and the search for objects by reference is of course O ( 1 ).
There is no good solution to problem 3 , use the Dictionary to create an object-to-reference index.

Complex value type dilemma
C # everything is an object, naturally including value types, can also follow the above program, this function is no problem, performance has encountered Waterloo:
Every time the value type is put into the object pool (referring to the set of mechanisms mentioned in the previous section to solve the 3 problems) will encounter inc2 situation, boxing into a new object, and into A series of operations of the pool. Some people will ask whether the use of pinned solutions will not have this problem, in fact the same, the value type is on the stack, and after the pinned from the stack to the heap, the stack will still have a similar process: allocate heap memory , copy, run out of release.
This problem has a wider impact than the previous problem, as long as C# passes a complex value type to lua . For example, an ordinary Vector3 quadratic operation will generate a large amount of gc .
The ulua and slua ideas are the same, and hard code optimized for a specific number of U3D value types ( Vector2 , Vector3 , Vector4 , Quaternion ), taking Vector3as an example:
1. Reimplement all methods of Vector3 using lua ;
2, C # of Vector3 incoming lua : the first is lua build a side luatable , to be passed Vector3 of X , Y , Z is set to the corresponding field; provided the table is metatableis a method implemented;
. 3, Lua return Vector3 to C # : C # build a Vector3 , remove the corresponding tablein the X , Y , Z field assigned to Vector3 ;

xLua Complex Value Type Optimization
The above optimization has some problems: It is very difficult to add a new value type, so the value type fingers supported by this solution can be counted up so far, and user-defined structs are less likely to be supported. Deep coupling of these core types of code is also unreasonable. There is a more serious problem: xLua authors are more resistant to hard coding this behavior.
Let's think about it. What is the essence of ulua and slua to avoid gc ? There is also a simple value type passed from C # to lua did not produce gc , what is the reason?
The answer is: value copy !
ulua and slua complex value type optimization, from C # transmitted to luaessentially the Vector3 is copied into a lua Table , avoiding into the pool thus avoiding the INC2 ; simple value type is, a c # of int incoming lua , is directly The int value is copied to the lua stack.
Understand this idea to open up a lot, xLua designed a new set of value types, as long as a struct contains only value types, nestable struct , of course, requires the nested struct also contains only the value type, this method is Be applicable.
The principle is not complicated:
1, generated struct copying of values of codes for the struct copy fields to an inside unmanaged memory ( Pack function), and from unmanaged memory copy of the value of the fields ( UNPACK function);
2, c # pass struct to lua : calling lua the API , an application UserData (for c # is unmanaged code), call Pack the struct package it;
3, lua pass back to c # : call UnPack solution struct ;
4. The struct method still follows the original implementation of c# .
To put it bluntly, it is similar to pb , serializing the data structure of c# to a block of memory and deserializing it from memory.
Let me first talk about the shortcomings of this program:
The disadvantage stems from the fact that the scheme calls the struct method or calls the original C# implementation. From lua via C language and then via pinvoke to C# , the cost of this adaptation is far greater than the overhead of some simple method execution. Of course, xLua just default to call C # implementation, nor is it necessary, xLua provided without C # , in C directly read changes struct field API , more diligent shoes use this API , you can try to require high-performance local With Luaimplementation, this avoids the adaptation cost between lua and C# .
PS : The very popular lua program performance use case on the Internet , using Vector3.Normalize to test the performance of lua call c# static function, and even Unityofficial evaluation uses this use case. From the previous analysis, it can be known that this is not correct, and the Vector3.Normalize of these tested solutions only runs in Lua, and has not tested “ lua calls c# static functions”.
The advantages of this program:
1, support a wide range of struct types, the user to do is also very simple, statement to generate code ( GCOptimize ), the reason to declare, mainly to avoid generating too much code;
2, compared to the table program memory and more economical, but struct size plus a head portion, and 64 Bits in an empty table to 80 bytes + , actual test Vector3 of userdata memory footprint embodiment is table scheme third;

Other value types GC optimization

Most of the optimizations below are valid only for xLua . You can see the usage in their 05_NoGc example. After generating the code, run it in profiler to see your effect.
1, enumeration type transfer without GC ;
2, decimal does not lose accuracy and no GC ;
3, all non- GC type, its array access without GC , this looks like most programs do;
4, can be GCOptimize optimized struct , in Lua directly pass a structure corresponding to Table , without the GC ;
5. LuaTable provides a series of generalized Get/Set interfaces that can pass value types without GC ;
6, an interface added to CSharpCallLua , you can use the table to achieve this interface , through this interface to access the table without GC ;
These optimizations are in line with the two major ideas introduced above. They can be seen through source code. This is not analyzed.

Next notice

Next time, we will introduce the implementation of xLua's most difficult feature.

Comments

Popular posts from this blog

Unity UI Process-Oriented Programming Template for Lua

[Unity XLua] hot update XLua entry (b): basic articles