Chapter 10. Tuning Practice - Script (C#)

This chapter mainly introduces performance tuning practices for C# code with examples.Basic C# notation is not covered here, but rather the design and implementation that you should beaware of when developing games that require performance.

10.1. GC.Alloc cases and how to deal with them

As introduced in "2.5.2 Garbage Collection",in this section, let's first understand what kind of specific processing causes GC.Alloc.

10.1.1. New of reference type

First of all, this is a very simple case in which GC.Alloc occurs.

List 10.1: Code that GC.Alloc every frame

 1: private void Update()
 2: {
 3:     const int listCapacity = 100;
 4:     //  GC.Alloc in new of List<int>.
 5:     var list = new List<int>(listCapacity);
 6:     for (var index = 0; index < listCapacity; index++)
 7:     {
 8:         //  Pack index into list, though it doesn't make any sense in particular
 9:         list.Add(index);
10:     }
11:     //  Randomly take a value from the list
12:     var random = UnityEngine.Random.Range(0, listCapacity);
13:     var randomValue = list[random];
14:     //  ... Do something with the random value ...
15: }

The major problem with this code is thatList<int> is new in the Update method that is executed every frame.

To fix this, it is possible to avoidGC.Alloc every frame by pre-generating List<int> and using it around.

List 10.2: Code that eliminates GC.Alloc in every frame

 1: private static readonly int listCapacity = 100;
 2: //  Generate a List in advance
 3: private readonly List<int> _list = new List<int>(listCapacity);
 4: 
 5: private void Update()
 6: {
 7:     _list.Clear();
 8:     for (var index = 0; index < listCapacity; index++)
 9:     {
10:         //  Pack indexes into the list, though it doesn't make sense to do so
11:         _list.Add(index);
12:     }
13:     //  Randomly take a value from the list
14:     var random = UnityEngine.Random.Range(0, listCapacity);
15:     var randomValue = _list[random];
16:     //  ... Do something with the random values ...
17: }

I don't think you will ever write meaningless code like the sample code here, butsimilar examples can be found in more cases than you might imagine.

If you lose GC.Alloc.

As you may have noticed, the sample code from List 10.2 above is all you need to do.

List 10.3:

 1: var randomValue = UnityEngine.Random.Range(0, listCapacity);
 2: //  ... Do something from a random value ...

While it is important to think about eliminating GC.Alloc in performance tuning,always thinking about eliminating pointless calculations is a step toward speeding up the process.

10.1.2. Lambda Expressions

Lambda expressions are also a useful feature, but their use is limited ingames because they too can cause GC.Alloc depending on how they are used.Here we assume that the following code is defined.

List 10.4: Assumed code for the lambda expression sample

 1: //  Member Variables
 2: private int _memberCount = 0;
 3: 
 4: //  static variables
 5: private static int _staticCount = 0;
 6: 
 7: //  member method
 8: private void IncrementMemberCount()
 9: {
10:     _memberCount++;
11: }
12: 
13: //  static method
14: private static void IncrementStaticCount()
15: {
16:     _staticCount++;
17: }
18: 
19: //  Member method that only invokes the received Action
20: private void InvokeActionMethod(System.Action action)
21: {
22:     action.Invoke();
23: }

At this time, if a variable is referenced in a lambda expression as follows, GC.Alloc will occur.

List 10.5: Case of GC.Alloc by referencing a variable in a lambda expression

 1: //  When a member variable is referenced, Delegate Allocation occurs
 2: InvokeActionMethod(() => { _memberCount++; });
 3: 
 4: //  When a local variable is referenced, Closure Allocation occurs
 5: int count = 0;
 6: //  The same Delegate Allocation as above also occurs
 7: InvokeActionMethod(() => { count++; });

However, these GC.Alloc can be avoided by referencing static variables as follows.

List 10.6: Cases where a static variable is referenced in a lambda expression and GC.Alloc is not performed

 1: //  When a static variable is referenced, GC Alloc does not occur and
 2: InvokeActionMethod(() => { _staticCount++; });

GC.Alloc is also performed differently for method references in lambda expressions, depending on how they are written.

List 10.7: Cases of GC.Alloc when a method is referenced in a lambda expression

 1: //  When a member method is referenced, Delegate Allocation occurs.
 2: InvokeActionMethod(() => { IncrementMemberCount(); });
 3: 
 4: //  If a member method is directly specified, Delegate Allocation occurs.
 5: InvokeActionMethod(IncrementMemberCount);
 6: 
 7: //  When a static method is directly specified, Delegate Allocation occurs
 8: InvokeActionMethod(IncrementStaticCount);

To avoid these cases, it is necessary to reference static methods in a statement format as follows.

List 10.8: Cases where a method is referenced in a lambda expression and GC.Alloc is not performed

 1: //  Non Alloc when a static method is referenced in a lambda expression
 2: InvokeActionMethod(() => { IncrementStaticCount(); });

In this way, the Action is new only the first time, but it is cached internally to avoid GC.Alloc from the second time onward.

However, making all variables and methods static is not veryadoptable in terms of code safety and readability. In code that needs to be fast, it is safer to design without using lambda expressions for events that fire atevery frame or at indefinite times, rather than to use a lot of statics to eliminate GC.Alloc.

10.1.3. Cases where generics are used and boxed

In the following cases where generics are used, what could cause boxing?

List 10.9: Example of possible boxed cases using generics

 1: public readonly struct GenericStruct<T> : IEquatable<T>
 2: {
 3:     private readonly T _value;
 4: 
 5:     public GenericStruct(T value)
 6:     {
 7:         _value = value;
 8:     }
 9: 
10:     public bool Equals(T other)
11:     {
12:         var result = _value.Equals(other);
13:         return result;
14:     }
15: }

In this case, the programmer implemented the IEquatable<T> interface to GenericStruct, but forgot to place restrictions onT . As a result, a type that does not implement the IEquatable<T> interface can be specified forT , and there is a case where the following Equals is used by implicitly casting to the Object type.

List 10.10: Object.cs

 1: public virtual bool Equals(object obj);

For example, if struct, which does not implement the IEquatable<T> interface, is specified to T, it will be cast toobject with the argument Equals, resulting in boxing. To prevent this from happening in advance, changethe following

List 10.11: Example with restrictions to prevent boxing

 1: public readonly struct GenericOnlyStruct<T> : IEquatable<T>
 2:     where T : IEquatable<T>
 3: {
 4:     private readonly T _value;
 5: 
 6:     public GenericOnlyStruct(T value)
 7:     {
 8:         _value = value;
 9:     }
10: 
11:     public bool Equals(T other)
12:     {
13:         var result = _value.Equals(other);
14:         return result;
15:     }
16: }

By using the where clause (generic type constraint) to restrict the types that T can accept to those that implement IEquatable<T>,such unexpected boxing can be prevented.

Never lose sight of the original purpose

As introduced in "2.5.2 Garbage Collection", there are many cases where thestructure is chosen because the intention is to avoid GC.Alloc during runtime in games. However, it is not always possible to speed up the process by making everything a structure in order to reduce GC.Alloc.

One of the most common failures is that when structs are used to avoid GC.Alloc, the cost related to GC is reduced as expected, but thedata size is so large that copying the value type becomes expensive, resulting in inefficient processing.

To avoid this, there are also methods that reduce copying costs by using pass-by-reference for method arguments.Although this may result in a speed-up, in this case, you should consider selecting a class from the beginning and implementing it in such a way that instances are pre-generated and used around.Remember that the ultimate goal is not to eradicate GC.Alloc, but to reduce the processing time per frame.

10.2. About for/foreach

As introduced in "2.6 Algorithms and computational complexity", loops become time-consuming depending on the number of data.Also, loops, which at first glance appear to be the same process, can vary in efficiency depending on how the code is written.

Let's take a look at the results of decompiling the code from IL to C# using SharpLab *1, using foreach/forList and just getting the contents of the array one by one.

First, let's look at the loop around foreach. List I've omitted adding values to the

List 10.12: Example of looping through a List with foreach

 1: var list = new List<int>(128);
 2: foreach (var val in list)
 3: {
 4: }

List 10.13: Decompilation result of the example of looping through a List with foreach

 1: List<int>.Enumerator enumerator = new List<int>(128).GetEnumerator();
 2: try
 3: {
 4:     while (enumerator.MoveNext())
 5:     {
 6:         int current = enumerator.Current;
 7:     }
 8: }
 9: finally
10: {
11:     ((IDisposable)enumerator).Dispose();
12: }

In the case of turning with foreach, you can see that the implementation is to get the enumerator, move on with MoveNext(), and refer to the value with Current.Furthermore, looking at the implementation ofMoveNext() in list.cs *2, it appears that the number of various property accesses, such as size checks, are increased, and thatprocessing is more frequent than direct access by the indexer.

[*2] https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs

Next, let's look at when we turn in for.

List 10.14: Example of turning a List with for

 1: var list = new List<int>(128);
 2: for (var i = 0; i < list.Count; i++)
 3: {
 4:     var val = list[i];
 5: }

List 10.15: Decompiled result when turning List with for

 1: List<int> list = new List<int>(128);
 2: int num = 0;
 3: while (num < list.Count)
 4: {
 5:     int num2 = list[num];
 6:     num++;
 7: }

In C#, the for statement is a sugar-coated syntax for the while statement, and the indexer (public T this[int index] ), and is obtained by reference by the indexer (Also, if you look closely at this while statement, you will see that the conditional expression contains list.Count. This means that theaccess to the Count property is performed each time the loop is repeated. Count The more the number of Count accesses to the property, the more the number of accesses to the property increases proportionally, and depending on the number ofaccesses, the load becomes non-negligible. If Count does not change within the loop, then the load on property accesses can be reduced by caching them before the loop.

List 10.16: Example of turning a List with for: Improved version

 1: var count = list.Count;
 2: for (var i = 0; i < count; i++)
 3: {
 4:     var val = list[i];
 5: }

List 10.17: Example of List in for: Decompiled result of the improved version

 1: List<int> list = new List<int>(128);
 2: int count = list.Count;
 3: int num = 0;
 4: while (num < count)
 5: {
 6:     int num2 = list[num];
 7:     num++;
 8: }

Caching Count reduced the number of property accesses and made it faster.Both of the comparisons in this loop are not loaded by GC.Alloc, and the difference is due to the difference in implementation.

In the case of arrays, foreach has also been optimized and is almost unchanged from that described in for.

List 10.18: Example of turning an array with foreach

 1: var array = new int[128];
 2: foreach (var val in array)
 3: {
 4: }

List 10.19: Decompilation result of the example of turning an array by foreach

 1: int[] array = new int[128];
 2: int num = 0;
 3: while (num < array.Length)
 4: {
 5:     int num2 = array[num];
 6:     num++;
 7: }

For the purpose of verification, the number of data is 10,000,000 and random numbers are assigned in advance. List<int>The sum of the data is calculated. The verification environment was Pixel 3a and Unity 2021.3.1f1.

Table 10.1: Measurement results for each description method in List<int

TypeTime ms
List: foreach66.43
List: for62.49
List: for (Count cache)55.11
Array: for30.53
Array: foreach23.75

In the case of List<int>, a comparison with a finer set of conditions shows that for and for with Count optimizations are even fasterthan foreach. List The foreach of can be rewritten to for with Count optimization to reduce the overhead of the MoveNext() and Current properties in the processing offoreach , thus making it faster.

In addition, when comparing the respective fastest speeds of List and arrays, arrays are approximately 2.3 times faster than List.Even if foreach and for are written to have the same IL result, foreach is the faster result, andarray's foreach is sufficiently optimized.

Based on the above results, arrays should be considered instead of List<T> for situations where the number of data is large and processing speed must be fast.

foreachfor However, if the rewriting is insufficient, such as when List defined in a field is referenced without local caching, it may not be possible to speed up the process.

10.3. Object Pooling

As we have mentioned in many places, it is important in game development to pre-generate objects and use them around instead of dynamically generating them.This is called object pool ing. For example, objects that are to be used in the game phase can be pooled together in the load phase and handled while only assigning and referencing the pooled objects when they areused, thereby avoiding GC.Alloc during the game phase.

In addition to reducing allocations, object pooling can also be used in a variety of other situations, such as enablingscreen transitions without having to recreate the objects that make up the screen each time, reducing load times, and avoiding multiple heavy calculations by retaining the results of processes with very high calculation costs.It is used in a variety of situations.

Although the term "object" is used here in a broad sense, it applies not only to the smallest unit of data, but also to Coroutine and Action,. For example, consider generating Coroutine more than the expected number of executions in advance, and use it when necessary to exhaust it.For example, if a game that takes 2 minutes to complete will be executed a maximum of 20 times, you can reduce the cost of generatingby generating IEnumerator in advance and only using StartCoroutine when you need to use it.

10.4. string

The string object is a sequential collection of System.Char objects representing strings. string GC.Alloc can easily occur with one usage.For example, concatenating two strings using the character concatenation operator + will result in the creation of a new string object. string The value ofcannot be changed (immutable) after it is created, so an operation that appears to change the value creates and returns a new string object.

List 10.20: When string concatenation is used to create a STRING

 1: private string CreatePath()
 2: {
 3:     var path = "root";
 4:     path += "/";
 5:     path += "Hoge";
 6:     path += "/";
 7:     path += "Fuga";
 8:     return path;
 9: }

In the above example, a string is created with each string concatenation, resulting in a total of 164Byte allocation.

When strings are frequently changed, the use of StringBuilder, whose value can be changed, can prevent the mass generation of string objects.By performing operations such as character concatenation and deletion in the StringBuilder object and finally extracting the value and ToString() it to the string object, the memory allocation can be limited to only the time ofacquisition. Also, when using StringBuilder, be sure to set Capacity.When unspecified, the default value is 16, and when the buffer is extended with more characters, such as Append, memory allocation and value copying will run.Be sure to set an appropriate Capacity that will not cause inadvertent expansion.

List 10.21: When creating a string with StringBuilder

 1: private readonly StringBuilder _stringBuilder = new StringBuilder(16);
 2: private string CreatePathFromStringBuilder()
 3: {
 4:     _stringBuilder.Clear();
 5:     _stringBuilder.Append("root");
 6:     _stringBuilder.Append("/");
 7:     _stringBuilder.Append("Hoge");
 8:     _stringBuilder.Append("/");
 9:     _stringBuilder.Append("Fuga");
10:     return _stringBuilder.ToString();
11: }

In the example using StringBuilder, if StringBuilder is generated in advance (in the above example, 112Byte allocation is made at the time of generation), then fromonward, only 50Byte allocation is needed which is taken at ToString() when the generated string is retrieved.

However, StringBuilder is also not recommended for use when you want to avoid GC.Alloc, since allocation is only less likely to occur during value manipulation, and as mentioned above, string objects will begenerated when ToString() is executed. Also, since the $"" syntax is converted to string.Format and the internal implementation ofstring.Format uses StringBuilder, the cost of ToString() is ultimately unavoidable.The use of objects in the previous section should be applied here as well, and strings that may be used in advance should be pre-generated string objects and used.

However, there are times during the game when string manipulation and the creation of string objects must be performed.In such cases, it is necessary to have a pre-generated buffer for strings and extend it so that it can be used as is.Consider implementing your own code like unsafe or introducing a library with extensions for Unity like ZString *3(e.g. NonAlloc applicability to TextMeshPro).

10.5. LINQ and Latency Evaluation

This section describes how to reduce GC.Alloc by using LINQ and the key points of lazy evaluation.

Mitigating GC.Alloc by using LINQ

The use of LINQ causes GC.Alloc in cases like List 10.22.

List 10.22: Example of GC.Alloc occurring

 1: var oneToTen = Enumerable.Range(1, 11).ToArray();
 2: var query = oneToTen.Where(i => i % 2 == 0).Select(i => i * i);

List 10.22 The reason why GC.Alloc occurs in is due to the internal implementation of LINQ.In addition, some LINQ methods are optimized for the caller's type, so the size of GC.Alloc changes depending on the caller's type.

List 10.23: Execution speed verification for each type

 1: private int[] array;
 2: private List<int> list;
 3: private IEnumerable<int> ienumerable;
 4: 
 5: public void GlobalSetup()
 6: {
 7:     array = Enumerable.Range(0, 1000).ToArray();
 8:     list = Enumerable.Range(0, 1000).ToList();
 9:     ienumerable = Enumerable.Range(0, 1000);
10: }
11: 
12: public void RunAsArray()
13: {
14:     var query = array.Where(i => i % 2 == 0);
15:     foreach (var i in query){}
16: }
17: 
18: public void RunAsList()
19: {
20:     var query = list.Where(i => i % 2 == 0);
21:     foreach (var i in query){}
22: }
23: 
24: public void RunAsIEnumerable()
25: {
26:     var query = ienumerable.Where(i => i % 2 == 0);
27:     foreach (var i in query){}
28: }

List 10.23 We measured the benchmark for each method defined in Figure 10.1.The results show that the size of heap allocations increases in the order T[]List<T>IEnumerable<T>.

Thus, when using LINQ, the size of GC.Alloc can be reduced by being aware of the runtime type.

Comparison of Execution Speed by Type

Figure 10.1: Comparison of Execution Speed by Type

Causes of GC.Alloc in LINQ

Part of the cause of GC.Alloc with the use of LINQ is the internal implementation of LINQ.Many LINQ methods take IEnumerable<T> and return IEnumerable<T>, and this API design allows for intuitive description using method chains.The entity IEnumerable<T> returned by a method is an instance of the class for each function.LINQ internally instantiates a class that implements IEnumerable<T>, and furthermore, GC.Alloc occurs internally because calls to GetEnumerator() are made to realize loop processing, etc.

LINQ Lazy Evaluation

LINQ methods such as Where and Select are lazy evaluations that delay evaluation until the result is actually needed.On the other hand, methods such as ToArray are defined for immediate evaluation.

Now consider the case of the following List 10.24 code.

List 10.24: Methods with immediate evaluation in between

 1: private static void LazyExpression()
 2: {
 3:     var array = Enumerable.Range(0, 5).ToArray();
 4:     var sw = Stopwatch.StartNew();
 5:     var query = array.Where(i => i % 2 == 0).Select(HeavyProcess).ToArray();
 6:     Console.WriteLine($"Query: {sw.ElapsedMilliseconds}");
 7: 
 8:     foreach (var i in query)
 9:     {
10:         Console.WriteLine($"diff: {sw.ElapsedMilliseconds}");
11:     }
12: }
13: 
14: private static int HeavyProcess(int x)
15: {
16:     Thread.Sleep(1000);
17:     return x;
18: }

List 10.24 The result of the execution of List 10.25 is the result of .By adding ToArray at the end, which is an immediate evaluation, the result of executing the method Where or Select and evaluating the value is returned when the assignment is made to query.Therefore, since HeavyProcess is also called, you can see that processing time is taken at the timing when query is generated.

List 10.25: Result of Adding a Method for Immediate Evaluation

 1: Query: 3013
 2: diff: 3032
 3: diff: 3032
 4: diff: 3032

As you can see, unintentional calls to LINQ's immediate evaluation methods can result in bottlenecks at those points. ToArray Methods that require looking at the entire sequence once, such as OrderBy, Count, and , are immediate evaluation, so be aware of the cost when calling them.

The Choice to "Avoid Using LINQ"

This section explained the causes of GC.Alloc when using LINQ, how to reduce it, and the key points of delayed evaluation.In this section, we explain the criteria for using LINQ.The premise is that LINQ is a useful language feature, but its use will worsen heap allocation and execution speed compared to when it is not used.In fact, Microsoft's Unity performance recommendations at *4 clearly state "Avoid use of LINQ.Here is a benchmark comparison of the same logic implementation with and without LINQ at List 10.26.

List 10.26: Performance comparison with and without LINQ

 1: private int[] array;
 2: 
 3: public void GlobalSetup()
 4: {
 5:     array = Enumerable.Range(0, 100_000_000).ToArray();
 6: }
 7: 
 8: public void Pure()
 9: {
10:     foreach (var i in array)
11:     {
12:         if (i % 2 == 0)
13:         {
14:             var _ = i * i;
15:         }
16:     }
17: }
18: 
19: public void UseLinq()
20: {
21:     var query = array.Where(i => i % 2 == 0).Select(i => i * i);
22:     foreach (var i in query)
23:     {
24:     }
25: }

The results are available at Figure 10.2. The comparison of execution times shows that the process with LINQ takes 19 times longer than the process without LINQ.

Performance Comparison Results with and without LINQ

Figure 10.2: Performance Comparison Results with and without LINQ

While the above results clearly show that the use of LINQ deteriorates performance, there are cases where the coding intent is more easily conveyed by using LINQ.After understanding these behaviors, there may be room for discussion within the project as to whether to use LINQ or not, and if so, the rules for using LINQ.

10.6. How to avoid async/await overhead

Async/await is a language feature added in C# 5.0 that allows asynchronous processing to be written as a single synchronous process without callbacks.

Avoid async where it is not needed

Methods defined async will have code generated by the compiler to achieve asynchronous processing.And if the async keyword is present, code generation by the compiler is always performed.Therefore, even methods that may complete synchronously, such as List 10.27, are actually code generated by the compiler.

List 10.27: Asynchronous processing that may complete synchronously

 1: using System;
 2: using System.Threading.Tasks;
 3: 
 4: namespace A {
 5:     public class B {
 6:         public async Task HogeAsync(int i) {
 7:             if (i == 0) {
 8:                 Console.WriteLine("i is 0");
 9:                 return;
10:             }
11:             await Task.Delay(TimeSpan.FromSeconds(1));
12:         }
13: 
14:         public void Main() {
15:             int i = int.Parse(Console.ReadLine());
16:             Task.Run(() => HogeAsync(i));
17:         }
18:     }
19: }

In cases such as List 10.27, the cost of generating a state machine structure for IAsyncStateMachine implementation, which is unnecessary in the case of synchronous completion, can be omitted by splitting HogeAsync, which may be completed synchronously, and implementing it as List 10.28.

List 10.28: Split implementation of synchronous and asynchronous processing

 1: using System;
 2: using System.Threading.Tasks;
 3: 
 4: namespace A {
 5:     public class B {
 6:         public async Task HogeAsync(int i) {
 7:             await Task.Delay(TimeSpan.FromSeconds(1));
 8:         }
 9: 
10:         public void Main() {
11:             int i = int.Parse(Console.ReadLine());
12:             if (i == 0) {
13:                 Console.WriteLine("i is 0");
14:             } else {
15:                 Task.Run(() => HogeAsync(i));
16:             }
17:         }
18:     }
19: }

How async/await works

The async/await syntax is realized using compiler code generation at compile time.Methods with the async keyword add a process to generate a structure implementing IAsyncStateMachine at compile time, and the async/await function is realized by managing a state machine that advances state when the process to be awaited completes.Also, this IAsyncStateMachine is an interface defined in the System.Runtime.CompilerServices namespace and is available only to the compiler.

Avoid capturing synchronous context

The mechanism to return to the calling thread from asynchronous processing that has been saved to another thread is synchronous context, and await The previous context can be captured by usingSince this synchronous context is captured each time await is executed, there is an overhead for each await.For this reason, UniTask *5, which is widely used in Unity development, is implemented without ExecutionContext and SynchronizationContext to avoid the overhead of synchronous context capture.As far as Unity is concerned, implementing such libraries may improve performance.

10.7. Optimization with stackalloc

Allocating arrays as local variables causes GC.Alloc to occur each time, which can lead to spikes.In addition, reading and writing to the heap area is a little less efficient than to the stack area.

Therefore, in C#, the unsafe code-only syntax for allocating arrays on the stack.

List 10.29 Instead of using the new keyword, as in the following example, an array can be allocated on the stack using the stackalloc keyword.

List 10.29: stackalloc Allocating an array on the stack using the

 1: //  stackalloc is limited to unsafe
 2: unsafe
 3: {
 4:     //  Allocating an array of ints on the stack
 5:     byte* buffer = stackalloc byte[BufferSize];
 6: }

Since C# 7.2, the Span<T> structure can be used to allocate an array of ints on the stack as shown in List 10.30The structure can now be used without unsafestackalloc can be used without unsafe as shown in .

List 10.30: Span<T> Allocating an array on the stack using the struct

 1: Span<byte> buffer = stackalloc byte[BufferSize];

For Unity, this is standard from 2021.2. For earlier versions, Span<T> does not exist, so System.Memory.dll must be installed.

Arrays allocated with stackalloc are stack-only and cannot be held in class or structure fields. They must be used as local variables.

Even though the array is allocated on the stack, it takes a certain amount of processing time to allocate an array with a large number of elements.If you want to use arrays with a large number of elements in places where heap allocation should be avoided, such as in an update loop, it is better to allocate the array in advance during initialization or to prepare a data structure like an object pool, and implement it in such a way that it can be rented out when used.

Also, note that the stack area allocated by stackalloc is not released until the function exits.For example, the code shown at List 10.31 may cause a Stack Overflow while looping, since all arrays allocated in the loop are retained and released when exiting the Hoge method.

List 10.31: stackalloc Allocating Arrays on the Stack Using

 1: unsafe void Hoge()
 2: {
 3:     for (int i = 0; i < 10000; i++)
 4:     {
 5:         //  Arrays are accumulated for the number of loops
 6:         byte* buffer = stackalloc byte[10000];
 7:     }
 8: }

10.8. Optimizing method invocation under IL2CPP backend with sealed

When building with IL2CPP as a backend in Unity, method invocation is performed using a C++ vtable-like mechanism to achieve virtual method invocation of the class *6.

Specifically, for each method call definition of a class, the code shown at List 10.32 is automatically generated.

List 10.32: C++ code for method calls generated by IL2CPP

 1: struct VirtActionInvoker0
 2: {
 3:     typedef void (*Action)(void*, const RuntimeMethod*);
 4: 
 5:     static inline void Invoke (
 6:         Il2CppMethodSlot slot, RuntimeObject* obj)
 7:     {
 8:         const VirtualInvokeData& invokeData =
 9:             il2cpp_codegen_get_virtual_invoke_data(slot, obj);
10:         ((Action)invokeData.methodPtr)(obj, invokeData.method);
11:     }
12: };

It generates similar C++ code not only for virutal methods, but also for non-virtual methods that do not inherit at compile time.This auto-generated behavior leads to bloated code size and increased processing time for method calls.

This problem can be avoided by adding the sealed modifier to the class definition *7.

List 10.33 If you define a class like List 10.34 and call a method, the C++ code generated by IL2CPP will generate method calls like .

List 10.33: Class definition and method invocation without sealed

 1: public abstract class Animal
 2: {
 3:     public abstract string Speak();
 4: }
 5: 
 6: public class Cow : Animal
 7: {
 8:     public override string Speak() {
 9:         return "Moo";
10:     }
11: }
12: 
13: var cow = new Cow();
14: //  Calling the Speak method
15: Debug.LogFormat("The cow says '{0}'", cow.Speak());

List 10.34: List 10.33 The C++ code corresponding to the method call in

 1: //  var cow = new Cow();
 2: Cow_t1312235562 * L_14 =
 3:     (Cow_t1312235562 *)il2cpp_codegen_object_new(
 4:         Cow_t1312235562_il2cpp_TypeInfo_var);
 5: Cow__ctor_m2285919473(L_14, /* hidden argument*/NULL);
 6: V_4 = L_14;
 7: Cow_t1312235562 * L_16 = V_4;
 8: 
 9: //  cow.Speak()
10: String_t* L_17 = VirtFuncInvoker0< String_t* >::Invoke(
11:     4 /* String AssemblyCSharp.Cow::Speak() */, L_16);

List 10.34 shows that VirtFuncInvoker0< String_t* >::Invoke is called even though it is not a virtual method call, and that a method call like a virtual method is made.

On the other hand, defining the Cow class of List 10.33 with the sealed modifier as shown in List 10.35 generates C++ code like List 10.36.

List 10.35: Class Definition and Method Calls Using the SEALED

 1: public sealed class Cow : Animal
 2: {
 3:     public override string Speak() {
 4:         return "Moo";
 5:     }
 6: }
 7: 
 8: var cow = new Cow();
 9: //  Calling the Speak method
10: Debug.LogFormat("The cow says '{0}'", cow.Speak());

List 10.36: List 10.35 C++ code corresponding to a method call of

 1: //  var cow = new Cow();
 2: Cow_t1312235562 * L_14 =
 3:     (Cow_t1312235562 *)il2cpp_codegen_object_new(
 4:         Cow_t1312235562_il2cpp_TypeInfo_var);
 5: Cow__ctor_m2285919473(L_14, /* hidden argument*/NULL);
 6: V_4 = L_14;
 7: Cow_t1312235562 * L_16 = V_4;
 8: 
 9: //  cow.Speak()
10: String_t* L_17 = Cow_Speak_m1607867742(L_16, /* hidden argument*/NULL);

Thus, we can see that the method call calls Cow_Speak_m1607867742, which directly calls the method.

However, in relatively recent Unity, the Unity official clarifies that such optimization is partially automatic *8.

In other words, even if you do not explicitly specify sealed, it is possible that such optimization is done automatically.

However, the "[il2cpp] Is `sealed` Not Worked As Said Anymore In Unity 2018.3?"\footnotemark[8] As mentioned in the forum, this implementation is not complete as of April 2019.

Because of this current state of affairs, it would be a good idea to check the code generated by IL2CPP and decide on the setting of the sealed modifier for each project.

For more reliable direct method calls, and in anticipation of future IL2CPP optimizations, it may be a good idea to set the sealed modifier as an optimizable mark.

10.9. Optimization through inlining

Method calls have some cost.Therefore, as a general optimization, not only for C# but also for other languages, relatively small method calls are optimized by compilers through inlining.

Specifically, for code such as List 10.37, inlining generates code such as List 10.38.

List 10.37: Code before inlining

 1: int F(int a, int b, int c)
 2: {
 3:     var d = Add(a, b);
 4:     var e = Add(b, c);
 5:     var f = Add(d, e);
 6: 
 7:     return f;
 8: }
 9: 
10: int Add(int a, int b) => a + b;

List 10.38: List 10.37 Code with inlining for

 1: int F(int a, int b, int c)
 2: {
 3:     var d = a + b;
 4:     var e = b + c;
 5:     var f = d + e;
 6: 
 7:     return f;
 8: }

Inlining is done by copying and expanding the contents within a method, such as List 10.38, and the call to the Add method within the Func method of List 10.37.

In IL2CPP, no particular inlining optimization is performed during code generation.

However, starting with Unity 2020.2, by specifying the MethodImpl attribute for a method and MethodOptions.AggressiveInlining for its parameter, the corresponding function in the generated C++ code will be given the inline specifier.In other words, inlining at the C++ code level is now possible.

The advantage of inlining is that it not only reduces the cost of method calls, but also saves copying of arguments specified at the time of method invocation.

For example, arithmetic methods take multiple relatively large structures as arguments, such as Vector3 and Matrix.If the structs are passed as arguments as they are, they are all copied and passed to the method as passed by value. If the number of arguments and the size of the passed structs are large, the processing cost may be considerable for method calls and argument copying.In addition, method calls may become a case that cannot be overlooked as a processing burden because they are often used in periodic processing, such as in the implementation of physical operations and animations.

In such cases, optimization through inlining can be effective. In fact, Unity's new mathmatics library Mathmatics specifies MethodOptions.AggressiveInlining for method calls everywhere *9.

On the other hand, inlining has the disadvantage that the code size increases with the expansion of the process within the method.

Therefore, it is recommended to consider inlining especially for methods that are frequently called in a single frame and are hot-passed.It should also be noted that specifying an attribute does not always result in inlining.

Inlining is limited to methods that are small in content, so methods that you want to inline must be kept small.

Also, in Unity 2020.2 and earlier, the inline specifier is not attached to attribute specifications, and there is no guarantee that inlining will be performed reliably even if the C++ inline specifier is specified.

Therefore, if you want to ensure inlining, you may want to consider manual inlining for methods that are hotpaths, although it will reduce readability.