Garbage Collections


Introduction


The garbage collector implemented on Xbox 360 made only full garbage collections (Edit: XNA 4 Refresh supports Generational Garbage Collection). That means that every time the garbage collector runs it rearrange all memory. Not only the frame per second will be lower but also the frame stability will suffer.

The main problem is that you won’t feel any performance drop when you are developing on PC because the PC’s garbage collector works with generations, i.e. normally it only rearranges a small fraction of the heap; consequently when you are ready to port your game on the Xbox you will notice an important frame drop just because the garbage collection is working inefficiently. On Xbox, the .NET Compact Framework performs a garbage collection after one megabyte has been allocated. It seems high, but in practice this could mean several collections per second.

There are two practical solutions for this problem, eliminate all garbage or reduce the garbage collection latency. But be aware that they are not complementary solutions; you should follow one or the other, but you can't follow both at the same time. The engine follows the first alternative and consequently does not produce any garbage; therefore you should also follow this rule.

I will first explain how garbage collectors work before discussing the two approaches for improving overall garbage collector performance. Feel free to skip to the Improving Garbage Collector Performance section.

Garbage Collector Basics


In order to understand how to make good use of the garbage collector and what performance problems you might run into when running in a garbage-collected environment, it's important to understand the basics of how garbage collectors work and how those inner workings affect running programs.
Consider the following simplified model of the managed heap:

GarbageCollectorSimplifiedModel.gif

The rules for this simplified model are as follows:
  • All garbage-collectable objects are allocated from one contiguous range of address space.
  • The heap is divided into generations so that it is possible to eliminate most of the garbage by looking at only a small fraction of the heap.
  • Objects within a generation are all roughly the same age.
  • Higher-numbered generations indicate areas of the heap with older objects—those objects are much more likely to be stable.
  • The oldest objects are at the lowest addresses, while new objects are created at increasing addresses.
  • The allocation pointer for new objects marks the boundary between the used (allocated) and unused (free) areas of memory.
  • Periodically the heap is compacted by removing dead objects and sliding the live objects up toward the low-address end of the heap. This expands the unused area at the bottom of the diagram in which new objects are created.
  • The order of objects in memory remains the order in which they were created, for good locality.
  • There are never any gaps between objects in the heap.
  • Only some of the free space is committed. When necessary, more memory is acquired from the operating system in the reserved address range.

Collecting the Garbage


Full Collections

In a full collection we must stop the program execution and find all of the roots into the GC heap. These roots come in a variety of forms, but are most notably stack and global variables that point into the heap. Starting from the roots, we visit every object and follow every object pointer contained in every visited object marking the objects as we go along. In this way the collector will have found every reachable or live object. The other objects, the unreachable ones, are now condemned.

GarbageCollectorFullCollection.gif

Once the unreachable objects have been identified we want to reclaim that space for later use; the goal of the collector at this point is to slide the live objects up and eliminate the wasted space. With execution stopped, it's safe for the collector to move all those objects, and to fix all the pointers so that everything is properly linked in its new location. The surviving objects are promoted to the next generation number (which is to say the boundaries for the generations are updated) and execution can resume.

Partial Collections

The full garbage collection is simply too expensive to do every time, so now it's appropriate to discuss how having generations in the collection helps us out.

Suppose that in all of the time we were running since the last collection we didn't write on any of the older objects at all, only newly allocated, generation zero (gen0), objects have been written to. Instead of our usual full collect we can just assume that all of the older objects (gen1, gen2) are still live—or at least enough of them are alive that it isn't worth looking at those objects. Furthermore, since none of them were written there are no pointers from the older objects to the newer objects. So what we can do is look at all the roots like usual, and if any roots point to old objects just ignore those ones. For other roots (those pointing into gen0) we proceed as usual, following all the pointers. Whenever we find an internal pointer that goes back into the older objects, we ignore it.

When that process is done we will have visited every live object in gen0 without having visited any objects from the older generations. The gen0 objects can then be condemned as usual and we slide up just that region of memory, leaving the older objects undisturbed.

Normally most of the dead space is likely to be in younger objects where there is a great deal of churn. Many classes create temporary objects for their return values, temporary strings, and assorted other utility classes like enumerators and whatnot. Looking at just gen0 gives us an easy way to get back most of the dead space by looking at only very few of the objects.
Unfortunately, we're never lucky enough to use this approach, because at least some older objects are bound to change so that they point to new objects. If that happens it's not sufficient to just ignore them.

Making Generations Work with Write Barriers

To make the algorithm above actually work, we must know which older objects have been modified. To remember the location of the dirty objects, we use a data structure called the card table, and to maintain this data structure the managed code compiler generates so-called write barriers. These two notions are central the success of generation-based garbage collecting.

The card table can be implemented in a variety of ways, but the easiest way to think of it is as an array of bits. Each bit in the card table represents a range of memory on the heap—let's say 128 bytes. Every time a program writes an object into some address, the write barrier code must compute which 128-byte chunk was written and then set the corresponding bit in the card table.

With this mechanism in place, we can now revisit the collection algorithm. If we are doing a gen0 garbage collection, we can use the algorithm as discussed above, ignoring any pointers to older generations, but once we have done that we must then also find every object pointer in every object that lies on a chunk that was marked as modified in the card table. We must treat those just like roots. If we consider those pointers as well, then we will correctly collect just the gen0 objects.

This approach wouldn't help at all if the card table was always full, but in practice comparatively few of the pointers from the older generations actually get modified, so there is a substantial savings from this approach.

Improving Garbage Collector Performance


There are two radically different approaches for improving overall garbage collector performance. You can either reduce the number of collections, or you can reduce the collection latency. As long as you make one of these values nice and small, it doesn't matter how big the other is: either one alone is enough to achieve good performance.

Reducing Garbage Collection Frequency

How often the garbage collector runs is directly proportional to how often you allocate memory. If you never allocate anything, it will never run. To reduce the frequency of garbage collections, you should allocate everything you are ever going to need up front while loading your levels, and avoid allocations during gameplay.

There are several aspects to avoiding memory allocation:
  • Don't allocate memory: Do not call “new” on reference types. It is ok to new value types such as Matrix, Vector3, and Color, however. Any time you find yourself wanting to new a reference type, use an object pool to reuse existing instances instead.
  • Don't use classes that allocate on your behalf: When you add data to a collection class such as List<T> or Dictionary<K,V>, this may have to allocate memory to expand the collection. You can avoid that by using the collection constructor overloads which have explicit capacity parameters. Specify a capacity to preallocate as much memory as you will ever need for the maximum number of objects you intend to store in the collection. Beware of string formatting. It is hard to manipulate strings in .NET without causing allocations, instead use StringBuilder.
  • Don't make the CLR runtime allocate: The CLR runtime allocates memory when boxing occurs. Avoid this like the plague! Boxing can happen for many reasons, some obvious, others less so:
    • If you assign a value type to an object variable, it gets boxed.
    • If you store value types in one of the old non-generic collection classes, they will be boxed.
    • Accessing value types via an interface causes them to be boxed.
    • If you use an enum type as a dictionary key, internal dictionary operations will cause boxing. You can avoid this by using integer keys, and casting your enum values to ints before adding them to the dictionary.
  • Don't make the C# compiler allocate: It can be tricky to use delegates (especially delegates which form lexical closures) without causing the compiler to allocate memory. This is a whole topic in itself, but if in doubt you should avoid using delegates or events. Generator methods will always require the compiler to allocate memory. The foreach statement can allocate if used carelessly. But note, this does not mean you should avoid using foreach! It is often the fastest way to iterate over a collection. Learn the rules and use it appropriately.
  • Everything that does not allocate is ok: Disciples of the frequency path are allowed to have large and complex data structures. They can happily allocate hundreds of thousands of objects while their game is loading, filling the heap with a crazy maze of cross-linked object references. As long as they allocate nothing after loading has completed, the garbage collector will never run, so it makes absolutely no difference how large or complex their heap may be.

Reducing Garbage Collection Latency

How long the garbage collector takes to run is directly proportional to how complex your heap is. If the heap is empty, it will have no work to do. Note that I said "how complex your heap is" and not "how large". Heap complexity is a combination of the number of live objects and the number of object references. It makes no difference how many bytes each object happens to take up: it is the total number of objects (because the garbage collector must examine each one) and references to objects (because the garbage collector must follow each reference to see what it is pointing to) that matter.

When measuring heap complexity, an array of 100,000 integers is very cheap. Even though that is a lot of memory, the garbage collector only has to examine the object once, and does not need to bother looking inside it. 100,000 arrays each containing a single integer would be far more expensive, because the garbage collector would have more objects to examine.

A single array of 100,000 object references is also more expensive, because although the array itself is only a single object, now the garbage collector must also scan the contents of the array to see if any of the objects inside it need to be kept alive. Even if the entire array contains nothing but null values, the garbage collector must still examine each element to make sure.

Here are some things that can reduce heap complexity:
  • A few large objects are better than many small ones.
  • Value types are better than reference types (within reason, anyway).
  • The fewer object references, the better.
  • Arrays of value types are great!
  • Consider replacing object references with integer handles. Instead of storing a direct reference to the ship that created each bullet, instead you can store "I was created by ship #23" as an integer index.
  • Prefer T[] or List<T> to LinkedList<T> or Dictionary<K,V>.
Disciples of the latency path are free to allocate memory while their game is running. They are allowed to call new, to cause boxing, and to use anonymous delegates and generator methods because as long as their heap is so simple that these collections will finish quickly.

Compromise Will Get You Nowhere

You can achieve good performance by avoiding allocations, so the garbage collector will never run. This works regardless of how complex your heap may be. You can also achieve good performance by keeping your heap nice and simple, so the garbage collector will finish quickly. This works regardless of whether you allocate memory during gameplay. But you cannot achieve good performance by mixing these two approaches! There is little to be gained by allocating only a medium amount of memory and having only a medium complex heap. That will produce a game which has a medium size glitch every medium number of seconds.

Also, don’t think that calling GC.Collect at carefully chosen times will be any good. They are almost always wrong. Explicitly forcing garbage collection is a recipe for confusing the collector and hurting your performance.

Events and Garbage


If your game logic needs events you have to avoid the creation of EventsArgs. To do that you should define custom events that use value types. This goes against the event’s good coding practices, but the garbage problem is more important, believe me.

In the Event Naming Guidelines article the Microsoft team gives us some good rules to use with events. One rule says:

“Specify two parameters named sender and e. The sender parameter represents the object that raised the event. The sender parameter is always of type object, even if it is possible to use a more specific type. The state associated with the event is encapsulated in an instance of an event class named e. Use an appropriate and specific event class for the e parameter type.”

The problem is that the creation of an event argument produces a reference type and thus garbage. This could be avoided if we create a unique instance of the event argument and reuse it. However, if the event handler stores the argument instance somewhere, and wants to come back and look at it later then it could be already reset for another event. We can avoid this, improve access time and storage space if we use value types. Not the best solution in terms of readability and reuse, but it’s good for performance.

Another consideration about events: In order to prevent resource leaks, you should unsubscribe from events before you dispose of a subscriber object. Until you unsubscribe from an event, the multicast delegate that underlies the event in the publishing object has a reference to the delegate that encapsulates the subscriber's event handler. As long as the publishing object holds that reference, garbage collection will not delete your subscriber object.

Conclusions


XNA Final Engine follows the Reducing Garbage Collection Frequency approach and you should follow too. The engine gives statistics to know how much garbage collection occur and if you are unsure what part of your code is causing the collections then use the Visual Studio Performance Wizard (under the analyze menu).

Be aware that on PC the application window code generates garbage associated with the mouse movement. This can’t be avoided but this won’t happen on Xbox, therefore you could ignore this.

References



Last edited Jan 31, 2013 at 8:05 PM by jischneider, version 38

Comments

No comments yet.