Sgt. Conker We are "absolutely fine"

15May/111

High End Performance Optimizations on the Xbox 360 and Windows Phone 7

Ian Nicolades

Technical Director, UberGeekGames

For the Xbox or Windows Phone 7 programmer, performance is something that should always be kept in mind. For any moderately complex game, it can be very easy for framerate issues to crop up, and seeing as those pesky gamers insist on having a smooth playing experience, it can quickly become problematic.

Having had quite a bit of experience in this area with our last few games, this article will be a “missing manual” of sorts; the kind of cheat sheet that would have saved me more than one headache! :)

The target audience

I’m going to make a couple of reasonable assumptions to guide us. First, you either have a game that currently has performance issues, or you’re planning a game that will most likely run into performance issues later on in development. Second, you’ve already read through Shawn’s blog posts on performance. If you haven’t, go read them now: http://msdn.com/blogs/shawnhar I’ll wait.

Back? Excellent. You should already be familiar with the basic principles of profiling, how to use a profiler, rig up an FPS counter, and how to determine if you’re CPU or GPU bound. Without further ado, we will now dive into the deep end of the performance pool, so to speak. Each of the following sections will cover a technique that can help improve your game’s performance.

Quadtrees, Grid Registration, and spatial partitioning

Quadtrees are an oft-recommended improvement, and with good reason – they can vastly improve the performance of many different systems, the most popular being collision detection and view space culling.

For example, let’s say you’re building a shooter and will have a large number of bullets colliding with an even larger number of enemies. A naive approach would be to run your collision detection algorithm on every bullet against every enemy. This will take exponentially longer as the number of enemies and bullets increases.

By implementing a quadtree, you break the game world up into smaller chunks. Enemies, bullets, and anything else that needs to be collided with will be added to whichever chunk contains them. This way, you only need to check collisions against the bullets and enemies in the same chunk.

Quadtrees can be built in both 2D and 3D worlds. In 3D, they can also be used for efficient view space culling. They are typically known as “octrees” in 3D space.

Garbage collection

The CLR on the Xbox and WP7 have much slower garbage collectors than on the PC (soon this won’t be the case for WP7: http://blogs.msdn.com/b/abhinaba/archive/2011/04/13/generational-gc-in-windows-phone-mango.aspx – huzzah!). If your game has frequent stuttering or pauses, then you most likely have a garbage collection issue. This has been covered in Shawn’s blog, but here are a few additional tips I’ve picked up:

-          Beware of boxing. This can come from odd places, such as adding maintaining a list of interfaces; the following code will generate garbage:


interface ISomethingOrOther

{
    void Something();
}

struct Something : ISomethingOrOther
{
    public void Something()
   {
   }
}

List<ISomethingOrOther> ListOfInterfaces = new List<ISomethingOrOther>();

ListOfInterfaces.Add(new Something()); // boxing!

-          Boxing can also occur when you use an enum as a key in a Dictionary<Enum, something>. I usually use a Dictionary<int, something> and cast to int when indexing into it like so:

enum YourEnum

{

    A, B, C

}

//adding elements will cause boxing.

Dictionary<YourEnum, string> dictionaryWithBoxing = new Dictionary<YourEnum, string>()

{

    { YourEnum.A, "a" },

    { YourEnum.B, "b" },

    { YourEnum.C, "c" },

}

//no boxing!

Dictionary<int, string> yourDictionaryWithoutBoxing = new Dictionary<int, string>()

{

    { (int)YourEnum.A, "a" },

    { (int)YourEnum.B, "b" },

    { (int)YourEnum.C, "c" },

}

Nick has more info in this blog post: http://blog.nickgravelyn.com/2009/04/net-misconceptions-part-1/

-          Make liberal use of pooling to avoid creating new objects at runtime. A simple template that I use:


class SomeObject
{
    //stuff
    public void Spawn(Vector2 position)
    {
    }
}

class SomeObjectPool

{
    public static bool[] isAlive;
    public static SomeObject[] pool;
    public static SomeObjectPool()
    {
        pool = new SomeObject[100];
        isAlive = new bool[100];
        for(int i=0;i<pool.Length;i++)
        {
            isAlive[i] = false;
            pool[i] = new SomeObject();
        }
    }

public static void Spawn(Vector2 position)
{
    for(int i=0;i<pool.Length;i++)
    {
        If(!isAlive[i])
        {
            isAlive[i] = true;
            pool[i].Spawn(position);
            break;
        }
    }
}
}

Nick also has a template for this: http://forums.create.msdn.com/forums/t/3375.aspx

-          Pooling classes is almost always a better solution than switching to structs, unless you know exactly why you need them. Classes behave very differently from structs, sometimes in unintuitive ways if you are not familiar with the difference between value and reference types. This page contains a good primer on the differences: http://www.albahari.com/valuevsreftypes.aspx

Draw that 2D content most efficiently with Spritesheets

If you draw sprites individually, loading them one at a time and drawing them, you will prevent the GPU from batching its draw calls in the most efficient manner. By using spritesheets, the GPU will not need to switch textures in between draw calls and can batch your drawing more efficiently. This post by Shawn succinctly explains the general idea: http://forums.create.msdn.com/forums/p/24254/131437.aspx#131437

The spritesheet sample is here: http://create.msdn.com/en-US/education/catalog/sample/sprite_sheet

And I’ve personally been using Nick’s SpriteSheetPacker tool to automate the packaging process, with great success: http://spritesheetpacker.codeplex.com/

Batch your SpriteBatches

Efficient rendering is all about batching. Remember the elf in a box model: http://blogs.msdn.com/b/shawnhar/archive/2008/03/31/an-elf-in-a-box.aspx

If you want to maximize GPU performance, use as few SpriteBatch batches as possible. There are, of course, many cases where you just have to end and begin a new SpriteBatch in order to switch renderstates or blend states, but for all other cases – use a single spritebatch!

Multithreading

On the Xbox 360, you have three separate processor cores and six hardware threads, four of which are available for you to use (one on core one, one on core two, and the two on core three; see this MSDN article http://msdn.microsoft.com/en-us/library/system.threading.thread.setprocessoraffinity.aspx).

If you are CPU bound, you might gain some performance by threading. Common tasks that are likely candidates for threading:

-          Particles; if your particles are updated on the CPU and take a significant amount of processing time, they will likely be one of the easiest things to multithread.

-          Collision detection; while this can be tricky to synchronize depending on what you’re it can work well.

-          The sky is the limit when it comes to what you can offload. With enough work you could theoretically offload most any system to another thread. The key is making sure that the performance gains will be high enough to warrant the overhead of managing the thread, and how easy it will be to synchronize data between threads.

On WP7, you are limited to a single core, 1GHz processor. Multithreading is unlikely to see any wins here, as it will just add overhead.

Manually inlining high frequency code

You may get to the point where you still have performance trouble but there are no more big wins left, or all the low hanging fruit has been picked, so to speak. At this point, manually inlining methods that are called at a high frequency could be the next best step. Manually inlining is nothing more than avoiding excess method calls when possible. Take the following piece of code:


foreach(Particle particle in ParticleList)
{
    particle.Update();
}

class Particle
{
    public void Update()
    {
        Position += Velocity
    }
}

There is a small amount of overhead when calling a method. This alternate version will be faster:


void Update()
{
    foreach(Particle particle in ParticleList)
    {
        particle.Position += particle.Velocity; //slightly faster
    }
}

The gains here are relatively small, but can make a difference in with lots of values such as particle engines.

Cache values wherever possible

Optimization, at its core, is just figuring out how to get the same or similar results while doing less work. Caching values is a perfect example of this. When drawing a texture, for example, how many times do you do this?


void Draw()
{
    Rectangle drawingRect = new Rectangle(foo, bar, foobar, barfoo);
    spriteBatch.Draw(Texture, drawingRectangle, Color);
}

What’s the point of recreating the rectangle each frame if the value doesn’t change? Define that outside of the loop like this:

Rectangle drawingRect = new Rectangle(foo, bar, foobar, barfoo);
void Draw()
{
    spriteBatch.Draw(Texture, drawingRectangle, Color);
}

Properties

Properties are roughly equivalent to a JIT method call. And we’ve already established that method calls take a nontrivial amount of time to execute. Therefore, properties are best to be avoided unless necessary.

“But, kind sir, all the C# style guides tell me to use properties! Why would they recommend such a thing if it’s so bad?”

Simply put, most guides I’ve seen which recommend this assume that you are either going for maximum readability and maintainability, and/or performance isn’t of much concern (eg, a winforms tool). And on the PC, you’re almost certainly not going to run into issues stemming from properties, as the latest processor can chew through them with ease. On hardware such as the WP7 or the Xbox 360, every cycle can count.

In other words, if you want to look at pretty code all day, you should stay far away from game development. :)

Array indexing

Another subtle improvement in tight loops is how you index into an array’s elements. An array index costs about the same as a JIT method call, plus some overhead for bounds checking. Take this code for example:


SomeObject[] objectArray;

Void DoSomeStuff()
{
    for(int i=0;i<objectArray.Length;i++)
    {
        objectArray[i].Position += something;
        objectArray[i].Velocity *= whatever;
        objectArray[i].Counter++;
    }
}

You can cache the variable to avoid this overhead:


SomeObject[] objectArray;

Void DoSomeStuff()
{
    for(int i=0;i<objectArray.Length;i++)
    {
        SomeObject obj = objectArray[i];
        obj.Position += something;
        obj.Velocity *= whatever;
        obj.Counter++;
    }
}

Dictionaries vs Arrays

Dictionaries are great. They are an invaluable coding tool that makes efficiently organizing data easy. However, it is quite possible for them to become a performance bottleneck, as they are about twice as slow as an array index, and can be even slower if you’re unnecessarily indexing into them more than once. Here’s some example code to explain this:


Dictionary<int, Texture2D> textureDictionary;
…

//this is the slowest way to use a Dictionary, as we are effectively doing two lookups for the same object. Slowness!

if(textureDictionary.ContainsKey(textureKey))
{
    Texture2D texture = textureDictionary[textureKey];
    //do something with texture
}

//this is faster as we are only doing one lookup.
Texture2D texture = null;
if(textureDictionary.TryGetValue(textureKey, out texture)
{
    //do something with texture here
}

Vector operations

When possible, use the Vector2.Whatever() methods that pass their arguments by reference and use out to supply their output. This will be faster than using the normal C# addition/subtract/multiplication/division operators. Take this example code:


//Easy to read, but slow!
someVector += anotherVector * someFloat;

//More verbose, yet faster
Vector2 tmp = new Vector2();
Vector2.Multiply(ref anotherVector, someFloat, out tmp);
Vector2.Add(ref someVector, ref tmp, out someVector);

Soon, this will be even faster on WP7: http://blogs.msdn.com/b/abhinaba/archive/2011/04/10/simd-support-in-netcf.aspx

About Captain ZSquare

Microsoft XNA MVP
Comments (1) Trackbacks (2)
  1. Nice little article- I was unaware of using the object cache trick in the array indexing. Thanks for putting this out there.


Leave a comment


*