Article : Multi-threading your XNA
3. Multi-threading Update/Draw
The most popular task that we hear about related to multi-threading in games is the separation of updating and rendering of the game world on two different threads. The reason for this is obvious. When using a single thread, after the state of the world is updated, we have to wait until all drawing is done in order to update again. This seems like a waste, because the update computations don't need any information from the rendering computations. But using multi-threading, while the system is busy with drawing the current scene, we can use the other available processor cores to compute the next state of the world. This way, the updating computations won't have to wait anymore, and by executing it in parallel with the rendering code, we reduce the overall time of a frame. But if this sounds so simple in theory, why are there so few examples on how to actually do this?
3.1. Double Buffering
The main problem with separating Update/Draw is the fact that the rendering process needs the data computed by the update process. Moreover, we need to make sure that the data is correct and consistent for all entities that are drawn. To do this, we have to use synchronization primitives. But we also want to avoid waiting inside locks until each piece of data is processed, or else all the waiting will reduce our performance instead of increasing it.
The solution to this is the concept of double buffering. The updating thread computes the world state in an area of memory (buffer) containing information about the game world. When it is done, the rendering threads begins rendering the world using information from this buffer, and at the same time, the update thread begins computing the next state of the world in another buffer. When both processes are done, they simply switch the buffer used by each of them. The illustration below should help clarify this.

At frame 1, the rendering thread draws the world using the state stored in Buffer 0. Meanwhile, the update thread computes the next state of the world, and writes it in Buffer 1. When frame 2 starts, the rendering thread starts drawing the world using the new state stored in Buffer 1, and the update thread can start computing the next state, in Buffer 0. At the start of the 3rd frame, they switch again, and so on. So at each step, the rendering thread uses the state of the world computed by the update thread in the previous frame, while the update thread computes a new state and stores it in the buffer which is not currently used for rendering.
While the main idea seems simple, this is where things start to get complicated. Let's analyze this. What is the data contained in the buffers, and how do we organize it? The most natural way of thinking, especially if you've done Object Oriented Programming before is to say: "So this is a entity in our game. It needs some data for physics, like acceleration, velocity, collision primitives, positions, rotations. It also needs some data for animation, like bones, movement constraints, etc. To draw the object, we use the same positions and rotation which also describe the physics of the object. And not to forget the game-specific data, like health points, A.I. scripts, etc.". And then you go on and create a GameEntity class which describes your object.
Now where do we keep all these entities? The worst thing you can do is consider the buffers as the main structure for holding game data. When the update thread needs to compute the next state of the world, it needs to know the current state. If the state of the game is only held in these buffers, than the current state is at the moment in the buffer that the rendering thread uses. The buffer we currently hold is outdated by two frames.
So the next decision we can make is take all the game data, and store it some place in memory. Then, during the update function, a new state is computed for this data, and is then written to the buffer we have access to. So the update thread always has the most current state of the world.
Some of you may have noticed one thing we can do to improve performance. Not all objects change their state each frame, so why write them all in the buffer? Why not simply update the buffers only at those positions corresponding to the objects that suffered a change? To understand why this is not a good solution, take a look at the following sequence of events. We assume the data is currently identical in both buffers, and represents the current state of the game. We are ready to start frame number k.

In frame k, the update thread sees that some of the objects (1 and 3) have changes their states. The changes are written to Buffer 0 (currently owned by the update thread), and then the buffers are swapped. In frame k+1, the update thread detects changes in the states of some objects (2 and 3), and writes the updates, while the render thread draws the contents of Buffer 0. So far so good. Now, after the buffer pointers are swapped and frame k+2 starts, we have some problems. When the rendering thread draws object 1, the state of that object is not the most recent one. Its state is actually two frames old, and does not contain the changes made during frame k. The state of objects 2 and 3 is good, because the most recent change to object 3 was made when the update thread was controlling this buffer. You can easily see how, even if this frame we render the correct states for objects 2 and 3, the next frame, when the rendering thread reads from the other buffer, it will be reading some older states. So by trying to save time doing in-place updates, we introduced some complex bugs which might give us headaches. There's got to be a better solution, right?
Fortunately, there is. Next we will look at the solution proposed in a Gamefest presentation held by Ian Lewis, and then implement it using the XNA Framework.
3.2. Change Buffers
The basic idea is the following. Use different data structures for the update thread and the render thread. Usually, during rendering, you only need a subset of the data contained by a game entity: the model and textures, the World matrix, the animation bones, etc. So for each object, besides the main entity data, we could use a smaller structure which holds only the render data. This render data is a duplication of some fields in the normal entity data. The update thread only works with the entity data, while the render thread only works with the render data.
All there is left to do is keep these structures synchronized. Sounds familiar? This is what the double buffer was for. But now, instead of using the buffers to simply put all entity data in it, we will use it only to notify the render thread of changes in the object's state. The buffers will be used as a sort of "message" buffers, where each "message" describes what has changed about an object.

Take a look at the following sequence of images.

As you can see, in frame k, the update thread works on the objects, and changes the states of some objects (1 and 3). Because these objects have changed, the update threads enters notifications into the message buffer. These notifications have to contain enough information to reflect the new state of these objects. Then, the buffers are swapped. In frame k+1, the update thread clears the buffer, and then proceeds to update the game state. During this, it observes that the states of objects 2 and 3 have changed. It writes notifications about these changes in the buffer. Meanwhile, the rendering thread reads its buffer, where it sees the notifications about objects 1 and 3. It uses these notifications to update the render data of these objects, and then starts rendering the scene. When both are finished we swap the buffers. This was the step where the in-place update method failed. But now, as the update thread does his next update, the render thread looks at its buffer. It sees that the states of objects 2 and 3 have changed, and makes the appropriate changes in the render data of these objects. The data held by the rendering thread is consistent and correct, and the objects are drawn in their correct states. And this will continue to be the case the next frame also. So because each thread had a copy of the data, and the buffers were only used to pass off messages about changes to this data, everything stays consistent.
But what about memory? Doesn't this approach use much more memory then the previous ones? Actually, it does not. The game data, used by the update thread is about the same size as one of the buffers used in the previous methods. The render data is much smaller than the game data, so it total, the game data plus the render data is significantly smaller than two buffers. There's also the buffers storing the changes, but these should be fairly small. So all in all, we use smaller memory. And because the data written and read from memory each frame is smaller, the overall process should be faster. The downside is the increased complexity for the operations used to transmit the updates.
3.3. Implementation
And now the fun part begins. It's going to take a while, and more theoretical explanations will be given, but we'll get there. If you are too eager to run or see the code, check the end of this chapter, where an example is provided. For those of you interested in bearing with me through my explanations, continue reading. First, here's some things that we need to take into consideration:
- The Game class does some GraphicsDevice handling behind the scene for us, so if we want to avoid issues, or prefer not to rewrite those things ourselves, we will keep the rendering processing on the main Game thread (hardware thread 1, on the Xbox)
- This means that we will move the Update processing on another thread. For this we could use either thread 3,4, or 5.
- Some data is needed by both threads, such as the current GameTime. We will get and store this value in a shared location before the threads start executing each frame, and provide it to them as needed. The same mechanic could be used if we find other data that needs to be identical for both threads.
3.3.1. Classes
Before we start coding, let's think about the classes we need and put them down as a class diagram. Obviously we will need a class for the game data of each entity. As discussed earlier, we will also need a class to hold the rendering data. The actual fields of these classes depend highly on the type of game you're doing. For some organization, we will have two classes called UpdateManager and RenderManager, which will contain arrays of the two classes mentioned earier. For this article, we will consider that an entity is identified by its position in these arrays. So the object which is on position 4 in the UpdateManager's list of game data will correspond to position 4 of the RenderManager's render data. This could be replaced with some sort of identification scheme, using globally unique identifiers for objects, and hash-based tables, or dictionaries for finding an object with a certain ID, but we use this method for simplicity and ease of indexing.
The change messages will be stored in structures which we will call ChangeMessage. We will discuss in detail the implementation of these structures a little bit later. Until then, we decide to store these messages in collections called ChangeBuffers. And because we want to use double buffering, we will define a class called DoubleBuffer, which will contain two ChangeBuffers, and give each of them to the Update or Render thread, as requested. Right now, a general diagram of this system looks like this:

Now we will start implementing these classes.
3.3.2. Threading and Synchronization
As you will see in the next few paragraphs, there's really not much threading code to write. The most important part of multi-threaded games is planning and the data structures, and a few carefully picked lines of code for synchronization. So we only need to: synchronize the update and render thread at the start and end of each frame, make sure that the correct buffers are accessed by each frame, and lock objects that may be accessed by both frames at the same time. Fortunately, because of how we set up the data structures, and how we time the access to these structures, we will only need a few synchronization instructions.
For good reasons (like debugging), I prefer to keep all threading and synchronization code in a single class, and this class will be the DoubleBuffer class.
There is a sequence of steps that we will do each frame:
- A new frame starts
- If there are any operations that we want to do while a single thread is active, we do them now
- We swap the buffers, to prepare the previous update buffer as the new render buffer
- We signal the update and render threads that we are ready to start processing a new frame
- The update and render threads request their current active buffer, which they will use this frame
- The update and render threads do their computations
- We synchronize the threads by waiting for both of them to finish
- The current frame is done
The DoubleBuffer class needs to contains several fields. We need an array of two ChangeBuffers. We will use two integers that hold the index of the current buffer used for rendering, and the current buffer used for updating. We declare these as volatile, to make sure we always get the correct values, and caching doesn't play tricks on us. You read earlier about AutoResetEvents. We will use four AutoResetEvents to signal the beginning and wait for the end of the render thread and update thread. Lastly, we will add a field to store this frame's GameTime, which we might need in both threads. This field might be read at the same time by both threads, but it will always be written before the threads start their execution, so we don't need a lock, even for this field. But we do need to make it volatile, to make sure it is not cached.
<br />
class DoubleBuffer<br />
{<br />
private ChangeBuffer[] buffers;<br />
private volatile int currentUpdateBuffer;<br />
private volatile int currentRenderBuffer;</p>
<p> private AutoResetEvent renderFrameStart;<br />
private AutoResetEvent renderFrameEnd;<br />
private AutoResetEvent updateFrameStart;<br />
private AutoResetEvent updateFrameEnd;</p>
<p> private volatile GameTime gameTime;<br />
Next, we initialize these value in the contructor, we write a function that resets all fields to starting values, and a function that cleans up when we're done, and releases system resources.
<br />
public DoubleBuffer()<br />
{<br />
//create the buffers<br />
buffers = new ChangeBuffer[2];<br />
buffers[0] = new ChangeBuffer();<br />
buffers[1] = new ChangeBuffer();</p>
<p> //create the WaitHandlers<br />
renderFrameStart = new AutoResetEvent(false);<br />
renderFrameEnd = new AutoResetEvent(false);<br />
updateFrameStart = new AutoResetEvent(false);<br />
updateFrameEnd = new AutoResetEvent(false);</p>
<p> //reset the values<br />
Reset();<br />
}</p>
<p> public void Reset()<br />
{<br />
//reset the buffer indices<br />
currentUpdateBuffer = 0;<br />
currentRenderBuffer = 1;</p>
<p> //set all events to non-signaled<br />
renderFrameStart.Reset();<br />
renderFrameEnd.Reset();<br />
updateFrameStart.Reset();<br />
updateFrameEnd.Reset();<br />
}</p>
<p> public void CleanUp()<br />
{<br />
//relese system resources<br />
renderFrameStart.Close();<br />
renderFrameEnd.Close();<br />
updateFrameStart.Close();<br />
updateFrameEnd.Close();<br />
}<br />
The function that swaps the buffers only needs to switch the values stored in the currentUpdateBuffer and currentRenderBuffer variables. Since we will only call this function at times when execution is done on a single thread, before the signal that starts the threads is sent, we don't need to lock anything. And the fact that these fields are declared as volatile ensures we won't have any problems due to processor caching.
<br />
private void SwapBuffers()<br />
{<br />
currentRenderBuffer = currentUpdateBuffer;<br />
currentUpdateBuffer = (currentUpdateBuffer + 1) % 2;<br />
}<br />
Next, we write a function that represents the start of multi-threaded processing, and call it GlobalStartFrame(). This function receives as a parameter the current GameTime, so it can store its value and make it available for the other threads. We also need a function that waits for both render and update threads to finish, and only then returns to normal execution.
<br />
public void GlobalStartFrame(GameTime gameTime)<br />
{<br />
this.gameTime = gameTime;<br />
SwapBuffers();</p>
<p> //signal the render and update threads to start<br />
processing<br />
renderFrameStart.Set();<br />
updateFrameStart.Set();<br />
}<br />
public void GlobalSynchronize()<br />
{<br />
//wait until both threads signal that they are finished<br />
renderFrameEnd.WaitOne();<br />
updateFrameEnd.WaitOne();<br />
}<br />
The last functions we need to add to this class are the functions that will be called by the render and update thread to get references to their current buffers, and functions called by these threads when they are done with processing. When one of the threads calls one of this methods, it start by waiting for the corresponding WaitHandle, until it is signaled by the GlobalStartFrame() function. After this signal is received, we know that the required values are initialized correctly, can pass them to the calling thread through the out parameters, and then can return, to allow the calling thread to resume execution. The ending functions simply set the updateFrameEnd and renderFrameEnd events, so the GlobalSynchronize function can continue.
<br />
public void StartUpdateProcessing(out ChangeBuffer updateBuffer, out<br />
GameTime gameTime)<br />
{<br />
//wait for start signal<br />
updateFrameStart.WaitOne();<br />
//get the update buffer<br />
updateBuffer = buffers[currentUpdateBuffer];<br />
//get the game time<br />
gameTime = this.gameTime;<br />
}<br />
public void StartRenderProcessing(out ChangeBuffer renderBuffer, out<br />
GameTime gameTime)<br />
{<br />
//wait for start signal<br />
renderFrameStart.WaitOne();<br />
//get the render buffer<br />
renderBuffer = buffers[currentRenderBuffer];<br />
//ret the game time<br />
gameTime = this.gameTime;<br />
}<br />
public void SubmitUpdate()<br />
{<br />
//update is done<br />
updateFrameEnd.Set();<br />
}<br />
public void SubmitRender()<br />
{<br />
//render is done<br />
renderFrameEnd.Set();<br />
}<br />
Now all threading synchronization and primitives that we use are encapsulated inside this class. The sequence we mentioned above now becomes something similar to the following:
- At the beginning on the game, or after finishing the last frame, the render thread and update thread call the functions StartUpdateProcessing() and StartRenderProcessing(), declaring that they are ready to start, and they are waiting for their data. Because the renderFrameStart and updateFrameStart events are not set, they go to sleep waiting for these events.
- Somewhere in our Game class's code, we call the GlobalStartFrame() function, which swaps the buffers and stores the gameTime, preparing all the data that will be given to the render and update threads. After this, is sets the events renderFrameStart and updateFrameStart.
- At this moment, the render and update threads, which wait in their start functions, wake up, and return from the functions. The update thread starts computing a new game state, while the render thread starts drawing the scene.
- Back in out Game class, after we have called the GlobalStartFrame() function, we call the GlobalSynchronize() function, which begins waiting for the render and update thread to finish by watching the renderFrameEnd and updateFrameEnd events.
- When the update and render threads are done, they each call SubmitUpdate() and SubmitRender()
- In this moment, the GlobalSynchronize function wakes up, and returns from the function, so the Game class can leave the XNA Framework to do whatever it needs to do before starting a new frame.
At first look, this seems fine, since the buffer swapping is done when a single thread is running, and the update and render threads never work on the same buffer. It almost seems too nice to be true. And indeed it is. While the threads never work on the same buffers, some problems can still appear, because of caching. When the update thread finishes its work, the data does not always go directly into the main memory. The processor caches data, and delays writing it to the main memory in order to improve performance. But this means that some data may not reach the main memory before the render thread tries to read it, so the render thread will get old data. The same thing happens on the render thread also.
The processor caches the read data, and when the thread tries to read from the buffer, the contents of the buffer might not be the same as the contents of the main memory, due to caching. So, again, we could get old data. The solution is to force the cache of each core to be flushed to the main memory. This is a necessary operation to ensure that our data is always correct. To do this, we use the function Thread.MemoryBarrier(). In theory, it should be sufficient to add this on the update thread, right after it is done computing the new state, and in the render thread, just before it begins consuming the message buffer. To be extra safe, we will add it both in the update and render thread, at the beginning and end of the computations. As a side note, using a lock over the whole update and render code would automatically do this for us, because in it's implementation, lock takes care of cache coherency. However, it seems a little odd to use a lock when the buffer data will never be accessed at the same time. So Thread.MemoryBarrier() will suffice. The last four functions we talked about are modified like below:
<br />
public void StartUpdateProcessing(out ChangeBuffer updateBuffer, out GameTime gameTime)<br />
{<br />
//wait for start signal<br />
updateFrameStart.WaitOne();<br />
//ensure cache coherency<br />
Thread.MemoryBarrier();</p>
<p> //get the update buffer<br />
updateBuffer = buffers[currentUpdateBuffer];<br />
//get the game time<br />
gameTime = this.gameTime;<br />
}<br />
public void StartRenderProcessing(out ChangeBuffer renderBuffer, out GameTime gameTime)<br />
{<br />
//wait for start signal<br />
renderFrameStart.WaitOne();<br />
//ensure cache coherency<br />
Thread.MemoryBarrier();</p>
<p> //get the render buffer<br />
renderBuffer = buffers[currentRenderBuffer];<br />
//ret the game time<br />
gameTime = this.gameTime;<br />
}<br />
public void SubmitUpdate()<br />
{<br />
//ensure cache coherency<br />
Thread.MemoryBarrier();<br />
//update is done<br />
updateFrameEnd.Set();<br />
}<br />
public void SubmitRender()<br />
{<br />
//ensure cache coherency<br />
Thread.MemoryBarrier();<br />
//render is done<br />
renderFrameEnd.Set();<br />
}<br />
I hope this helps you form a good idea about what actually happens with these synchronization primitives.
At this point, the watchful reader might have noticed that as currently explained, this system actually uses three threads. One thread for rendering, one for updating, and one that only calls the GlobalStartFrame() and GlobalSynchronize() functions to synchronize everything. I made the explanations in this way on purpose, because it is easier to understand it like that. In the example presented later in this tutorial, the render thread and the synchronization thread will be a single thread (the main Game thread), while the updating code will run on a separate thread. The code we will actually use in the Game class will be something like this:
-
- Call GlobalStartFrame()
- execute rendering code
- Call GlobalSynchronize()
3.3.3. Change Buffers and Change Messages
This is another place things get interesting, especially if you target the Xbox. As you probably know by now, when using XNA Game Studio, it is vital to keep the garbage generated each frame to the minimum. But our architecture requires the creation of ChangeMessages each frame. And sometimes we will create lots of change messages, depending on what happends in the scene. It should be obvious that the ChangeMessage data type cannot be a class, and has to be a structure, because structures are not created on the heap, so we don't have garbage problems with them.
However, another problem arises. We will usually have more than one message type. For example, some messages will deal with updating the world matrix of an entity, others with updating some other data needed for rendering, such as highlight colors, states of an entity, or animation bones. If we were to use classes, we could have a base class for a message, and lots of subclasses for each type of message. If we didn't have to take care of garbage, this solution might be preferred, but as it is, we have to work with structures.
One solution is to create a structure containing all the possible variables that we might need to pass from the update thread to the render thread. You can clearly see that this is not a good solution, because the size of this kind of structure would be large, and will usually count as wasted space. The solution we will use was inspired by Frank Savage's presentation at Gamefest 2008, regarding performance in XNA Game Studio. In his presentation, he shows us how unions are possible in C#. I know this sounds crazy (I thought the same thing), but it is actually possible. Some of you may not know what a union is. A union is a data structure that stores one of several types of data in a single memory location. For example, if we declare a union to contain an int and a float, both these fields would reside in the same place in memory. The size of an int is 4 bytes, the size of a float is 4 bytes, but the size of the union is still 4 bytes (unlike a struct, where the size would have been 8 bytes). So when assigning a value to the int of the union, or to the float of the union, both operations write to the same memory location. While this may not seem useful on first thought, it actually is for our scenario. While the structure of the message remains the same, we can interpret the data contained in it as whatever type f message we need. I hope this is not too confusing, but if it is, you'll probably understand better after we have some code, a little later.
Before moving forward, let's make some decisions. For this tutorial, we will consider we could have the following message types:
- UpdateCameraView, which we use to give the render thread a new View matrix to be used with the camera
- UpdateWorldMatrix, which we use to update the World matrix of an object
- UpdateHighlightColor, which we use to update the highlight color of an object
- CreateNewRenderData, which we use to signal the render thread that a new object has been created, and pass the new RenderData to it
- DeleteRenderData, which we use to signal the render thread that a certain object has been destroyed, and doesn't need to be rendered any more
I'm sure you can think of other types of messages, depending on you game and scene, but this will suffice to illustrate our method. (Not all of these will be used in the example, but having them in the explanation helps.) We create an enumeration to hold all these types of messages.
<br />
public enum ChangeMessageType<br />
{<br />
UpdateCameraView,<br />
UpdateWorldMatrix,<br />
UpdateHighlightColor,<br />
CreateNewRenderData,<br />
DeleteRenderData,<br />
}<br />
Next we will define the structure of a message. To make a structure behave like a union, there are some steps we have to do. First, we need to add the [StructLayout(LayoutKind.Explicit)] attribute to the structure's declaration.
This allows us to specify for each field, the offset in memory where that field is written to and read from. To specify this, we need to add an attribute when declaring each field of the structure.
The attribute is [FieldOffset(x)], where x is the memory offset. In our structure, the first field will be a ChangeMessageType, which will indicate how an instance of this structure should be interpreted.
This field will have the field offset of 0, because it is the first field in the structure. We will have to ensure that no other field will use this memory location. The size of a ChangeMessageType variable is 4 bytes, so all the next fields should use offsets larger than
4. Now we'll continue to add fields to this structure, based on the possible types of messages.
- The UpdateCameraView message needs to pass a Matrix from update to render. So, we add a field of type Matrix on offset 4.
- We notice that all other messages tend to refer to a certain object. We said earlied that we identify these objects by an integer index, which we will call ID. We add a field of type int at offset 4. The individual fields of each of the remaining messages types will need to start at offset 8.
- The UpdateWorldMatrix message needs to send the new World matrix to the render thread. We will add a field of type Matrix at offset 8
- The UpdateHighlightColor message needs to send a new Vector4 containing the new color. We add a field of type Vector4 at offset 8
- The CreateNewRenderData will send a position and a color that will be used to create a new RenderData object by the rendering thread. We add the position at offset 8, and the color at offset 20 (a Vector3 is stored on 12 bytes)
- Finally, the DeleteRenderData message doesn't need anything besides the ID of the RenderData to be deleted, so we don't need to add anything else.
The final structure declaration can be seen below. Also an image illustrates how the structure resides in memory, and how fields are accessed depending on the message type.
<br />
[StructLayout(LayoutKind.Explicit)]<br />
public struct ChangeMessage<br />
{<br />
//this appears in all messages<br />
//identifies how this message should be interpreted<br />
[FieldOffset(0)]<br />
public ChangeMessageType MessageType;</p>
<p> //this is the field required when this message is of type UpdateCameraView</p>
<p> [FieldOffset(4)]<br />
public Matrix CameraViewMatrix;</p>
<p> //this field is used for all messages dealing with entities<br />
[FieldOffset(4)]<br />
public int ID;</p>
<p> //this is the field required when this message is of type UpdateWorldMatrix</p>
<p> [FieldOffset(8)]<br />
public Matrix WorldMatrix;</p>
<p> //this is the field required when this message is of type UpdateHighlightColor<br />
[FieldOffset(8)]<br />
public Vector4 HighlightColor;</p>
<p> //this is the field required when this message is of type CreateNewRenderData</p>
<p> [FieldOffset(8)]<br />
public Vector3 Position;<br />
[FieldOffset(20)]<br />
public Vector3 Color;</p>
<p> //nothing is required when this message is of type DeleteRenderData<br />
Below you can see how the fields are placed in memory.

As you can see, the total size in bytes of this structure is only 72 bytes, but it can be used as five different types of messages. For example, assume the update thread creates the following two messages:
<br />
//create a message to update the camera view matrix<br />
ChangeMessage updateCamera = new ChangeMessage();<br />
updateCamera.MessageType = ChangeMessageType.UpdateCameraView;<br />
updateCamera.CameraViewMatrix = Matrix.CreateLookAt(...);</p>
<p> //create a message to update the world matrix of the object with index 5<br />
ChangeMessage updateWorld = new ChangeMessage();<br />
updateWorld.MessageType = ChangeMessageType.UpdateWorldMatrix;<br />
updateWorld.ID = 5;<br />
updateWorld.UpdatedWorldMatrix = Matrix.CreateTranslation(...);<br />
As you can see, the structure ChangeMessage is used once as a UpdateCameraView message, and once as a UpdateWorldMatrix message. When using it as a UpdateCameraView message, we are only interesting in setting the relevant fields. Now assume that these messages are entered into a buffer, and later, the rendering thread takes each message in the buffer and analizes it. The code would look something like:
<br />
//processing a ChangeMessage with the name msg<br />
switch (msg.MessageType)<br />
{<br />
case ChangeMessageType.UpdateWorldMatrix:<br />
camera.View = msg.CameraViewMatrix;<br />
break;<br />
case ChangeMessageType.UpdateCameraView:<br />
renderObjects[msg.ID].World = msg.UpdatedWorldMatrix;<br />
break;<br />
[...]<br />
}<br />
So based on msg.MessageType, we can treat the message in the intender way, and use only the relevant fields.
Having defined the ChangeMessage structure, a change buffer will simply contain a list of such change messages.
<br />
public class ChangeBuffer<br />
{<br />
public List<ChangeMessage> Messages { get; set; }</p>
<p> public ChangeBuffer()<br />
{<br />
Messages = new List<ChangeMessage>();<br />
}<br />
public void Add(ChangeMessage msg)<br />
{<br />
Messages.Add(msg);<br />
}<br />
public void Clear()<br />
{<br />
Messages.Clear();<br />
}<br />
}<br />
Again, based on your game, you could make this class more complex, by adding other functionality. But for educational purposes, it is fine as it is.
3.3.4. Using the Buffers
We are nearing the finish line. At the moment, we have the DoubleBuffer, ChangeBuffer and ChangeMessages classes. The next step is the GameData and RenderData classes. I won't go into much detail about them here, because, as I've previously said, the fields of these classes depend on the game you make. The code sample will contains example of this. A simple example might look like below.
<br />
class GameData<br />
{<br />
public Vector3 Acceleration;<br />
public Vector3 Velocity;<br />
public Vector3 Position;<br />
public Matrix Rotation;<br />
public bool IsAlive;<br />
}</p>
<p> class RenderData<br />
{<br />
public Vector3 HighlightColor;<br />
public Matrix WorldMatrix;<br />
public Model Model;<br />
public bool IsAlive;<br />
}<br />
In a simple form, the update manager would contain a reference to the double buffer, and a list of GameData objects. Other fields can be added as needed.
<br />
class UpdateManager<br />
{<br />
public List<GameData> GameDataOjects { get; set; }<br />
private DoubleBuffer doubleBuffer;<br />
private GameTime gameTime;</p>
<p> protected ChangeBuffer messageBuffer;<br />
protected Game game;</p>
<p> public UpdateManager(DoubleBuffer doubleBuffer, Game game)<br />
{<br />
this.doubleBuffer = doubleBuffer;<br />
this.game = game;<br />
this.GameDataOjects = new List<GameData>();<br />
}<br />
We need to add a function that will be called each frame, and will contain the update code. We will separate this function in two. One function will take care of synchronizing with the doubleBuffer, and one is where the update code would normally be placed. We do this separation so we provide a easy point on which to extend the class. A specialized class that inherits UpdateManager simply has to overrive the Update function, and everything else is taken care of. We actually do this in the sample.
<br />
public void DoFrame()<br />
{</p>
<p> doubleBuffer.StartUpdateProcessing(out messageBuffer, out gameTime);<br />
this.Update(gameTime);<br />
doubleBuffer.SubmitUpdate();<br />
}</p>
<p> public virtual void Update(GameTime gameTime)<br />
{<br />
}<br />
The final step is to make this function execute on a separate thread. We add a field to keep track of that thread (for example, if the main thread want to exit, it can use this field to shut this thread down), and a function that when called, starts execution on a new thread. The actual function that is started on a new thread will simply call the DoFrame fuction in a loop. If we are running this on the Xbox, we also need to manually set the processor affinity, in order for the function to be executed on a different hardware thread. To do this, we call the Thread.SetProcessorAffinity function, specifying as the parameter the hardware thread we wish to run on.
<br />
public Thread RunningThread { get; set; }</p>
<p> private void run()<br />
{<br />
#if XBOX</p>
<p> Thread.CurrentThread.SetProcessorAffinity(5);<br />
#endif</p>
<p> while (true)<br />
{<br />
DoFrame();<br />
}<br />
}</p>
<p> public void StartOnNewThread()<br />
{<br />
ThreadStart ts = new ThreadStart(run);<br />
RunningThread = new Thread(ts);<br />
RunningThread.Start();<br />
}<br />
When we need to start the update thread, we simply call StartOnNewThread().
The RenderManager is similar to the UpdateManager. We don't includ here the mechanism to start the RenderManager on a separate thread, because it would be identical to the one used in UpdateManager. Besides, we won't use it in the sample code and we will simply call DoFrame from the main thread, because we keep the rendering operations the main thread, as previously discussed.
<br />
class RenderManager<br />
{<br />
public List<RenderData>; RenderDataOjects { get; set; }<br />
private DoubleBuffer doubleBuffer;<br />
private GameTime gameTime;</p>
<p> protected ChangeBuffer messageBuffer;<br />
protected Game game;</p>
<p> public RenderManager(DoubleBuffer doubleBuffer, Game game)<br />
{<br />
this.doubleBuffer = doubleBuffer;<br />
this.game = game;<br />
this.RenderDataOjects = new List<RenderData&gt>();<br />
}</p>
<p> public virtual void LoadContent()<br />
{<br />
}</p>
<p> public void DoFrame()<br />
{</p>
<p> doubleBuffer.StartRenderProcessing(out messageBuffer, out gameTime);<br />
this.Draw(gameTime);<br />
doubleBuffer.SubmitRender();<br />
}</p>
<p> public virtual void Draw(GameTime gameTime)<br />
{<br />
}<br />
}<br />
And now to put everything together. To use these classes, you would first add code to the Update and Draw functions of the managers, where you would make sure to write updates to buffers in the update manager, and read them in the render manager. Next, let's see how you would add all this into the Game class. Before anything, you need some fields for the double buffer, update manager and render manager.
<br />
public class Game1 : Microsoft.Xna.Framework.Game<br />
{<br />
[...]</p>
<p> DoubleBuffer doubleBuffer;<br />
RenderManager renderManager;<br />
UpdateManager updateManager;<br />
Then, during LoadContent, you can initialize them. Here, before they start running in parallel, you can load data for objects, and add them to the UpdateManager's GameData list, and to the RenderManager's RenderData list. But you have to be carefull to add them correctly, such that the GameData present in position x in the GameDataObjects list corresponds to the RenderData on the same position in the RenderDataObjects list. As I said before, you can use a more complex identification policy, if you want, but we keep it simple for educational purposes. At the end, after you've loaded the data, you can tell the UpdateManager to start executing on another thread.
<br />
protected override void LoadContent()<br />
{<br />
[...]</p>
<p> doubleBuffer = new DoubleBuffer();<br />
renderManager = new RenderManager(doubleBuffer, this);<br />
renderManager.LoadContent();<br />
updateManager = new UpdateManager(doubleBuffer, this);</p>
<p> //here, you can load data and add it to the RenderDataObjects list and GameDataObjects list</p>
<p> renderManager.RenderDataOjects.Add(...);</p>
<p> updateManager.GameDataOjects.Add(...);<br />
[...]</p>
<p> updateManager.StartOnNewThread();<br />
}<br />
We said before that we keep the drawing code on the main thread. So now, in the Draw function of the Game class, we put the synchronization code. We signal the start of the new frame. Once we do this, the thread of the UpdateManager, which was waiting for this signal, starts to execute. Now we can also tell the RenderManager to draw its frame. After the render manager is done, we wait for the UpdateManager to finish its work, by calling doubleBuffer.GlobalSynchronize(). When we exit this function, we know that the update thread is waiting for us to signal a new frame.
<br />
protected override void Draw(GameTime gameTime)<br />
{</p>
<p> doubleBuffer.GlobalStartFrame(gameTime);</p>
<p> graphics.GraphicsDevice.Clear(Color.Black);<br />
renderManager.DoFrame();<br />
base.Draw(gameTime);</p>
<p> doubleBuffer.GlobalSynchronize();<br />
}<br />
You probably noticed that we put no code in the Update() function of the Game class. Here, you can put any code that you want to be executed serially, because at the time when the Update() function is called, no other thread is executing. So here is a good place for some code that you don't want to be parallelized, such as transitions between screen states, code that deals with the Guide, and others.
Now one more important thing that needs to be done is shutting down the UpdateManager's thread when we don't need it anymore. You can do this by calling the Abort() function on it's thread. One example is before exiting the game, we can make sure we don't leave that thread alive.
<br />
protected override void OnExiting(object sender, EventArgs args)<br />
{<br />
doubleBuffer.CleanUp();<br />
if (updateManager.RunningThread != null)</p>
<p> updateManager.RunningThread.Abort();<br />
}<br />
And normally, when your game has multiple states and screens, you will probably only want to use multi-therading during your gameplay screen, so when exiting that screen, you should make sure you shut down the update thread, and clean up after the doubleBuffer. The code from this part can be found at the end of the article, and can be used as a starting point for your multithreaded games. In the next part, we will look at the code sample accompanying this tutorial, and talk a little about it.
November 28th, 2009 - 06:06
Hi Catalin, I originally went down a very similar research path as what you are describing in your threaded-rendering system.
However I did some prototyping and came to the realization that the GPU is already running asynchronously, and only forces a CPU-side block if the previous frame hasnt finished when the next .Begin() is called. This result plus the added complexity of caching game-state led me to abandon this system.
What we are planning to do on our engine is to use a seperate cpu thread to process vertex data (batch instancing), but that’s really no different than multithreading other cpu-side modules.
November 28th, 2009 - 07:41
Hi Jason.
Yes, the GPU does indeed run asynchronously from the CPU. However, the speed of the GPU is rarely the main performance bottleneck. Most time, the problem is with the communication between the CPU and the GPU which happens at each DrawXXX() call. Having a high number of draw calls eats up lots of CPU time, and that is what I am trying to reduce in this article. So what I am threading is not actually the GPU-drawing, but the CPU invoking of draw calls (which happens at driver-level).
In most multi-threading approaches, you either try to distribute the non-graphics CPU-work on multiple threads, or try to distribute the GPU-communication work done by the CPU.
But yes, the approach is not what I’d recommend for a truly advanced system. It is good for learning purposes, and it is definitely a robust way to add threading to your game and obtain a good performance boost, but at the cost of extra memory needed for the two buffers.
For a more complex threading system, there are some other ways to do it, as presented in some of the papers I linked to at the bottom of the article. Each method has it’s advantages, and you can’t always decide that one way is better then others. It usually depends on the structure of the rest of the engine.
November 28th, 2009 - 07:46
Nice one. It will help me a lot.
Thanks,
Timo
November 30th, 2009 - 03:12
yah, I think your system is a great intro to the very complex world of multithreading. Like I said, I originally went down a very similar path to what you are describing, so I do feel that there are benefits, but in my situation the drawbacks outweighed them.
October 9th, 2010 - 20:25
a simpler approach might be:
a) each GameComponent maintains two DrawState objects
b) Draw() method ONLY uses information in current DrawState object to draw
c) Update() keeps only ONE version of “UpdateState”
d) single point of control for flip flop using Game.CurrentDrawStateIndex = 0 or 1
e) GameComponent.Draw() { CurrentState = DrawState[Game.CurrentDrawStateIndex]; }
this is easy to design – anything that Draw() requires goes into DrawState
November 22nd, 2010 - 17:37
When I try using Multiple Threads, the controls don’t seem to work, however they do work if I use a single thread (uncomment the line in the draw method and comment out the one in load). Any idea why this is?
December 13th, 2010 - 05:41
I should notice something: because of the way xna handles input, Keyboard input can be received only from the main thread, due to this you have to create your own input handler to fix this