Optimizing

Optimizing

Naturally, if you are programming games, you are interested in making your code run as fast as possible, and that means speeding up the code wherever you can.

In a graphics engine, this is generally at the lowest level where triangles are being submitted to the rendering pipeline. I won't be talking about that here, though. The possibilities are too varied.

No, this page will just talk about some simpler methods dealing with data structures, and some simpler concepts about optimizing rendering.

Arrays

If you want to use arrays as a list of objects, try to keep all active objects at the front of the array, and stop your parse when you strike an inactive object. Keeping them all at the front is easy. When you are about to set an object to inactive, just move the LAST active object to that cell.

Since all active objects are at the front, parsing to an inactive objects means you've hit the end of objects that are relevent to operate with.

Array parsing

Here's a typical way to parse an array:

for (i=0; i<150; i++)
{
DoSomething(&(array[i]));
}

Here's another way, a little faster, because we don't have to calculate a memory location off an index:

lpData = &(array[0]);
for (i=0; i<150; i++)
{
DoSomething (lpData);
lpData++;
}

Even faster, it avoids the for loop operations:

lpData = &(array[0]);
lpLastData = &(array[150]); // Don't reference *lpLastdata, you'll get an error!!!
while (lpData != lpLastData)
{
DoSomething (lpData);
lpData++;
}

State changes in 3D

Avoid state changes as much as possible.

That means sorting your objects that you are rendering into groups of common state need (fog, fog attributes, lighting, lighting attributes, texture mapping, bound texture, blending, blending function, etc.)

In addition, it wouldn't hurt if your engine tried to detect redundent state function requests (setting lighting enabled when it already is, etc.) so that the request isn't even sent to the renderer. Since there is no guarantee that OpenGL and Direct3D will optiminze that redundancy for you.

Even if your engine does this redundancy check, it is still important to properly sort your objects for rendering, because if there are a lot of state changes necessary to render them, checking for redundant changes won't help much if the states are changing because your objects are badly organized.

Data and model design is important too. BSP's help with rendering back to front, but if the gain in speed on being able to ignore depth buffer reading is lost in needing to change textures all the time, that will play an important part. Objects that are skinned and only use one texture are optimal as well. You'll just need to experiment and find a balance.

Elements of a BSP

Have a flag that says if all the necessary components are already loaded. Whent he animation itself is about to render, check all components and load them all, then set its flag. This avoids pausing while the animation plays, and only causes a pause int he beginning. What about scheduling loading (load one component at a time, one or two frames at a time, so that everything else is loaded whent he animation needs it?)

This talks about the shadows generated by the terrain itself, not the shadows of objects on the terrain.

The shadows could be placed as color values on the vertices of the terrain itself. These values could also be used as a base for the lighting to be used on objects that are on those vertices.

If the sun moves across the sky, recalculate one row at a time to reduce calculations. That way the user is unlikely to notice the shift in color as the sun slowly moves across.

Objects should always behave based on the state at the same time at which that object exists. To accomplish this, there should actually be TWO states on each object, a new and an old. In addition, each object should store pointers to the old and new, and possibly a third that other objects should use for their comparisons.

That way, no object has an advantage aiming at a moving object based on its position in the list. And it doesn't matter which order their behaviors are iterated.

We could just store one pointer and derive the rest from its current state, but why bother? Pre-determine them and store tham at the object level to avoid a check every time we want to access the object.

Avoid letting the program derive its values when they can be stored for easy reference.

For example, if a collision with a wall will cause an object to slide across its surface, why not provide a vector as part of that surface structure that will most quickly determine the new travel velocity, instead of forcing the application to do a dot product off the surface normal and perform a subtraction?

This is especially true for pre-derived values that will never change. If you were in an entire level where every wall was changing its position, you may be better off using the surface normal instead of calculating the slide vector for every surface every time it moves. But, there are ways of handling that too, with transformational data on the surface that can adjust the collision and slide values appropriately.

Sorting routines are important to assist in quick searches and parsing where ordering is necessary, like alpha objects sorted back to front. Sorting doesn't always have to occur every frame, because from one frame to the next, a sort routine may set up objects to be close to in-order on the next frame.

Bubble sort is the worst when it comes to badly sorted lists. O(n^2) It quickly resolves itself on a nearly sorted list, though.

Quicksort performs well with a badly sorted list, but not when the list is close to sorted, O(n log (n)). It WAY overchecks a sorted list.

Radix sort is one of the sort algoithms that is argued to be the best, on arbitrary data and close to sorted. O(kn) Radix sorts are interesting in that they sort on pieces of the data, placing each element in a separate bin depending on its value. Then another pass is made, placing each value in a new bin based on the next value, etc. For floats, which can be expressed as integers if you cast an int pointer to a float, you would have 256 bins, and make 4 passes to sort them all. (How does this automatically detect a sorted list??? I haven't yet found a way that radix sort can be optimized for nearly sorted lists. I would recommend counting how many units are out of place and choose between bubble and radix sort. If you use radix sort all the time, it may take too long. Selectively removing out of place items and just putting them where they belong might be good enough, too, as long as there aren't too many.)