How to make a Beach

Tuesday, May 5, 2009 | 5:30 PM

With O3D we knew we wanted to support relatively high end graphics. One of the problems with working on an engineer-only team is that we only have access to "programmer art." I'm sure there are exceptions but "programmer art" is pretty legendary as a synonym for bad stick figures or some simple geometric primitives. That left us with a problem. How would we show people what's possible if all we had is some primitives and a teapot?
We ended up hiring Crazy Pixel Productions to create a professional quality "beach scene" for us. We told them that we wanted the graphics to look "next-gen" but also not be too big. We weren't trying to max out the system-- we just wanted to have something that shows off what can be done apart from the cubes, spheres and teapots seen in the other samples. It took Crazy Pixel approximately two person-months of full time work to make the assets. They had to make models for each bush, rock, tree and coral, draw texture maps and compute or draw normal maps. This was not a small task.

When we received the assets from Crazy Pixel, our first step was to design a process in which we could take the source assets and produce data for the demo without any manual handling. To do this, we wrote a python script that would launch 3d Studio Max, load their file, export the file to collada, exit 3d Studio Max, read the exported file, and process all the textures from their source formats (.TGA, .PSD) to formats that were more suited to our demo (.PNG, .JPG, .DDS). The script would then write out a new collada file with the texture paths updated to point to the processed textures and finally call our sample o3dConverter utility to convert the modified collada file into something our sample libraries could load. With that process firmly in place, it was easy to take any new changes from the artists and test them out. It was also easy to quickly adjust how the textures were processed such as trying them all at half or quarter resolution or trying different compression settings.

Loading was the easy part, just a few lines of code. The next step was making the shaders to render some of the effects. The easiest was the sky, which uses a cube map. The artists had created a sky dome in 3dsmax so we went into the application and, using the Ogre Max tools, were quickly able to generate the six faces of the cube. We then loaded those into Photoshop and used the NVidia Photoshop tools to generate a cube map. The sky shader just takes a normalized vector from the eye to the sky and uses that to get a color from the sky cubemap.

A cubemap made from 6 faces

The biggest effect is of course the water. It's done in three stages. First, the entire world is flipped upside down and is then rendered into a render target. This basically renders what you'd see if the water was a perfect mirror the exception being that anything below the water should not be rendered. That required making shaders that checked for objects below sea level and not rendering those pixels.


Next, the entire scene is rendered again to another render target for refraction. The shader there renders everything tinted sea color and attempts to add a little fog as the water gets deeper. Whereas the reflection shaders don't render anything below the water, the refraction shader doesn't render anything above the water.


When it comes time to render the water, the water shader takes the reflection render target, the refraction render target, the sky cube map, and the water color, and mixes those 4 components together depending on the angle of view and the normal of the water's surface.

The water surface mixed from 3 inputs.

To make the final image the water surface is combined with the main view and the sky.

The final result.

One thing that's really nice about working in a browser with JavaScript is that it is very quick to edit something and hit refresh in the browser to see your changes. For a small sample that's great, but as the data gets bigger it takes longer and longer to see those changes. On my desktop machine, loading the beach demo takes under 10 seconds from my local hard drive, but when you are iterating on shaders even 10 seconds seems like too long. That's when we added the ability to edit the shaders in real-time by adding a small text area and some buttons in the demo. You can press 'e' to bring these buttons up. With just a few minutes of coding I was able to edit the shaders instantly which was a huge time saver and also made experimenting a lot more fun.

We could have shipped that but there's a vast range of machines out there. Some have the latest greatest fastest graphic cards and others are ... let's just say less than fast. With that in mind it seemed best to try to make the demo render as fast as possible without sacrificing the effects.

The first step was to take a look at how much stuff was being drawn. GPUs are fast but the less you give them to draw the faster they go. The original scene had over 520 model instances (70 of "palm tree", 50 of "plant", etc.) and our demo was running pretty slow on machines with a slow GPU. Of course, nothing is going to make it run fast everywhere, as there is a huge difference in machine capabilities--from systems running with an Intel GMA 3100 at the low end, to an ATI Radeon 4870 or NVidia GeForce GTX 280 at the high end. Nevertheless, we still wanted to see what we could optimize.

The first thing we tried was collapsing the models. The simple way to think of this is that it's a lot faster to draw one model with 10,000 triangles than it is to draw 10,000 models with one triangle each. We were drawing the entire scene three times: once for reflection, once for refraction and once for the main view. That's around 1,560 models being drawn in each frame. So, we collapsed all similar models--all 70 instances of "palm tree" became one model of 70 palm trees, all 50 instances of "plant" became one model of 50 plants. Doing that for all the instances changed the scene from 520 model instances to about 70 model instances and our total number of models drawn to about 210. That helped but not as much as we were hoping as we were still asking the GPU to draw the same number of pixels.

The next step was to realize that for the reflection render target, nothing below the water needed to be drawn. Similarly, for the refraction render target, nothing above the water needed to be rendered. And, even for the main view, nothing below the water needed to be rendered because what appears to be below the water is actually coming from the refraction we already rendered. The solution was to organize the transforms in the transform graph so that just four transforms became the parents of those combinations. This process made it easy to set or clear the visibility on just those four transforms when rendering each render target so that only what needs to be rendered in each pass is actually rendered. That brought the number of models rendered per frame to around 60 and we were asking the GPU to draw a lot less pixels. With that, the demo ran at 30hz on most of our laptops at work, which we decided was good enough.

To see the difference in performance between a top GPU and a bottom GPU, on a fast card like those mentioned above, the beach demo is able to run at a solid 60hz on a 30inch monitor stretched full screen, as well as a solid 60hz stretched across two 24 inch monitors whereas on an Intel GMA 3100 it might run as slow as 10hz.

There was one more big optimization added at the last minute. If you look at the demo, except for the water, almost nothing is moving. Just the flames and a few particles on the waterfall. Taking that into account, if the camera does not move then the refraction and reflection actually do not need to be re-rendered since nothing in them will change. Changing the demo to only render the reflection and refraction when the camera moves shaved off another 10-15% of rendering time. If we had something more dynamic in the scene we could not have added this optimization.

If you decide to play with the source for the beach demo be aware the first thing you'll want to do is edit the beachdemo.js file, search for V8,
// Comment out the line below to run the sample in the browser JavaScript
// engine. This may be helpful for debugging.
and comment out that 3rd line. This will enable you to use your browser's debugger to inspect and manipulate variables. Now run the demo. You can now enter your browser's debugger and from the debugger's console start looking at and setting the various elements. Set "g_timeMult" to 0.1 to see it run at 1/10th speed or inspect "g_re" to see some render stats. You can also copy and paste this code:
numParticles: 40,
lifeTime: 2,
timeRange: 2,
startSize: 50,
endSize: 90,
positionRange: [10, 10, 10],
velocity: [0, 0, 60], velocityRange: [15, 15, 15],
acceleration: [0, 0, -20],
spinSpeedRange: 4}
You may want to try changing some of the numbers to see the torches change in real time. Change the 60 in velocity to 600 and the start size to 500 to see some giant torches.

As you can see, working on a graphic application in such an environment definitely has some advantages. Take the beach demo and hack it up! Make better water effects or maybe even turn it into a game. If you have any more questions, be sure to check our session at Google I/O or our discussion groups .

Posted by Gregg Tavares, software engineer, O3D Team.

3 comments: said...

Hi Gregg, could you elaborate about the models vs. instances optimization? When you say 70 instances of palm tree became one model of 70 palm trees, are they not instances within the model? I.e. is it not more effecient all round to have instances than duplicate geometry, and in which case what difference does the model "wrapper" make?

Patapom said...

I suppose he meant that they collapsed all the 70 instances into a single mesh...

Gregg Tavares said...

wow, Sorry I didn't notice this comment.

There are 2 kinds of efficiency. Efficiency of Speed and Efficiency of Memory.

Having 1 tree model displayed 70 times is an efficiency of memory. Only 1 tree and 70 positions needs to be downloaded but it is slow because the GPU has to be told 70 times, draw a tree, draw a tree, draw a tree. Each "draw" call is slow.

Having 1 models of 70 trees as an efficiency of speed. There is 1 model of 70 trees. This takes more memory but it means only asking the GPU to draw 1 thing. "draw model of trees" which is much faster.