Designing a 3D API for the browser

Friday, April 24, 2009 | 6:10 PM

When we started out with O3D we knew what we wanted to achieve: we wanted to create a browser API that provides developers direct access to the powerful 3D graphics hardware that's inside most modern PCs. We wanted it to be fast; fast enough to handle the complex rendering needs of a high-end video game but without making any assumptions on what it would be used for. In essence, we wanted to provide the flexibility and speed of a low-level graphics API like OpenGL or Direct3D, while addressing the constraints of running inside the browser.

Choosing JavaScript as a language was an obvious choice since we really wanted the API to feel at home in the browser. However, JavaScript, the programming language of the web, isn't exactly synonymous with high-performance. While modern browsers have improved the speed of JS by leaps and bounds, we are still faced with a large performance gap between JS and native code. This lead us to our first realization: If we were to provide an immediate mode API like OpenGL or DirectX our ability to render larger scenes would be limited by the speed of JavaScript. Just consider that when using an immediate mode API, for every object to be drawn the following calls need to be made:

  1. Bind a shader.
  2. Provide values for all the shader parameters. Even simple shaders require a handful of parameters like the view projection matrix, light positions and texture sampler settings.
  3. Assign vertex streams such as positions, normals, tangents and texture coordinates.
  4. Set the necessary render states such as blending modes and culling.
  5. Issue the draw call.

Note that all these calls need to be issued for every object, in every single frame. This means that for a large scene we would have 16ms (1/60th of a second) to not only issue the thousands of calls from JavaScript into the browser to render the objects but also handle the rest of the application's logic. This simply didn't seem to scale well, not by today's performance standards anyway.

Given our concerns about JavaScript performance, we decided to make O3D a retained mode API. As a retained mode API, O3D requires the app to define the objects to render and the exact sequence in which to render them. Once this is set, O3D handles issuing the draw calls, once every frame. Practically speaking, if nothing moves in the scene, no JavaScript code needs to be executed. If an object moves then all the app needs to do is update its transformation matrix. To achieve that, O3D internally uses two types of graphs: Transform graphs and render graphs. A transform graph stores tree hierarchies of transforms and shapes that are used by the render calls and preserves the logical grouping of objects typically found in most asset pipelines. A render graph describes exactly what order the render operations take place in and handles things like render state setting, z-sorting of shapes for transparency, culling, setting of render targets, etc. This combination of the two graph types allows us not only to minimize the number of calls that need to be made from JavaScript into the API per frame but also migrates some of the more computationally intensive operations, like updating of the transform matrix hierarchies and doing sorting and culling, from JavaScript into native code.

Even though we use a transform graph, O3D is not a high-level scenegraph API. We like to think of O3D as a low-level retained mode graphics API. We tried to keep as little high-level functionality in the API as possible while remaining pragmatic about the performance limitations of running inside the browser. We also tried to leave behind the traditional baggage of scenegraphs that get a lot of people up in arms. We don't have cameras, lights, script nodes, predefined behaviors, etc. However, We had to make some exceptions: Most notably, we added a very basic animation system to O3D, once again because we didn't think that JavaScript would currently be capable of providing the required level of performance for evaluating hundreds of animation curves per frame.

A few other things are worth noting about O3D:

  • We decided to make the O3D API completely shader based. We believe that the fixed function pipeline is on its way to extinction and we're not the only ones: so does OpenGL and Direct3D. Leaving the fixed function pipeline behind makes the API smaller and easier to implement. It also means that shaders need to be provided for every draw call which raises the barrier of entry a bit although this problem can be solved by providing higher level JavaScript abstractions (as we have in some of our JavaScript utility libraries).
  • We had decided early on that in order for O3D to be useful we need to take all the steps necessary to guarantee that the images we render look the same, regardless of the GPU or the OS it's run, so that developers can trust that what they see on their computer is exactly what all their users will see as well. In order to achieve that, we settled on a GPU spec that we feel offers a good balance between being inclusive and providing access to modern GPU features. GPUs that run O3D must be capable of at least DirectX 9.0 graphics and we only accept Shader Model 2.0 pixel and fragment shaders. We believe this covers over 60% of the GPU installed base today. For the remaining ones we are looking into providing a software rasterizer. If you feel that we haven't quite hit the right sweet spot, either that we're limiting GPU features too much or that we're not inclusive enough, please let us know here .
  • We believe that one of the big advantages of implementing an API that isn't exactly based off of an existing API such as Direct3D or OpenGL is that it gives the implementaters the flexibility to chose the appropriate low level graphics API to build on top of, depending on the platform. In our case, we chose to use Direct3D on Windows (DirectX enjoys a lot better GPU driver support on that platform than OpenGL) and naturally OpenGL on the Mac and Linux.

In closing, I'd like to say that what you see here is an API that was designed by a group of engineers coming from all sorts of different backgrounds: some graphics people, some games people, some web folks. We're proud of what we've created but at the same time realize it's not set in stone in any way. It's really our contribution to the discussion on finding a standard for 3D graphics in the browser, a cause that we're really committed to. As such, we always welcome your feedback, comments and even criticism! Please let us know what you think of it, and how we can improve it. You'll find links to our discussion group and moderator app in our front page (http://code.google.com/apis/o3d).

Posted By Vangelis Kokkevis, Software Engineer, O3D Team.

8 comments:

Charlie said...

Very cool to see a proposed standard introduced as open, usable working code rather than a boring (possibly unworkable) specification document!

I think the simple fact that people can see it, play with it and use it right away throws considerable weight behind it!

Treed Box said...

Correct me if I'm wrong, but the samples I saw in the O3d this further demonstration of the environment, and not for features, because that is what silverlight doing, making possible communicate with databases, send and receive data, allowing a msn / gtalk with all the features being introduced in silverlight.

If something similar or better is possible in O3d please show us.

Because only games and go and click an object and animate in 3D is already well beaten, we want something that can really transform the 3D environment into something more than just appreciation
please show to us :D

Treed Box said...

And anything else:
Can be made for applications (such as flash, silverlight and the same line going O3d) comply with the browser?
because they act as application apart, enabled by click, when clicked on something within the O3d lost contact with the browser
being unable to use the shortcut as Control + W / Control + F4 to close the tab

I was surprised with Firefox (and grateful) 3.1b3 (not work in Chrome and Internet Explorer) at zoom with the mouse wheel not be treated the same as the click, as if moving the mouse above the O3d without clicking and use the wheel, the Zoom O3d Application works and does not interrupt the commands of the browser, and thus can use the shortcut for closing the tab (for example),
Why the enabled applications that do not respect that?
Are in a browser is to respect it, if not, is just make a way to access these applications where they are put out of browser, right?

this is just a little "byte", ahead of high-tech O3d, but it is something that is filling the bag with time.

can fix that?

Gregg Tavares said...

Treed:

Unfortunately the NPAPI standard and the various browsers ways of handing OS level events appear to make it impossible to pass the keys back to the browser so that the browser can handle them. :-(

Believe me, we would love to find a way to solve that.

quasar said...

Retained mode doesn't free you from the need to set properties for dynamic objects every frame. Nor do display lists.
IMNSHO you are just wasting your time/money/youth on one more 3D API. Just exposing OpenGL in JavaScript would be a far better idea.

Cheers,
Creig

Julian Ewers-Peters said...

When do you release a package with just the basic functionalities of O3D so that developers can actually start off working with O3D?

Checking out the repository works fine, but there is too much overhead that is not needed which is why I would appreciate a downloadable bundle with only the basic stuff that is needed and that does not count about 1.5 gigabytes.

It is nice to see all the features of O3D, the API and the documentation, but how does it serve me without having to download the full package in the repository?

I only find this:

http://code.google.com/p/o3d/wiki/HowToBuild

With O3D being all new I am having a hard time finding a starting point.

joel. said...

@quasar: If you're going to have a not-so-humble opinion, please do everyone the favor of trying to understand the documents that explain the motivation and justification before asserting that O3D is a "waste of time" and that "exposing OpenGL in Javascript would be a far better idea".

If you consider the project's explicit long-term goal of making high-end games feasible in a web browser, and driven by a Javascript engine, simply copying the OpenGL API would make this damned near impossible. Existing Javascript engines have wildly-varying performance, and even the fastest among them are incapable of driving a fully-immediate-mode API quickly enough to be effective.

While it is true that "Retained mode doesn't free you from the need to set properties for dynamic objects every frame", it does help drastically reduce the number of properties that have to be directly manipulated per-frame from the script engine. Also, you might want to read the FunctionEval and related objects' documentation to see one strategy for getting the script engine out of the loop for certain calculations.

Marco Mugnatto said...

I hope Google integrate that natively into Chrome, so we don't have to rely on plug-ins in the future...