Yet another toon shader

Maybe is true, as I wrote in the README file, that I coded this demo because I felt like the only one who hasn’t yet implemented a toon shader. πŸ™‚
Actually this is not the only reason, I came with the inspiration when I was presenting the first part of my updated Modern GPUs slides at the university, this time the event was organized by some students and advertised with leaflets. πŸ˜‰
So, for the second part that will be held next Wednesday, I’m planning to integrate the explanations about the internals of this demo.

From untextured to textured with outlines

From untextured to textured with outlines

It was easy and fast to have a basic toon shader working, thanks to the Lighthouse 3D tutorial.
This version uses a cascade of if-then-else instead of a more usual 1D texture lookup but, judging from the tests I have run, it’s not a performance issue, at least on GeForce 8 and newer cards.

For the edge detecting I wanted to exploit the fragment shader capabilities, working in screen space with the sobel operator and thus being independent from geometric complexity.

The only problem was about *what* to filter.

  1. The first test was straight, I filtered the rendered image, a grey version of the textured and lit MrFixit head, but the results were poor: edge detecting outlined toon lighting shades too.
  2. In the second one I decided to filter the depth buffer, I could get rid of colour to grey conversion but, again, the results were not satisfactory: there were no outlines in the model, just a contour all around.
    Maybe it could have been corrected with a per-model clip planes tuning, but I gave up.
  3. With the third test I filtered out the unilluminated color texture and the results were better. Unfortunately it relied on the presence of a texture and outlined too much details.
  4. I think the fourth approach, as seen in this demo, is the best one.
    I used MRTs to save the eye-space normal buffer during the toon shader pass, then I filtered a grey version of it, outlining the contour plus some other geometric details.

A small note: saving an already grey converted buffer in the toon shader pass speeds up the demo a bit, but storing the normal in a single 8 bits component of the texture causes a loss of precision that leads to some visible artefacts.
Using a floating point texture helps with the precision issue but makes the demo too slow.
Maybe I should try using a single component texture or some kind of RGBA packing algorithm…

As usual you can have a look at YouTube or Vimeo videos and download the sources.

Converting weights to vertex colors

If you have read my LinkedIn profile lately you should already know that, by now, some months have passed since I started my pre-graduation internship activity at Raylight (and since I signed my first NDA πŸ˜‰ ).
The real-time graphic R&D work that I’m doing there for my thesis is enjoying and stimulating, but this is not the topic of the post…

Raylight logo

Some days ago a 3d artist of the team, Alessandro, asked me a script that would have helped him using Blender for one more task along the company asset creation pipeline, weight painting.
He needed a simple script to actually convert vertex weights to per-bone vertex colors layers, in order to bake them to per-bone UV maps and later import them inside 3d Studio Max.

Weight painting

Weight painting

Vertex painting

Vertex painting

At first I didn’t even know where to start, how to extract and match per-vertex weight data with per-face vertex color one, but my second try with it went as smoothly as honey.
The core algorithm is, indeed, very simple:

for f in faces:
  for i,v in enumerate(f):
      infs = me.getVertexInfluences(v.index)
      for vgroup, w in infs:
        me.activeColorLayer = vgroup
        col = f.col[i]
        col.r = col.g = col.b = int(255*w)

Well, he has not yet taken advantage of it nor I know if he will ever use it, nevertheless the script is working and I have shared it on my site, as usual. πŸ˜‰

Habemus OpenGL 3.0!

We have been waiting OpenGL 3.0 for ages, we were all very excited about the wonderful features ARB was promising, then, on the 11th of August 2008, the specs were released…
I, like everyone else, was really disappointed, the Architecture Review Board was not only really late on schedule but it didn’t keep its word about many key features that should have been introduced with this release.
Nevertheless I’m still confident in the future, when older API functions will be removed and not simply, as in the current version, tagged as deprecated.

OpenGL3 Logo

Now, after the long but needed introduction, let’s talk about things that matter: today the ArchLinux team moved the new stable 180.22 driver release from the [testing] to the [extra] repository.
Well, apart from the equally important CUDA 2.1 and VDPAU support, this release has been bundled with OpenGL 3 and GLSL 1.30 support, so have a look at how to create an OpenGL 3.0 context.

First of all, it seems like there’s no other way to open the new context without getting your hands dirty, that is talking directly with GLX.
What follows is a snippet to create an OpenGL 3.0 context integrated in SDL 1.2, which still doesn’t support it natively.

First of all you need some new defines.

#define GLX_CONTEXT_DEBUG_BIT_ARB                         0x00000001
#define GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB   0x00000002
#define GLX_CONTEXT_MAJOR_VERSION_ARB                  0x2091
#define GLX_CONTEXT_MINOR_VERSION_ARB                   0x2092
#define GLX_CONTEXT_FLAGS_ARB                                0x2094

You need also to retrieve the address of the following new GLX function.

typedef GLXContext ( * PFNGLXCREATECONTEXTATTRIBSARBPROC) 
	(Display *dpy, GLXFBConfig config, GLXContext share_context, Bool direct, const int *attrib_list);
PFNGLXCREATECONTEXTATTRIBSARBPROC glXCreateContextAttribsARB = 
	(PFNGLXCREATECONTEXTATTRIBSARBPROC)glXGetProcAddress((GLubyte*)"glXCreateContextAttribsARB");

Then you have to define a bunch of GLX related variables.

Display *dpy;
GLXDrawable draw, read;
GLXContext ctx, ctx3;
GLXFBConfig *cfg;
int nelements;
int attribs[]= {
	GLX_CONTEXT_MAJOR_VERSION_ARB, 3,
	GLX_CONTEXT_MINOR_VERSION_ARB, 0,
	GLX_CONTEXT_FLAGS_ARB, GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB,
	0
};

At last, after having called SDL_SetVideoMode(), create a new context, make it current and destroy the old one.

ctx = glXGetCurrentContext();
dpy = glXGetCurrentDisplay();
draw = glXGetCurrentDrawable();
read = glXGetCurrentReadDrawable();
cfg = glXGetFBConfigs(dpy, 0, &nelements);
ctx3 = glXCreateContextAttribsARB(dpy, *cfg, 0, 1, attribs);
glXMakeContextCurrent(dpy, draw, read, ctx3);
glXDestroyContext(dpy, ctx);

Don’t forget to put some querying code, just to be sure the whole process worked. πŸ˜€

const GLubyte* string;

string = glGetString(GL_VENDOR);
printf("Vendor: %s\n", string);
string = glGetString(GL_RENDERER);
printf("Renderer: %s\n", string);
string = glGetString(GL_VERSION);
printf("OpenGL Version: %s\n", string);
string = glGetString(GL_SHADING_LANGUAGE_VERSION);
printf("GLSL Version: %s\n\n", string);

On my workstation I get this:
Vendor: NVIDIA Corporation
Renderer: GeForce 8600 GT/PCI/SSE2
OpenGL Version: 3.0 NVIDIA 180.22
GLSL Version: 1.30 NVIDIA via Cg compiler

If you want to retrieve also the extension list, a new function can help you simplify the process.

typedef const GLubyte * (APIENTRYP PFNGLGETSTRINGIPROC) (GLenum name, GLuint index);
PFNGLGETSTRINGIPROC glGetStringi = (PFNGLGETSTRINGIPROC)glXGetProcAddress((GLubyte*)"glGetStringi");

You can then use it like this:

GLint numExtensions = 0;
glGetIntegerv(GL_NUM_EXTENSIONS, &numExtensions);

printf(“Extension list: \n”);
for (int i = 0; i < numExtensions; ++i) { printf("%s ", glGetStringi(GL_EXTENSIONS, i)); } printf("\n"); [/sourcecode] This new OpenGL version seems to perform just like the old one at the moment, drivers do not honour the GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB attribute, this means everything is still in place, backward compatible and unoptimized. πŸ™

Composing renders in a strip

First of all happy new year to everyone, then let’s talk about this post topic… πŸ™‚

During these days I was relaxing and practicing subdivision modeling, after a long time away from Blender I was back to the dream of creating a convincing human head model, but my programming side win the day. πŸ˜€
While I was studying in detail some face key parts topology from here, I noticed the PiP-like composed images attached to the first post…

Showing camera keyframes

Last night I was thinking of a way to automates the process and today it becomes reality in the form of a Blender Python script: it is capable of producing an image which is composed of multiple rendered frames, think of a daily comic strip and you understand the name ;).

The user can select which frames to render specifying a string similar to the following one: “1-3, 5, 7, 9-11“.
Moreover it is possible, of course, to choose the size of a single frame and the composed image table dimensions, i.e. how many rows and columns it should have.
Have a look to how well my topology study renders fit the script purpose. πŸ˜‰

The resulting composed image

This second script is a bit more complex than my first one, making use of the Registry module to load and save options and the Draw.PupBlock() method to display a bigger GUI.

Of course it is released under the GNU GPL License and available online, download it from here.

Blurring the parallax

Today I have published the first demo making use of my new C++ class library, I designed it to be very easily ported to a strict GL3 profile or to ES 2.0.

From plain rendering to depth of field

From plain rendering to depth of field

As a matter of fact, it doesn’t make use of fixed pipeline or deprecated functions at all:

  • No immediate mode, only VBOs
  • No use of OpenGL matrix stacks, I have my classes handling transformations and passing matrices to shaders directly
  • No OpenGL lighting, only per-fragment one
  • No quads or polygons, just triangles
Normal versus parallax mapping

Normal versus parallax mapping

I couldn’t release something only to show changes “under the hood”, I had to make something cool, so I decided to mix together parallax mapping (that, as you can see in the screenshot, is a lot more pronounced now) and depth of field, with the little addition of Stanford PLY mesh loading. πŸ˜€

Mr.Fixit model and maps (the character players portray in Sauerbraten) are courtesy of John Siar, thank you John. πŸ˜‰

As usual, you can have a look to Vimeo videos (640×480, 1280×720) and download the sources.

Automatic parallax map generation with Blender

It has been a long time since I last wrote something here, during these months two new things happened that are worth to be mentioned: first of all I’m really close to graduation!
Well, actually I need to pass the last exam and spend a period of at least four months of internship, nothing is sure now but I’m in close contact with a game developing company… πŸ˜‰

The second thing is closely related to this post instead, a couple of months ago I began to convert, following M3xican’s advice, my OpenGL demos to object oriented C++.
What I have now is really not much, nevertheless my class library can load a Stanford PLY model, it is ES 2.0 compliant (this means it will be easily converted to “Pure” OpenGL 3.x), and it can already display both parallax mapping and depth of field!

I’m not going to publish any screenshot by now because I think it’s not the time yet, what I’m showing you is a easy script, my first one, which I wrote yesterday night using the Blender Python API.
What it does is really simple yet time-saving, you select a high-poly and a low-poly model, run the script from the Object->Scripts menu and watch Blender baking your normal and height map and then saving them.

Blender parallax maps

I have also set up an easy compositing nodes configuration to mix the two images in a single parallax map with height data encoded in the alpha channel.

Blender parallax maps nodes

You can download the script from here.
Everything is very simple (and funny!) with the astonishing power of Blender! πŸ™‚

A long presentation…

The professor of the computer graphics course at my university was continuosly annoyed by my protests and comments during her lectures..

I’m sorry but I just couldn’t stand some claims like: “Phong shading is never used in interactive applications because of it being computationally too heavy”… πŸ™‚
Fortunately she gave me the opportunity to give everyone a small technological update. πŸ˜€
After about a month, my presentation was born.
Made entirely with LaTeX Beamer, VIM, Dia and GIMP, it deals about what modern GPU are capable of, showing some GPGPU applications, along with traditional ones (videogames πŸ˜€ ), and some shader examples together with commented code.

I discussed it yesterday in a couple of hours, I was all shook up at first but then I advanced smooth and plain. πŸ˜‰
It is in Italian, of course, but I published it anyway: Le Moderne GPU.

Let there be light!

I started exploring deferred shading rendering to display multiple light sources and ended writing a demo featuring eight different lighting techniques and a PyOpenGL class library. πŸ™‚

glsl_multilight

The whole story is more than a month old, just after releasing the first depth of field demo I began studying deferred shading, but I extended my purpose to include other lighting methods, like single and multi-pass fixed-pipeline lighting, per-vertex and per-pixel single and multi-pass shader lighting and, of course, deferred one.

While writing the C code, I thought it was going to be fun to also port it to Python, this way I could have also have a look to the “new” (ArchLinux adopted it quite late πŸ™‚ ) ctypes PyOpenGL, aka PyOpenGL 3.

Unfortunately, many little but annoying issues delayed me until today:

  • not setting explicitely glDepthFunc(GL_LEQUAL) (or, alternatively, not clearing the depth buffer at each pass) for multi-pass scene rendering made every pass to be discarded excepting the first one.
  • trying to make a buggy Python glDrawBuffers() wrapper work.
    Actually I had no luck with this and give up on MRTs support in PyOpenGL.
  • trying to figure out why VBOs didn’t work on PyOpenGL, I give up on this too. πŸ™‚
  • using a uniform variable to index the gl_LightSource structure array, which prevented the shader from running on Shader Model 3.0 cards
  • exploring all the possibilities that could ever lead to “the brick room is very dark in fixed-pipeline mode” issue, only to discover today that this was a mere scaled normals problem.
    It was easily solved enabling GL_RESCALE_NORMAL

At last I made it, I have made a multi light demo that includes deferred lighting (although very rough and not optimized at all) and shows coherent lighting in all rendering modes.
The PyOpenGL class library almost works, no MRTs and VBOs, but it is functional enough to sport a complete DoF2 and multilight (without deferred mode, which relies on MRTs, of course) demo conversions.

It’s not a news anymore that you can view it in action on my YouTube Channel, or in a high definition 720p version hosted on my Vimeo page.

All’s well that ends well. πŸ™‚

Depth of field reloaded

Lately I’ve been really disappointed by the poor performances of my first depth of field implementation, thus I decided to do something about it…

glsl_dof2

The most natural step to do was to give a look to the second Direct3D example from the same paper I used for the first one, as I was sure it would have led to more satisfactory results.
I spent the two last nights converting, correcting and fine tuning it, but I was rewarded by the fact that I was right: even if it is a five passes algorithm which is using four different Frame Buffer Objects, it is about 2.5 times faster than my previous implementation!

I think the speed boost depends on the two following:

  1. image blurring is achieved by a gaussian filter which is calculated separating the X from the Y axis, it is an approximation of a standard 2D kernel but it also means that the convolution matrix calculation complexity decreases from a quadratic to a linear factor.
  2. this filter operates only on a downsampled (1/4th of the screen resolution actually) FBO

Another nice note about this new implementation is that there are only two focal parameters, focus depth and focus range, which really help to setup a correct scene.

Now let’s review the five passes in detail:

  1. Render the scene normally while calculating a blur amount per-vertex, then store the interpolated value per-pixel inside the alpha component of the fragment.
    The calculation at the vertex shader is just:

    Blur = clamp(abs(-PosWV.z - focalDistance) / focalRange, 0.0, 1.0);
    
  2. Downsample the scene rendered at the previous pass storing it in a smaller FBO
  3. Apply the gaussian filter along the X axis on the downsampled scene and store it in a new FBO
  4. Apply the gaussian filter along the Y axis on the already X blurred scene and store it in a new FBO
  5. Calculate a linear interpolation between the first full resolution FBO and the XY downsampled blurred one
    This is performed in the fragment shader as:

    gl_FragColor = Fullres + Fullres.a * (Blurred - Fullres);
    

Again, you can view it in action on my YouTube Channel, or in a high definition 720p version hosted on my Vimeo page. πŸ˜‰

I love depth of field

I consider depth of field as one of the most beautiful post-processing effects of the “next-gen” games.
It was natural for me to choose it as the first shader demo to implement after months of inactivity, as a matter of fact GLSL_impgro was really just a testbed for post-processing basic techniques, like Frame Buffer Objects.

GLSL_DoF

I have studied the theory from an ATI paper included in the ShaderX2 book, titled Real-Time Depth of Field Simulation, I have choosen the first of the two different implementation and converted it from Direct3D and HLSL to OpenGL and GLSL.

Of course, being a post-processing effect, the rendering is actually divided in two pass:

  1. Rendering the scene storing the depth of every vertex and calculating the amount of blur per fragment
  2. Applying the blur per fragment based on the value from the previous step

The second pass fragment shader, the one which is really applying the blur effect, is slow even on my 8600GT, because it performs several calculations for every one of the twelve fragments that are contributing to the blur of the center one.

Another interesting aspect is that, in order to calculate a correct approximation of the circular blur needed for circles of confusion simulation, these twelve pixel are sampled around the center based on a poissonian disc distribution, thus creating much less artifacts than a small convolution matrix scaled too much in order to sample from far away the center.

Just like the previous demo you can view it in action on my YouTube Channel, but I really suggest you to give a look to the high definition 720p version instead, hosted together with the other ones on my Vimeo page. πŸ˜‰