Tag Archives: snippets

High Dynamic Range galore

In January, during my internship activity, I was researching in the field of HDR imaging, today I had the time, at last, to polish a bit and release the two demos I made at the time.

They both load an RGBE image (the two you see here are courtesy of the Paul Devebec’s Light Probe Image Gallery) through the library of Bruce Walter.

Light probe at different exposure levels (hdr_load1)

Light probe at different exposure levels (hdr_load1)

The first demo implements the technique described in the article High Dynamic Range Rendering published on GameDev.net and is based on five passes and four FBOs:

  1. Rendering of the floating-point texture in an FBO
  2. Down-sampling in a 1/4 FBO and high-pass filter
  3. Gaussian filter along the X axis
  4. Gaussian filter along the Y axis
  5. Tone-mapping and composition

The algorithm is very simple, it first renders the original scene then it extracts bright parts at the second pass, which merely discards fragments which are below a specified threshold:

// excrpt from hipass.frag
if (colorMap.r > 1.0 || colorMap.g > 1.0 || colorMap.b > 1.0)
	gl_FragColor = colorMap;
	gl_FragColor = vec4(0.0);

While the third and fourth passes blurs the bright mask, the last one mixes it with the first FBO and sets exposure and gamma to achieve a bloom effect.

// excerpt from tonemap.frag
gl_FragColor = colorMap + Factor * (bloomMap - colorMap);
gl_FragColor *= Exposure;
gl_FragColor = pow(gl_FragColor, vec4(Gamma));
Light probe at different exposure levels (hdr_load2)

Light probe at different exposure levels (hdr_load2)

The second demo implements the technique described in the article High Dynamic Range Rendering in XNA published on Ziggyware and is based on seven passed and more than five FBOs:

  1. Rendering of the floating-point texture in an FBO
  2. Calculating maximum and mean luminance for the entire scene
  3. Bright-pass filter
  4. Gaussian filter along the X axis
  5. Gaussian filter along the Y axis
  6. Tone-mapping
  7. Bloom layer addition

This approach is far more complex than the previous one and is based on converting the scene to its luminance (defined as Y = 0.299*R + 0.587*G + 0.114*B) version, the mean and maximum value can be calculated using a particular downsampling shader and working in more passes, at each one rendering on an FBO with a smaller resolution than the previous until the last pass, when you render the luminance of the entire scene on a 1×1 FBO.

As usual you can have a look at YouTube (GLSL_hdrload1, GLSL_hdrload2) or Vimeo (GLSL_hdrload1, GLSL_hdrload2) videos and download the sources.

A flexible PLY loader for Evolution War r71

I’ve cited Evolution War for the first time on this blog in my previous post, today I want to celebrate my return to SVN committing after a very long time. ๐Ÿ˜€

PLY Export

Revision 71 adds the support for a real Stanford PLY parser and loader, while the one I coded for my graphic class library is very primitive, expecting a hard-coded order for data, this one shouldn’t have any kind of problem with every well-formed PLY file.

For example, while the hard-coded loader can only accept a file like this:

format ascii 1.0
comment Created by Blender3D 249 - www.blender.org
element vertex 4
property float x
property float y
property float z
property float nx
property float ny
property float nz
element face 3
property list uchar uint vertex_indices
1.000000 2.000000 3.000000 -4.000000 -5.000000 6.000000 
-1.000000 -2.000000 -3.000000 4.000000 5.000000 6.000000 
1.000000 2.000000 3.000000 -4.000000 -5.000000 6.000000 
-1.000000 -2.000000 -3.000000 4.000000 5.000000 6.000000  
3 0 1 2 
3 1 3 2 
3 4 2 1 

the parser loader can load even something like this:

format ascii 1.0
comment Created and shuffled by hand
element face 3
property list uchar uint vertex_indices
element skipme 3
property float skipfirst
property float skipsecond
element vertex 4
property float z
property float nz
property float y
property float ny
property float nx
property float x
3 0 1 2 
3 1 3 2 
3 4 2 1 
0.0 0.0 
0.0 0.0
0.0 0.0 
3.000000 -6.000000 2.000000 -5.000000 -4.000000 1.000000 
-3.000000 6.000000 -2.000000 5.000000 4.000000 -1.000000 
3.000000 -6.000000 2.000000 -5.000000 -4.000000 1.000000 
-3.000000 6.000000 -2.000000 5.000000 4.000000 -1.000000 

But one of its most important feature resides in the ability to correctly load binary PLY files! ๐Ÿ™‚

Related to it there’s a bug I would like to share with you together with the fix:

istream& istream::read (char* s, streamsize n);
ifstream ifs;
unsigned int *uIndices;
ifs.read((char *) uIndices+(j*3), sizeof(unsigned int) * 3);

The read() method only accepts char pointers, so uIndices is casted, but the precedence goes to casting and not to native unsigned int pointer arithmetics, leading to catastrophic effects! ๐Ÿ˜ฎ

The fix was as simple as the bug was subtle:

-ifs.read((char *) uIndices+(j*3), sizeof(unsigned int) * 3);
+ifs.read((char *) (uIndices+(j*3)), sizeof(unsigned int) * 3)

Converting weights to vertex colors

If you have read my LinkedIn profile lately you should already know that, by now, some months have passed since I started my pre-graduation internship activity at Raylight (and since I signed my first NDA ๐Ÿ˜‰ ).
The real-time graphic R&D work that I’m doing there for my thesis is enjoying and stimulating, but this is not the topic of the post…

Raylight logo

Some days ago a 3d artist of the team, Alessandro, asked me a script that would have helped him using Blender for one more task along the company asset creation pipeline, weight painting.
He needed a simple script to actually convert vertex weights to per-bone vertex colors layers, in order to bake them to per-bone UV maps and later import them inside 3d Studio Max.

Weight painting

Weight painting

Vertex painting

Vertex painting

At first I didn’t even know where to start, how to extract and match per-vertex weight data with per-face vertex color one, but my second try with it went as smoothly as honey.
The core algorithm is, indeed, very simple:

for f in faces:
  for i,v in enumerate(f):
      infs = me.getVertexInfluences(v.index)
      for vgroup, w in infs:
        me.activeColorLayer = vgroup
        col = f.col[i]
        col.r = col.g = col.b = int(255*w)

Well, he has not yet taken advantage of it nor I know if he will ever use it, nevertheless the script is working and I have shared it on my site, as usual. ๐Ÿ˜‰

Habemus OpenGL 3.0!

We have been waiting OpenGL 3.0 for ages, we were all very excited about the wonderful features ARB was promising, then, on the 11th of August 2008, the specs were released…
I, like everyone else, was really disappointed, the Architecture Review Board was not only really late on schedule but it didn’t keep its word about many key features that should have been introduced with this release.
Nevertheless I’m still confident in the future, when older API functions will be removed and not simply, as in the current version, tagged as deprecated.

OpenGL3 Logo

Now, after the long but needed introduction, let’s talk about things that matter: today the ArchLinux team moved the new stable 180.22 driver release from the [testing] to the [extra] repository.
Well, apart from the equally important CUDA 2.1 and VDPAU support, this release has been bundled with OpenGL 3 and GLSL 1.30 support, so have a look at how to create an OpenGL 3.0 context.

First of all, it seems like there’s no other way to open the new context without getting your hands dirty, that is talking directly with GLX.
What follows is a snippet to create an OpenGL 3.0 context integrated in SDL 1.2, which still doesn’t support it natively.

First of all you need some new defines.

#define GLX_CONTEXT_DEBUG_BIT_ARB                         0x00000001
#define GLX_CONTEXT_MAJOR_VERSION_ARB                  0x2091
#define GLX_CONTEXT_MINOR_VERSION_ARB                   0x2092
#define GLX_CONTEXT_FLAGS_ARB                                0x2094

You need also to retrieve the address of the following new GLX function.

	(Display *dpy, GLXFBConfig config, GLXContext share_context, Bool direct, const int *attrib_list);

Then you have to define a bunch of GLX related variables.

Display *dpy;
GLXDrawable draw, read;
GLXContext ctx, ctx3;
GLXFBConfig *cfg;
int nelements;
int attribs[]= {

At last, after having called SDL_SetVideoMode(), create a new context, make it current and destroy the old one.

ctx = glXGetCurrentContext();
dpy = glXGetCurrentDisplay();
draw = glXGetCurrentDrawable();
read = glXGetCurrentReadDrawable();
cfg = glXGetFBConfigs(dpy, 0, &nelements);
ctx3 = glXCreateContextAttribsARB(dpy, *cfg, 0, 1, attribs);
glXMakeContextCurrent(dpy, draw, read, ctx3);
glXDestroyContext(dpy, ctx);

Don’t forget to put some querying code, just to be sure the whole process worked. ๐Ÿ˜€

const GLubyte* string;

string = glGetString(GL_VENDOR);
printf("Vendor: %s\n", string);
string = glGetString(GL_RENDERER);
printf("Renderer: %s\n", string);
string = glGetString(GL_VERSION);
printf("OpenGL Version: %s\n", string);
printf("GLSL Version: %s\n\n", string);

On my workstation I get this:
Vendor: NVIDIA Corporation
Renderer: GeForce 8600 GT/PCI/SSE2
OpenGL Version: 3.0 NVIDIA 180.22
GLSL Version: 1.30 NVIDIA via Cg compiler

If you want to retrieve also the extension list, a new function can help you simplify the process.

typedef const GLubyte * (APIENTRYP PFNGLGETSTRINGIPROC) (GLenum name, GLuint index);
PFNGLGETSTRINGIPROC glGetStringi = (PFNGLGETSTRINGIPROC)glXGetProcAddress((GLubyte*)"glGetStringi");

You can then use it like this:

GLint numExtensions = 0;
glGetIntegerv(GL_NUM_EXTENSIONS, &numExtensions);

printf(“Extension list: \n”);
for (int i = 0; i < numExtensions; ++i) { printf("%s ", glGetStringi(GL_EXTENSIONS, i)); } printf("\n"); [/sourcecode] This new OpenGL version seems to perform just like the old one at the moment, drivers do not honour the GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB attribute, this means everything is still in place, backward compatible and unoptimized. ๐Ÿ™

Depth of field reloaded

Lately I’ve been really disappointed by the poor performances of my first depth of field implementation, thus I decided to do something about it…


The most natural step to do was to give a look to the second Direct3D example from the same paper I used for the first one, as I was sure it would have led to more satisfactory results.
I spent the two last nights converting, correcting and fine tuning it, but I was rewarded by the fact that I was right: even if it is a five passes algorithm which is using four different Frame Buffer Objects, it is about 2.5 times faster than my previous implementation!

I think the speed boost depends on the two following:

  1. image blurring is achieved by a gaussian filter which is calculated separating the X from the Y axis, it is an approximation of a standard 2D kernel but it also means that the convolution matrix calculation complexity decreases from a quadratic to a linear factor.
  2. this filter operates only on a downsampled (1/4th of the screen resolution actually) FBO

Another nice note about this new implementation is that there are only two focal parameters, focus depth and focus range, which really help to setup a correct scene.

Now let’s review the five passes in detail:

  1. Render the scene normally while calculating a blur amount per-vertex, then store the interpolated value per-pixel inside the alpha component of the fragment.
    The calculation at the vertex shader is just:

    Blur = clamp(abs(-PosWV.z - focalDistance) / focalRange, 0.0, 1.0);
  2. Downsample the scene rendered at the previous pass storing it in a smaller FBO
  3. Apply the gaussian filter along the X axis on the downsampled scene and store it in a new FBO
  4. Apply the gaussian filter along the Y axis on the already X blurred scene and store it in a new FBO
  5. Calculate a linear interpolation between the first full resolution FBO and the XY downsampled blurred one
    This is performed in the fragment shader as:

    gl_FragColor = Fullres + Fullres.a * (Blurred - Fullres);

Again, you can view it in action on my YouTube Channel, or in a high definition 720p version hosted on my Vimeo page. ๐Ÿ˜‰

Image post-processing with shaders

I’m back to work after many months, university exams take really a lot of time…
For I am a bit rusty on GLSL programming, but willing to learn new things anyway, I have decided to begin with a simple yet interesting topic, image processing.


The whole thing, actually, needs two rendering passes and relies heavily on Frame Buffer Objects because:

  1. You render the scene to an off-screen texture.
  2. You render a quad covering the entire screen and binded to the previously written texture.
  3. You make a shader process the fragments resulted from rendering this textured quad, i.e. post-processing the original scene.

In this program post-processing is demanded to convolution matrices calculated with these kernels:

GLfloat kernels[7][9] = {
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Identity */
    { 0.0f,-1.0f, 0.0f,-1.0f, 5.0f,-1.0f, 0.0f,-1.0f, 0.0f}, /* Sharpen */
    { 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Blur */
    { 1.0f, 2.0f, 1.0f, 2.0f, 4.0f, 2.0f, 1.0f, 2.0f, 1.0f}, /* Gaussian blur */
    { 0.0f, 0.0f, 0.0f,-1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Edge enhance */
    { 1.0f, 1.0f, 1.0f, 1.0f, 8.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Edge detect */
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f,-1.0f}  /* Emboss */

The final fragment color is calculated by a simple shader which, at the core, just performs the following:

for(i = -1; i <= 1; i++) for(j = -1; j <= 1; j++) { coord = gl_TexCoord[0].st + vec2(float(i) * (1.0/float(Width)) * float(Dist), float(j) * (1.0/float(Height)) * float(Dist)); sum += Kernel[i+1][j+1] * texture2D(Tex0, coord.xy); contrib += Kernel[i+1][j+1]; } gl_FragColor = sum/contrib; [/sourcecode] When the user chooses a filter, the application updates the kernel currently in use with a call to: [sourcecode language='cpp'] loc = glGetUniformLocation(sh.p2, "Dist"); glUniform1i(loc, dist); loc = glGetUniformLocation(sh.p2, "Kernel"); glUniformMatrix3fv(loc, 1, GL_FALSE, &kernels[curker]); [/sourcecode] Dist is a user defined parameter (you can change it using arrows) that defines the distance in pixels from the center to the contributing sample. Since a month I have created a YouTube Channel, now you can have an idea of how this demo works without downloading and compiling the source code: have a look at this link! ๐Ÿ˜‰

A slighty smarter setup.py for PySoy r64

While working on PySoy I was really disappointed by the policy adopted by the setup.py script, anytime I launched it the result was a recompilation of *all* the sources, this was really annoying and slow.

Python transparent logo

What came after was just that I decided to hack it a bit and make it behave more like a standard build tool, that is to control the modification time of a source file in order to choose whether it is updated or need a fresh compilation.

The new policy for the script is very simple, but useful enough to save plenty of time.
It is aware of the following cases (thank you Arc for tips on how to deal with the last one):

  • a .c file is missing, pyrex should compile the .pyx file
  • a .pyx file is newer than the corresponding .c file, an update is needed
  • a .pxd file is newer than any .pyx file, a global recompile is needed

The last point is not optimal, of course, but it’s a lot simpler than specifying all the .pxd dependecies for any .pyx file, and, anyway, quite close to being optimal, because of the thick web of cross dependencies which actually exists in PySoy.

Here is the piece of code which performs the magic:

# Convert Pyrex sources to C if using Trunk
if version == 'Trunk' :
  import os
  from stat import *
  from Pyrex.Compiler import Main

  options = Main.CompilationOptions(defaults=Main.default_options, include_path=include_path)

  newestpxd = 0
  for dir in include_path:
    for pxdsource in os.listdir(dir):
      pxdsource_path = (os.path.join(dir, pxdsource))

      if os.path.isfile(pxdsource_path) and pxdsource.rsplit('.', 1)[1] == 'pxd':
        if os.stat(pxdsource_path)[ST_MTIME] > newestpxd:
          newestpxd = os.stat(pxdsource_path)[ST_MTIME]

  for pyxsource in pyrex_sources:
    compilation_needed = False

    if os.path.isfile(pyxsource.rsplit('.', 1)[0] + '.c'):
      ctime = os.stat(pyxsource.rsplit('.', 1)[0] + '.c')[ST_MTIME]
      ctime = 0

    if newestpxd > ctime:
      compilation_needed = True
    elif os.stat(pyxsource)[ST_MTIME] > ctime:
      compilation_needed = True

    if compilation_needed:
      Main.compile(pyxsource, options)

Well, actually Arc commited r65 too, simplifying the conditionals of the script even more, but this is another story. ๐Ÿ™‚
Anyway, I hope to have made another little step into making the life of our team a bit simpler. ๐Ÿ˜‰

My Summer of Code begins with PySoy r44

Summer of Code has started just today (even if currently it is only a “Spring of Code” ๐Ÿ˜€ ) but a little contribution of mine has already made his way inside the SVN repository of PySoy.

But let’s start from the beginning…
After having shown to Arc an early draft of an UML class diagram for the current code I decided to come back to work on some test code I had written in the afternoon.
It was just a classical spinning cube demo to actually compare PyOpenGL versus Pyrex speed, I don’t report the results here because they are quite identical, if I haven’t done any mistake it should be the absolute minimum complexity of the code which actually determined this result.
Anyway, even without this proof I firmly believe in the power and speed of Pyrex. ๐Ÿ˜‰

Going back to my commit, my changes affected a small yet important area of the code, I think we will remember it in the future. ๐Ÿ™‚

In src/windows-glx/soy.windows.pyx:


In include/gl.pxd:

# Constants
  ctypedef enum:
    # glPush/PopAttrib bits

In src/windows-glx/soy.windows.pyx:


Was it better before or now? ๐Ÿ˜‰

Python 2.5 support in globs r46

Yes, I should have written about it when I actually committed the revision, but I forgot completely. ๐Ÿ˜€
About two months after Python 2.5 has been moved to the current repository of Arch I begun to think that maybe it was time to support the new release of our beloved language/interpreter.

Python transparent logo

I don’t know if it can be called “support”, but at least GL O.B.S. is now aware of it. ๐Ÿ™‚
The changes are very simple yet of some importance.
First of all, pysqlite is not needed anymore if you have the integrated sqlite3 module:

if sys.version_info[:2] >= (2, 5):
  from sqlite3 import dbapi2 as sqlite
  from pysqlite2 import dbapi2 as sqlite

Moreover I make use of the updated API of the webbrowser module:

if sys.version_info[:2] >= (2, 5):

Another addition, not related with the support of Python 2.5, is the check_ver function, which checks if a particular version of OpenGL is available on the machine running GL O.B.S., this have opened the possibility to add an OpenGL 2 only test like GLSL_Parallax into benchmarks r47.

Python and GL O.B.S. are getting better and better. ๐Ÿ˜‰

Parallax mapping for the masses

I have spent the last ten days studying hard, reading the first half of the Orange Book (it’s the last book in the list, of course ๐Ÿ˜€ ), a plethora of papers, many demos code, tons of tutorials and guides, but at last I achieved what I would have never imagined just two weeks ago. ๐Ÿ˜‰

Fixed Pipeline

Per-pixel Lighting

Normal Mapping Parallax Mapping

The GLSL_parallax demo shows per pixel Blinn-Phong shading, specular mapping and tangent space parallax mapping with offset limiting! ๐Ÿ˜€

Actually I’m not really sure about the correctness of my implementation (especially regarding tangent space lighting) but screenshots demonstrate that I’m close to it.
In the first one the usual and boring OpenGL fixed functionality per-vertex lighting (ambient, diffuse and specular components of a point light with attenuation), in the second one shaders are enabled, but only to calculate lighting on a per-pixel basis. At last, the third and the fourth image show normal and parallax mapping.

Talking in more detail, the code is written for OpenGL 2 only, it makes use of Vertex Buffer Objects and GLSL shaders using core functions.

Here is the magic:

if (withParallax == true) { // alpha channel encodes the height map
  height = scale * texture2D(Tex1, gl_TexCoord[1].st).a - bias;
  TexCoord = gl_TexCoord[0].st + height * ecPos.xy;
if (withNormal == true)
  nor = 2.0 * normalMap.rgb - 1.0; // decoding normal map

Some statistics:

  • 6 varying variables
  • 7 uniform variables (texture samples and enable/disable booleans)
  • 3 texture fetches every fragment processed
  • (24×3)x3 + 24×2 = 432 floats (1728 bytes) stored in VBOs

Enjoy the shaders! ๐Ÿ™‚