Posts Tagged ‘snippets’

Depth of field reloaded

Tuesday, April 15th, 2008

Lately I’ve been really disappointed by the poor performances of my first depth of field implementation, thus I decided to do something about it…

glsl_dof2

The most natural step to do was to give a look to the second Direct3D example from the same paper I used for the first one, as I was sure it would have led to more satisfactory results.
I spent the two last nights converting, correcting and fine tuning it, but I was rewarded by the fact that I was right: even if it is a five passes algorithm which is using four different Frame Buffer Objects, it is about 2.5 times faster than my previous implementation!

I think the speed boost depends on the two following:

  1. image blurring is achieved by a gaussian filter which is calculated separating the X from the Y axis, it is an approximation of a standard 2D kernel but it also means that the convolution matrix calculation complexity decreases from a quadratic to a linear factor.
  2. this filter operates only on a downsampled (1/4th of the screen resolution actually) FBO

Another nice note about this new implementation is that there are only two focal parameters, focus depth and focus range, which really help to setup a correct scene.

Now let’s review the five passes in detail:

  1. Render the scene normally while calculating a blur amount per-vertex, then store the interpolated value per-pixel inside the alpha component of the fragment.
    The calculation at the vertex shader is just:

    
    Blur = clamp(abs(-PosWV.z - focalDistance) / focalRange, 0.0, 1.0);
    
  2. Downsample the scene rendered at the previous pass storing it in a smaller FBO
  3. Apply the gaussian filter along the X axis on the downsampled scene and store it in a new FBO
  4. Apply the gaussian filter along the Y axis on the already X blurred scene and store it in a new FBO
  5. Calculate a linear interpolation between the first full resolution FBO and the XY downsampled blurred one
    This is performed in the fragment shader as:

    
    gl_FragColor = Fullres + Fullres.a * (Blurred - Fullres);
    

Again, you can view it in action on my YouTube Channel, or in a high definition 720p version hosted on my Vimeo page. ;)

Image post-processing with shaders

Thursday, March 13th, 2008

I’m back to work after many months, university exams take really a lot of time…
For I am a bit rusty on GLSL programming, but willing to learn new things anyway, I have decided to begin with a simple yet interesting topic, image processing.

GLSL_imgpro

The whole thing, actually, needs two rendering passes and relies heavily on Frame Buffer Objects because:

  1. You render the scene to an off-screen texture.
  2. You render a quad covering the entire screen and binded to the previously written texture.
  3. You make a shader process the fragments resulted from rendering this textured quad, i.e. post-processing the original scene.

In this program post-processing is demanded to convolution matrices calculated with these kernels:


GLfloat kernels[7][9] = {
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Identity */
    { 0.0f,-1.0f, 0.0f,-1.0f, 5.0f,-1.0f, 0.0f,-1.0f, 0.0f}, /* Sharpen */
    { 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Blur */
    { 1.0f, 2.0f, 1.0f, 2.0f, 4.0f, 2.0f, 1.0f, 2.0f, 1.0f}, /* Gaussian blur */
    { 0.0f, 0.0f, 0.0f,-1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Edge enhance */
    { 1.0f, 1.0f, 1.0f, 1.0f, 8.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Edge detect */
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f,-1.0f}  /* Emboss */
};

The final fragment color is calculated by a simple shader which, at the core, just performs the following:


for(i = -1; i <= 1; i++)
    for(j = -1; j <= 1; j++) {
        coord = gl_TexCoord[0].st + vec2(float(i) * (1.0/float(Width)) * float(Dist), float(j) * (1.0/float(Height)) * float(Dist));
        sum += Kernel[i+1][j+1] * texture2D(Tex0, coord.xy);
        contrib += Kernel[i+1][j+1];
    }

    gl_FragColor = sum/contrib;

When the user chooses a filter, the application updates the kernel currently in use with a call to:


loc = glGetUniformLocation(sh.p2, "Dist");
glUniform1i(loc, dist);
loc = glGetUniformLocation(sh.p2, "Kernel");
glUniformMatrix3fv(loc, 1, GL_FALSE, &kernels[curker]);

Dist is a user defined parameter (you can change it using arrows) that defines the distance in pixels from the center to the contributing sample.

Since a month I have created a YouTube Channel, now you can have an idea of how this demo works without downloading and compiling the source code: have a look at this link! ;)

A slighty smarter setup.py for PySoy r64

Friday, June 1st, 2007

While working on PySoy I was really disappointed by the policy adopted by the setup.py script, anytime I launched it the result was a recompilation of *all* the sources, this was really annoying and slow.

Python transparent logo

What came after was just that I decided to hack it a bit and make it behave more like a standard build tool, that is to control the modification time of a source file in order to choose whether it is updated or need a fresh compilation.

The new policy for the script is very simple, but useful enough to save plenty of time.
It is aware of the following cases (thank you Arc for tips on how to deal with the last one):

  • a .c file is missing, pyrex should compile the .pyx file
  • a .pyx file is newer than the corresponding .c file, an update is needed
  • a .pxd file is newer than any .pyx file, a global recompile is needed

The last point is not optimal, of course, but it’s a lot simpler than specifying all the .pxd dependecies for any .pyx file, and, anyway, quite close to being optimal, because of the thick web of cross dependencies which actually exists in PySoy.

Here is the piece of code which performs the magic:


# Convert Pyrex sources to C if using Trunk
if version == 'Trunk' :
  import os
  from stat import *
  from Pyrex.Compiler import Main

  options = Main.CompilationOptions(defaults=Main.default_options, include_path=include_path)

  newestpxd = 0
  for dir in include_path:
    for pxdsource in os.listdir(dir):
      pxdsource_path = (os.path.join(dir, pxdsource))

      if os.path.isfile(pxdsource_path) and pxdsource.rsplit('.', 1)[1] == 'pxd':
        if os.stat(pxdsource_path)[ST_MTIME] > newestpxd:
          newestpxd = os.stat(pxdsource_path)[ST_MTIME]

  for pyxsource in pyrex_sources:
    compilation_needed = False

    if os.path.isfile(pyxsource.rsplit('.', 1)[0] + '.c'):
      ctime = os.stat(pyxsource.rsplit('.', 1)[0] + '.c')[ST_MTIME]
    else:
      ctime = 0

    if newestpxd > ctime:
      compilation_needed = True
    elif os.stat(pyxsource)[ST_MTIME] > ctime:
      compilation_needed = True

    if compilation_needed:
      Main.compile(pyxsource, options)

Well, actually Arc commited r65 too, simplifying the conditionals of the script even more, but this is another story. :)
Anyway, I hope to have made another little step into making the life of our team a bit simpler. ;)

My Summer of Code begins with PySoy r44

Tuesday, May 29th, 2007

Summer of Code has started just today (even if currently it is only a “Spring of Code” :D ) but a little contribution of mine has already made his way inside the SVN repository of PySoy.

Code Obfuscation

But let’s start from the beginning…
After having shown to Arc an early draft of an UML class diagram for the current code I decided to come back to work on some test code I had written in the afternoon.
It was just a classical spinning cube demo to actually compare PyOpenGL versus Pyrex speed, I don’t report the results here because they are quite identical, if I haven’t done any mistake it should be the absolute minimum complexity of the code which actually determined this result.
Anyway, even without this proof I firmly believe in the power and speed of Pyrex. ;)

Going back to my commit, my changes affected a small yet important area of the code, I think we will remember it in the future. :)

BEFORE
In src/windows-glx/soy.windows.pyx:


gl.glClear(0x4100)

AFTER
In include/gl.pxd:


[...]
# Constants
  ctypedef enum:
    [...]
    # glPush/PopAttrib bits
    GL_DEPTH_BUFFER_BIT
    GL_COLOR_BUFFER_BIT
[...]

In src/windows-glx/soy.windows.pyx:


gl.glClear(gl.GL_COLOR_BUFFER_BIT | gl.GL_DEPTH_BUFFER_BIT)

Was it better before or now? ;)

Python 2.5 support in globs r46

Wednesday, May 23rd, 2007

Yes, I should have written about it when I actually committed the revision, but I forgot completely. :D
About two months after Python 2.5 has been moved to the current repository of Arch I begun to think that maybe it was time to support the new release of our beloved language/interpreter.

Python transparent logo

I don’t know if it can be called “support”, but at least GL O.B.S. is now aware of it. :)
The changes are very simple yet of some importance.
First of all, pysqlite is not needed anymore if you have the integrated sqlite3 module:


if sys.version_info[:2] >= (2, 5):
  from sqlite3 import dbapi2 as sqlite
else:
  from pysqlite2 import dbapi2 as sqlite

Moreover I make use of the updated API of the webbrowser module:


if sys.version_info[:2] >= (2, 5):
  webbrowser.open_new_tab(Globs.BROWSE_URL)
else:
  webbrowser.Netscape('firefox').open(Globs.BROWSE_URL)

Another addition, not related with the support of Python 2.5, is the check_ver function, which checks if a particular version of OpenGL is available on the machine running GL O.B.S., this have opened the possibility to add an OpenGL 2 only test like GLSL_Parallax into benchmarks r47.

Python and GL O.B.S. are getting better and better. ;)

Parallax mapping for the masses

Tuesday, May 1st, 2007

I have spent the last ten days studying hard, reading the first half of the Orange Book (it’s the last book in the list, of course :D ), a plethora of papers, many demos code, tons of tutorials and guides, but at last I achieved what I would have never imagined just two weeks ago. ;)

Fixed Pipeline Per-pixel Lighting Normal Mapping Parallax Mapping

The GLSL_parallax demo shows per pixel Blinn-Phong shading, specular mapping and tangent space parallax mapping with offset limiting! :D

Actually I’m not really sure about the correctness of my implementation (especially regarding tangent space lighting) but screenshots demonstrate that I’m close to it.
In the first one the usual and boring OpenGL fixed functionality per-vertex lighting (ambient, diffuse and specular components of a point light with attenuation), in the second one shaders are enabled, but only to calculate lighting on a per-pixel basis. At last, the third and the fourth image show normal and parallax mapping.

Talking in more detail, the code is written for OpenGL 2 only, it makes use of Vertex Buffer Objects and GLSL shaders using core functions.

Here is the magic:


[...]
if (withParallax == true) { // alpha channel encodes the height map
  height = scale * texture2D(Tex1, gl_TexCoord[1].st).a - bias;
  TexCoord = gl_TexCoord[0].st + height * ecPos.xy;
}
[...]
if (withNormal == true)
  nor = 2.0 * normalMap.rgb - 1.0; // decoding normal map
[...]

Some statistics:

  • 6 varying variables
  • 7 uniform variables (texture samples and enable/disable booleans)
  • 3 texture fetches every fragment processed
  • (24×3)x3 + 24×2 = 432 floats (1728 bytes) stored in VBOs

Enjoy the shaders! :)

Back to work, Mars r622

Wednesday, February 28th, 2007

Yesterday my exams session finally ended, I’m really satisfied about the results achieved, but during the studying period I was eager to get back to coding…

Today I fixed bug #0000008, a fastidious one which caused program termination if the user tried to take a screenshot after having deleted the hidden directory inside his home where settings and images are saved by default. :)

Mars 0.1.1 2nd

The first thing I thought was that I was missing a fopen() return code check, but, fortunately for my reputation, it wasn’t the case. ;)


// Opening output file
if((fp = fopen(filename, "wb")) == NULL)
{
  throw Exception("Screen", "fopen error");
  return -1;
}

The problem, as the shell output was suggesting, was related to the exception system: the fopen() exception was never caught.

Just changing this:


case SDLK_F4:
  screen->TakeScreenshot();
  break;

into this:


case SDLK_F4:
  try
  {
    screen->TakeScreenshot();
  }
  catch(Exception e)
  {
    e.PrintError();
  }
  break;

fixed everything.

Have a look at r622 log and at mars.cpp changes and remember to catch all the exception you may throw. ;)

Mars r594 and the vflip hack

Saturday, December 30th, 2006

For my first entry on this blog let me tell you a tale, it’s about OpenGL framebuffer and vertical flipping…

Mars 0.1.1 1st

Once upon a time a little fool called Encelo used to perform, in a little testing program, a vertical flip of the entire OpenGL framebuffer this way:


if(_flags & SDL_OPENGL)
{

  GLvoid * pixels;

  pixels = (GLvoid *) malloc(_width * _height * 4);
  glPushAttrib(GL_COLOR_BUFFER_BIT | GL_CURRENT_BIT | GL_PIXEL_MODE_BIT);
  glReadBuffer(GL_FRONT);
  glReadPixels(0, 0, _width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
  glDrawBuffer(GL_BACK);
  glRasterPos2f(-1.0f, 1.0f);
  glPixelZoom(1.0f, -1.0f);
  glDrawPixels(_width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
  glReadBuffer(GL_BACK);
  glReadPixels(0, 0, _width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
  glPopAttrib();
  output_surf = SDL_CreateRGBSurfaceFrom(pixels, _width, _height, 32, _surface->pitch, rmask, gmask, bmask, amask);
}

It wasn’t really bad, it performed some interesting tricks with buffers, and, as a matter of fact, Encelo was really proud of this implementation. :)
But… it didn’t work on Mars. Yes, no matter how much Encelo tested, changed, and tested again, it simply didn’t work on anything else than the original testing program.
A decision had to be taken soon, to persevere or not to persevere? That was the question.

Encelo chose not to persevere and to try a completely different approach… memcpy() flipping! :D
Yeah, something as simple, elegant, fast and smart as this:


if(_flags & SDL_OPENGL)
{
  int row, stride;
  GLubyte * swapline;
  GLubyte * pixels;

  stride = _width * 4; // length of a line in bytes
  pixels = (GLubyte *) malloc(stride * _height);
  swapline = (GLubyte *) malloc(stride);

  glReadPixels(0, 0, _width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);

  // vertical flip
  for(row = 0; row < _height/2; row++)
  {
    memcpy(swapline, pixels + row * stride, stride);
    memcpy(pixels + row * stride, pixels + (_height - row - 1) * stride, stride);
    memcpy(pixels + (_height - row -1) * stride, swapline, stride);
  }

  output_surf = SDL_CreateRGBSurfaceFrom(pixels, _width, _height, 32, _surface->pitch, rmask, gmask, bmask, amask);
}

This story is true, and happened exactly a month ago, on the 30 of November 2006.
As a proof have a look at r594 log and at Screen.cpp changes. ;)