Tag Archives: OpenGL

Image post-processing with shaders

I’m back to work after many months, university exams take really a lot of time…
For I am a bit rusty on GLSL programming, but willing to learn new things anyway, I have decided to begin with a simple yet interesting topic, image processing.


The whole thing, actually, needs two rendering passes and relies heavily on Frame Buffer Objects because:

  1. You render the scene to an off-screen texture.
  2. You render a quad covering the entire screen and binded to the previously written texture.
  3. You make a shader process the fragments resulted from rendering this textured quad, i.e. post-processing the original scene.

In this program post-processing is demanded to convolution matrices calculated with these kernels:

GLfloat kernels[7][9] = {
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Identity */
    { 0.0f,-1.0f, 0.0f,-1.0f, 5.0f,-1.0f, 0.0f,-1.0f, 0.0f}, /* Sharpen */
    { 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Blur */
    { 1.0f, 2.0f, 1.0f, 2.0f, 4.0f, 2.0f, 1.0f, 2.0f, 1.0f}, /* Gaussian blur */
    { 0.0f, 0.0f, 0.0f,-1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Edge enhance */
    { 1.0f, 1.0f, 1.0f, 1.0f, 8.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Edge detect */
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f,-1.0f}  /* Emboss */

The final fragment color is calculated by a simple shader which, at the core, just performs the following:

for(i = -1; i <= 1; i++) for(j = -1; j <= 1; j++) { coord = gl_TexCoord[0].st + vec2(float(i) * (1.0/float(Width)) * float(Dist), float(j) * (1.0/float(Height)) * float(Dist)); sum += Kernel[i+1][j+1] * texture2D(Tex0, coord.xy); contrib += Kernel[i+1][j+1]; } gl_FragColor = sum/contrib; [/sourcecode] When the user chooses a filter, the application updates the kernel currently in use with a call to: [sourcecode language='cpp'] loc = glGetUniformLocation(sh.p2, "Dist"); glUniform1i(loc, dist); loc = glGetUniformLocation(sh.p2, "Kernel"); glUniformMatrix3fv(loc, 1, GL_FALSE, &kernels[curker]); [/sourcecode] Dist is a user defined parameter (you can change it using arrows) that defines the distance in pixels from the center to the contributing sample. Since a month I have created a YouTube Channel, now you can have an idea of how this demo works without downloading and compiling the source code: have a look at this link! ๐Ÿ˜‰

Parallax mapping for the masses

I have spent the last ten days studying hard, reading the first half of the Orange Book (it’s the last book in the list, of course ๐Ÿ˜€ ), a plethora of papers, many demos code, tons of tutorials and guides, but at last I achieved what I would have never imagined just two weeks ago. ๐Ÿ˜‰

Fixed Pipeline

Per-pixel Lighting

Normal Mapping Parallax Mapping

The GLSL_parallax demo shows per pixel Blinn-Phong shading, specular mapping and tangent space parallax mapping with offset limiting! ๐Ÿ˜€

Actually I’m not really sure about the correctness of my implementation (especially regarding tangent space lighting) but screenshots demonstrate that I’m close to it.
In the first one the usual and boring OpenGL fixed functionality per-vertex lighting (ambient, diffuse and specular components of a point light with attenuation), in the second one shaders are enabled, but only to calculate lighting on a per-pixel basis. At last, the third and the fourth image show normal and parallax mapping.

Talking in more detail, the code is written for OpenGL 2 only, it makes use of Vertex Buffer Objects and GLSL shaders using core functions.

Here is the magic:

if (withParallax == true) { // alpha channel encodes the height map
  height = scale * texture2D(Tex1, gl_TexCoord[1].st).a - bias;
  TexCoord = gl_TexCoord[0].st + height * ecPos.xy;
if (withNormal == true)
  nor = 2.0 * normalMap.rgb - 1.0; // decoding normal map

Some statistics:

  • 6 varying variables
  • 7 uniform variables (texture samples and enable/disable booleans)
  • 3 texture fetches every fragment processed
  • (24×3)x3 + 24×2 = 432 floats (1728 bytes) stored in VBOs

Enjoy the shaders! ๐Ÿ™‚

The quest is over!

The quest for the lost fragment is over, at last!
Today I have returned from Athens and installed in Electron the additional ram module and the long awaited shader capable MSI FX5900XT-VTD128 card!

The first thing I’ve done was to update the Nvidia driver packages from ‘nvidia-96xx’ to ‘nvidia’, this currently means going from 96.31 to 97.55.

This is what’s new from GeForce4 Ti 4200 (NV25) to GeForce FX5900 XT (NV35):

  • The OpenGL version string is now 2.1.0
  • The CineFX 2.0 engine allows for two new anti-aliasing modes: 4x Bilinear Multisampling by 4x Supersampling and 4x Bilinear Multisampling by 2x Supersampling
  • There are eighteen new extensions available: GL_ARB_fragment_program, GL_ARB_fragment_program_shadow, GL_ARB_fragment_shader, GL_ARB_half_float_pixel, GL_EXT_blend_func_separate, GL_EXT_framebuffer_blit, GL_EXT_framebuffer_multisample, GL_EXT_framebuffer_object, GL_EXT_stencil_two_side, GL_EXT_texture_sRGB, GL_NV_float_buffer, GL_NV_fragment_program, GL_NV_fragment_program_option, GL_NV_framebuffer_multisample_coverage, GL_NV_half_float, GL_NV_primitive_restart, GL_NV_vertex_program2, GL_NV_vertex_program2_option

What follows is a series of test, actually they are exactly the same, and with the same settings, as the ones shown in the Easter gifts post:

Test NoAA, NoAF 2xAA, 4xAF
glxgears 4852.8 2678.0
Blender 9978 7927
GL_shadow 1189.4 793.0
GL_pointz 544.8 552.0
GL_blit 2006.0 1391.4
GL_smoke 449.4 402.8

Some tests perform better on Electron than on Thunder (which has a much faster graphic card and DDR RAM), this is very strange, maybe I’ve got to run these tests on Thunder again. ๐Ÿ˜‰

Accepted for Summer of Code

Dear Applicant,
Congratulations! This email is being sent to inform you that your
application was accepted to take part in the Summer of Code.

Yeah, one of my two proposals has been accepted!

My first proposal was about working on GL O.B.S. under the Python Software Foundation, unfortunately it was very likely going to be discarded.
I learned this from a mentor who contacted me, he wrote that my application was based on a personal program and that it would have been hard to find someone to mentor me, moreover I wouldn’t have contributed to the Python community. He also added that I could have been a good candidate for his project, he is, indeed, Arc Riley, Project Manager of PySoy.
And so I did, I wrote another application and, this time, it has been accepted. ๐Ÿ™‚

My work will be to integrate multi-texturing in the PySoy rendering loop and API, document API additions, test the whole under many different free software drivers and then implement some related techniques, like bump or normal mapping.

I’m really glad of this opportunity, I will learn many interesting OpenGL and Python topics and I will improve my design, teamwork and communication skills.
Thank you Google! ๐Ÿ˜‰

The quest for the lost fragment

Yesterday I was donated a new graphic card from a generous guy at the university, including cables, manual and bundled software, it is a nice MSI G4Ti4200-DT64 with red PCB.

MSI G4Ti4200-DT64

It’s a good card but unfortunately it has only a primitive version of pixel shaders, they are neither floating point (supporting at most the proprietary “HILO” format) nor GLSL compliant, and they cannot be used via GL_ARB_fragment_program (as a matter of fact it is not present in the extensions array ๐Ÿ™ ), but only through the family of GL_NV_register_combiners and GL_NV_texture_shader functions, which make use of the OpenGL state machine.
Nvidia Cg actually supports the fp20 profile, but its output is just a nvparse program, which has to be passed to a function that will setup GL texture states.

Anyway, let’s analyze what’s new going from a GeForce4 MX440-8X (NV18) to a GeForce4 Ti 4200 (NV25):

  • The OpenGL version string hasn’t changed (mainly because of the lack of Shader Model 2.0), it is still 1.5.8.
  • Thanks to the Accuview AA Engine there are three new anti-aliasing modes: 4x Bilinear Multisampling, 4x Gaussian Multisampling and 2x Bilinear Multisampling by 4x Supersampling.
  • There are twenty new extensions available, most of them are related to multisample, depth textures, occlusion queries, shadows and texture shaders:
    GL_ARB_depth_texture, GL_ARB_multisample, GL_ARB_occlusion_query, GL_ARB_shadow, GL_ARB_texture_border_clamp, GL_EXT_shadow_funcs, GL_EXT_texture3D, GL_EXT_timer_query, GL_HP_occlusion_test, GL_NV_copy_depth_to_color, GL_NV_depth_clamp, GL_NV_multisample_filter_hint, GL_NV_occlusion_query, GL_NV_register_combiners2, GL_NV_texture_compression_vtc, GL_NV_texture_shader, GL_NV_texture_shader2, GL_NV_texture_shader3, GL_SGIX_depth_texture, GL_SGIX_shadow.

Of course I performed some benchmarks too (have a look at Electron specs), all at 1024×768, except from glxgears and globs tests, they were run at the default 640×480 resolution.
Quake 3 was tested on the four.dm_68 demo with sound, Blender 2.42 was tested with the draw benchmark, while the GLSLvp_pointz test uses only vertex shader to move and color points.
Note that this last test is emulated in software on NV18 while is performed in hardware on the NV25, but shaders plus full scene anti-aliasing seem to be impossible to achieve on the latter.

Test NV18 NV25
NoAA, NoAF 2xAA, 4xAF NoAA, NoAF 2xAA, 4xAf
glxgears 1568.5 833.5 2976.7 1571.7
Blender 2580 1580 7000 4484
Quake 3 93.0 51.7 113.6 92.3
gl_shadow 296 148.4 674.6 360
gl_pointz 421.8 272.6 526.2 398.4
gl_blit 600.6 285.8 1197.4 612.4
gl_smoke 299 177.2 404.2 302
GLSLvp_point 118.6 103.8 317.6 X

The card is nice and fast, but the search for the fragment (extension) has not ended. ๐Ÿ˜‰

Mars r594 and the vflip hack

For my first entry on this blog let me tell you a tale, it’s about OpenGL framebuffer and vertical flipping…

Mars 0.1.1 1st

Once upon a time a little fool called Encelo used to perform, in a little testing program, a vertical flip of the entire OpenGL framebuffer this way:

if(_flags & SDL_OPENGL)

  GLvoid * pixels;

  pixels = (GLvoid *) malloc(_width * _height * 4);
  glReadPixels(0, 0, _width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
  glRasterPos2f(-1.0f, 1.0f);
  glPixelZoom(1.0f, -1.0f);
  glDrawPixels(_width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
  glReadPixels(0, 0, _width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
  output_surf = SDL_CreateRGBSurfaceFrom(pixels, _width, _height, 32, _surface->pitch, rmask, gmask, bmask, amask);

It wasn’t really bad, it performed some interesting tricks with buffers, and, as a matter of fact, Encelo was really proud of this implementation. ๐Ÿ™‚
But… it didn’t work on Mars. Yes, no matter how much Encelo tested, changed, and tested again, it simply didn’t work on anything else than the original testing program.
A decision had to be taken soon, to persevere or not to persevere? That was the question.

Encelo chose not to persevere and to try a completely different approach… memcpy() flipping! ๐Ÿ˜€
Yeah, something as simple, elegant, fast and smart as this:

if(_flags & SDL_OPENGL)
  int row, stride;
  GLubyte * swapline;
  GLubyte * pixels;

  stride = _width * 4; // length of a line in bytes
  pixels = (GLubyte *) malloc(stride * _height);
  swapline = (GLubyte *) malloc(stride);

  glReadPixels(0, 0, _width, _height, GL_RGBA, GL_UNSIGNED_BYTE, pixels);

  // vertical flip
  for(row = 0; row < _height/2; row++)
    memcpy(swapline, pixels + row * stride, stride);
    memcpy(pixels + row * stride, pixels + (_height - row - 1) * stride, stride);
    memcpy(pixels + (_height - row -1) * stride, swapline, stride);

  output_surf = SDL_CreateRGBSurfaceFrom(pixels, _width, _height, 32, _surface->pitch, rmask, gmask, bmask, amask);

This story is true, and happened exactly a month ago, on the 30 of November 2006.
As a proof have a look at r594 log and at Screen.cpp changes. ๐Ÿ˜‰