Posts Tagged ‘OpenGL’

Blurring the parallax

Monday, November 10th, 2008

Today I have published the first demo making use of my new C++ class library, I designed it to be very easily ported to a strict GL3 profile or to ES 2.0.

From plain rendering to depth of field

From plain rendering to depth of field

As a matter of fact, it doesn’t make use of fixed pipeline or deprecated functions at all:

  • No immediate mode, only VBOs
  • No use of OpenGL matrix stacks, I have my classes handling transformations and passing matrices to shaders directly
  • No OpenGL lighting, only per-fragment one
  • No quads or polygons, just triangles
Normal versus parallax mapping

Normal versus parallax mapping

I couldn’t release something only to show changes “under the hood”, I had to make something cool, so I decided to mix together parallax mapping (that, as you can see in the screenshot, is a lot more pronounced now) and depth of field, with the little addition of Stanford PLY mesh loading. :D

Mr.Fixit model and maps (the character players protray in Sauerbraten) are courtesy of John Siar, thank you John. ;)

As usual, you can have a look to Vimeo videos (640×480, 1280×720) and download the sources.

Let there be light!

Monday, April 28th, 2008

I started exploring deferred shading rendering to display multiple light sources and ended writing a demo featuring eight different lighting techniques and a PyOpenGL class library. :)

glsl_multilight

The whole story is more than a month old, just after releasing the first depth of field demo I began studying deferred shading, but I extended my purpose to include other lighting methods, like single and multi-pass fixed-pipeline lighting, per-vertex and per-pixel single and multi-pass shader lighting and, of course, deferred one.

While writing the C code, I thought it was going to be fun to also port it to Python, this way I could have also have a look to the “new” (ArchLinux adopted it quite late :) ) ctypes PyOpenGL, aka PyOpenGL 3.

Unfortunately, many little but annoying issues delayed me until today:

  • not setting explicitely glDepthFunc(GL_LEQUAL) (or, alternatively, not clearing the depth buffer at each pass) for multi-pass scene rendering made every pass to be discarded excepting the first one.
  • trying to make a buggy Python glDrawBuffers() wrapper work.
    Actually I had no luck with this and give up on MRTs support in PyOpenGL.
  • trying to figure out why VBOs didn’t work on PyOpenGL, I give up on this too. :)
  • using a uniform variable to index the gl_LightSource structure array, which prevented the shader from running on Shader Model 3.0 cards
  • exploring all the possibilities that could ever lead to “the brick room is very dark in fixed-pipeline mode” issue, only to discover today that this was a mere scaled normals problem.
    It was easily solved enabling GL_RESCALE_NORMAL

At last I made it, I have made a multi light demo that includes deferred lighting (although very rough and not optimized at all) and shows coherent lighting in all rendering modes.
The PyOpenGL class library almost works, no MRTs and VBOs, but it is functional enough to sport a complete DoF2 and multilight (without deferred mode, which relies on MRTs, of course) demo conversions.

It’s not a news anymore that you can view it in action on my YouTube Channel, or in a high definition 720p version hosted on my Vimeo page.

All’s well that ends well. :)

Depth of field reloaded

Tuesday, April 15th, 2008

Lately I’ve been really disappointed by the poor performances of my first depth of field implementation, thus I decided to do something about it…

glsl_dof2

The most natural step to do was to give a look to the second Direct3D example from the same paper I used for the first one, as I was sure it would have led to more satisfactory results.
I spent the two last nights converting, correcting and fine tuning it, but I was rewarded by the fact that I was right: even if it is a five passes algorithm which is using four different Frame Buffer Objects, it is about 2.5 times faster than my previous implementation!

I think the speed boost depends on the two following:

  1. image blurring is achieved by a gaussian filter which is calculated separating the X from the Y axis, it is an approximation of a standard 2D kernel but it also means that the convolution matrix calculation complexity decreases from a quadratic to a linear factor.
  2. this filter operates only on a downsampled (1/4th of the screen resolution actually) FBO

Another nice note about this new implementation is that there are only two focal parameters, focus depth and focus range, which really help to setup a correct scene.

Now let’s review the five passes in detail:

  1. Render the scene normally while calculating a blur amount per-vertex, then store the interpolated value per-pixel inside the alpha component of the fragment.
    The calculation at the vertex shader is just:

    
    Blur = clamp(abs(-PosWV.z - focalDistance) / focalRange, 0.0, 1.0);
    
  2. Downsample the scene rendered at the previous pass storing it in a smaller FBO
  3. Apply the gaussian filter along the X axis on the downsampled scene and store it in a new FBO
  4. Apply the gaussian filter along the Y axis on the already X blurred scene and store it in a new FBO
  5. Calculate a linear interpolation between the first full resolution FBO and the XY downsampled blurred one
    This is performed in the fragment shader as:

    
    gl_FragColor = Fullres + Fullres.a * (Blurred - Fullres);
    

Again, you can view it in action on my YouTube Channel, or in a high definition 720p version hosted on my Vimeo page. ;)

I love depth of field

Sunday, March 23rd, 2008

I consider depth of field as one of the most beautiful post-processing effects of the “next-gen” games.
It was natural for me to choose it as the first shader demo to implement after months of inactivity, as a matter of fact GLSL_impgro was really just a testbed for post-processing basic techniques, like Frame Buffer Objects.

GLSL_DoF

I have studied the theory from an ATI paper included in the ShaderX2 book, titled Real-Time Depth of Field Simulation, I have choosen the first of the two different implementation and converted it from Direct3D and HLSL to OpenGL and GLSL.

Of course, being a post-processing effect, the rendering is actually divided in two pass:

  1. Rendering the scene storing the depth of every vertex and calculating the amount of blur per fragment
  2. Applying the blur per fragment based on the value from the previous step

The second pass fragment shader, the one which is really applying the blur effect, is slow even on my 8600GT, because it performs several calculations for every one of the twelve fragments that are contributing to the blur of the center one.

Another interesting aspect is that, in order to calculate a correct approximation of the circular blur needed for circles of confusion simulation, these twelve pixel are sampled around the center based on a poissonian disc distribution, thus creating much less artifacts than a small convolution matrix scaled too much in order to sample from far away the center.

Just like the previous demo you can view it in action on my YouTube Channel, but I really suggest you to give a look to the high definition 720p version instead, hosted together with the other ones on my Vimeo page. ;)

glUniform1f() is working!

Monday, March 17th, 2008

I faced this problem for the first time a year ago, while working for my parallax mapping demo, and I met it again these days, in which I’m busy to fine tune my depth of field demo to permit keyboard driven parameters tweaking.

Bug

The issue I’m talking about is quite seriuos, on my machine it is impossible to pass a float uniform variable to a shader, and I’m not the only one reporting it:

The first link is a forum thread from GameDev written by a girl whose applications suffer from this annoying issue, he has written a proof of concept which works perfectly on my box, i.e. float uniforms are NOT passed. :)
But it has been the third one which made me think about how to fix the problem: it has been reported that, after calling glewInit(), glUniform*f() functions work again.

The first thing I did, of course, was to download and investigate inside GLEW sources to see what was happening inside that magic function. What it does, actually, is redefining all the GL function pointers calling glXGetProcAddress() for everyone of them, I thought it would have been a good thing to try to replicate this behaviour in my programs, and I was right! :D

This is what I added to my sources for the incriminated function to work:


PFNGLUNIFORM1FPROC glUniform1f = NULL;
glUniform1f = (PFNGLUNIFORM1FPROC)glXGetProcAddress((const GLubyte*)"glUniform1f");

This also seems to explain why my Python shader demo didn’t suffer from all of this, I think that PyOpenGL initializes itself retrieving the addresses for all the GL functions it needs.

IMPORTANT UPDATE
M3xican, the shader master came with THE solution, just add -DGL_GLEXT_PROTOTYPES to CFLAGS.
Hail to the master! :D

Image post-processing with shaders

Thursday, March 13th, 2008

I’m back to work after many months, university exams take really a lot of time…
For I am a bit rusty on GLSL programming, but willing to learn new things anyway, I have decided to begin with a simple yet interesting topic, image processing.

GLSL_imgpro

The whole thing, actually, needs two rendering passes and relies heavily on Frame Buffer Objects because:

  1. You render the scene to an off-screen texture.
  2. You render a quad covering the entire screen and binded to the previously written texture.
  3. You make a shader process the fragments resulted from rendering this textured quad, i.e. post-processing the original scene.

In this program post-processing is demanded to convolution matrices calculated with these kernels:


GLfloat kernels[7][9] = {
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Identity */
    { 0.0f,-1.0f, 0.0f,-1.0f, 5.0f,-1.0f, 0.0f,-1.0f, 0.0f}, /* Sharpen */
    { 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Blur */
    { 1.0f, 2.0f, 1.0f, 2.0f, 4.0f, 2.0f, 1.0f, 2.0f, 1.0f}, /* Gaussian blur */
    { 0.0f, 0.0f, 0.0f,-1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f}, /* Edge enhance */
    { 1.0f, 1.0f, 1.0f, 1.0f, 8.0f, 1.0f, 1.0f, 1.0f, 1.0f}, /* Edge detect */
    { 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f,-1.0f}  /* Emboss */
};

The final fragment color is calculated by a simple shader which, at the core, just performs the following:


for(i = -1; i <= 1; i++)
    for(j = -1; j <= 1; j++) {
        coord = gl_TexCoord[0].st + vec2(float(i) * (1.0/float(Width)) * float(Dist), float(j) * (1.0/float(Height)) * float(Dist));
        sum += Kernel[i+1][j+1] * texture2D(Tex0, coord.xy);
        contrib += Kernel[i+1][j+1];
    }

    gl_FragColor = sum/contrib;

When the user chooses a filter, the application updates the kernel currently in use with a call to:


loc = glGetUniformLocation(sh.p2, "Dist");
glUniform1i(loc, dist);
loc = glGetUniformLocation(sh.p2, "Kernel");
glUniformMatrix3fv(loc, 1, GL_FALSE, &kernels[curker]);

Dist is a user defined parameter (you can change it using arrows) that defines the distance in pixels from the center to the contributing sample.

Since a month I have created a YouTube Channel, now you can have an idea of how this demo works without downloading and compiling the source code: have a look at this link! ;)

Parallax mapping for the masses

Tuesday, May 1st, 2007

I have spent the last ten days studying hard, reading the first half of the Orange Book (it’s the last book in the list, of course :D ), a plethora of papers, many demos code, tons of tutorials and guides, but at last I achieved what I would have never imagined just two weeks ago. ;)

Fixed Pipeline Per-pixel Lighting Normal Mapping Parallax Mapping

The GLSL_parallax demo shows per pixel Blinn-Phong shading, specular mapping and tangent space parallax mapping with offset limiting! :D

Actually I’m not really sure about the correctness of my implementation (especially regarding tangent space lighting) but screenshots demonstrate that I’m close to it.
In the first one the usual and boring OpenGL fixed functionality per-vertex lighting (ambient, diffuse and specular components of a point light with attenuation), in the second one shaders are enabled, but only to calculate lighting on a per-pixel basis. At last, the third and the fourth image show normal and parallax mapping.

Talking in more detail, the code is written for OpenGL 2 only, it makes use of Vertex Buffer Objects and GLSL shaders using core functions.

Here is the magic:


[...]
if (withParallax == true) { // alpha channel encodes the height map
  height = scale * texture2D(Tex1, gl_TexCoord[1].st).a - bias;
  TexCoord = gl_TexCoord[0].st + height * ecPos.xy;
}
[...]
if (withNormal == true)
  nor = 2.0 * normalMap.rgb - 1.0; // decoding normal map
[...]

Some statistics:

  • 6 varying variables
  • 7 uniform variables (texture samples and enable/disable booleans)
  • 3 texture fetches every fragment processed
  • (24×3)x3 + 24×2 = 432 floats (1728 bytes) stored in VBOs

Enjoy the shaders! :)

The quest is over!

Sunday, April 15th, 2007

The quest for the lost fragment is over, at last!
Today I have returned from Athens and installed in Electron the additional ram module and the long awaited shader capable MSI FX5900XT-VTD128 card!

MSI FX5900XT-VTD128

The first thing I’ve done was to update the Nvidia driver packages from ‘nvidia-96xx’ to ‘nvidia’, this currently means going from 96.31 to 97.55.

This is what’s new from GeForce4 Ti 4200 (NV25) to GeForce FX5900 XT (NV35):

  • The OpenGL version string is now 2.1.0
  • The CineFX 2.0 engine allows for two new anti-aliasing modes: 4x Bilinear Multisampling by 4x Supersampling and 4x Bilinear Multisampling by 2x Supersampling
  • There are eighteen new extensions available: GL_ARB_fragment_program, GL_ARB_fragment_program_shadow, GL_ARB_fragment_shader, GL_ARB_half_float_pixel, GL_EXT_blend_func_separate, GL_EXT_framebuffer_blit, GL_EXT_framebuffer_multisample, GL_EXT_framebuffer_object, GL_EXT_stencil_two_side, GL_EXT_texture_sRGB, GL_NV_float_buffer, GL_NV_fragment_program, GL_NV_fragment_program_option, GL_NV_framebuffer_multisample_coverage, GL_NV_half_float, GL_NV_primitive_restart, GL_NV_vertex_program2, GL_NV_vertex_program2_option

What follows is a series of test, actually they are exactly the same, and with the same settings, as the ones shown in the Easter gifts post:

Test NoAA, NoAF 2xAA, 4xAF
glxgears 4852.8 2678.0
Blender 9978 7927
GL_shadow 1189.4 793.0
GL_pointz 544.8 552.0
GL_blit 2006.0 1391.4
GL_smoke 449.4 402.8

Some tests perform better on Electron than on Thunder (which has a much faster graphic card and DDR RAM), this is very strange, maybe I’ve got to run these tests on Thunder again. ;)

Accepted for Summer of Code

Thursday, April 12th, 2007

Dear Applicant,
Congratulations! This email is being sent to inform you that your
application was accepted to take part in the Summer of Code.

Yeah, one of my two proposals has been accepted!

Dilbert Doodle

My first proposal was about working on GL O.B.S. under the Python Software Foundation, unfortunately it was very likely going to be discarded.
I learned this from a mentor who contacted me, he wrote that my application was based on a personal program and that it would have been hard to find someone to mentor me, moreover I wouldn’t have contributed to the Python community. He also added that I could have been a good candidate for his project, he is, indeed, Arc Riley, Project Manager of PySoy.
And so I did, I wrote another application and, this time, it has been accepted. :)

My work will be to integrate multi-texturing in the PySoy rendering loop and API, document API additions, test the whole under many different free software drivers and then implement some related techniques, like bump or normal mapping.

I’m really glad of this opportunity, I will learn many interesting OpenGL and Python topics and I will improve my design, teamwork and communication skills.
Thank you Google! ;)

The quest for the lost fragment

Tuesday, February 6th, 2007

Yesterday I was donated a new graphic card from a generous guy at the university, including cables, manual and bundled software, it is a nice MSI G4Ti4200-DT64 with red PCB.

MSI G4Ti4200-DT64

It’s a good card but unfortunately it has only a primitive version of pixel shaders, they are neither floating point (supporting at most the proprietary “HILO” format) nor GLSL compliant, and they cannot be used via GL_ARB_fragment_program (as a matter of fact it is not present in the extensions array :( ), but only through the family of GL_NV_register_combiners and GL_NV_texture_shader functions, which make use of the OpenGL state machine.
Nvidia Cg actually supports the fp20 profile, but its output is just a nvparse program, which has to be passed to a function that will setup GL texture states.

Anyway, let’s analyze what’s new going from a GeForce4 MX440-8X (NV18) to a GeForce4 Ti 4200 (NV25):

  • The OpenGL version string hasn’t changed (mainly because of the lack of Shader Model 2.0), it is still 1.5.8.
  • Thanks to the Accuview AA Engine there are three new anti-aliasing modes: 4x Bilinear Multisampling, 4x Gaussian Multisampling and 2x Bilinear Multisampling by 4x Supersampling.
  • There are twenty new extensions available, most of them are related to multisample, depth textures, occlusion queries, shadows and texture shaders:
    GL_ARB_depth_texture, GL_ARB_multisample, GL_ARB_occlusion_query, GL_ARB_shadow, GL_ARB_texture_border_clamp, GL_EXT_shadow_funcs, GL_EXT_texture3D, GL_EXT_timer_query, GL_HP_occlusion_test, GL_NV_copy_depth_to_color, GL_NV_depth_clamp, GL_NV_multisample_filter_hint, GL_NV_occlusion_query, GL_NV_register_combiners2, GL_NV_texture_compression_vtc, GL_NV_texture_shader, GL_NV_texture_shader2, GL_NV_texture_shader3, GL_SGIX_depth_texture, GL_SGIX_shadow.

Of course I performed some benchmarks too (have a look at Electron specs), all at 1024×768, except from glxgears and globs tests, they were run at the default 640×480 resolution.
Quake 3 was tested on the four.dm_68 demo with sound, Blender 2.42 was tested with the draw benchmark, while the GLSLvp_pointz test uses only vertex shader to move and color points.
Note that this last test is emulated in software on NV18 while is performed in hardware on the NV25, but shaders plus full scene anti-aliasing seem to be impossible to achieve on the latter.

Test NV18 NV25
NoAA, NoAF 2xAA, 4xAF NoAA, NoAF 2xAA, 4xAf
glxgears 1568.5 833.5 2976.7 1571.7
Blender 2580 1580 7000 4484
Quake 3 93.0 51.7 113.6 92.3
gl_shadow 296 148.4 674.6 360
gl_pointz 421.8 272.6 526.2 398.4
gl_blit 600.6 285.8 1197.4 612.4
gl_smoke 299 177.2 404.2 302
GLSLvp_point 118.6 103.8 317.6 X

The card is nice and fast, but the search for the fragment (extension) has not ended. ;)