CImg and NVIDIA’s NPP Interop

Apparently, NPP relies on the pixel order of its input arrays (they need to be interleaved). If you are planning on using CImg with NPP, be sure to check this post out before attempting to do so. Failing to permute CImg image axes will result in wrong filtered values for color images.

CImg does not store pixels in the interleaved format

 

Took me hours before I found and read the documentation.

CImg stores pixels in a planer format (RRRR…..GGGG…..BBBB). For most tasks in CUDA, it’s much better to store the pixel values in the interleaved format (RGBRGBRGB……).

In order to do that, just call the permute_axes method of the CImg object:

 

 IMPORTANT:
After permutation, the width, height,  spectrum and depth values that are reported for CImg will all change.  To permute back (for displaying or saving) do this:

 

Where the values, are previously saved values (before doing any kind of permutation on the axes). This will undo any changes and now you can safely save the image or display it.

CImg instance from interleaved array (bitmap):

Now imagine you want to initialize a CImg object with an interleaved bitmap (say an OpenGL texture or what have you). In this case, you need to know the width and height of the image as well as the number of components. Also imagine that the spectrum is 1. To create a CImg object using this array you can do ( imageArray is the bitmap pointer):

 

Error: “incorrect inclusion of a cudart header file”

If you receive this error while compiling a CUDA program, it means that you have included a CUDA header file containing CUDA specific qualifiers (such as __device__)  in a *.cpp file.

CUDA header files with such qualifiers should ONLY be included in *.cu files.

This happened to me when I had #inlcude <common_functions.h> in my *.cpp file. Note that having this in a header file that will be linked to a *.cpp file will also result in the same error.

Enable C++11 Support for CUDA Compiler (NVCC) – CUDA 6.5+

To enable support for C++11 in nvcc just add the switch -std=c++11 to nvcc.

If you are using Nsight Eclipse, right click on your project, go to Properties > Build > Settings > Tool Settings > NVCC Compiler and in the “Command line prompt” section add -std=c++11

The C++11 code should be compiled successfully with nvcc. Nsight’s C++ indexer will also work fine.

Automount NTFS Partitions with All Permissions

Somethings really need to be burned onto the inside of my skull., since I forget them ALL the time. This is especially true for Linux commands for trivial tasks. Automounting NTFS partitions with execution permission in Linux is one of those things for me. Here’s how to do it in Linux Mint (or probably any other Debian-based Linux distro)

1) Find  the UUID of your partition by

 

2) Add the following line in the file /etc/fstab

 

3) Run the following command to verify everything is working fine

 

You can verify the uid for your user by running

 

Note the option umask=000. This gives execution permission to all files.

 

NPP’s Convoluion with Border Control Only Partially Implemented

One thing I discovered yesterday is that the image convolution filters implemented in NPP (such as nppiFilterBorder_8u) are only partially implemented! These family of functions are asserted to provide border control for the convolution, thus serving as a robust alternative to the regular image convolution functions in NPP (such as nppiFilter_8u). The catch is that the border control is only partially working.

The documentation on these functions is scarce. These functions expect an argument of type NppiBorderType to define their border treatment. Possible options are:

 

My experiments showed that the only working option is NPP_BORDER_REPLICATE. Any other option would result in the NPPStatus  error code of -9999 (equivalent to NPP_NOT_SUPPORTED_MODE_ERROR, for which I have, again, not found any documentations).

Seeing as the performance of the border-controlled convolutions is inferior to the box filter function (using large mask sizes), my assumption is that the NPP_BORDER_REPLICATE uses the nppiCopyConstBorder_8u function to implement its border-control.

Possible options include implementing the border control manually, if behaviors other than replication are desired.

 

 

NPP’s Box Filter (nppiFilterBox) is Broken

Surprisingly, the box filter function (nppiFilterBox_8u)  that is shipped with CUDA as a part of the NPP library is broken! It is the same function that is used in the “Box Filter with NPP” sample.

If you import this sample from the CUDA SDK and try it with masks of size 13 an above, the filter produces garbage output (tested with CUDA 6.5). At this point, I have no idea why this is happening or why such simple filter may not work for larger mask sizes. An alternative would be to use the convolution filters (such as nppiFilter_8u).

 

EDIT (12/5/2014): I reported this bug to NVIDIA and today I received an email indicating that this bugs was now fixed and the fixed version will be available in the next version of the CUDA toolkit.

Blog Created

Seeing as how often many programmers struggle with the same issue twice, I decided to start this blog. I will try to note the problems that I encountered during my coding here so that when I, or other programmers, encounter them again the solution is already available somewhere.

I will note the issues that required more than a simple Google search to solve.

Never get stuck on the same issue twice! 🙂