Error: “incorrect inclusion of a cudart header file”

If you receive this error while compiling a CUDA program, it means that you have included a CUDA header file containing CUDA specific qualifiers (such as __device__)  in a *.cpp file.

CUDA header files with such qualifiers should ONLY be included in *.cu files.

This happened to me when I had #inlcude <common_functions.h> in my *.cpp file. Note that having this in a header file that will be linked to a *.cpp file will also result in the same error.

Enable C++11 Support for CUDA Compiler (NVCC) – CUDA 6.5+

To enable support for C++11 in nvcc just add the switch -std=c++11 to nvcc.

If you are using Nsight Eclipse, right click on your project, go to Properties > Build > Settings > Tool Settings > NVCC Compiler and in the “Command line prompt” section add -std=c++11

The C++11 code should be compiled successfully with nvcc. Nsight’s C++ indexer will also work fine.

Automount NTFS Partitions with All Permissions

Somethings really need to be burned onto the inside of my skull., since I forget them ALL the time. This is especially true for Linux commands for trivial tasks. Automounting NTFS partitions with execution permission in Linux is one of those things for me. Here’s how to do it in Linux Mint (or probably any other Debian-based Linux distro)

1) Find  the UUID of your partition by


2) Add the following line in the file /etc/fstab


3) Run the following command to verify everything is working fine


You can verify the uid for your user by running


Note the option umask=000. This gives execution permission to all files.


NPP’s Convoluion with Border Control Only Partially Implemented

One thing I discovered yesterday is that the image convolution filters implemented in NPP (such as nppiFilterBorder_8u) are only partially implemented! These family of functions are asserted to provide border control for the convolution, thus serving as a robust alternative to the regular image convolution functions in NPP (such as nppiFilter_8u). The catch is that the border control is only partially working.

The documentation on these functions is scarce. These functions expect an argument of type NppiBorderType to define their border treatment. Possible options are:


My experiments showed that the only working option is NPP_BORDER_REPLICATE. Any other option would result in the NPPStatus  error code of -9999 (equivalent to NPP_NOT_SUPPORTED_MODE_ERROR, for which I have, again, not found any documentations).

Seeing as the performance of the border-controlled convolutions is inferior to the box filter function (using large mask sizes), my assumption is that the NPP_BORDER_REPLICATE uses the nppiCopyConstBorder_8u function to implement its border-control.

Possible options include implementing the border control manually, if behaviors other than replication are desired.



NPP’s Box Filter (nppiFilterBox) is Broken

Surprisingly, the box filter function (nppiFilterBox_8u)  that is shipped with CUDA as a part of the NPP library is broken! It is the same function that is used in the “Box Filter with NPP” sample.

If you import this sample from the CUDA SDK and try it with masks of size 13 an above, the filter produces garbage output (tested with CUDA 6.5). At this point, I have no idea why this is happening or why such simple filter may not work for larger mask sizes. An alternative would be to use the convolution filters (such as nppiFilter_8u).


EDIT (12/5/2014): I reported this bug to NVIDIA and today I received an email indicating that this bugs was now fixed and the fixed version will be available in the next version of the CUDA toolkit.

Blog Created

Seeing as how often many programmers struggle with the same issue twice, I decided to start this blog. I will try to note the problems that I encountered during my coding here so that when I, or other programmers, encounter them again the solution is already available somewhere.

I will note the issues that required more than a simple Google search to solve.

Never get stuck on the same issue twice! 🙂