Practical Kinect Stereo Calibration for the Highest Accuracy

I’ve been meaning to write up this post for a while, but I’ve been putting it off ūüôā

The release of the Kinect sensor by Microsoft spawned a plethora of research in robotics, computer vision and many other fields. Many of these attempts involved using Kinect for purposes other that what it was originally meant for! That pretty much involves anything other than gesture recognition and skeleton tracking.

Even though Kinect is a very capable device, many people (including us!) don’t get the fact that Kinect was¬†simply not designed for being used as an all-purpose depth and RGB camera. This observation is further bolstered by the fact that the Kinect was not released with a public API, and the¬†official SDK which was released much later was missing a lot of functionality critical¬†to various computer vision applications. For instance, the SDK does not provide a built-in functionality for getting color camera and depth camera calibration values, extrinsics and distortion models. Rather conveniently, the SDK designers simply chose to provide the users¬†with simple functionality, like mapping pixels from world coordinates to RGB coordinates (but no way to do a backproject an image or depth point to ray). Needless to say that in many practical computer vision applications, one needs to constantly re-calibrate the camera in order to minimize the error caused by the mis-calibration of the camera.

Nevertheless, the choice by the SDK designers is understandable: there are proprietary algorithms and methods that are implemented in the Kinect software layer and it may not always be possible to give public access to them. Also, 3rd party open source libraries (such as libfreenect) have tried to reverse-engineer many of the innerworkings of the Kinect sensor and supply the end user with a set of needed functionalities.

With all this, let’s get started! (If you are impatient, feel free to jump to the list of tips I have compiled for you at the end of this post). I have also included PDFs containing checkerboards suitable for printing on large sheets, PVC or aluminum dibond.

Continue reading Practical Kinect Stereo Calibration for the Highest Accuracy

Align Depth and Color Frames – Depth and RGB Registration

Sometimes it is necessary to create a point cloud from a given depth and color (RGB) frame. This is especially the case when a scene is captured using depth cameras such as Kinect. The process of aligning the depth and the RGB frame is called “registration” and it is very easy to do (and the algorithm’s pseudo-code is surprisingly hard to find with a simple Google search! ūüėÄ )

To perform registration, you would need 4 pieces of information:

  1. The depth camera intrinsics:
    1. Focal lengths fxd and fyd (in pixel units)
    2. Optical centers (sometimes called image centers) Cxd and Cyd
  2. The RGB camera intrinsics:
    1. Focal lengths fxrgb and fyrgb (in pixel units)
    2. Optical centers (sometimes called image centers) Cxrgb and Cyrgb
  3. The extrinsics relating the depth camera to the¬†RGB camera. This is a 4×4 matrix containing rotation and translation values.
  4. (Obviously) the depth and the RGB frames. Note that they do not have to have the same resolution. Applying the intrinsics takes care of the resolution issue. Using camera’s such as Kinect, the depth values should usually be in meters (the unit of the depth values is very important as using incorrect units will result in a registration in which the colors and the depth values are off and are clearly misaligned).
    Also, note that some data sets apply a scale and a bias to the depth values in the depth frame. Make sure to account for this scaling and offsetting before proceeding. In order words, make sure there are no scales applied to the depth values of your depth frame.

Let depthData contain the depth frame and rgbData contain the RGB frame. The pseudo-code for registration in MATLAB is as follows:

A few things to note here:

  1. The indices x and y in the second group of for loops may be invalid which indicates that the obtained RGB pixel is not visible to the RGB camera.
  2. Some kind of interpolation may be necessary when using x and y. I just did rounding.
  3. This code can be readily used with savepcd function to save the point cloud into a PCL compatible format.

The registration formulas were obtained from the paper “On-line Incremental 3D¬†Human Body Reconstruction for HMI¬†or AR¬†Applications” by Almeida et al (2011).¬†The same formulas can be found¬†here. Hope this helps ūüôā