I’ve been meaning to write up this post for a while, but I’ve been putting it off 🙂
The release of the Kinect sensor by Microsoft spawned a plethora of research in robotics, computer vision and many other fields. Many of these attempts involved using Kinect for purposes other that what it was originally meant for! That pretty much involves anything other than gesture recognition and skeleton tracking.
Even though Kinect is a very capable device, many people (including us!) don’t get the fact that Kinect was simply not designed for being used as an all-purpose depth and RGB camera. This observation is further bolstered by the fact that the Kinect was not released with a public API, and the official SDK which was released much later was missing a lot of functionality critical to various computer vision applications. For instance, the SDK does not provide a built-in functionality for getting color camera and depth camera calibration values, extrinsics and distortion models. Rather conveniently, the SDK designers simply chose to provide the users with simple functionality, like mapping pixels from world coordinates to RGB coordinates (but no way to do a backproject an image or depth point to ray). Needless to say that in many practical computer vision applications, one needs to constantly re-calibrate the camera in order to minimize the error caused by the mis-calibration of the camera.
Nevertheless, the choice by the SDK designers is understandable: there are proprietary algorithms and methods that are implemented in the Kinect software layer and it may not always be possible to give public access to them. Also, 3rd party open source libraries (such as libfreenect) have tried to reverse-engineer many of the innerworkings of the Kinect sensor and supply the end user with a set of needed functionalities.
With all this, let’s get started! (If you are impatient, feel free to jump to the list of tips I have compiled for you at the end of this post). I have also included PDFs containing checkerboards suitable for printing on large sheets, PVC or aluminum dibond.
What Started it All:
For our particular application, we needed a precisely calibrated Kinect. The reason was that we needed to be able to easily create world rays off of RGB pixels and to ray casting with the 3D point cloud. Without any calibration and using the official Kinect SDK, this is a sample of a point cloud that we obtained (click to see the large image):
Note the areas pointed by the arrows. There are all sorts of mis-alignment between the depth and the color data. There are red pixels of the bottle’s cap that appear on the dark surface. Also, parts of the floor have bled into the edges of the desk.
This prompted us to perform a manual calibration of our Kinect. Soon I came up with 3 routes I could take to tackle our problem:
- Find a way to extract the calibration values stored on the device, use them for backprojection (to get some other task done fist) and then play around and optimize them and reuse those values.
- Survey the literature to see what kind of calibration procedure use that is the most applicable to the Kinect.
- Perform the manual good ol’ stereo calibration with checkerboards.
Option 1 – (La-Z-Boy’s Favorite):
Out of the 3, I decided to go with choice 1 first, because this would at least allow me to be done with a similar, but unrelated task in my research. After a lot of looking, this blog post seemed to be talking about what I was looking for exactly and proved to be a viable approach. Unfortunately, after looking into it more closely, I found out that the parameters stored on the device are, once again, only useful in the context of the functionalities that the official SDK provides.
Long story short, the factory values gives you a matrix (more specifically a homography) that can map a depth point to a color point, but not the other way around. They also provides you with a depth unprojection matrix: meaning you can unproject depth values into 3D world points using the matrix stored on the device, but you can do the same for color pixels, nor can you go from color to depth and then to world coordinate space. All these make sense if you think about it: Kinect was meant to be used for skeleton tracking, not stereo vision specifically! The only things needed for doing skeleton tracking that would the matrices that are stored on the device. The developers, had probably never though that people would seriously use this sensor for things beyond this purpose, so they simply didn’t put the infrastructure needed for those exotic applications on the device. Also, recall that there’s proprietary algorithms that are used in the Kinect, and one means of protecting those intellectual properties is through obfuscation of how things are done under the hood. In other words, don’t expose more than necessary, or you’ll do reverse-engineers a favor!
Option 2 – (The Literature Reviewer’s Poison):
With option 1 being a dead-end I started looking into the literature and unsurprisingly I found a paper dealing with calibrating the Kinect. The paper is titled “Joint Depth and Color Camera Calibration with Distortion Correction”, and the authors were kind enough to supply the MATLAB code and a how-to guide for their approach on their homepage that used to be here (don’t bother… the link is dead now for some reason… the toolbox itself is available here). Understandably. the toolbox’s code was research-grade, therefore it was not very straightforward to use and get comfortable with. Call me stupid but it did take some time for me to be able to use it properly, and I had to go back and forth between the code and the paper to figure how things were working, just so that I could actually use them! Worst of all, things had a tendency to break for no reason: one click of the wrong GUI button and you were dead! You had to do a bunch of things from the beginning, and the polygon selection tool was a nightmare (thanks to MATLAB’s super slow
ginput window where the mouse lags like crazy if you are on a high resolution display).
I followed all of their instructions to the letter: I collected around 100 images of the calibration target, with varying distances and poses. I even used my Surface Book’s stylus to do precise polygon selection on the depth images. I wanted to get the best calibration possible, but no matter how hard I tried, the calibrations were bad and there was still a lot of misalignments visible on the point cloud. Also, backprojected pixels of the depth image were mostly off-target.
I recollected the images of my calibration target around 5 more times (so I had 5 sessions of about 100 images!!), using different-sized checkerboard patterns printed with different sizes with different printers to make sure that my target was not causing an issue. After all this failing too, I was convinced that this toolbox was not made for me and I would probably have better luck with something else (the resulting point cloud had a lot of mis-matches between the 3D data and the color image).
Option 3 – (The Conservative’s Bill):
Thus began my journey into using option 3: the good ol’ trusty stereo calibration approach using checkerboards. Theoretically you would want to calibrated the RGB camera and the infrared (IR) camera of Kinect and find the extrinsic matrix between them. Also, for this to work, you need to block the IR projector on Kinect itself (the lone lens-looking thing on the far right side of Kinect). There are various tutorials for performing Kinect stereo calibration online (here or here are just a few examples).
The good thing is that both OpenCV and MATLAB already have the tools to do this type of calibration. The only catch is that you would need to capture the IR and RGB images simultaneously. This is not possible with Kinect unfortunately. The problem is that at any instance in time Kinect can either give you RGB + depth data or IR image + depth data. In other words, you can’t ask Kinect to stream both color image and the IR image simultaneously, because the way it works is that it treats RGB stream and IR stream simply as image stream, thus you can’t have two streams streaming over one stream (again, another evidence that Kinect was not meant for applications other than skeleton tracking).
The workaround for this is that you can write an app on top of the SDK such that at capture time, it would open up the RGB stream, capture one frame, then close that stream and open an IR stream, capture the IR image and close that stream. This is theoretically good enough if you want to do a stereo calibration and need the image of the calibration target with both “heads” at the same time. However, the problem is that the task of opening a stream takes time (it literally takes half a second each time, because Kinect needs to initialize internally). So even if you want to capture the stereo pairs at the same time, you have to make sure that your sensor is not moving one bit during the stream switching time! Therefore, the safest bet is to use a tripod or something to make sure your Kinect is always stable.
Luckily, there is an app in the MRPT library (shown bellow) made for the exact purpose of capturing IR and RGB images from Kinect. This would switch the streams for you, capture the images and can later do stereo calibration using OpenCV too!
I used this app to capture the data and tried stereo calibration numerous times. I noticed that the calibration results with this app was slightly worse than those generated by MATLAB’s stereo calibration app. Also, MATLAB allows you to conveniently get rid of bad images our outliers after performing an iteration of calibration, possibly letting you improve your results further.
Each time I tried the calibration procedure, I took as many as 70 images with the target at various poses and distances relative to the camera. I noticed that even after getting rid of the outliers, my average reprojection error were along 1px to 3 px, which indicates bad calibration (a good calibration is supposed to give you subpixel average reprojection errors).
I was convinced I was doing something wrong and my calibration target was to blame bad. I set out on finding the best possible calibration target the money could buy (well not literally the best, but close enough!) After Googling a bit, I stumbled upon this post on ROS answers. They suggested a custom print on an aluminum dibond, to ensure the pattern was as flat as possible. I called a bunch of print shops in my area and they told me they would do dibond printing, but they were going to take like a week or so. Also they quoted me for around $120. Since I was running low on time, I decided to do a custom print of a 9×6 checkerboard pattern on a thick 37 x 25 inches PVC board (the board was 6mm thick). I was told by the print shop that that thickness would take a lot of abuse and the final board would not be very flexible (a claim that was later proved to be wrong, as the final design was crappy!).
The design (posted here), which was carefully examined to be very exact, was meant to allow for collecting calibration images from distances longer than 5 meters from the camera. Before proceeding with the job, I paid the print shop a visit and asked them for a sample print and the material. They gave me a nice looking sample PVC board with a some black prints on top. I was so picky that I wanted to make sure the black ink would show up in IR images perfectly. So I took my Kinect to the print shop, hooked it up to my laptop and captured a few images of the sample PVC print with the RGB, IR and the depth camera. After I was satisfied that the board would show nicely in all images, I told them to proceed. It took the print shop two business days to print me this:
I put the pen in there to give you a rough idea of the size of this board. Notice anything wrong….? I turned on the overhead light just to show you that nice and visible yellow glare which is the reflection of the lamp! Even though I had specifically told them that it should be matte, they laminated the finish, and kindly cut the whitespace margins of the target!!! 😮
In case you do not know, those white margins around the checkerboard pattern are vital for automatic checkerboard detection algorithms in MATLAB and OpenCV! Without those borders, those algorithms will NOT find your pattern and they won’t detect any corner points on the image.
I paid for the board but told them that I would probably need another one, simply because this one was not made to my specifications (for the reference, this job costed me $63). I took the board home and was unable to use it initially. The board was VERY flexible, and was slightly bent. I had to affix it to a large, flat, thick and white cardboard to be make the pattern stay flat and to be able to get the pattern detected properly. With this new pattern, my reprojection errors improved slightly, but the reflective nature of the board caused problems (especially with the IR camera).
Before asking the print shop to redo the job, I made a new checkerboard pattern using Adobe Illustrator. This time I made it so that the 9×6 pattern that I had contained squares of size 90mm (that original 100mm was not very necessary and was causing the board to go oversize. Also, I added more white margins around the checkerboard to ensure proper detection later. This new pattern was 93cm x 66cm and I asked them to print it on 3mm thick PVC, using the matte ink they had shown me the first time. I decided that later I would glue the PVC sheet to an aluminum sheet to make it stay flat. After another couple of days, they printed me this:
The overhead light is still on, but as you can see, the pattern in now matte and does not cause a lot of reflection! 🙂
Another thing I noticed was that the IR camera on Kinect acted strangely sometimes! If there is a lot of natural light in the room, the IR images will not be very noisy. Conversely, in an area with a lot of artificial light, the image will be SUPER noisy. In fact, so noisy that the pattern may not be detected at all! Therefore, I decided to get an IR light projector so that I could shine enough IR light on the target to make sure that it is visible in the IR image. I found this IR projector on Amazon which looked promising. It’s meant for mounting on CCTV cameras. The only thing to look out for is that this projector comes equipped with a photocell, so that it can automatically shut itself off during the days if mounted on CCTV cameras. Since we need to have it on all the time, we can simply block the photoresistor on the projector and fool it to think it’s always night 😀 I decided to jam a precisely cut cardboard onto the opening of the photoresistor on mine. The red circle in the image below shows the white cardboard piece I used to fool the photocell:
With all these little pieces in plane, I collected around 300 RGB and IR images from my calibration target, at various distances and poses relative to the camera. I used MATLAB’s stereo calibration app to do the final calibration and I was finally able to get around 0.35px average reprojection error 🙂 The image below shows the final point cloud I was able to obtain:
There are still mis-alignments between the point cloud and the RGB data, but they are much better than before and the amount of color bleeding has become minimal.
This is what describes what I feel today:
An autopsy revealed that the patient died from an autopsy
I thought this concluded my journey, but the story did not end here! Even though the calibration looked visually good, I found that sometimes my backprojected rays were off, even extremely inaccurate in some areas. I had tried looking into sensors other than Kinect, but their picture and depth quality was subpar. I have an Intel RealSense R200 which I’m not using much, and the point cloud that I had obtained from it had lots of noise compared to Kinect. So I never bothered looking into how good its calibration values were.
One day while looking at various issues posted on librealsense‘s repository I saw this post which suggested that there is a barometer inside Intel’s RealSense sensors which adjusts camera parameters based on ambient pressure changes! Right then, I though of my R200 sensor and though I’d give it a try. When I hooked it up, I noticed that there was a new firmware available for it. I updated the firmware, fired up the sensor and grabbed some point clouds. I noticed how much time I wasted, only after I saw this point cloud:
Even with the firmware update, the quality leaves something to be desired. But as you can see, right out of the box, the sensor’s calibration values are great as there’s minimal color bleeding and mis-alignment. Upon further inspection, I saw that the backprojected rays that I get with this sensor is far better than the ones I get with my Kinect. As a result, I completely ditched my Kinect and switched to R200. To date, I have not needed to do a manual calibration on my R200 (because the current calibration works mostly). Also, librealsense allows you to access all the camera parameters stored on the device which is extremely helpful!
Here’s my 2 cents for anyone who wants to perform stereo calibration. Even though these are geared towards Kinect calibration, they are applicable to any stereo calibration scenario:
- If you are using the Kinect, stop right here! Don’t do this and run the other way. The Kinect is not meant for whatever it is you are trying to use it for 😀 Try to grab a sensor that is meant for what you want to do.
The RealSense family of sensors are geared towards developers and are actively supported. I’ve heard interesting stuff about the ZED camera but have never tried one. Bottom line: use something that’s meant for your purposes.
- OK… I see you’re still here. Don’t say I didn’t warn you! 😀
Next, make sure you have the largest possible calibration pattern. Follow what I said above. Get a nice pattern professionally printed. Make sure each square is at least 8cm x 8cm. Also, make sure one side of the calibration pattern has an odd number of squares and the other side has an even number of squares (e.g 9×6 or 7×8). It’s important for detecting the pose of the target correctly. Also, some toolboxes will not be able to detect the pattern in this requirement is not met. As mentioned before, the patterns I used which are suitable for printing on large sheets are uploaded here (for 9cm squares) and here (for 10cm squares).
- Make sure your printed pattern has enough white border around it, otherwise it may not be easily detected by most toolboxes.
- Completely cover the IR projector of the Kinect with a thick material. Kinect’s IR projector is surprisingly strong enough to pass through post-it notes or other thin covers.
- Try to get the IR projector I linked above. This will make sure the captured IR images are as bright and noise-free as possible.
- Make sure the Kinect does not move. I used this mount to mount my Kinect to a tripod (beware of the screw’s treading when buying this — it is not compatible with all tripods out there).
- Try to get as many images of the calibration target as you can. My best calibration was obtained using 300 images, at distances as low as 0.5 meter to as far as 10 meters from the camera. Make sure you rotate the pattern around X, Y and Z axes. Also try to “tile” the view with images taken at the same distance: i.e take one image, move the target to the next tile in the field of view, take another one and repeat until you’ve “tiled” all of the current field of view. The goal is to cover the entire field of view at each distance as much as possible.
- Use MATLAB’s stereo calibration app if possible. It allows you to get rid of the outliers after each calibration phase.
I spent so much time trying to get the best calibration possible. With this post, my hope is that you won’t have to 🙂