## Specify Target File in git cherry-pick

Cherry-picking with git could easily become a nightmare if some files are renamed or the directory structure of the project is changed. In these cases, it is often helpful to cherry-pick the problematic files individually.

Fortunately, this can be done in a few easy steps:

1. Create a patch file for that individual file: git show [commit hash] -- path/to/old/file > patchfile
2. Manually edit the newly created patchfile and replace all occurrences of the old path with the new path.
3. Apply the patch via git apply --3way patchfile

The --3way argument is very important and would tell git to do a 3-way comparison. After step 3, if git is unable to figure out how to merge the changes, it will add conflict markers to the resulting file and would allow one to manually resolve the conflicts.

The procedure above can be repeated for every individual file.

## Use PyTorch’s DataLoader with Variable Length Sequences for LSTM/GRU

When I first started using PyTorch to implement recurrent neural networks (RNN), I faced a small issue when I was trying to use DataLoader in conjunction with variable-length sequences. What I specifically wanted to do was to automate the process of distributing training data among multiple graphics cards. Even though there are numerous examples online talking about how to do the actual padding, I couldn’t find any concrete example of using DataLoader in conjunction with padding, and my many-months old question on their forum is still left unanswered!!

The standard way of working with inputs of variable lengths is to pad all the sequences with zeros to make their lengths equal to the length of the largest sequence. This padding is done with the pad_sequence function. PyTorch’s RNN (LSTM, GRU, etc) modules are capable of working with inputs of a padded sequence type and intelligently ignore the zero paddings in the sequence.

If the goal is to train with mini-batches, one needs to pad the sequences in each batch. In other words, given a mini-batch of size N, if the length of the largest sequence is L, one needs to pad every sequence with a length of smaller than L with zeros and make their lengths equal to L. Moreover, it is important that the sequences in the batch are in the descending order.

To do proper padding with DataLoader, we can use the collate_fn argument to specify a class that performs the collation operation, which in our case is zero padding. The following is a minimal example of a collation class that does the padding we need:

Note the importance of batch_first=True in my code above. By default, DataLoader assumes that the first dimension of the data is the batch number. Whereas, PyTorch’s RNN modules, by default, put batch in the second dimension (which I absolutely hate). Fortunately, this behavior can be changed for both the RNN modules and the DataLoader. I personally always prefer to have the batch be the first dimension of the data.

With my code above, DataLoader instance is created as follows:

The last remaining step here is to pass each batch to the RNN module during training/inference. This can be done by using the pack_padded_sequence function as follows:

## OpenGL not Found, “cannot find -lGL”, and Other Issues with NVIDIA Drivers

Under Ubuntu,  sometimes when the NVIDIA drivers are installed, CMake cannot properly find OpenGL (specifically libGL.so). As a result, linking issues may occur.

The easiest fix for this is to first create a symlink of nvidia-current to nvidia-xxx where xxx is the version of the current NVIDIA driver installed (e.g.  nvidia-387) . Then create a symlink from /usr/lib/libGL.so to /usr/lib/nvidia-current/libGL.so.

The command would be:

Note that having nvidia-current symlink is generally a good idea as it can prevent other linker errors when it comes to finding NVIDIA-specific libraries. Also, remember to update this symlink whenever NVIDIA drivers are also updated.

## Regional and Locale Settings Affects Parsing Decimal Strings in C#

Yesterday, someone who was trying to use Jackknife, our gesture recognition library, reported that he couldn’t reproduce our results with our C# code. Naturally, we started looking into the issue.

Strangely enough, the code worked fine for me (both Debug and Release configurations). I then built the code under multiple versions of Visual Studio, tried both Windows 8 and Windows 10, and even built the code on Mono under Ubuntu. Everything was working as expected. He even sent me his compiled binaries so that I could try his build, yet to my surprise I was still getting correct outputs on all my machines.

Since our C++ code was working fine for him, I knew the issue was connected to .NET somehow. While trying to figure out what was causing this mind-boggling issue, I noticed that he was in Europe and I got this crazy idea that his operating system’s locale settings were interfering with something somewhere. As a last resort, I changed the regional settings on my computer to his region and voila! I was able to reproduce his problem!!

It turned out that in his region, decimal numbers are separated by commas, rather than periods, which is common in the US. Consequently, in that region the decimal number 123.45 is written as 123,45. In our tool, we were manually parsing CSV files. Since we hadn’t thought of issues like this, we were naively calling double.Parse() on strings containing decimals. Therefore, .NET was using the operating system’s locale settings to parse the string, giving completely different results than the ones we expected.

One easy hack to fix this issue was to force the locale of the current thread to “en-US” locale. The code excerpt below does exactly that:

In our C++ code, however, we were using std::stod which does not care about locale settings, so it was working fine.

## Orthonormalize a Rotation Matrix

If you use a 3×3 R matrix to store the result of the multiplication of a series of rotation transformations, it could be the case that sometimes you end up with a matrix that is not orthonormal (i.e. det(R) != 1 and R.R' != eye).

In such cases, you need to re-orthonormalize the rotation matrix, which can be done in either of the two following ways:

1. Use the SVD decomposition as follows (MATLAB syntax):
2. Alternatively, you can express the rotation matrix as a quaternion, normalize the quaternion, then convert the quaternion back to the rotation matrix form. In MATLAB, you could do this:

Note that the above syntax requires MATLAB’s robotics toolbox.

## Visual Studio Setup Blocked or Can’t Uninstall

For some reason a Visual Studio 2015 Community Edition installation I had on one of my machines was experiencing problems: solution files were not loading properly, and what’s worse is that installation instance had gone totally missing from “Uninstall a Program” in Windows Control Panel. Also, every time I tried running the setup file vs_community.exe from the ISO image, I would get an error saying “The product version that you are trying to set up is earlier than the version already installed on this computer”.

What I ended up doing was to run the setup file with the two following switches:

vs_community.exe /uninstall /force

Luckily, it went forward with uninstalling the instance I had on my machine.

## Practical Kinect Stereo Calibration for the Highest Accuracy

I’ve been meaning to write up this post for a while, but I’ve been putting it off 🙂

The release of the Kinect sensor by Microsoft spawned a plethora of research in robotics, computer vision and many other fields. Many of these attempts involved using Kinect for purposes other that what it was originally meant for! That pretty much involves anything other than gesture recognition and skeleton tracking.

Even though Kinect is a very capable device, many people (including us!) don’t get the fact that Kinect was simply not designed for being used as an all-purpose depth and RGB camera. This observation is further bolstered by the fact that the Kinect was not released with a public API, and the official SDK which was released much later was missing a lot of functionality critical to various computer vision applications. For instance, the SDK does not provide a built-in functionality for getting color camera and depth camera calibration values, extrinsics and distortion models. Rather conveniently, the SDK designers simply chose to provide the users with simple functionality, like mapping pixels from world coordinates to RGB coordinates (but no way to do a backproject an image or depth point to ray). Needless to say that in many practical computer vision applications, one needs to constantly re-calibrate the camera in order to minimize the error caused by the mis-calibration of the camera.

Nevertheless, the choice by the SDK designers is understandable: there are proprietary algorithms and methods that are implemented in the Kinect software layer and it may not always be possible to give public access to them. Also, 3rd party open source libraries (such as libfreenect) have tried to reverse-engineer many of the innerworkings of the Kinect sensor and supply the end user with a set of needed functionalities.

With all this, let’s get started! (If you are impatient, feel free to jump to the list of tips I have compiled for you at the end of this post). I have also included PDFs containing checkerboards suitable for printing on large sheets, PVC or aluminum dibond.

## Computing the Distance Between a 3D Point and a Plücker Line

In order to solve an optimization problem with the goal of reducing the distance between a bunch of 3D points and lines, I was looking for the correct way of finding the distance between 3D points and a Plucker line representation.

The Plucker line $$L$$ passing through two lines $$A$$ and $$B$$ is defined as $$L = AB^T – BA^T$$ (for more details refer to [1]). After a lot of looking, I found that there is a simple method for finding this distance in [2]. A direct quote from the paper:

A Plucker line $$L = (n, m)$$ is described by a unit vector $$n$$ and a
moment $$m$$. This line representation allows to conveniently determine
the distance of a 3D point $$X$$ to the line

$$d(X, L) = ||X \times n – m||_2$$

where $$\times$$ denotes a cross product.

[1] Hartley, Richard, and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.

[2] Brox, Thomas, et al. “Combined region and motion-based 3D tracking of rigid and articulated objects.” IEEE Transactions on Pattern Analysis and Machine Intelligence 32.3 (2010): 402-415.

## Tracking down BadImageFormatException in C#

I know It’s been quite some time since my last post, but what can I say? I’ve been quite busy 🙂 Here is the latest issue that I got stuck on for a few hours, and I thought I’d share what worked in the end.

I was working on a large C# project that had a lot of dependencies on native C++ DLLs in Visual Studio 2015. I had compiled all the 3rdparty dependencies to DLLs myself and had included them in the C# project. On my main development machine, everything was working fine, but I couldn’t get the project to work on another machine with a clean Windows 10 install. The project would compile fine but would fail to run with BadImageFormatException.

From prior experience, I knew this error had to do with a mismatch between dependency architectures somewhere (e.g. 32 bit binary is trying to load a 64 bit binary or vice versa). However, this time I was so sure that wasn’t the case because I had built almost all the DLL dependencies myself and I knew that everything was built in x64 mode. Also, it was working fine on my development machine.

After a few hours of frustration and using the Fuslogvw.exe tool without obtaining much more insight about the problem, I decided to use Dependency Walker to see what’s happening. This great little tool did magic! I noticed that one of the dependencies that I hadn’t built myself was actually a 32-bit binary. The reason this was not causing an issue on my development machine was that as bad luck would have it, the 64-bit version of that dependency was in my PATH variable on that machine, so my C# project would load up the correct DLL. However, on other machines with a clean Windows installation, this wasn’t the case and because the only found binary was the 32-bit version, I would receive that dreaded BadImageFormatException.

## librealsense with ARM support

Recently, I was able to successfully use RealSense R200 on my NVIDIA Jetson TX1 with librealsense. I had to replace some SSSE3 instructions in the code to get it to compile under ARM. I created a fork of librealsense on Github with all the changes I made. Check it out here: ttps://github.com/Maghoumi/librealsense

R200 works at 60 FPS with my Jetson TX1 flawlessly! 🙂