Abstract: | Due to advances in technology the amount of images from portable cameras has increased tremendously in recent years. Nowadays, most new mobile phones and cars have multiple cameras. Drones and other robots are often equipped with cameras as well. To be able to capture an image is not the same as to understand the content of it. Computer vision deals with the task of giving machines the ability to make sense of images like humans do. %Hopefully they can surpass us one day.
Animals have no trouble of navigating in the world using visual information but this is still a problem for robots.
The subfield of computer vision that builds 3D models from 2D images is called structure from motion.
Structure from motion is a very complex problem, and the best known techniques use a pipeline structure that solves many subproblems in a sequence. Two of the papers in this thesis focus on the relative pose problem which is solved in most pipelines. In the relative pose step, the camera pose is estimated between two images at a time and the results are then used to construct a complete camera trajectory.
In the first paper we address the problem of merging relative poses.
Each relative pose has information about the relative position and rotation between two images. The paper focuses on the so called multiple rotation averaging problem, where the optimal set of camera rotations is calculated from relative ones. This is an overdetermined problem that often contains bad estimates of some of the pairwise rotations. We construct an algorithm based on Lagrangian duality that also can be used to verify if a solution is globally optimal.
We show by experiments that the algorithm finds the optimal solution for both real and synthetic data, if the amount of noise is reasonable.
In the second paper we focus on the special case of relative pose, where the orientation of the two cameras is known beforehand, by other sensors or previous calculations. The problem is to calculate the relative location of the cameras and we provide two different algorithms that have optimality guarantees. The first algorithm has a low order polynomial time complexity independent of the number of outliers among the set of corresponding 2D points. The second algorithm is a branch and bound approach that is orders of magnitudes faster in practice. We evaluate the algorithms on real and synthetic data and compare it to standard RANSAC, which in some hard instances deteriorates compared to our optimal methods.
The third paper is similar to the second paper, but differs in one fundamental way. We solve the pair-wise relative pose with unknown correspondences between the images. This means that we estimate the epipolar geometry and the corresponding points in one step instead of two. The method has optimality guarantees and is built on branch and bound together with bipartite matching. In experiments we show that this procedure can solve harder problems and that more inlier correspondences can be obtained by optimizing over pose and correspondences simultaneously instead of in two steps.
The fourth paper is about matching images to a given 3D model (absolute pose). In this paper we use a novel outlier rejection scheme for the case of known rotation. Our method can identify outlier correspondences between the 3D and 2D points in a fast and efficient way. When most of the outliers are removed the remaining problems are easier to solve. In the paper we evaluate the method on real and synthetic data and the results show improved performance compared to RANSAC. |