Description
Overview
Camera pose estimation is a very robotic vision common task—we typically wish to know the pose (position and orientation) of the camera in the environment at all times. Pose estimation is also closely related to camera calibration. The general problem of pose estimation can be tricky (because you need to know something about the scene geometry). In this assignment, you will learn how to estimate the pose of the camera relative to a known object, in this case a planar checkerboard camera calibration target of fixed size. The goals are to:

provide practical experience with image smoothing and subpixel feature extraction, and

assist in understanding the nonlinear least squares optimization algorithm.
The due date for assignment submission is Monday, October 17, 2020, by 11:59 p.m. EDT. All submissions will be in Python 3 via Autolab; you may submit as many times as you wish until the deadline. To complete the assignment, you will need to review some material that goes beyond that discussed in the lectures—more details are provided below. The project has four parts, worth a total of 50 points.
Please clearly comment your code and ensure that you only make use of the Python modules and functions listed at the top of the code templates. We will view and run your code.
Implementation Details
For problems in which the pose of the camera must be determined with a high degree of accuracy (e.g., camera tracking for high fidelity 3D reconstruction), it is not unusual to insert a known calibration object into the scene; knowledge of the geometry of the calibration object can then be used to assist with pose estimation (assuming that the object is visible).
As part of this assignment, you will estimate the camera pose relative to a checkerboard target whose squares are 63.5 mm in size (on a side). You may assume that the checkerboard is perfectly flat, that is, the z coordinate of any point lying on the board is exactly zero (in the frame of the target). Sample images are shown in Figure 1. You may also assume that each image has already been unwarped to remove any lens distortion effects (you will be estimating pose parameters only; the intrinsic parameters are also assumed to be known already).
(a) (b) (c) (d) (e)
Figure 1: Sample images for use when testing your pose estimation algorithm.
ROB501: Computer Vision for Robotics Assignment #2: Camera Pose Estimation
Part 1: Image Smoothing and Subpixel Feature Extraction
To determine the camera pose, you will need to carefully extract a set of known feature points from an image of the target. Correspondences between the observed 2D image coordinates and the known 3D coordinates of the feature points (landmarks) then allows the pose to be determined (see Part 4). Typically, for a planar checkerboard target, crossjunction points are used, for two reasons: 1) they are easy to extract and 2) they are invariant to perspective transformations. A crossjunction is defined as the (ideal) point at which the diagonal black and white squares meet. In the sample images, the number of crossjunctions is 8 × 6 = 48.
There are variety of ways to identify the crossjunctions (for example, using the Harris corner detector). Usually, the coarse estimates of the crossjunction positions in each image are then refined using a saddle point detector, such as the one described in the following paper (included in the assignment archive):
L. Lucchese and S. K. Mitra, “Using Saddle Points for Subpixel Feature Detection in Camera Calibration Targets,” in Proceedings of the AsiaPacific Conference on Circuits and Systems (APCCAS), vol. 2, (Singapore), pp. 191–195, December 2002.
The saddle point is the best subpixel estimate of the true position of the crossjunction. Images are typically smoothed with a Gaussian filter prior to computing the saddle points—this is done because, in many cases, the crossjunctions are not clearly defined (due to, e.g., image quantization errors). If you zoom in on one of the sample images, you may notice this effect.
Your first task is to write a Python function that computes the position of the saddle point in a small image patch. This can be carried out by fitting a hyperbolic paraboloid to the smoothed intensity surface. The relevant fitting problem is defined by Eqn. (4) in the Lucchese paper, and is solved (unsurprisingly) using linear least squares! For this portion of the assignment, you should submit:

A single function in saddle_point.py, which accepts a small image patch as input and attempts to find a saddle point using the algorithm described in the paper. Note that the image coordinates of the saddle point should be returned with subpixel precision (i.e., as doubleprecision floating point numbers).
You will have access to the SciPy gaussian_filter function, which will perform image blurring (smoothing with a symmetric Gaussian kernel of fixed standard deviation)—feel free to try it out! For convenience, all of the Autolab tests for Part 1 use patches that have been preblurred in advance.
Note that two steps are required to find the saddle point: Eqn. (4) must be solved first, for the parameters α,

, γ, δ, ϵ, and ζ; the coordinates of the saddle point are then found using the next equation in the paper.
Part 2: Extracting All CrossJunctions
Your second task is to implement a Python function that returns an ordered list (rowmajor, from top left) of the crossjunction points on the target, to subpixel precision as determined by the saddle point detector. In every case, we will provide a bounding polygon that encloses the target (where the bounding polygon has points ordered clockwise from the upper left corner of the target); the upper left crossjunction should be taken as the origin of the target frame, with the xaxis pointing to the right in the image and the z axis extending into the page. Using this information, you will be able compute the metric coordinates of each (ideal) crossjunction (in metres)—we have also provided a world_points.npy file that contains this set of 3D coordinates. For this portion of the assignment, you should submit:

A single function in cross_junctions.py, which accepts an image and a bounding polygon and extracts all of the crossjunctions on the planar target, returning their coordinates with subpixel precision (in rowmajor order). The function should also accept the 3D (world) coordinates of the points (landmarks). The number of image features you extract must be the same as the number of landmarks.
ROB501: Computer Vision for Robotics Assignment #2: Camera Pose Estimation
The first set of (x, y) bounding polygon coordinates will always be closest to the upper leftmost crossjunction on the target (this should make ordering easier). Note that you may need to copy and paste the ‘innards’ of your saddle_point.py function into the appropriate section of the cross_junctions.py function. You should develop a novel way to coarsely localize the crossjunctions, followed by subpixel refinement with the saddle point detector. Note that this part of the assignment may require substantial effort.
Part 3: Camera Pose Jacobians
Upon completing Parts 1 and 2 of the assignment, you should have a function (in cross_junctions.py) that produces a series of 2D3D feature correspondences that can be used for pose estimation. You will implement pose estimation using a nonlinear least squares (NLS) procedure that incorporates all of the available data.
The solution to the PnP problem (i.e., PerspectivenPoints) is described in Section 6.2 of the Szeliski text. Although there is a linear algorithm for the problem, it does not work when all points are coplanar. Instead, we will provide you with an initial guess for the camera pose (±10^{◦} and 20 cm, approximately). You will need to know the camera intrinsic calibration matrix, which in this case is






564.9
0
337.3
0
0
1
K =
0
564.3
226.5
_{,}





where the focal lengths and principal point values are in pixels. To make things easier, we will use an Euler angle parameterization for the camera orientation. It will be necessary to solve for six parameters: the x, y, and z camera translation, and the roll, pitch, and yaw Euler angles that define the camera orientation. Note that we wish to solve for the pose of the camera relative to the target, T_{W C} (i.e., a 4 × 4 homogeneous pose matrix). When expressed in terms of the roll (ϕ), pitch (θ), and yaw (ψ) Euler angles, the rotation matrix that defines the orientation of the camera frame relative to the world (target) frame is

C_{WC}(ψ, θ, ϕ) = C(ψ) C(θ) C(ϕ)
(1)
cos ψ
− sin ψ
0
cos θ
0
sin θ
1
0
0
(2)
=
sin ψ
cos ψ
0
0
1
0
0
cos ϕ
− sin ϕ
0
0
1
− sin θ
0
cos θ
0
sin ϕ
cos ϕ
cos ψ cos θ
cos ψ sin θ sin ϕ − sin ψ cos ϕ
cos ψ sin θ cos ϕ + sin ψ sin ϕ
(3)
=
sin ψ cos θ
sin ψ sin θ sin ϕ + cos ψ cos ϕ
sin ψ sin θ cos ϕ − cos ψ sin ϕ
.
− sin θ
cos θ sin ϕ
cos θ cos ϕ
In order to compute the NLS solution, you will need the (full) Jacobian matrix for the image plane points (observations) with respect to the (six) pose parameters. The full Jacobian is composed of a series of 2 × 6 submatrices, stacked vertically. For this portion of the assignment, you should submit:

A single function in find_jacobian.py that computes a 2 × 6 Jacobian matrix for each image plane (feature) observation with respect to the camera pose parameters.
We will use the pinhole projection model, where the image plane coordinates of the projection of landmark j, with 3D position p¯_{j}, are







˜
˜
−1 _{¯}
(4)
xij ^{=} K ^{(}TW C_{i} ^{)}
pj






for camera pose i. Two helper functions to convert between Euler angles and rotation matrices are available on Autolab (and in the code package that accompanies this assignment document) to assist you.
ROB501: Computer Vision for Robotics Assignment #2: Camera Pose Estimation
Part 4: Camera Pose Estimation
The final step is to set up, and then solve, the nonlinear system of equations for pose estimation, using nonlinear least squares. For this portion of the assignment, you should submit:

A function in pose_estimate_nls.py, which accepts an intrinsic calibration matrix, a set of 2D3D correspondences (imagetarget) and an initial guess for the camera pose, and performs a nonlinear least squares optimization procedure to compute an updated, optimal estimate of the camera pose.
For testing, you may use the example images included in the assignment archive.
Grading
Points for each portion of the assignment will be determined as follows:

Saddle point function – 12 points (4 tests × 3 points per test)
Each test uses a different (preblurred) image patch containing one crossjunction. The estimate of the (subpixel) position of the saddle point must be within 1.0 pixels of the reference position to pass.

Crossjunction extraction – 15 points (3 tests × 5 points per test)
Each test uses a different image (of the calibration target); to pass, the extraction function must return the full set of image plane points (48), and the average position error (over all crossjunctions) must be less than 3 pixels.

Jacobian function – 11 points (2 tests; 6 points and 5 points)
The computed Jacobian matrix (for each test) is checked for accuracy—there is an exact solution for each camera pose and landmark point.

NLS pose estimation function – 12 points (3 tests × 4 points per test)
There are three tests, each of which uses a different (holdout) image of the calibration target. The returned, optimized pose solution must be ‘close’ to the reference solution (within 4 cm and 2^{◦}).
Total: 50 points
Grading criteria include: correctness and succinctness of the implementation of support functions, proper overall program operation and code commenting, and a correct composite image output (subject to some variation). Please note that we will test your code and it must run successfully. Code that is not properly commented or that looks like ‘spaghetti’ may result in an overall deduction of up to 10%.
4 of 4