# Large-scale bundle adjustment in scipyΒΆ

A bundle adjusmtent problem arises in 3-D reconstruction and it can be formulated as follows (taken from https://en.wikipedia.org/wiki/Bundle_adjustment):

Given a set of images depicting a number of 3D points from different viewpoints, bundle adjustment can be defined as the problem of simultaneously refining the 3D coordinates describing the scene geometry as well as the parameters of the relative motion and the optical characteristics of the camera(s) employed to acquire the images, according to an optimality criterion involving the corresponding image projections of all points.

More precisely. We have a set of points in real world defined by their coordinates $(X, Y, Z)$ in some apriori chosen "world coordinate frame". We photograph these points by different cameras, which are characterized by their orientation and translation relative to the world coordinate frame and also by focal length and two radial distortion parameters (9 parameters in total). Then we precicely measure 2-D coordinates $(x, y)$ of the points projected by the cameras on images. Our task is to refine 3-D coordinates of original points as well as camera parameters, by minimizing the sum of squares of reprojecting errors.

Let $\pmb{P} = (X, Y, Z)^T$ - a radius-vector of a point, $\pmb{R}$ - a rotation matrix of a camera, $\pmb{t}$ - a translation vector of a camera, $f$ - its focal distance, $k_1, k_2$ - its distortion parameters. Then the reprojecting is done as follows:

\begin{align} \pmb{Q} = \pmb{R} \pmb{P} + \pmb{t} \\ \pmb{q} = -\begin{pmatrix} Q_x / Q_z \\ Q_y / Q_z \end{pmatrix} \\ \pmb{p} = f (1 + k_1 \lVert \pmb{q} \rVert^2 + k_2 \lVert \pmb{q} \rVert^4) \pmb{q} \end{align}The resulting vector $\pmb{p}=(x, y)^T$ contains image coordinates of the original point. This model is called "pinhole camera model", a very good notes about this subject I found here http://www.comp.nus.edu.sg/~cs4243/lecture/camera.pdf

Now let's start solving some real bundle adjusment problem. We'll take a problem from http://grail.cs.washington.edu/projects/bal/.

```
from __future__ import print_function
```

```
import urllib
import bz2
import os
import numpy as np
```

First download the data file:

```
BASE_URL = "http://grail.cs.washington.edu/projects/bal/data/ladybug/"
FILE_NAME = "problem-49-7776-pre.txt.bz2"
URL = BASE_URL + FILE_NAME
```

```
if not os.path.isfile(FILE_NAME):
urllib.request.urlretrieve(URL, FILE_NAME)
```

Now read the data from the file:

```
def read_bal_data(file_name):
with bz2.open(file_name, "rt") as file:
n_cameras, n_points, n_observations = map(
int, file.readline().split())
camera_indices = np.empty(n_observations, dtype=int)
point_indices = np.empty(n_observations, dtype=int)
points_2d = np.empty((n_observations, 2))
for i in range(n_observations):
camera_index, point_index, x, y = file.readline().split()
camera_indices[i] = int(camera_index)
point_indices[i] = int(point_index)
points_2d[i] = [float(x), float(y)]
camera_params = np.empty(n_cameras * 9)
for i in range(n_cameras * 9):
camera_params[i] = float(file.readline())
camera_params = camera_params.reshape((n_cameras, -1))
points_3d = np.empty(n_points * 3)
for i in range(n_points * 3):
points_3d[i] = float(file.readline())
points_3d = points_3d.reshape((n_points, -1))
return camera_params, points_3d, camera_indices, point_indices, points_2d
```

```
camera_params, points_3d, camera_indices, point_indices, points_2d = read_bal_data(FILE_NAME)
```

Here we have numpy arrays:

`camera_params`

with shape`(n_cameras, 9)`

contains initial estimates of parameters for all cameras. First 3 components in each row form a rotation vector (https://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula), next 3 components form a translation vector, then a focal distance and two distortion parameters.`points_3d`

with shape`(n_points, 3)`

contains initial estimates of point coordinates in the world frame.`camera_ind`

with shape`(n_observations,)`

contains indices of cameras (from 0 to`n_cameras - 1`

) involved in each observation.`point_ind`

with shape`(n_observations,)`

contatins indices of points (from 0 to`n_points - 1`

) involved in each observation.`points_2d`

with shape`(n_observations, 2)`

contains measured 2-D coordinates of points projected on images in each observations.

And the numbers are:

```
n_cameras = camera_params.shape[0]
n_points = points_3d.shape[0]
n = 9 * n_cameras + 3 * n_points
m = 2 * points_2d.shape[0]
print("n_cameras: {}".format(n_cameras))
print("n_points: {}".format(n_points))
print("Total number of parameters: {}".format(n))
print("Total number of residuals: {}".format(m))
```

We chose a relatively small problem to reduce computation time, but scipy's algorithm is capable of solving much larger problems, although required time will grow proportionally.

Now define the function which returns a vector of residuals. We use numpy vectorized computations:

```
def rotate(points, rot_vecs):
"""Rotate points by given rotation vectors.
Rodrigues' rotation formula is used.
"""
theta = np.linalg.norm(rot_vecs, axis=1)[:, np.newaxis]
with np.errstate(invalid='ignore'):
v = rot_vecs / theta
v = np.nan_to_num(v)
dot = np.sum(points * v, axis=1)[:, np.newaxis]
cos_theta = np.cos(theta)
sin_theta = np.sin(theta)
return cos_theta * points + sin_theta * np.cross(v, points) + dot * (1 - cos_theta) * v
```

```
def project(points, camera_params):
"""Convert 3-D points to 2-D by projecting onto images."""
points_proj = rotate(points, camera_params[:, :3])
points_proj += camera_params[:, 3:6]
points_proj = -points_proj[:, :2] / points_proj[:, 2, np.newaxis]
f = camera_params[:, 6]
k1 = camera_params[:, 7]
k2 = camera_params[:, 8]
n = np.sum(points_proj**2, axis=1)
r = 1 + k1 * n + k2 * n**2
points_proj *= (r * f)[:, np.newaxis]
return points_proj
```

```
def fun(params, n_cameras, n_points, camera_indices, point_indices, points_2d):
"""Compute residuals.
`params` contains camera parameters and 3-D coordinates.
"""
camera_params = params[:n_cameras * 9].reshape((n_cameras, 9))
points_3d = params[n_cameras * 9:].reshape((n_points, 3))
points_proj = project(points_3d[point_indices], camera_params[camera_indices])
return (points_proj - points_2d).ravel()
```

You can see that computing Jacobian of `fun`

is cumbersome, thus we will rely on the finite difference approximation. To make this process time feasible we provide Jacobian sparsity structure (i. e. mark elements which are known to be non-zero):

```
from scipy.sparse import lil_matrix
```

```
def bundle_adjustment_sparsity(n_cameras, n_points, camera_indices, point_indices):
m = camera_indices.size * 2
n = n_cameras * 9 + n_points * 3
A = lil_matrix((m, n), dtype=int)
i = np.arange(camera_indices.size)
for s in range(9):
A[2 * i, camera_indices * 9 + s] = 1
A[2 * i + 1, camera_indices * 9 + s] = 1
for s in range(3):
A[2 * i, n_cameras * 9 + point_indices * 3 + s] = 1
A[2 * i + 1, n_cameras * 9 + point_indices * 3 + s] = 1
return A
```

Now we are ready to run optimization. Let's visualize residuals evaluated with the initial parameters.

```
%matplotlib inline
import matplotlib.pyplot as plt
```

```
x0 = np.hstack((camera_params.ravel(), points_3d.ravel()))
```

```
f0 = fun(x0, n_cameras, n_points, camera_indices, point_indices, points_2d)
```

```
plt.plot(f0)
```

```
A = bundle_adjustment_sparsity(n_cameras, n_points, camera_indices, point_indices)
```

```
import time
from scipy.optimize import least_squares
```

```
t0 = time.time()
res = least_squares(fun, x0, jac_sparsity=A, verbose=2, x_scale='jac', ftol=1e-4, method='trf',
args=(n_cameras, n_points, camera_indices, point_indices, points_2d))
t1 = time.time()
```

```
print("Optimization took {0:.0f} seconds".format(t1 - t0))
```

Setting `scaling='jac'`

was done to automatically scale the variables and equalize their influence on the cost function (clearly the camera parameters and coordinates of the points are very different entities). This option turned out to be crucial for successfull bundle adjustment.

Now let's plot residuals at the found solution:

```
plt.plot(res.fun)
```

We see much better picture of residuals now, with the mean being very close to zero. There are some spikes left. It can be explained by outliers in the data, or, possibly, the algorithm found a local minimum (very good one though) or didn't converged enough. Note that the algorithm worked with Jacobian finite difference aproximate, which can potentially block the progress near the minimum because of insufficient accuracy (but again, computing exact Jacobian for this problem is quite difficult).

*Section author: Nikolay Mayorov*