Auto Focus Implementation


To properly focus a camera, the computer will need to be able to evaluate the relative quality of focus as the lens is adjusted. This is easy for a human, but not so simple for a computer.

There are several proven approaches to determining focus quality using mathematics and image processing. These typically rely on the observation that, compared to out-of-focus, in-focus images tend to have brighter bright areas, darker dark areas and increased sharpness and detail. Mathematically, this translates into wider dynamic range, higher spatial frequencies and higher gradient slopes, respectively.

Use of dynamic range was eliminated because it is sensitive to varying light – a serious issue in welding. Between the remaining two categories, gradient analysis was chosen over frequency analysis (FFT) because it is a very fast calculation which better matches the most difficult welding setup focus, grey, smooth metal.

The chosen calculation is called “Threshold Gradient Maximum” (TGM).  In TGM, the first derivative is calculated at each pixel and summed up in all pixels where it exceeds a threshold value.

A standard Sobel image operator is used to calculate the derivative. The operator is applied twice, once along the horizontal and once in the vertical direction of an image. The Sobel matrices for the x and y directions in an image are

$latex s_x = \left[
\begin{array}{ c c c}
-1 & 0 & 1 \\
-2 & 0 & 2 \\
-1 & 0 & 1
s_y = \left[\begin{array}{ c c c}
1 & 2 & 1 \\
0 & 0 & 0 \\
-1 & -2 & -1

To obtain a general result for edges with arbitrary orientation, the X and Y Sobel results (partial derivatives) are combined. The calculation of the full Sobel, “S” is similar to the hypotenuse of a right triangle.

$latex S = \sqrt{s_x^2 + s_y^2}&bg=e1e1e1$

Calculating a square-root is an expensive calculation in CPU cycles and can be replaced by an approximation that sums absolute values (Equation 3). Although geometrically in-accurate, this is still a good predictor of focus quality.

$latex S = |s_x| + |s_y|&bg=e1e1e1$

If the video has noise in the image, typical of low-lighting where thermal noise adds “snow” to the scene, the Sobel function becomes overwhelmed and the true focus scores are obscured.

Experiments with actual weld scenes showed that a threshold that floats with the mean of the combined Sobel result is effective at suppressing image noise. This equation shows the chosen threshold of 2.3 times the mean of the combined pixel scores.

$latex threshold = \frac{2.3}{\mathit{PixelCount}} * \sum_{Pixels} S&bg=e1e1e1$

The focus function value, or score, is the sum of the Sobel results that exceed the threshold for all pixels in the focus region:

$latex ImageScore = \sum_{Pixels} \left[
if (S > threshold):S \\

By repeating this process over a range of focus positions, we create a focus function whose maximum represents the position of best lens focus. An example focus function is shown below, with the actual best focus position highlighted with a red dot. The maximum function value identifies best focus, while the slope of the function around maximum indicates the best direction to move for focus improvement.

Example Focus Function Over Image Frames Collected at Different Focus Positions

Example Focus Function Over Image Frames Collected at Different Focus Positions

The image sequence below shows the progression of focus functions as the focus is changed. The scene has a white welding rod on a steel plate background.

In the center of each image, where the focus is calculated, the focus function is shown. Pixels whose gradient energy below the threshold are shown in black, while those above the threshold are shown in grayscale (higher Sobel scores produce whiter pixels). The surrounding image is left unchanged for reference.

Visual Demonstration of Focus Function over a Range of Focus Positions

Visual Demonstration of Focus Function over a Range of Focus Positions

The image sequence starts in poor focus, moves through best focus, and then beyond. In the first frame (upper left), the focus function detects the rod edges with a sprinkling of white pixels, but most pixels remain black because their Sobel score (gradient energy) is below the threshold.

As the frames progress (top left to right, then bottom left to right), the edges of the rod become whiter and have more density. The peak (best focus) is shown in frame 89 (lower left). After this, the edges fade again.


With the Sobel function, the computer can judge the relative quality of focus for each possible focus position, however, with over 800 possible positions, it can take 20 to 25 seconds to evaluate them individually.

To speed this process, a 3-layer search algorithm is employed:

1) Check if a focus can be found nearby (focus changes can be small between camera setups). A short search is done of positions near the current focus position. If a slope indicates that a peak is nearby, the search continues to locate it. This takes approximately 2 seconds.

2) If no focus is nearby (Sobel function is flat ), a quick end-to end search is performed. Using a large step-size, e.g. 50, the system checks for indications of a nearby peak. If activity is found in one of the big-step samples, a local search, like that in step 1, is used to locate the actual peak. The time to find a focus if step 2 is required is approximately 7 seconds.

3) If the fast scan cannot find a peak, then the slow end-to-end scan of the full 800 focus positions is made, with the highest scoring position becoming the found focus point.  Slow, but sure, this full search takes up to 20 seconds.
7 Sears Road, Wayland, MA 01778-2101