Steven Riddle 2008/2009

Stereo Vision based Obstacle Avoidance

Abstract

The purpose of this paper is describe an algorithm capable of running on embedded hardware. The hardware to be used is a Beagleboard, which is a single-board computer using an OMAP3 ARM processor capable of 600 mhz. The algorithm described can run real-time on this hardware because its resolution is reduced to reduce computation. Using the optimized reduced resolution algorithm, the detail in the depth map is still sufficient to avoid obstacles.

Introduction

Stereo computer vision is a problem in computer science with many solutions, each with their own problems. Given two images that were taken at the same height and angle, I attempted to find the depth of every pixel in those images. Our brains solve this problem 30 times a second to provide us with depth perception. Stereo computer vision is a branch of computer science that attempts to solve this problem and create disparity maps giving depth approximations for any given set of images.

In this paper I will be discussing two stereo algorithms, the hardware and software used in constructing my robot and the reasons why I chose them. There are also several major problems with stereo algorithms, such as occlusions, texture-less areas and repetition, which will also be discussed here.

Algorithm

Stereo computer vision has always been a computationally intensive task. The algorithm must match pixels on one image to the corresponding pixels in another image. Any image processing is at least an O(n2) algorithm and since most stereo algorithm use small windows to match pixels in one image to those in another. This increases the algorithm to O(n2 * w2 * d) where w is the size of the window and d is the range of disparities we are searching through.

Most stereo algorithms use the epi-polar constraint in order to reduce the window search to a one dimensional search. The idea is that any pixels along a scanline in one image will have its corresponding pixels on the same scanline in the other image. If the pixels were not on the same scanline, the search would have to be two dimensional which would increase the complexity of the algorithm.

The first algorithm I attempted to use was a Sum of Absolute Differences stereo algorithm. The principle behind this algorithm is to use a window (a small region) of pixels in right eye image and subtract the pixels from another window of the same size in left eye image. The process then repeats itself by moving the window in the left image along a line. The line represents the possible levels of disparity between the two images. The larger the disparity is, the closer the object is to the viewer. The smaller the disparity is, the further away it is from the viewer. A disparity of 0 represents infinity, as an object extremely far away will appear at the same position for both eyes. The process assumes that the epi-polar constraint is satisfied. We use the absolute of the subtraction of the pixels from one window to another and sum them up into a result. The result with the lowest absolute difference is assumed to the correct disparity (depth).



The next algorithm I tried was the Sum of the Squared Differences algorithm. The algorithm is very similar to the SAD algorithm but instead of taking the absolute value of the difference, you take the squared. The result is always positive and differences larger than 1 between the two images at that disparity affect the sum more because they are squared. As can be seen from figures 4 and 5 (SSD) and figures 2 and 3 (SAD) there is not much difference in using SSD to SAD. Therefore for optimization purposes I have opted to use SAD since it requires less multiplication.



While using this algorithm, one notices that there is a problem with the matching process. In the event that there is little or no detail in an area of the image, the matching function will find that there are many low matches for its window in the other image. Since there is no unique feature to ensure only one match, the result is generally incorrect. Take the kitchen example in figure 1. The area inside the rectangle in the left image is without texture or unique features. The region in the right images shows the range of disparty searched. The region of search also has no unique features either and so therefore the entire search range will provide a low match for the SAD algorithm.


Fig. 1 - Kitchen Stereo Image set

To get around the problem of regions without texture, I increased the size of the window along the x axis until there was an area with detail in it. The enlarged region would then have a higher chance of matching correctly. It still has problems, though, because certain boundaries of an object may not have a hard edge behind them, so enlarging the region may include areas which have a change in disparity. Finding a threshold for how much change is in a window proved to impossible without either including areas not part of the object or being too sensitive and not including a large enough area to provide an accurate match. A better method that I did not have time to implement could have been to use color segmentation.

Instead of using the SAD algorithm in the standard fashion which uses a window to calculate disparity for a single pixel, I opted to use the window to calulate the disparity for the entire window. Although this leads to artifacts when using a small window, as can be seen in figure 2, using a larger window produces far fewer artifacts. Even with a large window, artifacts can still occur in regions of low detail, and areas are also missed, such as the arm of the lamp in figure 3. I used the disparity of the window for the entire region of the window, as opposed to center pixel of the window , for optimization purposes. To ensure that the algorithm can be executed with low hardware requirement I have reduced the algorithm to O(n2*d) by removing the need to calculate a window per pixel.


Tsukuba Stereo Image set
Fig. 2 - SAD 3x3 Window Fig. 3 - SAD 21x21 Window
Fig. 4 - SSD 3x3 Window Fig. 5 - SSD 21x21 Window

Hardware

Beagleboard


The Beagleboard is the mobile computer I used in building my robot. The processor is a 600mhz ARM Cortex-A8 processor. It is a system on a chip board with a video processor capable of OpenGL ES 2.0 and a DSP capable of HD video acceleration.

The Beagleboard presented some problems, such as the lack of OpenGL drivers and lack of documentation on the C64x processor for most of the time that I spent on my thesis. It also had some difficulties with USB support. The board has only one USB port, so a hub had to be used. The USB hub also needed to be powered, as the Beagleboard's single USB port supplies only 100mA of power. The board's USB 2.0 support on the OTG (On The Go) port at the bottom of the image is broken. The broken support was causing buffer problems with my webcams with any USB 2.0 hubs, so a USB 1.1 powered hub had to be used to force the camera to use USB 1.1 instead of 2.0.

Linksys USBHUB4C ProConnect Compact USB 4-Port Hub


The hub used to connect the cameras and the Minyboard to the beagleboard. This hub was particularly hard to find, as it is outdated technology and most USB 1.1 Hubs were not powered.

Minyboard


The motor controller board for the robot. An ATMega168 Microcontroller that can be commanded via a USB connection using a CR2101 USB to serial connection. The board can use pulsed width modulation to control the speed of the motors.

The Minyboard was easy to set up and flash with a simple program to accept commands using the USB connection. The program written used the upper 4 bits of the byte sent to represent the left motor's speed and direction and the lower 4 bits to represent the right. The Minyboard was programmed to keep the motors at the last given speed until sent another byte.

Lithium Ion Battery


An old laptop battery from a Dell was used as the power supply for the robot. The battery used to be a 14.8V 3600mAh battery but was converted to 7.4V 7200mAh instead as there was no need for the higher voltage and increased battery life would be more beneficial.

Dalek Toy


A toy Dalek was used as the shell for the robot because of its maneuverable base and because I'm a fan of Doctor Who. Two holes were drilled in the front of its base so the cameras could be mounted on the inside. The base has been modified with a wood-based mounting board for mounting all the components. The Dalek has 2 motors in the back and a centered free moving wheel in the front, which makes it maneuverable like a tank. Rotating in place was very desirable as it makes it less likely to hit anything while it rotates. The original 4-AA-battery power supply is still used to power the Minyboard, which powers the motors.

Logitech Quickcam Connect


These are the cameras that are used in my robot. I have found several drawbacks to using these cameras, which have caused many problems for my thesis. These webcams only support adjusting Auto-Whitebalance (boolean), Brightness, Contrast and Gain. They have a maximum resolution of 640x480 but were used at 320x240 for speed.

There is still some auto-balance for brightness which cannot be turned off, which does cause problems when using an SAD or SSD algorithm. Since one image may be brighter than the other, what normally should be a perfect match will leave a uniform "leftover" result from the algorithm. If this leftover result is large enough another disparity level may have a lower result than the correct disparity.

Software

The BeagleBoard OS: Linux

Linux was the best choice because I was familiar with it and it is open source. I tried several kernel versions until I found one that worked correctly. Starting with 2.6.26-r64, I had problems with USB and was receiving Bad Huffman codes from my webcams. After several other, different, versions compiled I found that 2.6.28 had the fix needed to remove the Bad Huffman codes. The drivers used in the kernel for the cameras was the GSPCA driver using the zc3xx module. The drivers required to get the Minyboard running was the CR2101 kernel driver. The kernel driver is actually the driver for the CR2101 serial to USB chip used on the Minyboard. The Beagleboard also uses its own OMAPFB driver for video display, a simple framebuffer driver that allows a framebuffer console and X windows to be used.

Minyboard

The software written for the Minyboard was simple. There is no OS for the Minyboard. The program was written and flashed directly to the ATMega168 processor. The program opens a serial connection and waits for a byte. Once the byte is received it splits the byte into its upper and lower 4 bits. The upper 4 bits control the right motor and the lower 4 bits control the left. Both sets of 4 bits are interpreted as signed so that the motors can work in reverse.

Stereo Vision Program

The stereo vision program was written based on a free program called SVV (Simple Video Viewer). The code is a simple example of how to retrieve webcam images using V4L (Video for Linux). The example was modified to use 2 cameras and perform an SAD algorithm on the two retrived images. The resulting disparity map was combined with the two frames to display all three images at once in a window for testing purposes, but is removed in the final result. The program was also modified to accept two images from the console.

Further modification to the program was made so that the robot could make decisions based on the disparity map found. Using information from the disparity map, the robot would decide to go forward, turn left or turn right. The robot splits the disparity image it sees into 3 regions: left, center and right. The robot decides to move forward if there are no close disparities in all three regions. If there are disparities close in the right region it will turn left. If there are disparities close in the left region it will turn right. If there are close disparities in all regions it will continue turning in the direction it last turned. A close disparity in a region is a window that has a disparity of more than 80% of the maximum disparity range.

Assembling the Robot

The robot's base had several areas where it was possible to screw on a wooden base that I would use to mount all the hardware. I built a three-level framework, as can be seen in figure 7. At the bottom level is the battery and the voltage regulator. The voltage regulator was required to lower the voltage of the battery to 5V for the Beagleboard and the USB hub. The voltage regulator is connected to a switch below the bottom level for switching it on and off. The voltage regulator's output is connected to the Beagleboard and the USB hub.

On the middle level there is the Minyboard and the USB hub. The Minyboard draws its power from the orignal battery area provided with the toy. The Minyboard connects to the USB hub by using a cable that has had one male connector removed in order to solder it directly to the Minyboard. The USB hub's USB male connector has been removed and the wires are directly soldered to the Beagleboard which has also had its female connector removed. I removed the connectors because they take up too much space. On the third level is the beagleboard. It is held up at three points by three motherboard spacers.

The shell of the robot has two holes dremelled in the front. Glued inside the shell are the two Logitech Quickcam cameras as can be seen in figure 8. These cameras connect to the USB hub using standard connectors. Figure 9 shows the entire shell.

The result is in figure 10 which is the fully assembled Dalek. After switching the Dalek robot on, Linux boots and initializes the cameras and Minyboard and then executes my stereo vision program.


Fig. 7 - Dalek base with hardware mounted to wooden frame

Fig. 8 - Dalek shell with cameras in dremelled holes

Fig. 9 - Dalek shell

Fig. 10 - Fully Assembled Dalek

Problems Encountered with Stereo Vision

During my development on stereo vision I found that the main problem with most algorithms are the following:

Occlusions: When one pixel in one image is not present in the other image because it is covered by a foreground object, the pixel should not be matched to anything.

Texture-less Areas: When an area is approximately the same color over a whole patch, like a white wall, there is a problem with window matching in figuring out which possible match is most likely, as the rest of the wall in the matching image will have many false positive matches.

Depth Discontinuities: When using window matching, the window being used can only return an accurate disparity measurement if there is only one disparity in the window. When multiple disparities occur in a window this can lead to false matches because the algorithm will be most likely to select the disparity that represent the majority of the window and so therefore would be incorrect about the rest of the window.

Repetition: When a textured area repeats, this also causes matching problems because the repeated areas will cause false matches if one of the repetitions has a lesser SAD than the actual matching window.

I have also found during my development that most solutions to these problems seem to either consume large amounts of processing power or will end up trading one kind of a problem for another. For example, when using the technique with enlarging the window area to include areas with detail when there is insufficient detail to a region, this solution does solve the problem in areas without detail but can overrun into regions with sufficient detail under certain circumstances and give a false result for that area.

Conclusion and Future Development

I have enjoyed working on this project and I will continue to improve the Dalek. The Dalek can avoid most objects using its reduced-resolution disparity map. It does fail, however, when faced with a large textureless area. In order to avoid this, the Dalek would need to keep track of disparities it has seen in a form of a point cloud, which could be a future development. In my point of view, since the software can avoid obstacles under most circumstances it is a success.

I can see myself improving on the Dalek in the future by converting its SAD algorithm into a pixel shader algorithm which the Beagleboard does support. A pixel shader algorithm is a program that runs on video cards. Pixel shaders use a vertor-based assembly language which can be used to manipulate RGBA values quickly, and has certain operations such as reciprocal square root that run in one clock cycle, which are normally very costly operations on a processor. A variance normalised correlation algorithm could be used in a pixel shader very efficiently. The advantage of this algorithm is that different levels of brightness in the images received from the cameras would not affect the algorithm. This algorithm uses many costly operations such as square roots, which were considered too costly to put in this project due to limitations of hardware at the time. The OpenGL ES 2.0 drivers have not been available during the time spent on this project. Once available I plan to modify the algorithm to use pixel shaders.

Another modification that I plan to add is the ability to control the robot from an iPhone and display the disparity map, left and right camera image on the phone aswell. The Dalek will need a WIFI adapter added to it and the iPhone will be able to control the Dalek using its accelerometers and its WIFI.