Sep 072010

The Masters Thesis is no more. Now there will only be a Master’s Project, which requires only a 25 page writeup instead of a 50 page one. I’m also working with a different adviser. Currently I’m somewhere on page 11 with the writeup and I’m not sure I have enough material to fill 25 pages. If I come up short it’s likely that I’ll add a section on loading the map data to a database for retrieval, or a section on using the full history of the traveled path in order to find the current road location more accurately (instead of the current method of matching to one of the two nearest roads based on angle).

 Posted by at 6:30 pm
Jul 272010

Thesis Progress

In this paper, progress made on the thesis of Aaron Johnson during the summer semester of 2010 is outlined. The purpose of the software is to recognize and distinguish the road from visual and other types of data. The software includes unmodified third party software, modified software, and custom algorithms.

Software and Algorithms Used

Figure 1
The current state of the software is pictured to the left. The upper-left square is the current location of the vehicle in latitude/longitude coordinates. The upper-right square is map data and the current location of the vehicle. The bottom-left image is incoming camera data, and the bottom-right image is a perspective view of the map data.

The hardware and software used is available to any consumer. The GPS device used is the “GPS 18x OEM” by Garmin, a USB device which echoes the current latitude and longitude once per second. The camera used is the Rocketfish 2MP USB camera, and the computer used has a 2.53 GHz Intel Core 2 Due processor and runs Mac OS X 10.5.8 with Parallels Desktop 4.0 used to run Windows XP SP2.

The live camera data shown in the lower-left of Figure 1 is a result of custom software created after researching Java Media Framework. Descriptions of the other software and algorithms used can be found below.

Reading the Current Location

After connecting the GPS device, third party software provided by Garmin called “Spanner” interfaces with the device and echoes standard NMEA sentences to a COM port. Thus, in order to read the data from the device, software is used to listen to the COM port.

For connecting to the COM port, existing software called SerTerm is used. It was necessary to read and understand the code used in this program in order to modify it for the needs of reading the GPS data. It was modified in the following ways to allow the location data to be used.

  • The GUI is hidden,
  • a connection is created automatically when the program starts,
  • the NMEA data read is parsed at regular intervals when it arrives,
  • the latitude/longitude data is converted from degrees, minutes, seconds to decimal degrees, and
  • the program is adjusted to save the location data to a custom Java interface.

Reading Tiger Map Data

The map data used is freely available “Tiger” map data provided by the US Government. In order to read this data, which comes in a binary format (“ShapeFile” format), a program named OpenMap 4.6.5 was used. OpenMap is “a Java Beans based toolkit for building applications and applets needing geographic information” (OpenMap). It provides a variety of functionality and interfaces and consists of about 350,000 lines of code. In order to read the ShapeFile data and display the road data visually as lines on the screen, the following was done.

The OpenMap documentation was reviewed and code for related classes was modified in order to make previously unavailable data accessible (line data is converted from radians to degrees using already existing OpenMap methods). The line data is exported to a comma separated values (CSV) file where each road and each piece of a road is represented as a set of multiple latitude/longitude coordinates.

Custom software was created to read the road data from the CSV files, convert the latitude/longitude points to points relative to the screen, and finally, to draw lines representing the road. This exercise in computer graphics involved understanding the relationship between screen pixels and geospatial data enough to convert between the two, define the viewing window size (zoom), and assure the aspect ratio remained correct during rendering regardless of the height and width of the viewing area.

The location data from the GPS is used in conjunction with the map data to display the current location of the vehicle, as shown in the upper-right of Figure 1 above. Unfortunately, the real-time GPS data and the map data are not perfect and therefore it’s necessary to normalize the two data sets in order to determine the true location and direction of travel of the vehicle. A number of different techniques were considered before the chosen technique.

An example of the incongruent data and the solution found follows. On any given road, the location data from the GPS does not correspond to a location exactly on the road – the location given by the GPS may be slightly to the right of the location of the road, for example. Suppose the vehicle is traveling North on a road and approaching an intersection and that the GPS data tells us the true location is slightly to the right of the road. Unfortunately, we cannot simply “snap” the current location to the nearest road and always have the current location be associated with the correct road. In this example, while traveling North towards an intersection, this method would at one point snap the current location to an East/West bound road instead of the correct one, since the location data may be off horizontally but not vertically. To solve this problem, the set of previous GPS coordinates is used to determine the current direction (angle) of travel. The angle of travel is then joined with the angle of the roads nearest the current GPS location and the current location is set as the nearest road with a corresponding angle of travel. This algorithm does not yield perfect results in all cases and can be improved on, but it is very effective in most cases.

Creating a Perspective View

Creating the perspective view image shown in the lower-right of Figure 1 involved learning about computer graphics and using tools available in Java. The perspective view is created from the rendered map data using a translation and transformation. As with the other software, additional custom programming was necessary to assure no extra lines were drawn and to assure the transformation would function correctly.

Edge Detection

Many academic papers covering road recognition used edge detection as one heuristic for accomplishing the task, and the same was done here. An edge detection algorithm was used to separate non-road surfaces from road surfaces in the 2D image from the camera, as pictured below in Figure 2.

Figure 2
The current state of the image detection algorithm is pictured to the left. The upper-left image is the original view of the road from the camera. The upper-right image is the image after edge detection is applied. The lower-left image is the edge detected image after a threshold is applied, and the lower-right image is the image with the threshold applied and the road highlighted in white.

The algorithm has a simple premise and as such, is sometimes effective and sometimes not. Edge detection is applied to the image and it is assumed that the road surface has little to no edges while non-road surfaces have many edges. As can be seen in Figure 2, this is especially true for trees, cars, and houses. One advantage of the algorithm is that after optimization it takes only about 32 milliseconds to apply it to an image. The disadvantages are many, and include all of the cases where there are additional unexpected edges on the road or little to no edges on a non-road surface, as shown in Figure 3 and Figure 4.

Figure 3
In this image shadows on the road create additional edges on the road surface which prevents the algorithm from detecting the complete road surface.

Figure 4
In this image a lack of edges on the trees on the right, grass on the left, and sidewalk on the right result in these non-road surfaces being highlighted.

The algorithm works as follows. Intensity values from the color image are used (the image is converted to black and white) before the edges are highlighted. To highlight the edges, the pixels are multiplied with a basic edge detect matrix ({{0,-1, 0},{-1,4,-1},{0,-1,0}}). This results in the non-thresholded edge detected image as can be seen in the upper-right of Figures 2, 3, and 4. The result is that the intensity values of edges are increased while the intensity values of non-edges are decreased. A simple threshold is applied to this image in the next step – if the intensity value of a pixel is above 27 then the pixel is kept as an “edge” and otherwise it is kept as a non-edge. Edges are assigned an intensity value of 100 while non-edges are assigned an intensity value of 0, and the result is displayed as shown in the lower-left of Figures 2, 3, and 4. The final step is searching through the image for areas where there are few edges. For every pixel in the image, the intensity values of the portion around that pixel are summed. If the sum is below a certain threshold, then the center pixel and the pixels around it are marked as road.

During the final phase of the algorithm some assumptions are made. For example, it is assumed that the road does not exist in the upper portion of the image, so this section is skipped during the search process. Additionally, a small portion of the lower part of the image is skipped, since in this application of the algorithm a portion of the vehicle can be seen. While generally it is safe to assume that the road will not exist in the top of the image, this will not be the case on mountain roads or in hilly areas like San Francisco, for example. Similarly, the very bottom of the image may contain road, depending on the location of the camera and the vehicle used.

A number of methods are used to speed up the algorithm described, with the effect that without these modifications it takes roughly 600 milliseconds to process one image, or about 200 times as long. All of the improvements to the algorithm are done by caching values which are used multiple times, or by caching the result of calculations which are done often ahead of time (before the algorithm is applied). An example of values that are cached is an RGB to intensity cache, which is simply a 3 dimensional matrix which when accessed with RGB values returns the intensity value computed from using the red, green, and blue values. Creating the matrix is as simple as going through every value of red, green, and blue from 0 to 255 and computing the intensity, which is simply

(((11 * red) + (16 * green) + (5 * blue)) / 32)

Computing these values once and caching them saves a significant amount of time when converting a color image to black and white intensity values. Possibly the most effective caching technique used, however, is used in the final step when summing the number of edges within a certain area near each pixel. Since every pixel in the image is searched and the pixels around every pixel are summed, many pixels are used multiple times during the summations. Caching these sums is complicated compared to the other cached values because it must be done while the algorithm is being executed instead of ahead of time, as the results are specific to each image being processed, and because which values are cached is dependent on the implementation of the algorithm. In the program as shown, pixels are searched from left to right so that the sum of each column being considered can be computed, cached, and reused when the area around the next pixel is computed.

The current implementation of the final algorithm searches a certain area of the image and does not take into account which areas were previously marked as road. An improvement to the algorithm is possible by beginning the search in the lower-center of the image and then “growing” the road section outwards only where each new section of the road directly touches an old section of road. This will likely have slightly better results for situations like the one shown in Figure 4, where the non-road portion of the image on the right is not directly connected with the correctly marked road at the bottom center of the image. It would not improve the case where there are extra shadows in the road however, as shown in Figure 3.

Current Results and Future Improvements

The current best result of this work is the edge detection algorithm, as currently the map data and perspective view transformation have not been joined with the edge detection to aid in recognizing the location of the road in the image. In the future, the map data will be used in conjunction with the edge detection algorithm to better approximate the location of the road. This may be a very valuable heuristic, as the map data tells us the curvature of the road ahead and informs us of intersections, both of which are difficult to extract using an edge detection algorithm alone.

There are many opportunities for improvement, including using color information, intensity information, map data, and curve fitting as additional heuristics for road recognition. The most promising avenues for improving the current body of knowledge are either incorporating many multiple heuristics at once or simply joining the map data with the edge detection algorithm effectively.


Garmin. “GPS 18x OEM.” Last accessed July 25, 2010.

“NMEA 0183” Wikipedia and Contributors. Updated July 15, 2010. Last accessed July 25, 2010.

SerTerm.” Google Project Hosting. Last accessed July 25, 2010.

“Tiger Shapefile Data.” U.S. Census Bureau – Tiger/Line. Geography Division. Last updated June 29, 2010. Last accessed July 25, 2010.

OpenMap.” Raytheon / BBN Technologues. Last updated March 5, 2009. Last accessed July 25, 2010.

 Posted by at 8:57 pm