Jul 272010

Thesis Progress

In this paper, progress made on the thesis of Aaron Johnson during the summer semester of 2010 is outlined. The purpose of the software is to recognize and distinguish the road from visual and other types of data. The software includes unmodified third party software, modified software, and custom algorithms.

Software and Algorithms Used

Figure 1
The current state of the software is pictured to the left. The upper-left square is the current location of the vehicle in latitude/longitude coordinates. The upper-right square is map data and the current location of the vehicle. The bottom-left image is incoming camera data, and the bottom-right image is a perspective view of the map data.

The hardware and software used is available to any consumer. The GPS device used is the “GPS 18x OEM” by Garmin, a USB device which echoes the current latitude and longitude once per second. The camera used is the Rocketfish 2MP USB camera, and the computer used has a 2.53 GHz Intel Core 2 Due processor and runs Mac OS X 10.5.8 with Parallels Desktop 4.0 used to run Windows XP SP2.

The live camera data shown in the lower-left of Figure 1 is a result of custom software created after researching Java Media Framework. Descriptions of the other software and algorithms used can be found below.

Reading the Current Location

After connecting the GPS device, third party software provided by Garmin called “Spanner” interfaces with the device and echoes standard NMEA sentences to a COM port. Thus, in order to read the data from the device, software is used to listen to the COM port.

For connecting to the COM port, existing software called SerTerm is used. It was necessary to read and understand the code used in this program in order to modify it for the needs of reading the GPS data. It was modified in the following ways to allow the location data to be used.

  • The GUI is hidden,
  • a connection is created automatically when the program starts,
  • the NMEA data read is parsed at regular intervals when it arrives,
  • the latitude/longitude data is converted from degrees, minutes, seconds to decimal degrees, and
  • the program is adjusted to save the location data to a custom Java interface.

Reading Tiger Map Data

The map data used is freely available “Tiger” map data provided by the US Government. In order to read this data, which comes in a binary format (“ShapeFile” format), a program named OpenMap 4.6.5 was used. OpenMap is “a Java Beans based toolkit for building applications and applets needing geographic information” (OpenMap). It provides a variety of functionality and interfaces and consists of about 350,000 lines of code. In order to read the ShapeFile data and display the road data visually as lines on the screen, the following was done.

The OpenMap documentation was reviewed and code for related classes was modified in order to make previously unavailable data accessible (line data is converted from radians to degrees using already existing OpenMap methods). The line data is exported to a comma separated values (CSV) file where each road and each piece of a road is represented as a set of multiple latitude/longitude coordinates.

Custom software was created to read the road data from the CSV files, convert the latitude/longitude points to points relative to the screen, and finally, to draw lines representing the road. This exercise in computer graphics involved understanding the relationship between screen pixels and geospatial data enough to convert between the two, define the viewing window size (zoom), and assure the aspect ratio remained correct during rendering regardless of the height and width of the viewing area.

The location data from the GPS is used in conjunction with the map data to display the current location of the vehicle, as shown in the upper-right of Figure 1 above. Unfortunately, the real-time GPS data and the map data are not perfect and therefore it’s necessary to normalize the two data sets in order to determine the true location and direction of travel of the vehicle. A number of different techniques were considered before the chosen technique.

An example of the incongruent data and the solution found follows. On any given road, the location data from the GPS does not correspond to a location exactly on the road – the location given by the GPS may be slightly to the right of the location of the road, for example. Suppose the vehicle is traveling North on a road and approaching an intersection and that the GPS data tells us the true location is slightly to the right of the road. Unfortunately, we cannot simply “snap” the current location to the nearest road and always have the current location be associated with the correct road. In this example, while traveling North towards an intersection, this method would at one point snap the current location to an East/West bound road instead of the correct one, since the location data may be off horizontally but not vertically. To solve this problem, the set of previous GPS coordinates is used to determine the current direction (angle) of travel. The angle of travel is then joined with the angle of the roads nearest the current GPS location and the current location is set as the nearest road with a corresponding angle of travel. This algorithm does not yield perfect results in all cases and can be improved on, but it is very effective in most cases.

Creating a Perspective View

Creating the perspective view image shown in the lower-right of Figure 1 involved learning about computer graphics and using tools available in Java. The perspective view is created from the rendered map data using a translation and transformation. As with the other software, additional custom programming was necessary to assure no extra lines were drawn and to assure the transformation would function correctly.

Edge Detection

Many academic papers covering road recognition used edge detection as one heuristic for accomplishing the task, and the same was done here. An edge detection algorithm was used to separate non-road surfaces from road surfaces in the 2D image from the camera, as pictured below in Figure 2.

Figure 2
The current state of the image detection algorithm is pictured to the left. The upper-left image is the original view of the road from the camera. The upper-right image is the image after edge detection is applied. The lower-left image is the edge detected image after a threshold is applied, and the lower-right image is the image with the threshold applied and the road highlighted in white.

The algorithm has a simple premise and as such, is sometimes effective and sometimes not. Edge detection is applied to the image and it is assumed that the road surface has little to no edges while non-road surfaces have many edges. As can be seen in Figure 2, this is especially true for trees, cars, and houses. One advantage of the algorithm is that after optimization it takes only about 32 milliseconds to apply it to an image. The disadvantages are many, and include all of the cases where there are additional unexpected edges on the road or little to no edges on a non-road surface, as shown in Figure 3 and Figure 4.

Figure 3
In this image shadows on the road create additional edges on the road surface which prevents the algorithm from detecting the complete road surface.

Figure 4
In this image a lack of edges on the trees on the right, grass on the left, and sidewalk on the right result in these non-road surfaces being highlighted.

The algorithm works as follows. Intensity values from the color image are used (the image is converted to black and white) before the edges are highlighted. To highlight the edges, the pixels are multiplied with a basic edge detect matrix ({{0,-1, 0},{-1,4,-1},{0,-1,0}}). This results in the non-thresholded edge detected image as can be seen in the upper-right of Figures 2, 3, and 4. The result is that the intensity values of edges are increased while the intensity values of non-edges are decreased. A simple threshold is applied to this image in the next step – if the intensity value of a pixel is above 27 then the pixel is kept as an “edge” and otherwise it is kept as a non-edge. Edges are assigned an intensity value of 100 while non-edges are assigned an intensity value of 0, and the result is displayed as shown in the lower-left of Figures 2, 3, and 4. The final step is searching through the image for areas where there are few edges. For every pixel in the image, the intensity values of the portion around that pixel are summed. If the sum is below a certain threshold, then the center pixel and the pixels around it are marked as road.

During the final phase of the algorithm some assumptions are made. For example, it is assumed that the road does not exist in the upper portion of the image, so this section is skipped during the search process. Additionally, a small portion of the lower part of the image is skipped, since in this application of the algorithm a portion of the vehicle can be seen. While generally it is safe to assume that the road will not exist in the top of the image, this will not be the case on mountain roads or in hilly areas like San Francisco, for example. Similarly, the very bottom of the image may contain road, depending on the location of the camera and the vehicle used.

A number of methods are used to speed up the algorithm described, with the effect that without these modifications it takes roughly 600 milliseconds to process one image, or about 200 times as long. All of the improvements to the algorithm are done by caching values which are used multiple times, or by caching the result of calculations which are done often ahead of time (before the algorithm is applied). An example of values that are cached is an RGB to intensity cache, which is simply a 3 dimensional matrix which when accessed with RGB values returns the intensity value computed from using the red, green, and blue values. Creating the matrix is as simple as going through every value of red, green, and blue from 0 to 255 and computing the intensity, which is simply

(((11 * red) + (16 * green) + (5 * blue)) / 32)

Computing these values once and caching them saves a significant amount of time when converting a color image to black and white intensity values. Possibly the most effective caching technique used, however, is used in the final step when summing the number of edges within a certain area near each pixel. Since every pixel in the image is searched and the pixels around every pixel are summed, many pixels are used multiple times during the summations. Caching these sums is complicated compared to the other cached values because it must be done while the algorithm is being executed instead of ahead of time, as the results are specific to each image being processed, and because which values are cached is dependent on the implementation of the algorithm. In the program as shown, pixels are searched from left to right so that the sum of each column being considered can be computed, cached, and reused when the area around the next pixel is computed.

The current implementation of the final algorithm searches a certain area of the image and does not take into account which areas were previously marked as road. An improvement to the algorithm is possible by beginning the search in the lower-center of the image and then “growing” the road section outwards only where each new section of the road directly touches an old section of road. This will likely have slightly better results for situations like the one shown in Figure 4, where the non-road portion of the image on the right is not directly connected with the correctly marked road at the bottom center of the image. It would not improve the case where there are extra shadows in the road however, as shown in Figure 3.

Current Results and Future Improvements

The current best result of this work is the edge detection algorithm, as currently the map data and perspective view transformation have not been joined with the edge detection to aid in recognizing the location of the road in the image. In the future, the map data will be used in conjunction with the edge detection algorithm to better approximate the location of the road. This may be a very valuable heuristic, as the map data tells us the curvature of the road ahead and informs us of intersections, both of which are difficult to extract using an edge detection algorithm alone.

There are many opportunities for improvement, including using color information, intensity information, map data, and curve fitting as additional heuristics for road recognition. The most promising avenues for improving the current body of knowledge are either incorporating many multiple heuristics at once or simply joining the map data with the edge detection algorithm effectively.


Garmin. “GPS 18x OEM.” Last accessed July 25, 2010. https://buy.garmin.com/shop/shop.do?pID=27594

“NMEA 0183” Wikipedia and Contributors. Updated July 15, 2010. Last accessed July 25, 2010. http://en.wikipedia.org/wiki/NMEA

SerTerm.” Google Project Hosting. Last accessed July 25, 2010. http://code.google.com/p/serterm/

“Tiger Shapefile Data.” U.S. Census Bureau – Tiger/Line. Geography Division. Last updated June 29, 2010. Last accessed July 25, 2010. http://www.census.gov/geo/www/tiger/

OpenMap.” Raytheon / BBN Technologues. Last updated March 5, 2009. Last accessed July 25, 2010. http://www.openmap.org/

 Posted by at 8:57 pm
Jul 182010

Perspective Transform from Map Data to Road Image

Transforming a flat map into what the road image looks like is done by squishing the top of the image down and stretching the bottom two points outward. As in any other image, this changes the perceived perspective to be similar to our own. Some issues in making the transformed image match the road image are as follows. The ending height of the transformed image is dependent on the visible horizon. So if the visible horizon is very high in the image then, of course, the squished/transformed image must match this height. Another issue with height is the distance that the viewer can see. If the viewer can see a mile into the distance, then the before-transformed map image should be squished starting at a point higher on the map (i.e. a mile up), so that the correct road data will correspond to the visible road (i.e. so the length of the visible road will match the length of the map data that is overlaid on top of it). So there are at least two things we must know to transform the flat map image correctly to correspond to the perspective view of the road: the height of the horizon and the distance to the horizon. Unfortunately these things aren’t easily available. I think that people will rarely be able to see a mile in front of them while driving however, unless they live in the middle of a particular “nowhere” which happens to be flat, like Kansas or Nevada. In most situations the distance to the horizon is probably significantly limited by buildings, turns, etc.

Perspective Transform in Java

Doing a perspective transformation on a set of lines seems to be simple. However, there are some issues applying it to the road lines. When lines are drawn too far off screen the transformation doesn’t work correctly. Since the road lines drawn currently includes many off-screen lines, it’s not working… It may seem like it’s only necessary to figure out which lines actually need to be drawn and which don’t before doing the transformation. This is easier said than done though, when considering lines where neither point is visible, but the line crosses through a visible area. Also, this probably won’t work since some lines may be extremely long and still pass through the visible area (and still cause the error since they’re so long). So in reality it may be necessary to find only the points on each line which cross through the visible area, thus minimizing the length of the line and cutting off any part of the line not within the visible area, and then applying the transformation after all that is done.

 Posted by at 8:42 pm
Jul 122010

Noisy Map Data

The data actually obtained from the GPS device while driving does not match the freely available Tiger map data. This can’t be solved accurately by assuming the closest road point is the current position. Imagine a road like this.


The above lines are meant to represent two intersecting roads – one going directly North/South and the other going directly East/West. On the road pictured above, if the the map or GPS data is off slightly such that the GPS shows the current position as slightly to the right of the actual road then there is an issue. If traveling North from the bottom up and the current position is to the right of the road by some amount, say by two underscores (“__”) as shown in the image, then there will be a point during traveling upward that the closest road point will no longer be the North/South road even though the driver is still traveling upwards. Furthermore the “current location” marker would jump from the North/South road to the East/West road when the occurred, which wouldn’t look very appealing, besides being just wrong.

To clean up the noisy data effectively the angle of travel should be taken into account. The current and previous angles of travel can be obtained by saving the previous GPS locations of the vehicle. This will give you a list of angles. The map data itself is just a set of points, so the angle of each road section can be computed as well. These two sets of “angle” data can be brought together. The “current location” should snap to the nearest road section with an angle that is similar to the current angle of travel.

Issues Etc With Matching Angles

Although I haven’t tried this yet I imagine it will work quite well. When on a road with a long curve, the algorithm should choose the section of road with the closest matching angle, since there will be many road sections to choose from. When in the situation of a road section with a non-matching angle being the closest road, the algorithm will disregard the incorrect road section since the angle doesn’t match.

The only remaining issue I can think of currently is the following. If 1) the GPS location is off, say to the right by some amount and 2) then the driver turns right onto a road, then snapping to the nearest location on the road will result in a jump from where the two roads meet to a bit to the right on the new road. This could be avoided by detecting when a change of road happens and assuring somehow that the current position as displayed on the map is adjusted to be very near the intersection, but I imagine this won’t be necessary for my limited application for my thesis.

Note: I implemented this this week, only considering the current angle of travel, and it seems to work well.

Floating Road with Edge Detection

I realized this week that it is possible to improve my algorithm to get rid of the floating road pieces. Currently it marks road everywhere, then goes back through the image and “searches up” to look for spaces, and then goes through the image horizontally to get rid of road pieces which are too thin. It could simply assume that road pieces below a certain level are “ok” and begin searching horizontally across the bottom of the image. It would then only include further road pieces if they are touching the road pieces which already exist in the image. This algorithm may or may not be slower since it has to “grow” the road outward from certain pieces, but it may be faster because it only has to process the road points once instead of three times.

Additionally, in some cases it’s not be necessary to search every single pixel when searching for the road, since when examining “one pixel” it really looks at a square matrix of pixels for the edge density (i.e. “the road” has a certain width). Once the algorithm has found one road pixel it could then search the next furthest pixel which would result in a new section of road that touches the current section. Ordering the search in the manner will improve speed because it wouldn’t be necessary to search any of the pixels between two touching road sections, since the road sections have a certain width. The remaining issue for the new algorithm is when the next searched pixel is not recognized as a section of road.

Assume the road width (aka matrix width) is “roadWidth” pixels. The algorithm should begin the search at x where x is (roadWidth / 2). If this is recognized as road then the next pixel to search would be (currentX + roadWidth); call this pixel newX. If this second pixel (newX) is not recognized as road then we still must search all of the pixels between currentX and newX, beginning at newX and moving toward currentX, to try to expand the area recognized as road. … This would be complicated to implement efficiently.

Jul 052010

On many images there are random clusters of points that have been marked as road which are well above the road, in the grass, or on houses. They’re easily recognizable even without the original image because there’s a space between the large chunk of recognized road at the bottom and the incorrectly recognized road above it. I tried counting the number of pixels that were close to any given pixel to remove the chunks, but this didn’t work – the bad road pixels have a lot of other bad road pixels around them.

Next I tried “searching up” through the image and looking for any road that is floating above another section of road and getting rid of any “floating” road sections. This left thin vertical streaks of road in some places where some “floating” road squares were slightly touching lower road squares. So I then go back through the image horizontally and remove and sections of road that are not wide enough. The addition of this algorithm does give a slightly better result than without it and it doesn’t take much time, even on a 640×480 image.

I also sped up the various algorithms used by caching things that have to be computed often, such as an intensity-value-to-RGB-int cache and a red/green/blue to intensity value cache.

The quality of this is still mediocre though. There need to be either more or better heuristics involved.

 Posted by at 8:42 pm