Vision-based World Modeling Using Linear Representation of Occupancy Function 
(PhD Thesis Proposal)

Dmitry O. Gorodnichy
Department of Computing Science
University of Alberta

May 22, 1998


The contents of the thesis is influenced by the following professors, the disscussions with whom is greatly appreciated:
W. W.Armstrong, Walter Bischof, Russ Greiner, Xiaobo LiHong Zhang
 
1. The Reasons for the Research

1.1 The Challenge of World Modeling for Mobile Robots

The age of robots is coming [Moravec 1998]. Robots are invading Mars, exploring the ocean, catering food etc [Mobile Robots]. More complete list of area where robots are used now can be found in the description of AAAI Mobile Robot Competitions.

No matter what a mission a robot is carrying out in the environment, it must be able to learn and maintain the model of its environment. This explains the increasingly growing interest of researchers all over the world in the problem of world modeling for mobile robots. This problem is difficult and far from being solved.

Let us name the major practical limitations on a robot`s ability to acquire accurate models of the world [Thrun 1995]:

Sensor Limitations
Whatever sensors are used, they are not perfect. They have limited resolution, they provide data corrupted by noise, and the main, they provide only a limited amount of information. In particular, they provide information only around the robot. To acquire global information, the robot has to actively explore its environment.
Odometric errors
Robot's motion is inaccurate due to the drift and slippage. Hence, incorrectness in object location estimation.
Complexity of the environment
The environment around a robot is complex and dynamic. In most cases it is impossible to model the environment using simple structures only.
Time and memory constraints

In most applications a robot should operate in real time. Robot has also hardware limitations.

1.2 Occupancy Model of the World 

The choice of the model to be used to describe the world depends on three factors: 1) environment, 2) sensors and 3) task goals.

Recent research has produced three fundamental paradigms for world modeling for mobile robots: metric (occupancy-based) paradigm, geometrical (beacon recognition and tracking based) paradigm and topological (graph-based) paradigm [World Models].

In our research we consider the occupancy-based approach, which is superior to others for its simplicity in building, representing and maintaining the world model. This approach however has also disadvantages (as compared to other approaches): it is inefficient for navigation planning and requires redundant space representation [Thrun 1997].

Originally developed for building 2D maps,the occupancy-based approach has been recently extended to build 3D models of the world, which provide much more information about the surrounding world than 2D maps do [Moravec 1996].

This is why in our research we focus our attention on the occupancy-based approach.

1.3 Visual Sensor for Building the World Model 

As has been indicated by Moravec, a visual sensor is much preferable to a sonar sensor when dealing with small resolution occupancy grids, and especially, when dealing with a 3D occupancy representation of the world.

Therefore, for our research we consider a visual sensor. At the same time, we are not interested in a very expensive highly calibrated stereo setup, but rather we consider the simplest possible stereo setup with only one (possibly badly calibrated) camera. We find the challenge in showing that this stereo setup is sufficient for acquiring a model of the world which would suffice to carry out the robot's tasks, the major one of which is navigation.

In our research we do not consider robot navigation and planning problems, which are big areas for research themselves. However we shall be aware of these problems, as these are the areas where the results of our research can be directly applied.

1.4 Parametric Representation of Occupancy Function 

The development of the 3D occupancy approach was impeded significantly by the amount of data needing to be calculated with every new sensor reading. In particular, Moravec mentions the following numbers. Building a model of a room 6 by 6 by 2 meters, requires the update of several millions voxels for each of about 2000 evidence rays obtained from one 768x576 image frame. That is why only with appearance of extra powerful workstations (with 100 MIPS speed and 100MB memory range) the experiments with 3D occupancy model of the world became possible.

This explains our aspiration to find another, different from a conventional voxel-based, representation of the occupancy function of the world, which would be more optimal in memory consumption and data processing and which would be more convenient for navigation.
 

2. Outline of the research

2.1 Our setup

Analyzing the previous work in sensor based world modeling for mobile robots, we establish the major differences and advances of our approach. Their description follows.

Environment

In building the model of the world we do not assume any specific geometry of the world. This is inspired by the desire to navigate the robot in the outdoor environment. In other words, we do not assume that features, which are extracted from video images, are geometrically constrained. That is, each feature is considered as an independent source of information about the surrounding environment.

This assumption allows the utilization of probabilistic approaches in fusing different sensor data and is very suitable for the occupancy representation of the world [Elfes 1989].

It should be mentioned that this approach is different from most 3D scene reconstruction from video images approaches like that of Faugeras or Ayachi [Structure from Motion].

A view of the environment we are going to deal with is shown in Figures 1 and 4.

Robot

In the case of building a metric model of the world for mobile robots, a robot needs to maintain its own position on this map of the model.

We associate the world model with the system of coordinates centered on the initial position of the robot. The robot and its model are shown in Figure 1. An image as observed by a robot is shown in Figure 2.

It should be mentioned that although in our research with use a RWI B12 robot, our research is directed towards building the simplest possible robot. In particular, we consider a robot consisting of three parts only: 1) a mobile platform, 2) a pant-tilt unit and 3) a camera.

The mission for the robot

No matter what an ultimate task of a mobile robot is, the first task for it remains the same -- to navigate in the unknown environment. More exactly, the robot has to know how to move towards a goal not bumping onto obstacles.

This is the mission we want our robot to carry out. A goal will be a marked object, which is either seen from robot's initial position or which the robot has to find by moving and exploring the environment.

Sensor data

A lot of research has been done in fusing data from sonar sensors [Sensor Fusion]. This is mainly attributed to the well defined model of a sound sensor as compared to that of a visual sensor.

For the task of acquiring precise 3D information however, visual sensors are much more preferable than sonar sensors [Moravec 1996].

Yet, fusion of data from visual sensors is not well developed. This is due to the fact that different stereo setups require different sensor models.

Another issue is that a highly calibrated dual or trinocular stereo system, which are mostly used now for depth acquisition [UBC, Structure from Motion], are expensive. This is a certain restriction in using them in robotics.

Thus, the promise and the challenge of the research is to find a way of acquiring 3D information, which will be used in building the world model, with the simplest possible stereo setup, in particular, a single camera only.

As opposed to a stereo setup with a fixed camera configuration, a single camera stereo allows arbitrary motion of the camera. This gives more flexibility in tracking the features and exploring the world, which is also an advantage of using a single camera stereo.

Single camera sensor readings

The readings from a single camera stereo, which are 3D depth data, are obtained by the following three stage procedure (see Figure 3). A set of features is selected in the first frame (stage 1). Each feature is then tracked along the epipolar line in the second frame, which is grabbed after camera has moved, and the best match is obtained (stage 2). The depth to those features which are selected and tracked is then calculated on the basis of the disparity of the features in two frames (stage 3).

As a result of either camera distortion, changing light conditions, incorrect registration of features, or odometry errors, the obtained 3D information is not certain. All the above mentioned factors affect the certainty of the obtained 3D depth data.

A typical depth map obtained with a single camera by looking around the robot is shown in Figure 4. The figure shows 1) the second video frame of the last pair of stereo images, 2) features, which were selected in the first frame (shown in black) and tracked in the second frame (shown in white) and 3) the projection of the acquired 3D data points (x,y,z) onto Oxy (bird's eye view) plane. The confidence of the point is shown by the intensity of the pixel: the whiter the pixel, the more confidence is associated with the point.

2.2 The main objective of the thesis

Each detected 3D point (either correctly or incorrectly detected) provides a continuum of data for world modeling. In particular, it provides information about an occupied point - the detected 3D point itself, and unoccupied (empty) points - between the camera and the occupied point

As can be seen from Figure 4, a visual sensor provides uncertain and sometime contradictory data. The occupancy model of the world can be built thus by fusing these uncertain data in such a way that uncertainty is decreased by using the excessive amount of data.

This consists of several stages.

As can be seen, all described above stages affect the design and the representation of the occupancy functions.

The objective of our research is to design such an occupancy representation of the world that facilitates accomplishing of the these stages.

We believe that the adaptive logic network (ALN) [Armstrong 1996] representation of the occupancy functions is appropriate for world modeling.

We would like to show how they can be built, updated and used for navigation. We want to investigate how to use them with different data fusion techniques and which technique is the most optimal when combined with the ALN representation of the occupancy functions.

At the same time, we would like to show how to acquire 3D world models with a single camera stereo. This includes building a sensor model of single camera stereo and fusion of data obtained by this stereo.

The setup of our experiments described in previous section will be used to show the proof of the concept.
 

3. The way to achieve the goal

In this section we outline a way to approach the problem and summarize the previous work done in the area.

3.1 Designing the occupancy function

The way the values of the occupancy function are assigned to occupied and non-occupied points is determined by the combination rule which is used to fuse different data.

If we consider all data to be correct (or to be with the same degree of certainty), then the least square technique can be used to combine different data. Using the ALN terminology, this technique would be referred to as building the occupancy function on the basis of known sample points. In this case, it does not matter what values of the occupancy function are assigned to sample points as soon as we assign the smallest value to non-occupied points and the largest value to occupied points.

However in reality, data obtained from sensors have different degree of certainty, and the certainty of data depends 1) on the amount of data available and 2) on a sensor model. Therefore, other techniques which would take into account the uncertainty of data have to be considered.

Combining the pieces of evidence - fusion of uncertain data

Roughly speaking, the problem of combining the pieces of evidence in building the world model can be formulated as follows.
Problem 1:
1) At time t=0, a sensor reading tells us that ``probably point A is non-occupied''.
2) At time t=1, a sensor reading tells us that ``point A is very likely to be occupied''.
3) The questions are
Q1a: Is point A occupied or not? and
Q1b: How sure we are in asserting that?

Let us now review approaches mentioned in the literature treating this problem.

Probabilistic Approaches

The most straightforward way of assigning the values of the occupancy function consists in assigning the value of 1 to an occupied point and the value of 0 to an empty point. Then, the closer the occupancy value of a point is to one, the more likely the point is occupied.

This approach is inspired by probability theory. Using the Bayesian framework, Problem 1 can be reformulated as follows.

1) The prior probability of point A to be occupied (or empty) is equal to the conditional probability of point A to be occupied (or empty) given data tex2html_wrap_inline260:
 equation44
When no data are given, tex2html_wrap_inline262.

2) A sensor reading gives the data in terms of conditional probabilities of observing the data tex2html_wrap_inline264, given that a point is occupied and non-occupied (empty): tex2html_wrap_inline266 and tex2html_wrap_inline268.

3) The answer to questions Q1a and Q1b is given then by the Bayesian formula:
 equation47

In order to apply the Bayesian formula (Eq.  0.2), one has to assume independence of noise in different sensor readings. That is the following conditional independence is assumed
 equation52

Until recently this was the most established approach in data fusion [Mathiew&Elfes 1988], and different simplifications of it have been designed [Borenstein 1996, Martin&Moravec 1996].

Variations

A modification to the Bayesian approach described above was suggested by Konolige in 1995, who suggested to use odds-likelihood posterior
displaymath270
rather then only posterior probability P(Occ|D). In this case, the values of the occupancy function defined as tex2html_wrap_inline274 range from 0 (absolutely impossible) to tex2html_wrap_inline276 (absolutely true).

A similar approach was used later by Moravec [Martin&Moravec 1996, Moravec 1996], who considered the occupancy function defined as
displaymath278
It is this assignment of occupancy function used by Moravec in his latest research in building 3D evidence grids.

The above described approaches use only the first moments of an unknown occupancy value of a point. Estimating second moments (i.e. deviation and the covariance matrix) of the variables would lead us to the Extended Kalman Filter [Maybank].

Extended Kalman Filter

In Kalman filtering the uncertainty about the environment is propagated by means of estimating the first two moments of probability distribution of the spatial variables of the environment.

Whereas the Extended Kalman Filter has been intensively used for beacon-based world modeling approaches [Structure from Motion], it is not used in building of occupancy models of the world. This is explained by a huge dimension which would be required to update the state vector of the world made by concatenating vectors associated with each voxel of the world.

Evidential Approach

Assume that the reading of a sensor tells us: ``The point A is occupied''. Does this sensor reading decrease the probability of point A to be empty? According to the Bayesian approach -- yes. According to the Dempster-Shafer theory of evidence -- no. This is because the sensor reading provides no evidence for point being empty (it has only the evidence for point being occupied).

Using DS approach for world modeling for mobile robots was mainly inspired by the following reasoning.

In the case of Bayesian approach, if the probability of point A to be occupied P(Occ)=1/2, then it is not clear is that because there is not enough information about that point (and hence more exploration in this area is required) or is that because the data obtained about the point are contradictory (meaning that the environment is to complex there and extra exploration in this area most likely will not help). Calculating second moments of the occupancy value of the point would elucidate such an ambiguity. However, computational expenses incurred by this calculation would make this probabilistic approach unsuitable for robotics, and this probably why the second moments are not used so far in building the occupancy models.

According to the Dempster-Shafer theory, the likelihood of a point A to be occupied is determined by two functions, called belief and plausibility functions. These functions can be considered as a pessimistic and an optimistic estimates of a point to be occupied and are defined as
 

 equation64

 equation68
where m(x) is the basic probability assignment, satisfying the condition
equation72
and is determined by a sensor model.

The answer to questions Q1a and Q1b is obtained then by using written below (Eq. 0.7) Dempster's rule of combination and calculating functions Bel(Occ) and Pl(Occ):
 equation75

To apply the Dempster-Shafer combination rule one has to assume the DS-independence of the evidence data, which, roughly speaking, can be defined as probabilistic independence of sources of these data [Voorbraak 1995]. This is not realistic for many cases. But still, because of the simplicity of DS rule, it is widely used now in mobile robotics [Moreno&Puente 1992, Hughes&Murphy 1992, Durrant-Whyte et al 1995].

It should be mentioned that in the above references, Dempster-Shafer combination rule is applied to fuse data obtained from sonar sensors, which is attributed to the easiness in assigning basic probability assignment m(x) for sonar sensors. This explains also why this rule has been applied only for building two-dimensional occupancy grids.

3.2 The ALN representation of the occupancy function

Which combination rule and what values of the occupancy function are better to use for our research has still to be decided. But let us assume now that the choice of a fusing technique and the output range of the occupancy function has been made. Then the question is:
Q2a: How small should the resolution of the occupancy function of the world be?
Q2b: What is the optimal way of representing the occupancy function?

To make the question more specific, consider a room 5 by 5 by 2 meters, which the robot has to explore using its sensors and where it is going to navigate.

What is the best way of representing the mapping of continuum of points of the room (in tex2html_wrap_inline292 onto an interval, let say, [0,1] (in R)?

The way it is done now [Moravec 1996] is to divide the whole space of the room into voxels and then to assign a value to each element of thus built 3D array. This requires a lot of space and computational power. In the case of the room it would require several millions voxels.

The resolution of the grid does not depend on the complexity of the environment and it does not allow efficient navigation planning.

We think there is a better way of representing the occupancy function -- a parametrical way. Let us justify our conjecture.

Outline of the idea

The idea is to use the piecewise linear surfaces instead of the array of data. This is illustrated in Figure 5.a . The occupancy function is then represented in terms of MIN and MAX operators applied to 4D surfaces. The example of a part of this representation
 

 eqnarray86

First, the significant reduction of space consumption could be expected.

Second, as soon as the occupancy functions are built, i.e. after the equations of surfaces have been calculated, it is easy to find the empty area available for navigation (Figure 5.b). The left part of surface equations is simple made equal to the occupancy level. For example, f=0.4 in Eq. 0.8.

Even more, it is easy to incorporate the size of the robot into equations of the surfaces (Figure 5.c).

Obtaining plane equations of needed for navigation 2D maps from 4D surface equations should also be straightforward.

And the main, ALN representation of the occupancy functions would allow efficient way of using them for navigation, which, as mentioned in Section 1.2, so far is the main limitation of the occupancy model of the world. If the goal is seen, then the way to the goal can be found in terms of line equations. This is illustrated in Figure 5.d.

The main question for the research is how to update the occupancy function of the world with a new sensor reading coming, when it is represented in terms of equations rather than in terms of voxels.

Fusion of data -- learning the ALN

The ALN linear pieces are built in the so called training stage: given a set of sample points (x,y,z,F), the ALN finds the function f, which approximates the function F=F(x,y,z) in the least square sense.

In the case of modeling the occupancy function, the values of F are determined by a sensor model. In the simplest case, F=1 for occupied points and F=0 for empty points.

Notice that a stereo visual sensor reading gives only the ``occupied'' points, i.e. those points which are visible. Only after an ``occupied'' point has been given, a set of ``empty'' points is generated.

In the case of voxel representation of the occupancy function, the number of generated ``empty'' points will be equal to the number of voxels between an ``occupied'' point and the camera. This results in the redundancy of data. Also, the data produced in such a way will be very unevenly distributed: there will be much more ``0''s in the function output than ``1''s.

The ALN representation of the occupancy function, in its turn, would allow more flexible generation of sample points. There would be generated as many ``empty'' points as needed.

It still remains unclear though when to use the fusion techniques overviewed in Section 3.1 and when to use the least square approximation achieved by the ALN.

Most likely in the beginning stage, e.g. when acquiring the model from the first position, data will be combined using one of the combination rules described in Section 3.1. Then this data will be converted by the ALN to the surfaces equations. Then, when robot moves to the second position, both the data from the ALN built occupancy model and the new data will be combined using again the combination rule, and so on.

As of now, this is only a tentative idea and it still need to be investigated.

3.3 Building the visual sensor model

In order to apply any of the described above fusing techniques, we need to know the level of confidence associated with the data obtained from a sensor. The question is:
Q3: How to assign the confidence and the basic probabilistic assignments to the data obtained from a single camera stereo ?

This is achieved by constructing what is called a sensor model.

As described in Section 2.1.5 (see Figure 3), a single camera stereo visual sensor gives the estimate of the location (x,y,z) of point A, corresponding to a selected and tracked feature in the image frame.

In the ideal case, i.e. in the case of perfect matching in tracking stage and infinitesimal resolution of the image, the basic probabilistic assignments can be assigned to the points on the line of view as follows:
 

eqnarray101
Figure 7.a shows a model of the visual sensor for this ideal case.

Let us now consider the case of a real camera and discuss what influences the estimation of depth obtained by a visual sensor, and what affects the degree of certainty associated with the depth value.

Image

Figure 6 shows the image of monochrome green rectangles as observed by Sony XC-999 camera. The warping of the picture and different intensities of the same green colour can be clearly seen. This results in imperfect tracking and matching of features, which, in turn, results in under- or over-estimating of the depth value corresponding to a feature (see Figure 3).

This is the major reason of uncertainty in depth data, which is observed in Figure 4. However, this is not the only reason. Because of the limited resolution of the image, depth cannot be found precisely. This is illustrated in Figure 3. Thus, the error analysis of the stereo procedure should be done in order to build a sensor model.

The way the features are selected and tracked has been described in our previously and it is not a concern in this thesis. We will proceed to the last stage of the stereo procedure -- calculation of the location (x,y,z) of a visible point (let us call it point A), which corresponds to a selected and tracked feature in the image frame.

Depth calculation technique

The positions of a feature in the first and the second frame determine two directions of view, which are defined by unit vectors tex2html_wrap_inline318 and tex2html_wrap_inline320, where Focus is the focus of the camera.

The depth to the feature is then found using the least square minimization:
 equation122
where R is the rotation matrix and tex2html_wrap_inline326 is the translation vector of the camera.

Here we distinguish two cases:

Precise case) Robot does not move. Only pan-tilt unit moves. The rotation matrix and the translation vector of the camera are known. Then r' and r can be easily found by taking the partial derivatives of (0.12) over r' and r.

Imprecise case) Robot moves. The rotation matrix and the translation vector of the camera are not known. In this case they need to be found using the motion parallax of the features.

A common way to do it is by calculating the essential matrix E of the camera. This is done by resolving the epipolar constraint, which should hold for all detected and tracked features tex2html_wrap_inline338:
equation129
Then the essential matrix E needs to be decomposed into R and tex2html_wrap_inline326. Here, the singular value decomposition is used to find the eigen-vector corresponding to the smallest eigen-value of a matrix. For more detail see [Essential Matrix].

The problem of robust estimation of the essential matrix is still one of the most challenging problems in computer vision. In our research we consider this problem only from the prospective of error analysis.

Incorporating error estimation and matching error in sensor model

Taking into account the quantization error

The following result concerning the stereo quantization error has been obtained [Error Analysis]. For non-convergent dual camera stereo system, the range error tex2html_wrap_inline346 can be estimated as
equation133
where tex2html_wrap_inline348 is the disparity error, b is the baseline, f is the focal length.

It can be shown also that the closer a feature is to a border of the image, the greater is the disparity error tex2html_wrap_inline348.

For the case of a single camera stereo, the relationship will be approximately the same.

Therefore, stereoscopic depth estimation is more accurate 1) for nearby objects than for distant ones, and 2) for the objects in the center of the image frame, than on the margin of the image frame.

The second conclusion is also explained by the fact that the image is much more warped (because of the lense distortion) on the margin, than in the center of the image, which results in more frequent mistracking of features on the margin, than in the center of the image.

In order to decrease depth estimation error 1) we disregard the marginal features (the same as they do in [Navy Center]), 2) we make baseline as large as possible (The disparity of the features in our stereo varies from 5 pixels (for distant objects) to 30 pixels (close objects) in an image of size 160 by 120, which is the same as in [UBC] ), 3) when over-viewing the surrounding world, we make sure that each selected feature is observed at least twice, so that it appears in the middle of the image at least once. This is achieved by adjusting the angle of the camera rotation.

Taking into account the mismatch error in tracking

In the tracking stage of the stereo calculation, the epipolar line in the second frame is scanned pixel by pixel, and for each pixel a N - dimensional vector tex2html_wrap_inline356 is produced using a mask of size N. This vector is then compared with a template vector tex2html_wrap_inline358, obtained from the feature pixel in the first frame (see Figure 3). The pixel that produces the best match is considered to correspond to the same feature.

The goodness of matching between tex2html_wrap_inline358 and tex2html_wrap_inline356 is represented by the error function E, which is chosen as the Euclidean distance between tex2html_wrap_inline358 and tex2html_wrap_inline356 squared, i.e.
 equation147

As was mentioned before, because of many reasons what we call ``mistracking'' happens, that is the best match on the epipolar line, in fact, does not correspond to the feature selected in the first frame.

The way this problem can be resolved (or more exactly, it can be taken into account) consists in choosing not only one (the best) match, but a number of good matches, i. e. a number of pixels that produced small errors E. Furthermore, the value of match error E should be used then in assigning the level of confidence associated with a feature.

This idea, seemingly being very straightforward, has not been implemented yet in world modeling from visual sensors to the best of our knowledge. A similar idea has been mentioned in [Moravec 1996] though.

Summary

Recapitulating, in the case of a stereo visual sensor the certainty assigned to the depth value of a point is a function of

in the case of static robot, and it is also a function of in the case of a moving robot.

All these dependencies should be incorporated into the sensor model. That is, the basic probabilistic assignments should be assigned to the points on a ray of view to reflect these dependencies. Figure 6.b shows how a model of the visual sensor would look like in this case. In the figure the model is shown approximately. More rigorous investigation of this has to be done.

3.4 Summary

Although by now we have formulated the directions to achieve our main goal - building a model of the world using the ALN representation of occupancy functions with a single camera stereo, most of the questions are still open.

Below we summarize what still should be done.

Computer Vision part
Build the model of the stereo which takes into account the error of the visual data due to mistracking, quantization and imprecision of calculating stereo parameters (Section 3.3).
Reasoning With Uncertainty (AI) part
Design the scheme for combining both probabilistic-evidential approach of fusing data and that of the ALN (Sections 3.1 and 3.2). This includes
Navigation Planing (Robotics) part
Develop the techniques for using the ALN built occupancy function for planing the navigation. In planing, the knowledge of both which parts of the world are occupied and which parts are not explored yet should be utilized (Section 3.2.1).
All of these parts require theoretical investigation. However, we consider a real robot-environment setup described in Section 2.1 for the purpose of verifying and demonstrating the theoretical conclusions.
 

4. Selected and sorted bibliography

Hans P. Moravec Robot: mere machine to transcendent mind, Oxford University Press. (on track for publication in Fall 1998). http://www.frc.ri.cmu.edu/ hpm/book97/book97.index.html

Sixth Annual Mobile Robot Competition. http://spbtrc.gtri.gatech.edu/AAAI97

Mobile Robots

Georgia Tech Mobile Robot Laboratory. http://www.cc.gatech.edu/aimosaic/robot-lab/mrl-online-publications.html (No fusion or uncertainty, neuro-based, multy-agent, Ronald C. Arkin, Tucker Balch). Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces (Juan Carlos Santamaria, Richard S. Sutton, and Ashwin Ram).

Navy Center for Applied Research in Artificial Intelligence. http://www.aic.nrl.navy.mil/ yamauchi

Mobile Robot Exploration and Map-Building with Continuous Localization Brian Yamauchi, Alan Schultz, and William Adams, to appear in Proceedings of the 1998 IEEE International Conference on Robotics and Automation, Leuven, Belgium, May 1998

Frontier-Based Exploration Using Multiple Robots Brian Yamauchi, to appear in Proceedings of the Second International Conference on Autonomous Agents (Agents '98), Minneapolis, MN, May 1998

A Frontier-Based Approach for Autonomous Exploration Brian Yamauchi, Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, Monterey, CA, July 1997, pp. 146-151

Spatial Learning for Navigation in Dynamic Environments Brian Yamauchi and Randall Beer, IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, Special Issue on Learning Autonomous Robots, Vol. 26, No. 3, June 1996, pp. 496-505

University of British Columbia. Spinoza Project. http://www.cs.ubc.ca/nest/lci/robots/papers/papers.htm

Vladimir Tucakov, Michael Sahota, Don Murray, Alan Mackworth, Jim Little, Stewart Kingdon, Cullen Jennings, Rod Barman. Spinoza: A Stereoscopic Visually Guided Molibe Robot. To be published in HICSS97. Available at http://www.cs.ubc.ca/spider/tucakov/research/spin_paper/paper.html

Stereo Vision Based Mapping and Navigation for Mobile Robots Cullen Jennings and Don Murray Submitted to ICRA97.

Temporally Coherent Stereo: Improving Performance Through Knowledge of Motion Vladimir Tucakov and David Lowe Submitted to ICRA97

Center for Intelligent Machines at McGill University. http://www.cim.mcgill.ca/ dudek (Greg Dudek)

University of Bonn. http://www.cs.uni-bonn.de/ rhino/publications.html

S. Thrun, A. Bücken, W. Burgard, D. Fox, T. Fröhlinghaus, D. Hennig, T. Hofmann, M. Krell, and T. Schmidt, Map Learning and High-Speed Navigation in RHINO, in AI-based Mobile Robots: Case studies of successful robot systems, Kortenkamp, D. and Bonasso, R.P. and Murphy, R. (eds.), MIT Press (to appear).

J. Buhmann, W. Burgard, A.B. Cremers, D. Fox, T. Hofmann, F. Schneider, J. Strikos and S. Thrun. The Mobile Robot Rhino , AI Magazin, 16:1, 1995.

Robot Learning Laboratory at the School of Computer Science at Carnegie Mellon University http://www.cs.cmu.edu/ thrun

The Robotics Institute at Carnegie Mellon University. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/alv/member/www/projects/3DVfAN.html (3-D Vision for Autonomous Navigation, Martial Hebert)

World Models

S. Thrun, D. Fox, and W. Burgard, 1998. A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots. Machine Learning 31, 29-53 and Autonomous Robots 5, 253-271, (joint issue).

W. Burgard, D. Fox, and D. Hennig. Fast Grid-based Position Tracking for Mobile Robots, Proceedings of the 21st German Conference on Artificial Intelligence (KI 97), Freiburg, Germany, 1997.

S. Thrun and A. Buecken. Integrating Grid-Based and Topological Maps for Mobile Robot Navigation, National Conference on Artificial Intelligence, (AAAI-96). For more detail, see also our Technical Report: Learning Maps for Indoor Mobile Robot Navigation

Reasoning With Uncertainty

R. Szeliski Bayesian modeling of uncertainty in low-level vision International journal of computer vision, 5(3), pp. 271-301, 1990.

D. Pagas, E. Nebot, H. Durrant-Whyte An evidential Approach to Probabilistic Map-Building Reasoning with Uncertainty in Robotics, Proceedings of Intern. Workshop, pp. 165-169, Dec. 1995

F. Voorbraak. Reasoning with Uncertainty in AI, ibid.

A. Elfes. Robot navigation: integrating perception, environmental constraints and task execution within a probabilistic framework, ibid.

Kurt Konolige. A Refined Method for Occupnacy Grid Innterpretation, ibid. Kurt Konolige's Home Page (independence for occupancy grids, sonar) http://www.ai.sri.com/ konolige.

Denis Laurendeau "Probabilistic Octree Modeling of a 3-D Dynamic Environment" in Proc. of the IEEE Int'l Conf. on Robotics and Automation, 1997. http://www.gel.ulaval.ca/ vision/home_ang.html

Sensor Fusion: Occupancy Models, Evidence Grids, Sensor Models

W. Burgard, D. Fox, D. Hennig, and T. Schmidt. Estimating the Absolute Position of a Mobile Robot Using Position Probability Grids, Proceedings of the Thirteenth Nationa Conference on Artificial Intelligence (AAAI-96).

Borenstein, J, and Koren, Y., 1991, "Histogramic In-motion Mapping for Mobile Robot Obstacle Avoidance." IEEE Journal of Robotics and Automation, Vol. 7, No. 4, 1991, pp. 535-539.

Feng, L., Borenstein, J., and Everett, B., 1994, "Where am I? Sensors and Methods for Autonomous Mobile Robot Localization." Technical Report, The University of Michigan UM-MEAM-94-21, December 1994. ftp://ftp.eecs.umich.edu/people/johannb/pos96???.pdf (readme.txt)

Larry Mathies, Alberto Elfes. Integration of Sonar and Stereo Range Data Using a Grid-Based Represenation, 1988.

Hans P. Moravec. Robot Spatial Perception by Stereoscopic Vision and 3D Evidence Grids CMU-RI-TR-96-34 September 1996 in http://www.cs.cmu.edu/Reports/robotics.html

Martin, M.C.; Moravec, H.P. Robot Evidence Grids Carnegie Mellon University, Robotics Institute Technical Report CMU-RI-TR-96-06, http://www.frc.ri.cmu.edu/ hpm/hpm.pubs.html.

S. Thrun. A Bayesian Approach to Landmark Discovery and Active Perception in Mobile Robot Navigation, Technical Report CMU-CS-96-122, Carnegie Mellon University, Pittsburgh, PA, 1996.

P.Payer, Denis Laurendeau, C.Gosselin. Merging Uncertainty into Probabilistic Octree Models, Proc. VI-98, Vancouver 18-20 June 1998. (Range Sensor Model, Laval University)

Learning

S. Thrun. An Approach to Learning Mobile Robot Navigation. Robotics and Autonomous Systems, 15: 301-319, 1995.

S. Thrun. Bayesian Landmark Learning for Mobile Robot Localization, to appear in Machine Learning.

S. Thrun, 1998. Learning Metric-Topological Maps for Indoor Mobile Robot Navigation, AI Journal 99(1), 21-71.

Stereo Computer vision

T. Froehlinghaus and J. Buhmann. Real-Time Phase-Based Stereo for a Mobile Robot. Proceedings of the First Euromicro Workshop on Advanced Mobile robots. pp. 178-185, 1996.

Error Analysis

Rodriguez, Aggarwal. Stochastic Alanysis of Stereo Quantazation Error, PAMI - 12/5, pp 467-470, 1990.

L. Mathies, S. Shafer. Error Modeling in Stereo Navigation, Journal RA - 3/3, pp. 239-247, 1987.

S. Blostein, T, Huang. Error Modeling in Stereo Navigation, PAMI-9/6, pp. 752-772, 1987

Position estimation, Essential matrix

T. Svoboda and T. Pajdla. Efficient Motion Analysis. Research report, K335/95/95. Czech Technical University, Prague, October 1995.

T. Svoboda a Peter Sturm. What can be done with badly calibrated camera in ego-motion analysis?. Technical report CTU-CMP-1996-01, Center for Machine Perception, Czech Technical University, Prague, September 1996

R. I. Hartley, R. Gupta and T. Chang Stereo from uncalibrated cameras Proc. IEEE Conf. on Computer Vision and Pattern Recognition 761-764, 1992

R. I. Hartley, R. Gupta and T. Chang Estimation of relative camera position for uncalibarated cameras. Computer vision-ECCV '92 : Second European Conference on Computer Vision, Santa Margherita,Ligure, Italy, May 19-22, 1992 CALL NUMBER: TA 1632 E89 1994

R. I. Hartley Euclidian reconstruction from uncalibrated views. Proc. of the Second Europe-US Workshop on Invariance, Ponta Delgada, Azores,187-202,October 1993,

R. I. Hartley In defence of the 8-point algorithm Fifth International Conference on Computer Vision 1064-1070 IEEE Computer Society Press 1995

The following assume error-free data:

Longuet-Higgings. Novel techniques of non-destructive examination : proceedings of a Royal Society Discussion Meeting held on 9 and 10 July 1985 ISBN: 0854032924

X.Zhuang. A simplification to linear two-view motion algorithm Computer Vision, Graphics and Image Processing, 49 (1989) pp. 175-178.

See also [Kanatani 1993] below.

3D Scene reconstruction from Video images

Nicolas Ayache, Artificial Vision for Mobile Robots (translation from French, 1989, INRIA), The MIT Press, London, 1991.

Ayache, N. and Lustman, F. (1991), `Trinocular Stereo Vision for Robotics', IEEE Trans. on Pattern Analysis and Machine Intelligence 13(1), 73-85.

A. Blake and A. Yuille eds. Active Vision, The MIT Press, London, 1992.

Buffa, M., Faugeras, O. D. ; Zhang, Z. (1992), Obstacle Avoidance and Trajectory Planning for an Indoor Mobile Robot Using Stereo Vision and Delaunay Triangulation, I. Masaki, ed., `Vision-based Vehicle Guidance', Springer-Verlag, chapter 13, pp. 268-283.

Faugeras, O. Three-Dimensional Computer Vision - A Geometric Viewpoint, The MIT Press, Cambridge, 1993 Oxford University Press, 1993.

K. Kanatani. Geometric Computation for Machine Vision,

A. Mitiche. Computational Analysis of Visual Motion, Plenum Press, New York and London 1994.

J. Weng, T.S. Huang and N. Ahuja. Motion and Structure from Image Sequences , Springer Series in Information Sciences, vol. 29, Springer-Verlag, New York, 1992.

Z. Zhang and O. Faugeras. 3D Dynamic Scene Analysis , Springer Series in Information Sciences, vol. 27, Springer-Verlag, New York, 1992.

Y. Zhou and R. Chellappa. Artificial Neural Networks for Computer Vision, Research Notes in Neural Computing, vol. 5, Springer-Verlag, New York, 1992.

Other

W.W Armstrong, M.M.Thomas, Adaptive Logic Networks, sect. in C1.8 in Handbook of Neural Computation, Institute of Physics Publishing and Oxford University Press - USA, 1996 (looseleaf)

A. Albert. Regression and Moore-Penrose pseudoinverse, Academic New-York, 1972.

P. Maybeck. Stochastic Models, Estimation and Control, Vol. 1 and 2, Academic Press, London, 1979

Contact me for the full list, reprints and copies.