1.1 The Challenge of World Modeling for Mobile Robots
The age of robots is coming [Moravec 1998]. Robots are invading Mars, exploring the ocean, catering food etc [Mobile Robots]. More complete list of area where robots are used now can be found in the description of AAAI Mobile Robot Competitions.
No matter what a mission a robot is carrying out in the environment, it must be able to learn and maintain the model of its environment. This explains the increasingly growing interest of researchers all over the world in the problem of world modeling for mobile robots. This problem is difficult and far from being solved.
Let us name the major practical limitations on a robot`s ability to acquire accurate models of the world [Thrun 1995]:
In most applications a robot should operate in real time. Robot has also hardware limitations.
The choice of the model to be used to describe the world depends on three factors: 1) environment, 2) sensors and 3) task goals.
Recent research has produced three fundamental paradigms for world modeling for mobile robots: metric (occupancy-based) paradigm, geometrical (beacon recognition and tracking based) paradigm and topological (graph-based) paradigm [World Models].
In our research we consider the occupancy-based approach, which is superior to others for its simplicity in building, representing and maintaining the world model. This approach however has also disadvantages (as compared to other approaches): it is inefficient for navigation planning and requires redundant space representation [Thrun 1997].
Originally developed for building 2D maps,the occupancy-based approach has been recently extended to build 3D models of the world, which provide much more information about the surrounding world than 2D maps do [Moravec 1996].
This is why in our research we focus our attention on the occupancy-based approach.
1.3 Visual Sensor for Building the World Model
As has been indicated by Moravec, a visual sensor is much preferable to a sonar sensor when dealing with small resolution occupancy grids, and especially, when dealing with a 3D occupancy representation of the world.
Therefore, for our research we consider a visual sensor. At the same time, we are not interested in a very expensive highly calibrated stereo setup, but rather we consider the simplest possible stereo setup with only one (possibly badly calibrated) camera. We find the challenge in showing that this stereo setup is sufficient for acquiring a model of the world which would suffice to carry out the robot's tasks, the major one of which is navigation.
In our research we do not consider robot navigation and planning problems, which are big areas for research themselves. However we shall be aware of these problems, as these are the areas where the results of our research can be directly applied.
1.4 Parametric Representation of Occupancy Function
The development of the 3D occupancy approach was impeded significantly by the amount of data needing to be calculated with every new sensor reading. In particular, Moravec mentions the following numbers. Building a model of a room 6 by 6 by 2 meters, requires the update of several millions voxels for each of about 2000 evidence rays obtained from one 768x576 image frame. That is why only with appearance of extra powerful workstations (with 100 MIPS speed and 100MB memory range) the experiments with 3D occupancy model of the world became possible.
This explains our aspiration to find another, different from a conventional
voxel-based, representation of the occupancy function of the world, which
would be more optimal in memory consumption and data processing and which
would be more convenient for navigation.
2. Outline of the research
2.1 Our setup
Analyzing the previous work in sensor based world modeling for mobile
robots, we establish the major differences and advances of our approach.
Their description follows.
Environment
In building the model of the world we do not assume any specific geometry
of the world. This is inspired by the desire to navigate the robot in the
outdoor environment. In other words, we do not assume that features, which
are extracted from video images, are geometrically constrained. That is,
each feature is considered as an independent source of information about
the surrounding environment.
This assumption allows the utilization of probabilistic approaches in
fusing different sensor data and is very suitable for the occupancy representation
of the world [Elfes 1989].
It should be mentioned that this approach is different from most 3D
scene reconstruction from video images approaches like that of Faugeras
or Ayachi [Structure from Motion].
A view of the environment we are going to deal with is shown in Figures 1
and 4.
Robot
In the case of building a metric model of the world for mobile robots,
a robot needs to maintain its own position on this map of the model.
We associate the world model with the system of coordinates centered
on the initial position of the robot.
The robot and its model are shown in Figure 1.
An image as observed by a robot is shown in Figure 2.
It should be mentioned that although in our research with use a RWI
B12 robot, our research is directed towards building the simplest possible
robot. In particular, we consider a robot consisting of three parts only:
1) a mobile platform, 2) a pant-tilt unit and 3) a camera.
The mission for the robot
No matter what an ultimate task of a mobile robot is, the first task
for it remains the same -- to navigate in the unknown environment. More
exactly, the robot has to know how to move towards a goal not bumping onto
obstacles.
This is the mission we want our robot to carry out. A goal will be a
marked object, which is either seen from robot's initial position or which
the robot has to find by moving and exploring the environment.
Sensor data
A lot of research has been done in fusing data from sonar sensors [Sensor
Fusion]. This is mainly attributed to the well defined model of a sound
sensor as compared to that of a visual sensor.
For the task of acquiring precise 3D information however, visual sensors
are much more preferable than sonar sensors [Moravec 1996].
Yet, fusion of data from visual sensors is not well developed. This
is due to the fact that different stereo setups require different sensor
models.
Another issue is that a highly calibrated dual or trinocular stereo
system, which are mostly used now for depth acquisition [UBC, Structure
from Motion], are expensive. This is a certain restriction in using them
in robotics.
Thus, the promise and the challenge of the research is to find a way
of acquiring 3D information, which will be used in building the world model,
with the simplest possible stereo setup, in particular, a single camera
only.
As opposed to a stereo setup with a fixed camera configuration, a single
camera stereo allows arbitrary motion of the camera. This gives more flexibility
in tracking the features and exploring the world, which is also an advantage
of using a single camera stereo.
Single camera sensor readings
The readings from a single camera stereo, which are 3D depth data, are
obtained by the following three stage procedure (see Figure 3).
A set of features is selected in the first frame (stage 1). Each feature
is then tracked along the epipolar line in the second frame, which is grabbed
after camera has moved, and the best match is obtained (stage 2). The depth
to those features which are selected and tracked is then calculated on
the basis of the disparity of the features in two frames (stage 3).
As a result of either camera distortion, changing light conditions,
incorrect registration of features, or odometry errors, the obtained 3D
information is not certain. All the above mentioned factors affect the
certainty of the obtained 3D depth data.
A typical depth map obtained with a single camera by looking around
the robot is shown in Figure 4.
The figure shows 1) the second video frame of the last pair of stereo images,
2) features, which were selected in the first frame (shown in black) and
tracked in the second frame (shown in white) and 3) the projection of the
acquired 3D data points (x,y,z) onto Oxy (bird's
eye view) plane. The confidence
of the point is shown by the intensity of the pixel: the whiter the pixel,
the more confidence is associated with the point.
2.2 The main objective of the thesis
Each detected 3D point (either correctly or incorrectly detected) provides
a continuum of data for world modeling. In particular, it provides information
about an occupied point - the detected 3D point itself, and unoccupied
(empty) points - between the camera and the occupied point
As can be seen from Figure 4,
a visual sensor provides uncertain and sometime contradictory data. The
occupancy model of the world can be built thus by fusing these uncertain
data in such a way that uncertainty is decreased by using the excessive
amount of data.
This consists of several stages.
The objective of our research is to design such an occupancy representation
of the world that facilitates accomplishing of the these stages.
We believe that the adaptive logic network (ALN) [Armstrong 1996] representation
of the occupancy functions is appropriate for world modeling.
We would like to show how they can be built, updated and used for navigation.
We want to investigate how to use them with different data fusion techniques
and which technique is the most optimal when combined with the ALN representation
of the occupancy functions.
At the same time, we would like to show how to acquire 3D world models
with a single camera stereo. This includes building a sensor model
of single camera stereo and fusion of data obtained by this stereo.
The setup of our experiments described in previous section will be used
to show the proof of the concept.
3. The way to achieve the goal
In this section we outline a way to approach the problem and summarize
the previous work done in the area.
3.1 Designing the occupancy function
The way the values of the occupancy function are assigned to occupied
and non-occupied points is determined by the combination rule which is
used to fuse different data.
If we consider all data to be correct (or to be with the same degree
of certainty), then the least square technique can be used to combine different
data. Using the ALN terminology, this technique would be referred to as
building the occupancy function on the basis of known sample points. In
this case, it does not matter what values of the occupancy function are
assigned to sample points as soon as we assign the smallest value to non-occupied
points and the largest value to occupied points.
However in reality, data obtained from sensors have different degree
of certainty, and the certainty of data depends 1) on the amount of data
available and 2) on a sensor model. Therefore, other techniques which would
take into account the uncertainty of data have to be considered.
Combining the pieces of evidence - fusion of uncertain data
Roughly speaking, the problem of combining the pieces of evidence in
building the world model can be formulated as follows.
Let us now review approaches mentioned in the literature treating this
problem.
Probabilistic Approaches
The most straightforward way of assigning the values of the occupancy
function consists in assigning the value of 1 to an occupied point and
the value of 0 to an empty point. Then, the closer the occupancy value
of a point is to one, the more likely the point is occupied.
This approach is inspired by probability theory. Using the Bayesian
framework, Problem 1 can be reformulated as follows.
1) The prior probability of point A to be occupied (or empty)
is equal to the conditional probability of point A to be occupied
(or empty) given data 2) A sensor reading gives the data in terms of conditional probabilities
of observing the data 3) The answer to questions Q1a and Q1b is given then by the Bayesian
formula:
In order to apply the Bayesian formula (Eq. 0.2),
one has to assume independence of noise in different sensor readings. That
is the following conditional independence is assumed
Until recently this was the most established approach in data fusion
[Mathiew&Elfes 1988], and different simplifications of it have been
designed [Borenstein 1996, Martin&Moravec 1996].
Variations
A modification to the Bayesian approach described above was suggested
by Konolige in 1995, who suggested to use odds-likelihood posterior
A similar approach was used later by Moravec [Martin&Moravec 1996,
Moravec 1996], who considered the occupancy function defined as
The above described approaches use only the first moments of an unknown
occupancy value of a point. Estimating second moments (i.e. deviation and
the covariance matrix) of the variables would lead us to the Extended Kalman
Filter [Maybank].
Extended Kalman Filter
In Kalman filtering the uncertainty about the environment is propagated
by means of estimating the first two moments of probability distribution
of the spatial variables of the environment.
Whereas the Extended Kalman Filter has been intensively used for beacon-based
world modeling approaches [Structure from Motion], it is not used in building
of occupancy models of the world. This is explained by a huge dimension
which would be required to update the state vector of the world made by
concatenating vectors associated with each voxel of the world.
Evidential Approach
Assume that the reading of a sensor tells us: ``The point A is occupied''.
Does this sensor reading decrease the probability of point A to be empty?
According to the Bayesian approach -- yes. According to the Dempster-Shafer
theory of evidence -- no. This is because the sensor reading provides no
evidence for point being empty (it has only the evidence for point being
occupied).
Using DS approach for world modeling for mobile robots was mainly inspired
by the following reasoning.
In the case of Bayesian approach, if the probability of point A
to be occupied P(Occ)=1/2, then it is not clear is that because
there is not enough information about that point (and hence more exploration
in this area is required) or is that because the data obtained about the
point are contradictory (meaning that the environment is to complex there
and extra exploration in this area most likely will not help). Calculating
second moments of the occupancy value of the point would elucidate such
an ambiguity. However, computational expenses incurred by this calculation
would make this probabilistic approach unsuitable for robotics, and this
probably why the second moments are not used so far in building the occupancy
models.
According to the Dempster-Shafer theory, the likelihood of a point A
to be occupied is determined by two functions, called belief and
plausibility functions. These functions can be considered as a pessimistic
and an optimistic estimates of a point to be occupied and are defined as
The answer to questions Q1a and Q1b is obtained then by using written
below (Eq. 0.7)
Dempster's rule of combination and calculating functions Bel(Occ)
and Pl(Occ):
To apply the Dempster-Shafer combination rule one has to assume the
DS-independence of the evidence data, which, roughly speaking, can
be defined as probabilistic independence of sources of these data [Voorbraak
1995]. This is not realistic for many cases. But still, because of the
simplicity of DS rule, it is widely used now in mobile robotics [Moreno&Puente
1992, Hughes&Murphy 1992, Durrant-Whyte et al 1995].
It should be mentioned that in the above references, Dempster-Shafer
combination rule is applied to fuse data obtained from sonar sensors,
which is attributed to the easiness in assigning basic probability assignment
m(x) for sonar sensors. This explains also why this rule
has been applied only for building two-dimensional occupancy grids.
3.2 The ALN representation of the occupancy function
Which combination rule and what values of the occupancy function are
better to use for our research has still to be decided. But let us assume
now that the choice of a fusing technique and the output range of the occupancy
function has been made. Then the question is:
To make the question more specific, consider a room 5 by 5 by 2 meters,
which the robot has to explore using its sensors and where it is going
to navigate.
What is the best way of representing the mapping of continuum of points
of the room (in The way it is done now [Moravec 1996] is to divide the whole space of
the room into voxels and then to assign a value to each element of thus
built 3D array. This requires a lot of space and computational power. In
the case of the room it would require several millions voxels.
The resolution of the grid does not depend on the complexity of the
environment and it does not allow efficient navigation planning.
We think there is a better way of representing the occupancy function
-- a parametrical way. Let us justify our conjecture.
Outline of the idea
The idea is to use the piecewise linear surfaces instead of the array
of data. This is illustrated in Figure 5.a
. The occupancy function is then represented in terms of MIN and MAX operators
applied to 4D surfaces. The example of a part of this representation
First, the significant reduction of space consumption could be expected.
Second, as soon as the occupancy functions are built, i.e. after the
equations of surfaces have been calculated, it is easy to find the empty
area available for navigation (Figure 5.b).
The left part of surface equations is simple made equal to the occupancy
level. For example, f=0.4 in Eq. 0.8.
Even more, it is easy to incorporate the size of the robot into equations
of the surfaces (Figure 5.c).
Obtaining plane equations of needed for navigation 2D maps from 4D surface
equations should also be straightforward.
And the main, ALN representation of the occupancy functions would allow
efficient way of using them for navigation, which, as mentioned in Section
1.2, so far is the main limitation of the occupancy model of the world.
If the goal is seen, then the way to the goal can be found in terms of
line equations. This is illustrated in Figure 5.d.
The main question for the research is how to update the occupancy function
of the world with a new sensor reading coming, when it is represented in
terms of equations rather than in terms of voxels.
Fusion of data -- learning the ALN
The ALN linear pieces are built in the so called training stage: given
a set of sample points (x,y,z,F), the ALN finds
the function f, which approximates the function F=F(x,y,z)
in the least square sense.
In the case of modeling the occupancy function, the values of F
are determined by a sensor model. In the simplest case, F=1 for
occupied points and F=0 for empty points.
Notice that a stereo visual sensor reading gives only the ``occupied''
points, i.e. those points which are visible. Only after an ``occupied''
point has been given, a set of ``empty'' points is generated.
In the case of voxel representation of the occupancy function, the number
of generated ``empty'' points will be equal to the number of voxels between
an ``occupied'' point and the camera. This results in the redundancy of
data. Also, the data produced in such a way will be very unevenly distributed:
there will be much more ``0''s in the function output than ``1''s.
The ALN representation of the occupancy function, in its turn, would
allow more flexible generation of sample points. There would be generated
as many ``empty'' points as needed.
It still remains unclear though when to use the fusion techniques overviewed
in Section 3.1 and when to use the least square approximation achieved
by the ALN.
Most likely in the beginning stage, e.g. when acquiring the model from
the first position, data will be combined using one of the combination
rules described in Section 3.1. Then this data will be converted by the
ALN to the surfaces equations. Then, when robot moves to the second position,
both the data from the ALN built occupancy model and the new data will
be combined using again the combination rule, and so on.
As of now, this is only a tentative idea and it still need to be investigated.
3.3 Building the visual sensor model
In order to apply any of the described above fusing techniques, we need
to know the level of confidence associated with the data obtained from
a sensor. The question is:
This is achieved by constructing what is called a sensor model.
As described in Section 2.1.5 (see Figure 3),
a single camera stereo visual sensor gives the estimate of the location
(x,y,z) of point A, corresponding to a selected
and tracked feature in the image frame.
In the ideal case, i.e. in the case of perfect matching in tracking
stage and infinitesimal resolution of the image, the basic probabilistic
assignments can be assigned to the points on the line of view as follows:
Let us now consider the case of a real camera and discuss what influences
the estimation of depth obtained by a visual sensor, and what affects the
degree of certainty associated with the depth value.
Image
Figure 6
shows the image of monochrome green rectangles as observed by Sony XC-999
camera. The warping of the picture and different intensities of the same
green colour can be clearly seen. This results in imperfect tracking and
matching of features, which, in turn, results in under- or over-estimating
of the depth value corresponding to a feature (see Figure 3).
This is the major reason of uncertainty in depth data, which is observed
in Figure 4.
However, this is not the only reason. Because of the limited resolution
of the image, depth cannot be found precisely. This is illustrated in Figure 3.
Thus, the error analysis of the stereo procedure should be done in order
to build a sensor model.
The way the features are selected and tracked has been described in
our previously and it is not a concern in this thesis. We will
proceed to the last stage of the stereo procedure -- calculation of the
location (x,y,z) of a visible point (let us call it
point A), which corresponds to a selected and tracked feature in
the image frame.
Depth calculation technique
The positions of a feature in the first and the second frame determine
two directions of view, which are defined by unit vectors The depth to the feature is then found using the least square minimization:
Here we distinguish two cases:
Precise case) Robot does not move. Only pan-tilt unit moves.
The rotation matrix and the translation vector of the camera are known.
Then r' and r can be easily found by taking the partial derivatives
of (0.12)
over r' and r.
Imprecise case) Robot moves. The rotation matrix and the translation
vector of the camera are not known. In this case they need to be found
using the motion parallax of the features.
A common way to do it is by calculating the essential matrix
E of the camera. This is done by resolving the epipolar constraint,
which should hold for all detected and tracked features The problem of robust estimation of the essential matrix is still one
of the most challenging problems in computer vision. In our research we
consider this problem only from the prospective of error analysis.
Incorporating error estimation and matching error in sensor model
Taking into account the quantization error
The following result concerning the stereo quantization error has been
obtained [Error Analysis]. For non-convergent dual camera stereo system,
the range error It can be shown also that the closer a feature is to a border of the
image, the greater is the disparity error For the case of a single camera stereo, the relationship will be approximately
the same.
Therefore, stereoscopic depth estimation is more accurate 1) for nearby
objects than for distant ones, and 2) for the objects in the center of
the image frame, than on the margin of the image frame.
The second conclusion is also explained by the fact that the image is
much more warped (because of the lense distortion) on the margin, than
in the center of the image, which results in more frequent mistracking
of features on the margin, than in the center of the image.
In order to decrease depth estimation error 1) we disregard the marginal
features (the same as they do in [Navy Center]), 2) we make baseline as
large as possible (The disparity of the features
in our stereo varies from 5 pixels (for distant objects) to 30 pixels
(close objects) in an image of size 160 by 120, which is the same as in
[UBC] ),
3) when over-viewing the surrounding world, we make sure that each selected
feature is observed at least twice, so that it appears in the middle of
the image at least once. This is achieved by adjusting the angle of the
camera rotation.
Taking into account the mismatch error in tracking
In the tracking stage of the stereo calculation, the epipolar line in
the second frame is scanned pixel by pixel, and for each pixel a N
- dimensional vector The goodness of matching between As was mentioned before, because of many reasons what we call ``mistracking''
happens, that is the best match on the epipolar line, in fact, does not
correspond to the feature selected in the first frame.
The way this problem can be resolved (or more exactly, it can be taken
into account) consists in choosing not only one (the best) match, but a
number of good matches, i. e. a number of pixels that produced small errors
E. Furthermore, the value of match error E should be used
then in assigning the level of confidence associated with a feature.
This idea, seemingly being very straightforward, has not been implemented
yet in world modeling from visual sensors to the best of our knowledge.
A similar idea has been mentioned in [Moravec 1996] though.
Summary
Recapitulating, in the case of a stereo visual sensor the certainty
assigned to the depth value of a point is a function of
All these dependencies should be incorporated into the sensor model.
That is, the basic probabilistic assignments should be assigned to the
points on a ray of view to reflect these dependencies. Figure 6.b
shows how a model of the visual sensor would look like in this case. In
the figure the model is shown approximately. More rigorous investigation
of this has to be done.
3.4 Summary
Although by now we have formulated the directions to achieve our main
goal - building a model of the world using the ALN representation of
occupancy functions with a single camera stereo, most of the questions
are still open.
Below we summarize what still should be done.
4. Selected and sorted bibliography
Hans P. Moravec Robot: mere machine to transcendent mind, Oxford University
Press. (on track for publication in Fall 1998). http://www.frc.ri.cmu.edu/
hpm/book97/book97.index.html
Sixth Annual Mobile Robot Competition. http://spbtrc.gtri.gatech.edu/AAAI97
Mobile Robots
Georgia Tech Mobile Robot Laboratory. http://www.cc.gatech.edu/aimosaic/robot-lab/mrl-online-publications.html
(No fusion or uncertainty, neuro-based, multy-agent, Ronald C. Arkin, Tucker
Balch). Experiments with Reinforcement Learning in Problems with Continuous
State and Action Spaces (Juan Carlos Santamaria, Richard S. Sutton, and
Ashwin Ram).
Navy Center for Applied Research in Artificial Intelligence.
http://www.aic.nrl.navy.mil/ yamauchi
Mobile Robot Exploration and Map-Building with Continuous Localization
Brian Yamauchi, Alan Schultz, and William Adams, to appear in Proceedings
of the 1998 IEEE International Conference on Robotics and Automation, Leuven,
Belgium, May 1998
Frontier-Based Exploration Using Multiple Robots Brian Yamauchi, to
appear in Proceedings of the Second International Conference on Autonomous
Agents (Agents '98), Minneapolis, MN, May 1998
A Frontier-Based Approach for Autonomous Exploration Brian Yamauchi,
Proceedings of the 1997 IEEE International Symposium on Computational Intelligence
in Robotics and Automation, Monterey, CA, July 1997, pp. 146-151
Spatial Learning for Navigation in Dynamic Environments Brian Yamauchi
and Randall Beer, IEEE Transactions on Systems, Man, and Cybernetics -
Part B: Cybernetics, Special Issue on Learning Autonomous Robots, Vol.
26, No. 3, June 1996, pp. 496-505
University of British Columbia. Spinoza Project. http://www.cs.ubc.ca/nest/lci/robots/papers/papers.htm
Vladimir Tucakov, Michael Sahota, Don Murray, Alan Mackworth, Jim Little,
Stewart Kingdon, Cullen Jennings, Rod Barman. Spinoza: A Stereoscopic Visually
Guided Molibe Robot. To be published in HICSS97. Available at http://www.cs.ubc.ca/spider/tucakov/research/spin_paper/paper.html
Stereo Vision Based Mapping and Navigation for Mobile Robots Cullen
Jennings and Don Murray Submitted to ICRA97.
Temporally Coherent Stereo: Improving Performance Through Knowledge
of Motion Vladimir Tucakov and David Lowe Submitted to ICRA97
Center for Intelligent Machines at McGill University. http://www.cim.mcgill.ca/
dudek (Greg Dudek)
University of Bonn. http://www.cs.uni-bonn.de/ rhino/publications.html
S. Thrun, A. Bücken, W. Burgard, D. Fox, T. Fröhlinghaus,
D. Hennig, T. Hofmann, M. Krell, and T. Schmidt, Map Learning and High-Speed
Navigation in RHINO, in AI-based Mobile Robots: Case studies of successful
robot systems, Kortenkamp, D. and Bonasso, R.P. and Murphy, R. (eds.),
MIT Press (to appear).
J. Buhmann, W. Burgard, A.B. Cremers, D. Fox, T. Hofmann, F. Schneider,
J. Strikos and S. Thrun. The Mobile Robot Rhino , AI Magazin, 16:1, 1995.
Robot Learning Laboratory at the School of Computer Science at Carnegie
Mellon University http://www.cs.cmu.edu/ thrun
The Robotics Institute at Carnegie Mellon University. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/alv/member/www/projects/3DVfAN.html
(3-D Vision for Autonomous Navigation, Martial Hebert)
World Models
S. Thrun, D. Fox, and W. Burgard, 1998. A Probabilistic Approach to
Concurrent Mapping and Localization for Mobile Robots. Machine Learning
31, 29-53 and Autonomous Robots 5, 253-271, (joint issue).
W. Burgard, D. Fox, and D. Hennig. Fast Grid-based Position Tracking
for Mobile Robots, Proceedings of the 21st German Conference on Artificial
Intelligence (KI 97), Freiburg, Germany, 1997.
S. Thrun and A. Buecken. Integrating Grid-Based and Topological Maps
for Mobile Robot Navigation, National Conference on Artificial Intelligence,
(AAAI-96). For more detail, see also our Technical Report: Learning Maps
for Indoor Mobile Robot Navigation
Reasoning With Uncertainty
R. Szeliski Bayesian modeling of uncertainty in low-level vision International
journal of computer vision, 5(3), pp. 271-301, 1990.
D. Pagas, E. Nebot, H. Durrant-Whyte An evidential Approach to Probabilistic
Map-Building Reasoning with Uncertainty in Robotics, Proceedings of Intern.
Workshop, pp. 165-169, Dec. 1995
F. Voorbraak. Reasoning with Uncertainty in AI, ibid.
A. Elfes. Robot navigation: integrating perception, environmental constraints
and task execution within a probabilistic framework, ibid.
Kurt Konolige. A Refined Method for Occupnacy Grid Innterpretation,
ibid. Kurt Konolige's Home Page (independence for occupancy grids, sonar)
http://www.ai.sri.com/ konolige.
Denis Laurendeau "Probabilistic Octree Modeling of a 3-D Dynamic Environment"
in Proc. of the IEEE Int'l Conf. on Robotics and Automation, 1997. http://www.gel.ulaval.ca/
vision/home_ang.html
Sensor Fusion: Occupancy Models, Evidence Grids, Sensor Models
W. Burgard, D. Fox, D. Hennig, and T. Schmidt. Estimating the Absolute
Position of a Mobile Robot Using Position Probability Grids, Proceedings
of the Thirteenth Nationa Conference on Artificial Intelligence (AAAI-96).
Borenstein, J, and Koren, Y., 1991, "Histogramic In-motion Mapping for
Mobile Robot Obstacle Avoidance." IEEE Journal of Robotics and Automation,
Vol. 7, No. 4, 1991, pp. 535-539.
Feng, L., Borenstein, J., and Everett, B., 1994, "Where am I? Sensors
and Methods for Autonomous Mobile Robot Localization." Technical Report,
The University of Michigan UM-MEAM-94-21, December 1994. ftp://ftp.eecs.umich.edu/people/johannb/pos96???.pdf
(readme.txt)
Larry Mathies, Alberto Elfes. Integration of Sonar and Stereo Range
Data Using a Grid-Based Represenation, 1988.
Hans P. Moravec. Robot Spatial Perception by Stereoscopic Vision and
3D Evidence Grids CMU-RI-TR-96-34 September 1996 in http://www.cs.cmu.edu/Reports/robotics.html
Martin, M.C.; Moravec, H.P. Robot Evidence Grids Carnegie Mellon University,
Robotics Institute Technical Report CMU-RI-TR-96-06, http://www.frc.ri.cmu.edu/
hpm/hpm.pubs.html.
S. Thrun. A Bayesian Approach to Landmark Discovery and Active Perception
in Mobile Robot Navigation, Technical Report CMU-CS-96-122, Carnegie Mellon
University, Pittsburgh, PA, 1996.
P.Payer, Denis Laurendeau, C.Gosselin. Merging Uncertainty into Probabilistic
Octree Models, Proc. VI-98, Vancouver 18-20 June 1998. (Range Sensor Model,
Laval University)
Learning
S. Thrun. An Approach to Learning Mobile Robot Navigation. Robotics
and Autonomous Systems, 15: 301-319, 1995.
S. Thrun. Bayesian Landmark Learning for Mobile Robot Localization,
to appear in Machine Learning.
S. Thrun, 1998. Learning Metric-Topological Maps for Indoor Mobile Robot
Navigation, AI Journal 99(1), 21-71.
Stereo Computer vision
T. Froehlinghaus and J. Buhmann. Real-Time Phase-Based Stereo for a
Mobile Robot. Proceedings of the First Euromicro Workshop on Advanced Mobile
robots. pp. 178-185, 1996.
Error Analysis
Rodriguez, Aggarwal. Stochastic Alanysis of Stereo Quantazation Error,
PAMI - 12/5, pp 467-470, 1990.
L. Mathies, S. Shafer. Error Modeling in Stereo Navigation, Journal
RA - 3/3, pp. 239-247, 1987.
S. Blostein, T, Huang. Error Modeling in Stereo Navigation, PAMI-9/6,
pp. 752-772, 1987
Position estimation, Essential matrix
T. Svoboda and T. Pajdla. Efficient Motion Analysis. Research report,
K335/95/95. Czech Technical University, Prague, October 1995.
T. Svoboda a Peter Sturm. What can be done with badly calibrated camera
in ego-motion analysis?. Technical report CTU-CMP-1996-01, Center for Machine
Perception, Czech Technical University, Prague, September 1996
R. I. Hartley, R. Gupta and T. Chang Stereo from uncalibrated cameras
Proc. IEEE Conf. on Computer Vision and Pattern Recognition 761-764, 1992
R. I. Hartley, R. Gupta and T. Chang Estimation of relative camera position
for uncalibarated cameras. Computer vision-ECCV '92 : Second European Conference
on Computer Vision, Santa Margherita,Ligure, Italy, May 19-22, 1992 CALL
NUMBER: TA 1632 E89 1994
R. I. Hartley Euclidian reconstruction from uncalibrated views. Proc.
of the Second Europe-US Workshop on Invariance, Ponta Delgada, Azores,187-202,October
1993,
R. I. Hartley In defence of the 8-point algorithm Fifth International
Conference on Computer Vision 1064-1070 IEEE Computer Society Press 1995
The following assume error-free data:
Longuet-Higgings. Novel techniques of non-destructive examination :
proceedings of a Royal Society Discussion Meeting held on 9 and 10 July
1985 ISBN: 0854032924
X.Zhuang. A simplification to linear two-view motion algorithm Computer
Vision, Graphics and Image Processing, 49 (1989) pp. 175-178.
See also [Kanatani 1993] below.
3D Scene reconstruction from Video images
Nicolas Ayache, Artificial Vision for Mobile Robots (translation from
French, 1989, INRIA), The MIT Press, London, 1991.
Ayache, N. and Lustman, F. (1991), `Trinocular Stereo Vision for Robotics',
IEEE Trans. on Pattern Analysis and Machine Intelligence 13(1), 73-85.
A. Blake and A. Yuille eds. Active Vision, The MIT Press, London,
1992.
Buffa, M., Faugeras, O. D. ; Zhang, Z. (1992), Obstacle Avoidance and
Trajectory Planning for an Indoor Mobile Robot Using Stereo Vision and
Delaunay Triangulation, I. Masaki, ed., `Vision-based Vehicle Guidance',
Springer-Verlag, chapter 13, pp. 268-283.
Faugeras, O. Three-Dimensional Computer Vision - A Geometric Viewpoint,
The MIT Press, Cambridge, 1993 Oxford University Press, 1993.
K. Kanatani. Geometric Computation for Machine Vision,
A. Mitiche. Computational Analysis of Visual Motion, Plenum Press,
New York and London 1994.
J. Weng, T.S. Huang and N. Ahuja. Motion and Structure from Image
Sequences , Springer Series in Information Sciences, vol. 29, Springer-Verlag,
New York, 1992.
Z. Zhang and O. Faugeras. 3D Dynamic Scene Analysis , Springer
Series in Information Sciences, vol. 27, Springer-Verlag, New York, 1992.
Y. Zhou and R. Chellappa. Artificial Neural Networks for Computer
Vision, Research Notes in Neural Computing, vol. 5, Springer-Verlag,
New York, 1992.
Other
W.W Armstrong, M.M.Thomas, Adaptive Logic Networks, sect. in
C1.8 in Handbook of Neural Computation, Institute of Physics Publishing
and Oxford University Press - USA, 1996 (looseleaf)
A. Albert. Regression and Moore-Penrose pseudoinverse, Academic
New-York, 1972.
P. Maybeck. Stochastic Models, Estimation and Control, Vol. 1
and 2, Academic Press, London, 1979
Contact me for the full list, reprints and copies.
As can be seen, all described above stages affect the design and the representation
of the occupancy functions.
Problem 1:
1) At time t=0, a sensor reading tells us that ``probably point
A is non-occupied''.
2) At time t=1, a sensor reading tells us that ``point A
is very likely to be occupied''.
3) The questions are
Q1a: Is point A occupied or not? and
Q1b: How sure we are in asserting that?
:
When no data are given,
.
,
given that a point is occupied and non-occupied (empty):
and
.
rather then only posterior probability P(Occ|D).
In this case, the values of the occupancy function defined as
range from 0 (absolutely impossible) to
(absolutely true).
It is this assignment of occupancy function used by Moravec in his
latest research in building 3D evidence grids.
where m(x) is the basic probability assignment,
satisfying the condition
and is determined by a sensor model.
Q2a: How small should the resolution of the occupancy function
of the world be?
Q2b: What is the optimal way of representing the occupancy function?
onto an interval, let say, [0,1] (in R)?
Q3: How to assign the confidence and the basic probabilistic
assignments to the data obtained from a single camera stereo ?
Figure 7.a
shows a model of the visual sensor for this ideal case.
and
,
where Focus is the focus of the camera.
where R is the rotation matrix and
is the translation vector of the camera.
:
Then the essential matrix E needs to be decomposed into R
and
.
Here, the singular value decomposition is used to find the eigen-vector
corresponding to the smallest eigen-value of a matrix. For more detail
see [Essential Matrix].
can be estimated as
where
is the disparity error, b is the baseline, f is the focal
length.
.
is produced using a mask of size N. This vector is then compared
with a template vector
,
obtained from the feature pixel in the first frame (see Figure 3).
The pixel that produces the best match is considered to correspond to the
same feature.
and
is represented by the error function E, which is chosen as the Euclidean
distance between
and
squared, i.e.
in the case of static robot, and it is also a function of
in the case of a moving robot.
All of these parts require theoretical investigation. However, we consider
a real robot-environment setup described in Section 2.1 for the purpose
of verifying and demonstrating the theoretical conclusions.