- •Preface
- •Biological Vision Systems
- •Visual Representations from Paintings to Photographs
- •Computer Vision
- •The Limitations of Standard 2D Images
- •3D Imaging, Analysis and Applications
- •Book Objective and Content
- •Acknowledgements
- •Contents
- •Contributors
- •2.1 Introduction
- •Chapter Outline
- •2.2 An Overview of Passive 3D Imaging Systems
- •2.2.1 Multiple View Approaches
- •2.2.2 Single View Approaches
- •2.3 Camera Modeling
- •2.3.1 Homogeneous Coordinates
- •2.3.2 Perspective Projection Camera Model
- •2.3.2.1 Camera Modeling: The Coordinate Transformation
- •2.3.2.2 Camera Modeling: Perspective Projection
- •2.3.2.3 Camera Modeling: Image Sampling
- •2.3.2.4 Camera Modeling: Concatenating the Projective Mappings
- •2.3.3 Radial Distortion
- •2.4 Camera Calibration
- •2.4.1 Estimation of a Scene-to-Image Planar Homography
- •2.4.2 Basic Calibration
- •2.4.3 Refined Calibration
- •2.4.4 Calibration of a Stereo Rig
- •2.5 Two-View Geometry
- •2.5.1 Epipolar Geometry
- •2.5.2 Essential and Fundamental Matrices
- •2.5.3 The Fundamental Matrix for Pure Translation
- •2.5.4 Computation of the Fundamental Matrix
- •2.5.5 Two Views Separated by a Pure Rotation
- •2.5.6 Two Views of a Planar Scene
- •2.6 Rectification
- •2.6.1 Rectification with Calibration Information
- •2.6.2 Rectification Without Calibration Information
- •2.7 Finding Correspondences
- •2.7.1 Correlation-Based Methods
- •2.7.2 Feature-Based Methods
- •2.8 3D Reconstruction
- •2.8.1 Stereo
- •2.8.1.1 Dense Stereo Matching
- •2.8.1.2 Triangulation
- •2.8.2 Structure from Motion
- •2.9 Passive Multiple-View 3D Imaging Systems
- •2.9.1 Stereo Cameras
- •2.9.2 3D Modeling
- •2.9.3 Mobile Robot Localization and Mapping
- •2.10 Passive Versus Active 3D Imaging Systems
- •2.11 Concluding Remarks
- •2.12 Further Reading
- •2.13 Questions
- •2.14 Exercises
- •References
- •3.1 Introduction
- •3.1.1 Historical Context
- •3.1.2 Basic Measurement Principles
- •3.1.3 Active Triangulation-Based Methods
- •3.1.4 Chapter Outline
- •3.2 Spot Scanners
- •3.2.1 Spot Position Detection
- •3.3 Stripe Scanners
- •3.3.1 Camera Model
- •3.3.2 Sheet-of-Light Projector Model
- •3.3.3 Triangulation for Stripe Scanners
- •3.4 Area-Based Structured Light Systems
- •3.4.1 Gray Code Methods
- •3.4.1.1 Decoding of Binary Fringe-Based Codes
- •3.4.1.2 Advantage of the Gray Code
- •3.4.2 Phase Shift Methods
- •3.4.2.1 Removing the Phase Ambiguity
- •3.4.3 Triangulation for a Structured Light System
- •3.5 System Calibration
- •3.6 Measurement Uncertainty
- •3.6.1 Uncertainty Related to the Phase Shift Algorithm
- •3.6.2 Uncertainty Related to Intrinsic Parameters
- •3.6.3 Uncertainty Related to Extrinsic Parameters
- •3.6.4 Uncertainty as a Design Tool
- •3.7 Experimental Characterization of 3D Imaging Systems
- •3.7.1 Low-Level Characterization
- •3.7.2 System-Level Characterization
- •3.7.3 Characterization of Errors Caused by Surface Properties
- •3.7.4 Application-Based Characterization
- •3.8 Selected Advanced Topics
- •3.8.1 Thin Lens Equation
- •3.8.2 Depth of Field
- •3.8.3 Scheimpflug Condition
- •3.8.4 Speckle and Uncertainty
- •3.8.5 Laser Depth of Field
- •3.8.6 Lateral Resolution
- •3.9 Research Challenges
- •3.10 Concluding Remarks
- •3.11 Further Reading
- •3.12 Questions
- •3.13 Exercises
- •References
- •4.1 Introduction
- •Chapter Outline
- •4.2 Representation of 3D Data
- •4.2.1 Raw Data
- •4.2.1.1 Point Cloud
- •4.2.1.2 Structured Point Cloud
- •4.2.1.3 Depth Maps and Range Images
- •4.2.1.4 Needle map
- •4.2.1.5 Polygon Soup
- •4.2.2 Surface Representations
- •4.2.2.1 Triangular Mesh
- •4.2.2.2 Quadrilateral Mesh
- •4.2.2.3 Subdivision Surfaces
- •4.2.2.4 Morphable Model
- •4.2.2.5 Implicit Surface
- •4.2.2.6 Parametric Surface
- •4.2.2.7 Comparison of Surface Representations
- •4.2.3 Solid-Based Representations
- •4.2.3.1 Voxels
- •4.2.3.3 Binary Space Partitioning
- •4.2.3.4 Constructive Solid Geometry
- •4.2.3.5 Boundary Representations
- •4.2.4 Summary of Solid-Based Representations
- •4.3 Polygon Meshes
- •4.3.1 Mesh Storage
- •4.3.2 Mesh Data Structures
- •4.3.2.1 Halfedge Structure
- •4.4 Subdivision Surfaces
- •4.4.1 Doo-Sabin Scheme
- •4.4.2 Catmull-Clark Scheme
- •4.4.3 Loop Scheme
- •4.5 Local Differential Properties
- •4.5.1 Surface Normals
- •4.5.2 Differential Coordinates and the Mesh Laplacian
- •4.6 Compression and Levels of Detail
- •4.6.1 Mesh Simplification
- •4.6.1.1 Edge Collapse
- •4.6.1.2 Quadric Error Metric
- •4.6.2 QEM Simplification Summary
- •4.6.3 Surface Simplification Results
- •4.7 Visualization
- •4.8 Research Challenges
- •4.9 Concluding Remarks
- •4.10 Further Reading
- •4.11 Questions
- •4.12 Exercises
- •References
- •1.1 Introduction
- •Chapter Outline
- •1.2 A Historical Perspective on 3D Imaging
- •1.2.1 Image Formation and Image Capture
- •1.2.2 Binocular Perception of Depth
- •1.2.3 Stereoscopic Displays
- •1.3 The Development of Computer Vision
- •1.3.1 Further Reading in Computer Vision
- •1.4 Acquisition Techniques for 3D Imaging
- •1.4.1 Passive 3D Imaging
- •1.4.2 Active 3D Imaging
- •1.4.3 Passive Stereo Versus Active Stereo Imaging
- •1.5 Twelve Milestones in 3D Imaging and Shape Analysis
- •1.5.1 Active 3D Imaging: An Early Optical Triangulation System
- •1.5.2 Passive 3D Imaging: An Early Stereo System
- •1.5.3 Passive 3D Imaging: The Essential Matrix
- •1.5.4 Model Fitting: The RANSAC Approach to Feature Correspondence Analysis
- •1.5.5 Active 3D Imaging: Advances in Scanning Geometries
- •1.5.6 3D Registration: Rigid Transformation Estimation from 3D Correspondences
- •1.5.7 3D Registration: Iterative Closest Points
- •1.5.9 3D Local Shape Descriptors: Spin Images
- •1.5.10 Passive 3D Imaging: Flexible Camera Calibration
- •1.5.11 3D Shape Matching: Heat Kernel Signatures
- •1.6 Applications of 3D Imaging
- •1.7 Book Outline
- •1.7.1 Part I: 3D Imaging and Shape Representation
- •1.7.2 Part II: 3D Shape Analysis and Processing
- •1.7.3 Part III: 3D Imaging Applications
- •References
- •5.1 Introduction
- •5.1.1 Applications
- •5.1.2 Chapter Outline
- •5.2 Mathematical Background
- •5.2.1 Differential Geometry
- •5.2.2 Curvature of Two-Dimensional Surfaces
- •5.2.3 Discrete Differential Geometry
- •5.2.4 Diffusion Geometry
- •5.2.5 Discrete Diffusion Geometry
- •5.3 Feature Detectors
- •5.3.1 A Taxonomy
- •5.3.2 Harris 3D
- •5.3.3 Mesh DOG
- •5.3.4 Salient Features
- •5.3.5 Heat Kernel Features
- •5.3.6 Topological Features
- •5.3.7 Maximally Stable Components
- •5.3.8 Benchmarks
- •5.4 Feature Descriptors
- •5.4.1 A Taxonomy
- •5.4.2 Curvature-Based Descriptors (HK and SC)
- •5.4.3 Spin Images
- •5.4.4 Shape Context
- •5.4.5 Integral Volume Descriptor
- •5.4.6 Mesh Histogram of Gradients (HOG)
- •5.4.7 Heat Kernel Signature (HKS)
- •5.4.8 Scale-Invariant Heat Kernel Signature (SI-HKS)
- •5.4.9 Color Heat Kernel Signature (CHKS)
- •5.4.10 Volumetric Heat Kernel Signature (VHKS)
- •5.5 Research Challenges
- •5.6 Conclusions
- •5.7 Further Reading
- •5.8 Questions
- •5.9 Exercises
- •References
- •6.1 Introduction
- •Chapter Outline
- •6.2 Registration of Two Views
- •6.2.1 Problem Statement
- •6.2.2 The Iterative Closest Points (ICP) Algorithm
- •6.2.3 ICP Extensions
- •6.2.3.1 Techniques for Pre-alignment
- •Global Approaches
- •Local Approaches
- •6.2.3.2 Techniques for Improving Speed
- •Subsampling
- •Closest Point Computation
- •Distance Formulation
- •6.2.3.3 Techniques for Improving Accuracy
- •Outlier Rejection
- •Additional Information
- •Probabilistic Methods
- •6.3 Advanced Techniques
- •6.3.1 Registration of More than Two Views
- •Reducing Error Accumulation
- •Automating Registration
- •6.3.2 Registration in Cluttered Scenes
- •Point Signatures
- •Matching Methods
- •6.3.3 Deformable Registration
- •Methods Based on General Optimization Techniques
- •Probabilistic Methods
- •6.3.4 Machine Learning Techniques
- •Improving the Matching
- •Object Detection
- •6.4 Quantitative Performance Evaluation
- •6.5 Case Study 1: Pairwise Alignment with Outlier Rejection
- •6.6 Case Study 2: ICP with Levenberg-Marquardt
- •6.6.1 The LM-ICP Method
- •6.6.2 Computing the Derivatives
- •6.6.3 The Case of Quaternions
- •6.6.4 Summary of the LM-ICP Algorithm
- •6.6.5 Results and Discussion
- •6.7 Case Study 3: Deformable ICP with Levenberg-Marquardt
- •6.7.1 Surface Representation
- •6.7.2 Cost Function
- •Data Term: Global Surface Attraction
- •Data Term: Boundary Attraction
- •Penalty Term: Spatial Smoothness
- •Penalty Term: Temporal Smoothness
- •6.7.3 Minimization Procedure
- •6.7.4 Summary of the Algorithm
- •6.7.5 Experiments
- •6.8 Research Challenges
- •6.9 Concluding Remarks
- •6.10 Further Reading
- •6.11 Questions
- •6.12 Exercises
- •References
- •7.1 Introduction
- •7.1.1 Retrieval and Recognition Evaluation
- •7.1.2 Chapter Outline
- •7.2 Literature Review
- •7.3 3D Shape Retrieval Techniques
- •7.3.1 Depth-Buffer Descriptor
- •7.3.1.1 Computing the 2D Projections
- •7.3.1.2 Obtaining the Feature Vector
- •7.3.1.3 Evaluation
- •7.3.1.4 Complexity Analysis
- •7.3.2 Spin Images for Object Recognition
- •7.3.2.1 Matching
- •7.3.2.2 Evaluation
- •7.3.2.3 Complexity Analysis
- •7.3.3 Salient Spectral Geometric Features
- •7.3.3.1 Feature Points Detection
- •7.3.3.2 Local Descriptors
- •7.3.3.3 Shape Matching
- •7.3.3.4 Evaluation
- •7.3.3.5 Complexity Analysis
- •7.3.4 Heat Kernel Signatures
- •7.3.4.1 Evaluation
- •7.3.4.2 Complexity Analysis
- •7.4 Research Challenges
- •7.5 Concluding Remarks
- •7.6 Further Reading
- •7.7 Questions
- •7.8 Exercises
- •References
- •8.1 Introduction
- •Chapter Outline
- •8.2 3D Face Scan Representation and Visualization
- •8.3 3D Face Datasets
- •8.3.1 FRGC v2 3D Face Dataset
- •8.3.2 The Bosphorus Dataset
- •8.4 3D Face Recognition Evaluation
- •8.4.1 Face Verification
- •8.4.2 Face Identification
- •8.5 Processing Stages in 3D Face Recognition
- •8.5.1 Face Detection and Segmentation
- •8.5.2 Removal of Spikes
- •8.5.3 Filling of Holes and Missing Data
- •8.5.4 Removal of Noise
- •8.5.5 Fiducial Point Localization and Pose Correction
- •8.5.6 Spatial Resampling
- •8.5.7 Feature Extraction on Facial Surfaces
- •8.5.8 Classifiers for 3D Face Matching
- •8.6 ICP-Based 3D Face Recognition
- •8.6.1 ICP Outline
- •8.6.2 A Critical Discussion of ICP
- •8.6.3 A Typical ICP-Based 3D Face Recognition Implementation
- •8.6.4 ICP Variants and Other Surface Registration Approaches
- •8.7 PCA-Based 3D Face Recognition
- •8.7.1 PCA System Training
- •8.7.2 PCA Training Using Singular Value Decomposition
- •8.7.3 PCA Testing
- •8.7.4 PCA Performance
- •8.8 LDA-Based 3D Face Recognition
- •8.8.1 Two-Class LDA
- •8.8.2 LDA with More than Two Classes
- •8.8.3 LDA in High Dimensional 3D Face Spaces
- •8.8.4 LDA Performance
- •8.9 Normals and Curvature in 3D Face Recognition
- •8.9.1 Computing Curvature on a 3D Face Scan
- •8.10 Recent Techniques in 3D Face Recognition
- •8.10.1 3D Face Recognition Using Annotated Face Models (AFM)
- •8.10.2 Local Feature-Based 3D Face Recognition
- •8.10.2.1 Keypoint Detection and Local Feature Matching
- •8.10.2.2 Other Local Feature-Based Methods
- •8.10.3 Expression Modeling for Invariant 3D Face Recognition
- •8.10.3.1 Other Expression Modeling Approaches
- •8.11 Research Challenges
- •8.12 Concluding Remarks
- •8.13 Further Reading
- •8.14 Questions
- •8.15 Exercises
- •References
- •9.1 Introduction
- •Chapter Outline
- •9.2 DEM Generation from Stereoscopic Imagery
- •9.2.1 Stereoscopic DEM Generation: Literature Review
- •9.2.2 Accuracy Evaluation of DEMs
- •9.2.3 An Example of DEM Generation from SPOT-5 Imagery
- •9.3 DEM Generation from InSAR
- •9.3.1 Techniques for DEM Generation from InSAR
- •9.3.1.1 Basic Principle of InSAR in Elevation Measurement
- •9.3.1.2 Processing Stages of DEM Generation from InSAR
- •The Branch-Cut Method of Phase Unwrapping
- •The Least Squares (LS) Method of Phase Unwrapping
- •9.3.2 Accuracy Analysis of DEMs Generated from InSAR
- •9.3.3 Examples of DEM Generation from InSAR
- •9.4 DEM Generation from LIDAR
- •9.4.1 LIDAR Data Acquisition
- •9.4.2 Accuracy, Error Types and Countermeasures
- •9.4.3 LIDAR Interpolation
- •9.4.4 LIDAR Filtering
- •9.4.5 DTM from Statistical Properties of the Point Cloud
- •9.5 Research Challenges
- •9.6 Concluding Remarks
- •9.7 Further Reading
- •9.8 Questions
- •9.9 Exercises
- •References
- •10.1 Introduction
- •10.1.1 Allometric Modeling of Biomass
- •10.1.2 Chapter Outline
- •10.2 Aerial Photo Mensuration
- •10.2.1 Principles of Aerial Photogrammetry
- •10.2.1.1 Geometric Basis of Photogrammetric Measurement
- •10.2.1.2 Ground Control and Direct Georeferencing
- •10.2.2 Tree Height Measurement Using Forest Photogrammetry
- •10.2.2.2 Automated Methods in Forest Photogrammetry
- •10.3 Airborne Laser Scanning
- •10.3.1 Principles of Airborne Laser Scanning
- •10.3.1.1 Lidar-Based Measurement of Terrain and Canopy Surfaces
- •10.3.2 Individual Tree-Level Measurement Using Lidar
- •10.3.2.1 Automated Individual Tree Measurement Using Lidar
- •10.3.3 Area-Based Approach to Estimating Biomass with Lidar
- •10.4 Future Developments
- •10.5 Concluding Remarks
- •10.6 Further Reading
- •10.7 Questions
- •References
- •11.1 Introduction
- •Chapter Outline
- •11.2 Volumetric Data Acquisition
- •11.2.1 Computed Tomography
- •11.2.1.1 Characteristics of 3D CT Data
- •11.2.2 Positron Emission Tomography (PET)
- •11.2.2.1 Characteristics of 3D PET Data
- •Relaxation
- •11.2.3.1 Characteristics of the 3D MRI Data
- •Image Quality and Artifacts
- •11.2.4 Summary
- •11.3 Surface Extraction and Volumetric Visualization
- •11.3.1 Surface Extraction
- •Example: Curvatures and Geometric Tools
- •11.3.2 Volume Rendering
- •11.3.3 Summary
- •11.4 Volumetric Image Registration
- •11.4.1 A Hierarchy of Transformations
- •11.4.1.1 Rigid Body Transformation
- •11.4.1.2 Similarity Transformations and Anisotropic Scaling
- •11.4.1.3 Affine Transformations
- •11.4.1.4 Perspective Transformations
- •11.4.1.5 Non-rigid Transformations
- •11.4.2 Points and Features Used for the Registration
- •11.4.2.1 Landmark Features
- •11.4.2.2 Surface-Based Registration
- •11.4.2.3 Intensity-Based Registration
- •11.4.3 Registration Optimization
- •11.4.3.1 Estimation of Registration Errors
- •11.4.4 Summary
- •11.5 Segmentation
- •11.5.1 Semi-automatic Methods
- •11.5.1.1 Thresholding
- •11.5.1.2 Region Growing
- •11.5.1.3 Deformable Models
- •Snakes
- •Balloons
- •11.5.2 Fully Automatic Methods
- •11.5.2.1 Atlas-Based Segmentation
- •11.5.2.2 Statistical Shape Modeling and Analysis
- •11.5.3 Summary
- •11.6 Diffusion Imaging: An Illustration of a Full Pipeline
- •11.6.1 From Scalar Images to Tensors
- •11.6.2 From Tensor Image to Information
- •11.6.3 Summary
- •11.7 Applications
- •11.7.1 Diagnosis and Morphometry
- •11.7.2 Simulation and Training
- •11.7.3 Surgical Planning and Guidance
- •11.7.4 Summary
- •11.8 Concluding Remarks
- •11.9 Research Challenges
- •11.10 Further Reading
- •Data Acquisition
- •Surface Extraction
- •Volume Registration
- •Segmentation
- •Diffusion Imaging
- •Software
- •11.11 Questions
- •11.12 Exercises
- •References
- •Index
1 Introduction |
15 |
large-scale texture on the scene object can result in passive stereo giving low density 3D reconstructions, at least in the local regions where the surface texture or features (e.g. corners) are missing. This has a number of effects. Firstly, it makes it difficult to comprehensively determine the size and shape of the imaged object. Secondly, it is difficult to get good shape visualizations, when the imaged object is rendered from many different viewpoints. In contrast, as long as the surface is not too dark (low reflectivity) or specular, and does not have too many deep concavities (‘missing parts’), active stereo systems allow comprehensive shape measurements and give good renderings for multi-viewpoint visualizations. Thus, when the density of features is low, or the resolution of image sensing is low compared to the scale of the imaged texture, an active stereo system is the preferred solution. However, the need to scan a laser spot or stripe or to project a structured light pattern brings with it extra complexity and expense, and potential eye-safety issues. The use of spot or stripe scanning also brings with it additional reliability issues associated with moving parts.
There is another side to this discussion, which takes into account the availability of increasingly high resolution CCD/CMOS image sensors. For passive stereo systems, these now allow previously smooth surfaces to appear textured, at the higher resolution scale. A good example is the human face where the random pattern of facial pores can be extracted and hence used to solve the correspondence problem in a passive system. Of course, higher resolution sensors bring with them higher data rates and hence a higher computational burden. Thus, to achieve reasonable realtime performance, improved resolution is developing in tandem with faster processing architectures, such as GPUs, which are starting to have a big impact on dense, real-time passive stereo [56].
1.5 Twelve Milestones in 3D Imaging and Shape Analysis
In the development towards the current state-of-the-art in 3D imaging and analysis systems, we now outline a small selection of scientific and technological milestones. As we have pointed out, there are many historical precursors to this modern subject area, from the ancient Greeks referring to the optical projection of images (Aristotle, circa 350 BC) to Albrecht Duerer’s first mechanical perspective drawing (1525 CE) to Hauck’s establishment of the relationship between projective geometry and photogrammetry (1883 CE). Here, however, we will present a very small selection of relatively modern milestones21 from circa 1970 that are generally thought to fall within the fields of computer vision or computer graphics.
21Twelve milestones is a small number, with the selection somewhat subjective and open to debate. We are merely attempting to give a glimpse of the subject’s development and diversity, not a definitive and comprehensive history.
16 |
R. Koch et al. |
Fig. 1.6 Extracted stripe deformation when scanning a polyhedral object. Figure reprinted from [45] with permission
1.5.1 Active 3D Imaging: An Early Optical Triangulation System
The development of active rangefinders based on optical triangulation appears regularly in the literature from the early 1970s. In 1972, Shirai [45] presented a system that used a stripe projection system and a TV camera to recognize polyhedral objects. The stripe projector is rotated in steps so that a vertical plane of light passes over a polyhedral object of interest. A TV camera captures and stores the deformation of the projected stripe at a set of projection angles, as shown in Fig. 1.6. A set of processing steps enabled shapes to be recognized based on the interrelation of their scene planes. The assumption of polyhedral scene objects reduced the complexity of their processing, which suited the limitations of the available computational power at that time.
1.5.2 Passive 3D Imaging: An Early Stereo System
One of the first computer-based passive stereo systems employed in a clearly defined application was that of Gennery who, in 1977, presented a stereo vision system for an autonomous vehicle [18]. In this work, interest points are extracted and areabased correlation is used to find correspondences across the stereo image pair. The relative pose of the two cameras is computed from these matches, camera distortion is corrected and, finally, 3D points are triangulated from the correspondences. The extracted point cloud is then used in the autonomous vehicle application to distinguish between the ground and objects above the ground surface.
1.5.3 Passive 3D Imaging: The Essential Matrix
When 8 or more image correspondences are given for a stereo pair, captured by cameras with known intrinsic parameters, it is possible to linearly estimate the relative position and orientation (pose) of the two viewpoints from which the two projective
1 Introduction |
17 |
images were captured. Once these relative viewpoints are known, then the 3D structural information of the scene can be easily recovered from the image correspondences. The origin of this approach, where the relative viewpoint pose is captured in a 3 × 3 Essential Matrix, is due to Longuet-Higgins in his 1981 Nature paper entitled: A computer algorithm for reconstructing a scene from two projections [35]. It was previously known that relative camera viewpoints could be determined iteratively with just 5 correspondences, but the extra correspondences allowed LonguetHiggins to present a much more direct linear solution. Note that, only the direction of the displacement between the two viewpoints can be recovered, which means that the absolute scale of the 3D scene reconstruction is unknown. (Put simply, shape but not size is recovered, but the correct scale can be determined with a known dimension in the scene.)
1.5.4Model Fitting: The RANSAC Approach to Feature Correspondence Analysis
Matching corresponding features between images or surfaces is essential for both 3D shape reconstruction methods and 3D shape matching techniques. Selecting, for example, a set of 8 correct correspondences is vital to the estimation of the Essential Matrix. Unfortunately, this is a very difficult problem and often mismatches occur. Hence, robust selection of feature correspondences is of utmost importance. The 1981 seminal paper by M.A. Fishler and R.C. Bolles: Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography [37] opened the way to handle correspondence errors with a large percentage of outliers. From the available set of n candidate correspondences, which are matched on the basis of local properties, random subsets of p = 8 correspondences are drawn and a candidate Essential Matrix is computed for each. The other n − p candidate correspondences are then tested against the random set solutions and the solution with the largest ‘consensus’ (i.e. the most support in terms of the number of inliers) is selected as the best solution. Although computationally expensive, it yields excellent results. From the 1990s, many variants of this basic algorithm, with improved performance, have been employed in computer vision and 3D shape analysis.
1.5.5 Active 3D Imaging: Advances in Scanning Geometries
The practical development of 3D laser range sensors closely follows the availability of new electronic components and electro-optical technologies. A novel active triangulation method was proposed by Rioux in 1984 [40]. To obtain a large field of view using small triangulation angles, without sacrificing precision, the concept of synchronized scanners was proposed. Such a system has the advantage that the
18 |
R. Koch et al. |
number of ‘missing parts’ (i.e. the ‘shadow effect’) can be reduced. These occur where parts of the scene are not simultaneously accessible to both the laser and the image sensor. Using a special scanning mirror arrangement, both the emitted laser beam and receiver optics are rotated simultaneously, in a synchronized fashion, so that the laser spot in the sensor plane can be maintained closer to the image center, while the projected beam remains inherently in focus over a large depth of field and high resolution can be maintained despite a short physical baseline.
1.5.63D Registration: Rigid Transformation Estimation from 3D Correspondences
Given a set of surface correspondences between two 3D data sets of the same or similar objects in different poses, how do we compute the 6 degree of freedom rigid transformation between them? If we have this rotation and translation information, we can bring the two scans into alignment; a process called registration. In the second half of the 1980s, several researchers presented solutions to this problem; for example, both Faugeras and Hebert [15] and Horn [25] derived formulations where the rigid body rotation is represented using quaternions. In Horn’s work, an optimal unit quaternion (4-vector) is estimated as the eigenvector corresponding to the maximum eigenvalue of a matrix. Once the rotation has been estimated, it is trivial to compute the estimated translation using this rotation and the centroids of the two point clouds.
The singular value decomposition (SVD) approach of Arun et al. [2] is also very
widely used. Here, a 3 × 3 cross-covariance |
matrix H |
is formed using the correspon- |
||||||
T |
|
|||||||
dences |
and an SVD of this matrix, H |
= |
USV |
, yields the estimated rotation matrix |
||||
|
T |
. |
|
|
|
|
||
as ˆ = |
UV |
|
|
|
|
|
|
|
R |
|
|
|
|
|
|
|
Rigid body transformation estimation forms the core of rigid 3D registration algorithms, such as Iterative Closest Points, which is described next.
1.5.7 3D Registration: Iterative Closest Points
As long as three or more correspondences in a general position (non-collinear) are given between two overlapping 3D point clouds, then the resulting rigid body transformation can be estimated, using one of several methods, as previously mentioned. In 1992, Besl and McKay proposed the seminal iterative closest point (ICP) algorithm [5]. Algorithms based on the ICP algorithm are currently the de facto standard for rigid 3D shape registration tasks. The basis of the algorithm is that it iterates two steps until convergence: tentative correspondences establishment via ‘closest points’ across the two shapes, and rigid transformation parameters update. As long as the initial rotational and translational displacement between a pair of 3D shapes is sufficiently small, then convergence to a global minimum is always possible and
1 Introduction |
19 |
Fig. 1.7 Example of model reconstruction. Partial 3D views of the object of interest are acquired (left). After registration all the 3D views are transformed to the common reference system and merged (right). Figure generated by Alessandro Negrente, reproduced from [17]
high quality correspondences can be established. Over the last two decades, many variants of ICP have been proposed to improve the speed and accuracy of the registration process. An example of the registration required for model construction from partial 3D views is given in Fig. 1.7.
1.5.8Passive 3D Imaging: The Fundamental Matrix and Camera Self-calibration
In 1992 Luong, Faugeras, and Maybank extended the Essential Matrix to uncalibrated cameras through the Fundamental Matrix. While for the Essential Matrix estimation, the camera intrinsic parameters had to be known in advance, now arbitrary cameras could be used and calibrated from the image data alone. The papers by Faugeras:What can be seen in three dimensions with an uncalibrated stereo rig? [13] and by Faugeras, Luong and Maybank: Camera self-calibration: Theory and experiments [14] started a new research area within computer vision that today allows us to reconstruct large 3D environments from arbitrary image collections; for example, those taken by tourists and uploaded to the web. There are even web services available that enable us to simply upload our pictures of a scene and obtain 3D representations from them.
The basic algorithm for estimating the Fundamental Matrix from 8 correspondences was rather sensitive to correspondence errors and researchers were sceptical about its usability in noisy imaging conditions. This problem was tackled in 1995 by Richard Hartley in his famous work entitled In Defense of the 8-Point Algorithm [22] [23]. Hartley showed that image normalization is vital for practical Fundamental Matrix estimation.