Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

24

R. Koch et al.

of the Kinect-XBox gaming platform. To achieve high frame rates, extremely simple and computationally cheap 3D features were employed based on depth comparisons and randomized decision forest classifiers were trained and implemented to run on a GPU. The training data uses 100,000 captured poses, each of which generates a synthetic set of 15 base meshes, spanning a range of different body sizes, shapes, viewpoints, clothing and hairstyle. This system is an archetypal modern, successful 3D recognition system that shows what can be achieved when machine learning techniques, a large corpus of training data and fast computational architectures are brought together.

1.6 Applications of 3D Imaging

Probably the most mature techniques in range sensing are 3D scanner applications in geodetic, architectural and industrial surveys. Airborne LIDAR systems are being used routinely now to survey building structures in cities leading to the generation of digital elevation maps (DEMs). High-precision DEMs of rural regions and forest areas are also produced for various purposes. Multiple reflection response LIDAR systems record and evaluate multiple reflections in forest areas that arise from height differences between the solid forest floor and the foliage. These systems are combined with photogrammetric DEM estimation and semantic segmentation from aerial imagery to automate map generation (e.g. for navigation) and topographic map creation. Figure 1.12 depicts an example of a 3D DEM, mapped with texture from a satellite image. Chapters 9 and 10 will explicitly deal with these applications.

Ground-based 3D scanner stations are used to reconstruct 3D structures, such as buildings (indoor and outdoor) for the purpose of documentation and mensuration. One application area is preservation and electronic documentation of cultural heritage, such as scanning of 3D models from famous buildings, old castles, churches and archaeological artifacts. The conversion of these data into 3D models and their

Fig. 1.12 Bird’s eye view of a mountainous area from a 3D DEM mapped with texture from a satellite image. Copyright METI/NASA, reprinted with permission from ERSDAC (Earth Remote Sensing Data Analysis Center), http://www.ersdac.or.jp/GDEM/E/2.html

1 Introduction

25

Fig. 1.13 Example 3D scans from the Bosphorus dataset with occlusions, expression variations and pose variations [41]. Figure adapted from [11]

presentation by electronic means will help to preserve cultural diversity and will enable online access to such sites. Another application area is the precise 3D documentation of industrial sites and fabrication plants for as-built control and quality control. It is very important to document the changes in building structures and industrial plants, since this will greatly facilitate planning and integration of new structures. A challenge for reconstructing complete sites from range imaging devices is the need to fuse all surface data into a consistent 3D representation and to extract functional objects from the 3D points. This involves the registration of partial surfaces into complete models, as was illustrated in Fig. 1.7.

The handling of dynamic and time-varying 3D shapes is another emerging field which heavily relies on 3D imaging techniques. With the advent of range video cameras that capture depth at video frame rates, like the ToF 3D range cameras or the Kinect sensor, it becomes possible to observe deforming shapes over time and to model these deformations accordingly [27]. Of particular interest is the tracking and modeling of human motion, expression and behavior. Much work investigated traditional dynamic shape techniques, such as active motion capture systems using markers, or markerless multiview shape from silhouette for human motion capture and much has already been achieved regarding motion and behavior modeling. Novel and improved range imaging technology is now facilitating and significantly influencing motion and behavior modeling. As already discussed, a recent example is the very successful combination of the Kinect sensor with the XBox human motion capture system24 for interactive games. Prior approaches that utilized 2D images only are significantly inferior to the detailed tracking of body and limbs that is becoming available with range data.

In addition to human pose and gesture tracking, the analysis of human faces and face recognition systems are gaining importance. In particular, face recognition can be improved significantly if 3D face models and 3D face data are available. Systems based on 3D face shape are much more robust when dealing with changing illumination and pose, and facial views can be normalized for better recognition. Figure 1.13 shows a set of rendered face scans from the Bosphorus dataset [41]. Current research aims to provide 3D face recognition under occlusion, facial ex-

24Kinect and XBox are trademarks of Microsoft Corporation.

26

R. Koch et al.

Fig. 1.14 Depth-based 3D virtual studio application. Top left: Hybrid range-color camera system. Top right: one original color view of the scene. Bottom left: range image with mixed real and virtual content. Bottom right: automatic composition of real and virtual content, including mutual occlusion, color and shadow mixing. Images reprinted from [30] with permission

pression variations and pose variations. Chapter 8 discusses this research area in detail.

The exploitation of range data will open new opportunities to automate data processing and to address new application areas. One such area is digital post processing of film and video. The insertion of virtual content into a movie is an important part of film production and a substantial cost factor as well. Traditionally, 2D color keying is used to separate objects from the background, or to insert virtual content into a 3D scene and dynamic objects are even segmented out manually in the post production process. If range data is available, then the concept of depth keying can be applied, where a person is separated from its background by distance evaluation. This is much more robust than traditional 2D image-based color keying or motion segmentation approaches in unconstrained environments. As an example, Fig. 1.14 gives an overview of such a hybrid system that directly exploits a dynamic 3D range camera based on the time-of-flight principle, in addition to a set of color cameras for visualization [30]. This technique may be used in 3D virtual studios that mix virtual and real content for TV and film production. The system reconstructs a 3D environment model, together with depth and color of moving persons. Since all data contains both depth and color, it is easy to separate dynamically moving objects from the static environment and even to insert additional virtual content for video