Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
11
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

and manipulation languages for each metamodel of interest separately rather than strive for a generic-across- metamodels solution. With this approach, however, in addition to a set of particular MMt implementations I1,…,In for each metamodel M1,…, Mn, we will need to implement an even greater set of model translations Iij, since many of the MMt tasks are themselves heterogeneous, for example, integration of heterogeneous data sources on the Web or navigating from ER-diagrams to SQL schemas and then to their XML presentations. Developing and maintaining such a system would be really laborious.

Thus, genericness across metamodels appears to be a necessary working property of MMt systems. To realize it, we must find a model representation format that would combine metamodel universality with clear and compact encoding of particular models.

C.Heuristic gMMt—Discovering and specifying correspondences between models: As a rule, models involved in an MMt application present different views on the same data universe. Hence, we need to specify overlapping between models and present it as a model mapping. Here we face a highly nontrivial heuristic problem of discovering correspondences and discrepancies between models. Indeed, these models were often designed by different people in different contexts and for different goals. They may present the same data in different syntaxes, under different names, and in different formats. Particularly, the infamous problems of synonyms and homonyms in data schemas have to be resolved. In addition, data considered basic in one model can be derived in another model, that is, to be derivable/ computable from the basic data of the other model.

Generic Model Management

In particular, data in one model can be metadata for another model. All these correspondences must be discovered (the first phase of heuristic MMt) and specified in a suitable format provided by abstract MMt theory (second phase). Of course, the first phase cannot be fully automated. Yet we may hope that some sophisticated algorithms based on special AI techniques (particularly machine learning) could assist the human in the essentially heuristic job of model matching.

To illustrate the concepts introduced above, we consider a very simple example in Figure 2. Two simple relational schemas are presented in Column (a) of Figure 2. Suppose we know that table Person with primary key column pin and table Emplee with primary key e-num refer to the same real-world objects. In addition, suppose that Person.name is nothing but concatenation of

Emplee.l-name with Emplee.f-name. More formally, this description is presented in Column (b), where we use notation of Bernstein (2003) with minor distinctions. In detail, we have introduced a new model, whose each element is a relationship between the original models’ elements: Elements M, m1, and m3 are 1:1 and m2 is a 1:2 relationship (of course, general M-N relationships are also possible). In addition, elements m2 and m3 have expressions attached, which state, correspondingly, that Person.name is concatenation of Emplee.l- name and Emplee.f-name, and Emplee.age is equal to the difference between the current year and

Person.bdate.year.

Abstracting from this concrete specification, we come to the specification in Cell (c1) of Figure 2, where nodes denote models, and edges are relations between them.

Figure 2. Specifying correspondence between models (to be continued in Math-I, Figure 1, p. 352)

Table PERSON(

pin #

integer

name

string

b-date

date

);

 

Table EMPLEE (

e-num #

integer

f-name

string

l-name

string

age

integer

);

 

(a) two relational schemas to be related

 

 

 

M1

p1

Map12

p2 M2

Person

M

Emplee

 

 

 

 

pin :int

 

e-num :int

(c1)

… model correspondence

 

 

 

 

 

name :str

m2

f-name :str

 

p1

 

p2

 

 

 

M1

Map12

 

{name = l-name + “,_”

l-name:str

 

M2

 

 

 

 

q2

 

+ f-name

 

q1

 

 

b-date :date

m3

age :int

Map1Σ

 

Map2Σ

 

 

 

 

MΣ

q2Σ

 

 

 

 

q1Σ

 

{age = current.year

 

 

 

 

 

 

-- b-date.year}

 

 

(c2) …model merge

b) semi-formal concrete specification of their correspondence

c) semi-formal abstract

 

 

 

 

specifications of…

260

TEAM LinG

Generic Model Management

Each element of the abstract schema (c1) has an extent shown on the concrete schema (b): Extents of nodes are trees of elements, and extents of edges are sets of pairs (shown by dashed lines) of the corresponding elements.

Finally, Figure 2’s Cell (c2) shows an abstract specification of a model merge operation: It takes the model diagram (c1) as its input and produces three more nodes with associated four edges:

(MΣ , Map1Σ, Map2Σ ) = merge( M1, M2, Map12)

The concrete “extent” of this operation is a procedure that takes the configuration in Column (b) as its input, manipulates with its elements by predefined rules, and produces a new merge model together with mappings to original models (this is not shown in Column (b)). We will return to this example in Math-I and consider it in more details and a better notation.

In this simple example, stating correspondence between models was an easy task. In real situations it may be a far more complicated problem. Finding systematic solutions to such sorts of problems is the essence of heuristic gMMt. Particularly, the heuristic operation match takes two models as input and produces the model diagram on Figure 2(c1). Then it becomes the input for merge.

All three parts of gMMt are equally important in making it a working environment capable of running practical applications. It is worth stressing, however, that abstract MMt is the specificational cornerstone of the entire building. It provides patterns that, on one hand, must be implemented in a concrete (yet generic) representation and, on the other hand, should be instantiated by data obtained from heuristic algorithms. Compact and adequate abstract MMt patterns would greatly ease the job of the other two MMt components.

A BIT OF HISTORY AND THE CURRENT SITUATION

MMt problems probably appeared on the stage simultaneously with the first DBMSs. As the technology matured, their weight in the problem basket of data management has been gradually increasing until the recent Web invasion into the field made data integration and transformation—a key MMt problem—a key issue for the entire field.

In each particular area of MMt problems, the database community accumulated a certain experience in both theory and practice. However, inventing a modeling language or writing an MMt application (though laborious) turned out to be much easier heuristically than extracting a common core of particular cases and

abstracting it into a generic framework spanning a wide

class of languages and applications. Moreover, the very G formulation of desired genericness in precise terms turned out to be a problem because of the absence of language to discuss and specify generic notions (cf. Drew, King, McLeod, Rusinkiewicz, & Silberschatz,

1993).

Nevertheless, under the pressure of practical needs and with a noticeable similarity amongst methods and tools addressing different MMt tasks, particular fragments of the main gMMt ideas have been gradually emerging and developing. Of primary importance among them was the regained interest in mappings between models. The latter first appeared in the literature probably in Paolini and Pelagatti (1977) and Kalinichenko (1978) and then were forgotten until the mid-’90s, when they reappeared on the stage in Diskin and Kadish (1997) and Diskin (1998). These papers proposed an integral formal framework (including a generic notion of query language) for what they called “data-model-indepen- dent” metamodeling, and the framework was essentially based on mappings (arrows) between models. Diskin (1999) presented an even more abstract version of the arrow framework and demonstrated its usefulness for clarifying meta-metamodeling issues and formalizing relations between modeling languages, particularly, sublanguages constituting UML. Independently, a detailed study of mappings in the context of the relational data model was undertaken in Miller, Haas, and Hernandez (2000) and Yan, Miller, Haas, and Fagin (2001).

The crucial step towards the formation of gMMt as an integral discipline was made by Bernstein et al. (2000), who manifested the issue and described a detailed agenda of what needed to be done. They once again stated the primary role of model mappings for generic MMt, introduced a set of basic operations with models and mappings, and demonstrated their use in a few application scenarios. Also, they proposed an object tree structure as a generic presentation of models. In fact, Bernstein et al. outlined all three parts of gMMt and initiated a few lines of research in the field. Finally, they coined the very term “generic MMt,” thus establishing a new discipline.2

To date, the following results have been achieved. An impressive progress has been made in heuristic MMt, where a few sophisticated algorithms for model matching were elaborated and a few methods developed; see Rahm and Bernstein (2001) for a survey and Halevy and Madhavan (2003) for a discussion of promising novel ideas in the field. A few tools based on semi-automated methods were implemented and tested in practical industrial projects (Madhavan, Bernstein, & Rahm, 2001; Mork & Bernstein, 2004; Velegrakis, Miller, & Popa, 2003; Yan et al., 2001). There has been a great progress in

261

TEAM LinG

Generic Model Management

Table 2. Problems of generic MMt by Bernstein (2003)

Machineries are to be built for:

1. Filling the gap between models and

2. Representing

3. A general-purpose interface

mappings, which are syntactic structures,

models in a

between MMt operators and

and their semantics, where models serve

maximally generic

expression manipulation / inference

as templates for data instances and

yet tractable way.

engines (see Figure 2 for an

mappings govern translation of

 

example of expressions).

instances.

 

 

 

 

Clear and well-defined semantics is needed for:

4.

Operations of

 

5. Generic MMt as a whole. Particularly, evaluation

(a)

Mapping Composition;

of completeness of a given set of basic operators

is required. So far, the very notion of

(b)

Model Merge;

 

 

completeness of abstract MMt operations is not

(c)

Model Translation.

 

 

developed.

 

 

 

 

concrete MMt as well. A convenient graph-based generic presentation for models was developed, and a first full implementation of a gMMt environment was built on its base (Melnik, Rahm, & Bernstein, 2003).

As for abstract MMt, a set of basic operations was further elaborated and applied to a few MMt application scenarios (Bernstein, 2003; Bernstein & Rahm, 2000). Analysis of mappings strongly modulated by the relational data model was continued in Madhavan, Bernstein, Domingos, and Halevy (2002); Velegrakis et al. (2003); and Fagin, Kolaitis, Miller, and Popa (2003). Also, a general categorical framework for MMt, focused on a generic pattern for specifying constraints in data models, was proposed in Alagic and Bernstein (2001).3 However, despite a few partial advances, abstract MMt remains the least developed part in gMMt. Due to its fundamental role for the entire program, unsolved abstract MMt problems essentially hinder gMMt’s progress.

Problems to be Solved...

Bernstein (2003)—the most complete-to-date presentation of gMMt’s goals and means—considers the problems presented in Table 2 to be the most pressing and evaluates the agenda of their solution as requiring “many years and many research groups to develop” (p. 220).

Such a cautious estimation is not surprising: The desired genericness of specifications on one hand and their graph-based (rather than string-based) nature on the other hand lead to an entirely new sort of specification problem. Indeed, so far major theoretical achievements in database research were related to one or another data model, and there are well-developed notions of query, constraint, and instance specialized to a particular data model; e.g., relational (rigorously formalized) or a particular dialect of ER or OO (much less clearly formalized though). Abstract generalization of these notions in a rigorous way presents a new sort of problem, where such

well-known and deserved mathematical means as firstorder logic or relational algebra cannot help. Diskin, Kadish, and Piessens (1999) discuss the issue in detail.

... and Their Solutions: Category Theory vs. Generic MMt

Fortunately, software engineers and researchers do not need to develop specificational foundations for gMMt from scratch. Mathematical category theory (CT) is a discipline specially devoted to specifying operations with complex objects in an abstract generic way independent of the objects’ internal structure.

Categorically, everything that one wants to say about objects, one must say in terms of morphisms (or arrows or mappings) between objects. Such a reformulation is not evident and is sometimes really nontrivial, but the essential benefit is that the objects’ internal structure does not occur in specifications. This allows CT to design really generic patterns (necessarily graph-based) for specifying and manipulating complex structures, and these patterns can be readily employed in gMMt.

There are three major categorical prescriptions that jointly form a framework where each of the problems in Table 2 can be successfully managed. Moreover, this framework eliminates the three major obstacles to realizing a truly generic MMt environment, which so far have been considered insurmountable (Bernstein, 2003):

1.Operation of model translation is inevitably metamodel specific.

2.Mechanisms for dealing with expressions also must be metamodel specific.

3.A compromise between expressiveness of model representation and tractability of operations over it is inevitable.

In this section, we will only name the three concepts

262

TEAM LinG

Generic Model Management

Kleisly morphism, fibration and sketch–leaving their definitions and explanations of how they work for Math- I and Math-II.

Prescription 1 (for homogeneous abstract MMt)— Consider model mappings as Kleisly morphisms: This prescription sets up a proper framework for working with derived data and managing Problems 3 and 4(a, b) and a good part of Problem 5. In fact, Problems 3 and 4(a) are eliminated by reducing them to composition of Kleisly morphisms: Since Kleisly morphisms are functional mappings, their composition is easy and presents a form of term substitution. Kleisly morphisms put MMt on truly arrow (hence, generic) foundations and serve as a main vehicle of MMt’s categorization (see Math-I for details). Particularly, their convenience for specifying schema repositories, including relations between schemas (such as views and refinement) and operations over them (like schema integration), was shown in Cadish and Diskin (1996), Diskin and Kadish (1997), and Diskin (1998).

Prescription 2 (for heterogeneous abstract MMt)— In a heterogeneous environment, consider data as a fibration of the data instances over the universe of schemas, in its turn fibred over the universe of metaschemas: To realize this prescription, we need to choose some generic representation for data and schemas, for example, graphs. Then the fibrational view on data prescribes to supply (label) each item (an object identifier or a reference) in the data graph D with its type and organize the latter into a graph—the schema of the data S.4 Then, data can be seen as a graph mapping δ: D S. Being applied to the metalevel, the fibrational view suggests that in a heterogeneous universe, we need to keep each model with its metamodel and label each item of the model with its type from the metamodel. This way we come to a mapping σ: S M with M a graph specifying the metamodel. The fibrational view on data and metadata allows managing Problems 1 and 4(c) and, jointly with Prescription 1, Problem 5.

Following Prescriptions 1 and 2 we can build a specification framework where any reasonable MMt routine, including data and schema translation, integration, and evolution, is presented as a composition of a few basic diagram operations with data, schemas, metaschemas, and mappings between them. Definitions of these operations do not anyhow depend on the internal structure of data, schemas, and mappings. The basics of the approach are presented in Math-I and II.

Prescription 3 (for concrete MMt)—Consider models as (visual presentations of categorical) sketches and data as sketch instances in the category of sets and mappings5: It is mathematically proven that any data that can be specified formally can be specified by a sketch as well. Hence, the sketch format gives a really universal specification language of absolute expressiveness. Gen-

erally speaking, the sketch presentation of models is not

a must for gMMt, but it greatly simplifies many issues by G providing specifications combining the following advan-

tages:

Extreme compactness: A sketch consists of only three sorts of items—nodes, arrows, and diagram predicate declarations.

Clear and simple semantics: For data modeling, nodes are sets, and arrows are mappings between them.

Clear and simple formal definitions of model mapping: This property is especially valid for MMt.

Being applied to the metalevel, where models are data whose schemas are metamodels, Prescription 3 suggests to consider models as instances of sketches specifying metamodels. It allows us to apply many results of categorical logic to MMt problems. Details about sketches and their applications to data modeling can be found in Diskin (2003); Diskin and Kadish (2003); and Diskin, Kadish, Piessens, and Johnson (2000).

CONCLUSION

Generic MMt is an actively developing novel research field. Its creation has been motivated by purely practical reasons of effective metadata management. Nevertheless, gMMt quickly came to specification issues requiring fairly abstract mathematical means. Fortunately, the latter are already developed in category theory and can be readily adapted for MMt needs.

After a specification framework for gMMt is developed, the following two major tasks are in order. The first is to further develop a theory and algorithms for schema matching and to test them in various practical situations. It is a large and practically unlimited research and industrial field in-between DB and AI. The second major task is to build effective implementations of the gMMt environment and to estimate their efficiency in a variety of contexts and practical applications. Together these tasks form a broad experimental agenda still in its infancy. However, most likely it will rapidly mature and progress under the pressure of such important applications as data warehousing and the Web.

REFERENCES

Alagic, S., & Bernstein, P. (2001). A model theory for generic schema management. In Eighth International Workshop on Databases and Programming Languages,

263

TEAM LinG

DBPL’2001 (pp. 228-246).

Bernstein, P. (2003). Applying model management to classical metadata problems. In International Conference on Innovative Database Research, CIDR’2003 (pp. 209-220).

Bernstein, P., Halevy, A., & Pottinger, R. (2000). A vision for management of complex models. SIGMOD Record, 29(4), 55-63.

Bernstein, P., & Rahm, E. (2000). Data warehouse scenarios for model management. In International Conference on Entity-Relationship Modeling, ER’2000 (pp. 1- 15).

Cadish, B., & Diskin, Z. (1996). Heterogeneous view integration via sketches and equations. In Lecture notes in artificial intelligence: Vol. 1079. Foundations of Intelligent Systems: Ninth International Symposium, ISMIS’96 (pp. 603-612). Springer.

Diskin, Z. (1998). The arrow logic of meta-specifica- tions: A formalized graph-based framework for structuring schema repositories (Tech. Rep. No. TUM-I9820). In H. Kilov, B. Rumpe, & I. Simmonds (Eds.), Seventh OOPSLA Workshop on Behavioral Semantics of OO Business and System Specifications. Technische Universitaet Muenchen.

Diskin, Z. (1999). Abstract metamodeling, I: How to reason about metaand metamodeling in a formal way. In K. Baclawski, H. Kilov, A. Thalassinidis, & K. Tyson (Eds.), Eighth OOPSLA Workshop on Behavioral Semantics (pp. 32-48) Northeastern University.

Diskin, Z. (2003). Mathematics of UML: Making the odysseys of UML less dramatic. In K. Baclawski, & H. Kilov (Eds.), Practical foundations of business system specifications (pp. 145-178). Kluwer.

Diskin, Z., & Kadish, B. (1997). A graphical yet formalized framework for specifying view systems. In Advances in Databases and Information Systems, ADBIS’97:

Vol. 1. Regular papers (pp. 123-132)., St. Petersburg, Russia: Nevsky Dialect.

Diskin, Z., & Kadish, B. (2003). Variable set semantics for keyed generalized sketches: Formal semantics for object identity and abstract syntax for conceptual modeling.

Data & Knowledge Engineering, 47, 1-59.

Diskin, Z., Kadish, B., & Piessens, F. (1999). What vs. how of visual modeling: The arrow logic of graphic notations. In H. Kilov, B. Rumpe, & I. Simmonds (Eds.), Behavioral specifications in businesses and systems (pp. 27-44). Kluwer.

Generic Model Management

Diskin, Z., Kadish, B., Piessens, F., & Johnson, M. (2000). Universal arrow foundations for visual modeling. In Lecture notes in artificial intelligence: Vol. 1889. International Conference on the Theory and Applications of Diagrams (pp. 345-360). Springer.

Drew, P., King, R., McLeod, D., Rusinkiewicz, M., & Silberschatz, A. (1993). Report on the workshop on semantic heterogeneity and interoperation in multidatabase systems. SIGMOD Record, 22(3).

Fagin, R., Kolaitis, P., Miller, R., & Popa, L. (2003). Data exchange: Semantics and query answering. In International Conference on Database Theory, ICDT’2003

(pp. 207-224).

Goguen, J., & Burstall, R. (1992). Institutions: Abstract model theory for specification and programming. Journal of ACM, 39(1), 95-146.

Halevy, A., & Madhavan, J. (2003). Corpus-based knowledge representation. In International Joint Conference on Artificial Intelligence, IJCAI’03.

Kalinichenko, L. (1978). Data model transformation method based on axiomatic data model extension. In International Conference on Very Large Data Bases, VLDB’78

(pp. 549-555).

Madhavan, J., Bernstein, P., Domingos, P., & Halevy, A. (2002). Representing and reasoning about mappings between domain models. In 18th Conference of American Association for Artificial Intelligence, AAAI’2002.

Madhavan, J., Bernstein, P., & Rahm, E. (2001). Generic schema matching with Cupid. In International Conference on Very Large Databases, VLDB’2001.

Melnik, S., Rahm, E., & Bernstein, P. (2003). Developing metadata-intensive applications with Rondo. Journal of Web Semantics, 1, 47-74.

Miller, R., Haas, L., & Hernandez, M. (2000). Schema mapping as query discovery. In International Conference On Very Large Databases, VLDB’2000.

Mork, P., & Bernstein, P. (2004). Adapting a generic match algorithm to align ontologies of human anatomy. In International Conference on Data Engineering, ICDE’2004

(pp. 787-790).

Paolini, P., & Pelagatti, G. (1977). Formal definition of mapping in a database. In ACM SIGMOD Conference on Management of Data (pp. 40-46).

Rahm, E., & Bernstein, P. (2001). A survey of approaches to automatic schema matching. VLDB Journal, 10(4), 334350.

264

TEAM LinG

Generic Model Management

Velegrakis, Y., Miller, R., & Popa, L. (2003). Mapping adaptation under evolving schemas. In International Conference on Very Large Databases, VLDB’2003.

Yan, L.-L., Miller, R., Haas, L., & Fagin, R. (2001). Data-driven understanding and refinement of schema mappings. In ACM SIGMOD Conference on Management of Data.

Model Mapping or Morphism: Informally, a correspondence between models. Formally, it is a function G (arrow) sending elements of the source model to elements

of the target model. The graph of this function can be reified as a special model, often also called model mapping in the literature. We prefer the term correspondence model for reified mappings and use model mapping in the narrow sense of mappings-as-functions.

KEY TERMS

Data Model: A specification language where we can model and talk about models of a given class and their instances, manipulations with instances and models, queries, and constraints. A typical example of a welldeveloped data model is the relational model. ER data model, though less formal, is another example.

Data Transformation/Translation: Transformation of data stored in one format (metamodel) to another format.

Generic Model Management (gMMt): An MMt environment/system applicable to a wide range of MMt tasks across a wide range of metamodels.

Instance (of a Model): A set of data items structured according to the model. Data instantiate the model (or data schema in this context).

Metadata: Data specifying data, their structure, and presentation.

Metamodel (of a Given Class of Models): A model whose instances are the models of the class. Models of a given metamodel are called similar.

Model: A metadata artifact such as relational or XML schema, interface definition, or Web site layout.

ENDNOTES

1Partially supported by Grant 05.1526 from the Latvian Council of Science.

2Inventing a good term is not just linguistic sugar; rather, it is an act of stating a notion or concept, and it may significantly speed up the progress in the field.

3The framework is based on the notion of institution (Goguen & Burstall, 1992), which was invented for “model management” in mathematical logic (where models are logical theories). A discussion of institutions’ applicability to MMt problems can be found in Diskin (1999), where a bit more general version of institutions was developed for precise formulation of meta-metamodeling concepts.

4This is nothing but the XML view on data, and indeed an XML document can be viewed as a fibration of data over tags, as soon as we arrange data and the set of tags into graphs. In a sense, XML’s authors invented fibrations independently of category theorists.

5Sketch is a graph-based format used in category theory for specifying mathematical structures (see Math-I for details).

265

TEAM LinG

266

Geometric Quality in Geographic Information

José Francisco Zelasco

UNBA, Argentina and UNCPBA, Argentina

Gaspar Porta

Carthage College, USA

José Luis Fernandez Ausinaga

Pop-Vision Company, Argentina

INTRODUCTION

A typical way to build surface numerical models or Digital Elevation Models (DEMs) for Geographical Information Systems (GIS) is by processing the stereo images obtained from, for example, aerial photography or SPOT satellite data. These GIS can perform many computations involving their geographic databases. The quality control of a geographic database, and in particular the topological and geometric integrity, are, therefore, important topics (Guptill&Morrison,1995;Harvey,1997;Laurini&MilleretRaffort, 1993; Ubeda & Servigne, 1996). The geometric quality control of the stored DEM is what we are concerned with here. “Quality” means the geometric precision measured in terms of the difference between a DEM and a reference DEM (R-DEM). We assume the R-DEM is a faithful model of the actual surface. Its point density may be greater than the DEM point density.

BACKGROUND

In the literature, several unsatisfactory solutions were proposed for the DEM control with respect to a reference. A critical problem in the error estimation (evaluated using the difference referred to in the previous paragraph) is to establish for each selected point of the DEM the corresponding homologous point in the R-DEM. Other kinds of problems and errors are related to the existence of aberrant points, systematization, and so forth. These problems were studied for horizontal errors in maps in Abbas (1994), Grussenmeyer, Hottier, and Abbas (1994), and Hottier, (1996a). These authors found that the dissymmetry modelreference was the most important factor to determine homologous pairs.

Several solutions have been proposed for the punctual control method (e.g., recognition algorithms, filtering methods, adjustment of histograms to theoretical laws) without obtaining completely satisfactory results (Dunn, Harrison, & White, 1990; Lester & Chrisman, 1991). Later,

Abbas (1994), Grussenmeyer (1994), and Hottier (1996a) presented an alternative to the punctual control method: the linear control method based on the dissymmetry of the Hausdorff distance. Habib (1997) analyzed precision and accuracy in altimetry and mentioned some of the proposals of the last decade for the elevation control of quality.

In the case of the DEMs, to asses the difference that gives rise to the error we wish to compute, we need to identify without ambiguity each point M in the DEM with its homologous point P in the R-DEM. Two reasons that make this task difficult follow:

1.Many points M, such as those situated on regular sides, which are indistinguishable from their neighbors, are not identifiable. Potentially identifiable points are those located on sharp slope variations and possibly those with zero slope (tops, bottoms, and passes).

2.A point identifiable on the DEM is not necessarily identifiable in the R-DEM because of a difference in scale (generalization) or aberrant errors.

Finding identifiable homologous pairs of points is difficult for an operator, and automating this is a very delicate process.

In the case of the precision assessment of a DEM from an ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer; Hirano, Welch, & Lang, 2002), profiles or benchmarks (particular points) are used for the elevation accuracy assessment. This elevation accuracy assessment refers to the vertical error of the DEM. However, the horizontal error is not taken into account when using profiles and, also, there is a model-reference discrepancy and the choice and number of the benchmarks might not be a good stochastic sample of points.

In light of these difficulties, Zelasco and Ennis (2000) proposed an estimator for the variances of the vertical and horizontal errors. The values obtained for the estimator according to the type of terrain, its unevenness, and the number of sample points in the DEM were studied using

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Geometric Quality in Geographic Information

simulations (Zelasco, 2002; Zelasco, Ennis, & Blangino, 2001). Throughout these studies, it is assumed that the errors in elevation and in the horizontal plane are independent normal random variables—in agreement with Liapounov’s theorem (Hottier, 1996b). This proposed method is called the Perpendicular Distance Estimation Method (PDEM).

Let e(Mk) = Mk - Pk where Mk, Pk are the k-selected homologous pairs of points in the DEM and R-DEM, respectively. Estimating σ2(e) (variance of the error as distance between M and P) without measuring each vector e(Mk) completely, is the key advantage of the proposed PDEM. Only the projection of each e(Mk) in particular directions matters. Given an Mk it becomes unnecessary to find the homologous Pk in the R-DEM. How this is done is the subject of the following section.

For three-dimensional DEMs built with the usual techniques, such as photogrammetry or remote sensing, there are expected normal deviations in altimetry (σz) and in the horizontal plane (σp) that are a priori known. They are generally different from each other due to the manner in which the values of the z coordinate (altimetry) and the x,y coordinates (the horizontal plane) are evaluated. The goal of this PDEM is to evaluate the actual errors (a posteriori) of a given DEM.

THE QUALITY EVALUATION

METHOD

Description of the PDEM

The gap between the DEM and the R-DEM is represented by a function that links each point of the DEM to the vector e(M) = M-P in R3 where P is the homologous point to M in the R-DEM. M and P represent the same relief element; however, P is not vertically aligned with M. The DEM precision is a function of the vectors e(Mk) of a sample of {Mk} points. This function is used in the construction of the covariance matrix σ2(e). If the errors in the three directions are different, for instance, in interferometry radar, the rank of σ2(e) is three. Its eigenvalues are the variances in the three main coordinates.

When the errors in the orthogonal directions in the plane are equal (stereo images techniques), we can consider that the rank of the covariance matrix σ2(e) is two. The

eigenvalues are the variances σz2 and σx2 = σy2 = σx,y2. In a rotated frame of reference, the new covariance

matrix will no longer be diagonal. The new components are expressed in terms of the eigenvalues of σ2. The following expression corresponds to the first component σx2 of the new covariance matrix:

σ

2

= a

2σ 2

+ a

2σ 2

+ a

2σ 2

(1)

 

 

 

 

11 x

 

12 y

 

13 z

 

G

 

 

 

 

 

 

 

 

 

where a1j are the managing cosines of the unit vector in the coordinate of the rotated reference frame.

The value of the variance in the direction is equal to the square of the distance between the two planes normal to the direction: one tangent to the distribution indicative ellipsoid (whose parameters are a = σx; b = σy; c = σz) and the other passing through its center.

In Figure 1, the plane normal to the x’ direction and tangent to the distribution indicative ellipsoid is drawn in the general case.

The algebraic form of equation (1) is used to obtain the variance in any direction d, replacing the coefficients aij with the managing cosines (cos α, cos >, cos γ) corresponding to this direction d:

σ

2 = σ

2cos2α + σ

2cos2> + σ 2cos2γ

(2)

d

x

y

z

 

σd2 can be estimated with ei2/n 0 < i < I where the ei are the error components in the d direction.

We construct a mesh of triangles from the points of the R-DEM. We call this a model of the surface. Given a point M in the DEM, its projection onto the x,y plane is inside the projection of a unique triangle T from this model. We call this the corresponding triangle T to M. The ei component in the direction di is the distance di from each point to the plane of the corresponding triangle. The squares of the distances di, between each point of the model and the plane of the corresponding triangle, allow us to establish a least squares estimator for the assessment of (σx2, σy2, σz2).

Recall that if the errors in the horizontal plane are independent of the direction, the problem is reduced from three to two unknowns (see Figure 2). The σx=σy and the distribution indicative ellipsoid is a revolution ellipsoid. The maximum slope direction in each R-DEM triangle (equal to the angle made by the normal to the triangle and the z coordinate) will determine the managing cosines direction in two dimensions. The position of the homologous point in the R-DEM does not need to be known. This is the strength of the PDEM. By having determined the corresponding triangle, we capture the required information without needing to find the particular homologous

Figure 1. Distribution indicative ellipsoid and tangent plane Π

267

TEAM LinG

point of the R-DEM. Therefore, it is only needed to determine the distance di, which is the error component in the direction normal to the triangle.

From (2) and with

σ

2 = σ 2

=σ

2

(3)

x

y

 

x,y

 

we get

σ

2 = σ

x,y

2(cos2α+cos2>) + σ 2cos2γ

(4)

d

 

z

 

and finally

σ

2 = σ

x,y

2sin2γ + σ 2cos2

(5)

d

 

z

 

If we assume that we have only one value of the

distance di for each direction di, we may consider that di2

is an estimator of σd2, so we get the solution with the i relations

d

2 = σ

x,y

2sin2γ + σ 2cos2γ,

(6)

i

 

z

 

which is more precise than using σd2 estimated by means ei2/n 0 < i < I. Additionally this avoids having to give a different weight to each equation.

HOW TO EMPLOY THE PDEM

Recall that an R-DEM will be used as the control for a DEM, real or simulated. Also recall that an R-DEM, being a discrete set, allows constructing a mesh of K triangles defined by the points on the reference surface. Each triangle Tk, (k = 1,…,K), belongs to a plane which has a normal unitary vector nk, (k = 1,…,K).

The mesh of K triangles of the R-DEM is constructed using, for instance, the Delaunay method in two dimensions.

In the case of a simulation, we have to

determine the mass center of each triangle.

adopt a standard deviation for each coordinate σx, σy, and σz, and add three random normal variables to the

Figure 2. Distribution indicative ellipse and tangent straight line ρ

Geometric Quality in Geographic Information

center mass coordinates to simulate the points of the DEM to be evaluated. With σx=σy we evaluate σx,y (horizontal plane standard deviation). In this case, the distribution indicative ellipsoid is a revolution surface.

In the case of a real DEM, we have to

determine the triangle of the R-DEM containing the homologous point of each point of the DEM. Recall that the procedure is as follows: Given a DEM point

Ms, its homologous point P in the R-DEM will be in a plane define by the triangle Ts, if the projection of the point Ms on the x,y plane is inside the projection of the triangle Ts on the same plane. We establish a practical limit to avoid border problems given by the a priori standard deviation σp (in the horizontal plane), which allows us to select the points of the DEM in the sample. If the horizontal distance be-

tween the projection of Ms and the limits of the projected triangle is at least three times greater

than σp, the point is accepted. Otherwise, it is rejected. This is because σp is the standard deviation of a normal random variable.

In both cases, we have to

calculate the distance from each point (point of the simulated or of the real DEM) to the plane of the corresponding triangle.

evaluate, using the PDEM and the standard devia-

tion of the DEM in altimetry (σz) and in the horizontal plane (σx,y). In the case of the simulation, their values should be similar to the ones already used.

Note 1: Knowledge of the γ (in (5)) angle makes it possible to select an R-DEM with more or less unevenness.

Note 2: For each one of the planes having the direction d given by the normal unitary vector nk in question, the distance from the point Mj, (j = 1...J < = I) to the corresponding plane is a measure (in the direction d) of the error ed.

DISCUSSION

Evaluating the horizontal error is the most delicate task in the geometric quality control of a geographical database. When the unevenness of the R-DEM surface is insufficient, there is not enough information to evaluate the horizontal error; the assessment of the horizontal error will not be reasonably precise. With enough unevenness, the horizontal error can be comfortably esti-

268

TEAM LinG

Geometric Quality in Geographic Information

mated. The elevation error estimation is better than the horizontal one. The quality of the horizontal error assessment will depend on the unevenness of the reference surface as well as on the dimension of the sample (Zelasco, 2002).

With simulations, the error precision values can be calculated, but with a real DEM, the error precision is unknown (it can only be estimated). Applying the PDEM, two values are obtained: the elevation error and the horizontal error of a DEM. With these estimated values (in elevation and in the horizontal plane) simulations may be performed to evaluate the standard deviation of the estimated errors. These standard deviations determine whether to keep or to reject the estimated errors—particu- larly in the case of the horizontal error.

FUTURE TRENDS

It would be quite interesting to consider the effects of different error values for the x and y coordinates. It might also be of interest to study how to modify the proposed method, taking into account the correlation between the error in elevation and the error in a particular direction of the horizontal plane. In this way, the PDEM may be extended to the case of the Interferometry SAR.

CONCLUSION

Estimating the elevation error (σz) by means of the commonly used vertical distance estimator gives poor results with an uneven surface. Using the PDEM, the elevation error evaluation will almost always be very good. The horizontal error evaluation (σx,y) will be satisfactory when the unevenness and the number of points of the sample are appropriate.

REFERENCES

Abbas, I. (1994). Base de données vectorielles et erreur cartographique: Problèmes posés par le contrôle ponctuel; Une méthode alternative fondée sur la distance de Hausdorff: Le contrôle linéaire. Thèse, Université Paris 7.

Abbas, I., Grussenmeyer P., & Hottier, P. (1994). Base de Données géolocalisées: Estimation de l’erreur cartographique: Une méthode fondé sur la distance Hausdorf: le contrôle linéaire. Remarques sur le contrôle ponctuel. Notes de cours, ENSG IGN, France.

Dunn, R., Harrison, A. R., & White, J. C. (1990). Positional accuracy and measurement error in digital databases of G land use: An empirical study. International Journal of Geographic Information Systems, 4(4), 385-398.

Grussenmeyer, P., Hottier P., & Abbas, I. (1994). Le contrôle topographique d’une carte ou d’une base de données constituées par voie photogrammétrique. Journal XYZ 59, Association Française de Topographie, 39-45.

Guptill, S. C., & Morrison, J. L. (Eds.). (1995). Elements of spatial data quality. Oxford, UK: Elsevier.

Habib, M. (1997). Etude, par simulation, de la précision altimétrique et planimétrique d’un MNT obtenu par corrélation d’images spatiales (capteur à balayage)— Précision d’un MNT régulier—Erreur de rendu d’un semis régulier—Description d’une structure de MNT régulier. Thèse, Univerisité Paris 7.

Harvey, F. (1997). Quality needs more than standards. In M. Goodchild & R. Jeansoulin (Eds.), Data quality in geographic information: From error to uncertainty (pp. 37-42). Paris: Hermès.

Hirano A., Welch R., & Lang, H. (2002). Mapping from ASTER stereo image data: DEM validation and accuracy assessment. Photogrammetry and Remote Sensing, 57,

356-370.

Hottier, P. (l996a). La méthode du contrôle linéaire en planimétrie - Propositions pour une typologie des détails anormaux. Rapport de recherche, ENSG, IGN, France.

Hottier, P. (1996b). Précis de statistiques. ENSG, IGN, France.

Hottier, P. (1996c). Qualité géométrique de la planimétrie. Contrôle ponctuel et contrôle linéaire. Dossier: La notion de précision dans le GIS. Journal Géomètre 6, juin, Ordre des Géomètres-Experts Français, 34-42.

Lester, M., & Chrisman, N. (1991). Not all slivers are skinny: A comparison of two methods for detecting positional error in categorical maps. GIS/LIS, 2, 648-656.

Laurini, R., & Milleret-Raffort, F. (1993). Les bases de données en géomatique. Paris: Editions HERMES.

Ubeda, T., & Servigne, S. (1996, September). Geometric and topological consistency of spatial data. Proceedings of the 1st International Conference on Geocomputation.

(Vol. 1, pp. 830-842). Leeds, UK.

Zelasco, J. F. (2002). Contrôle de qualité des modèles numériques des bases de données géographiques. Journal XYZ 90, Association Française de Topographie, 50-55.

269

TEAM LinG

Соседние файлы в предмете Электротехника