Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
11
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

140

Database Engineering Focusing on Modern Dynamism Crises

Luiz Camolesi, Jr.

Methodist University of Piracicaba, Brazil

Marina Teresa Pires Vieira

Methodist University of Piracicaba, Brazil

INTRODUCTION

Researchers in several areas (sociology, philosophy, and psychology), among them Herbert Spencer and Abraham Maslow, attribute human actions resulting in continual environmental changes to the search for the satisfaction of individual and collective needs. In other fields of science, this behavior represents a challenge in ethical researches on concepts, methodologies, and technologies aimed at optimizing and qualifying the actions involved in these continual changes to obtain better results.

Specifically in computer science, software engineering is a critical sub-area for these researches and their application (Lehman & Stenning, 1997), since it involves the construction of models and orientation for their use in the development of resources, such as software, to support the user’s needs (Perry & Staudenmayer, 1994). Databases are included in this context as a component for data storage.

Considering the premise of continuous changes (Table 1) and the human needs involved (Khan & Khang, 2004), the consequences for software and for the database used are obvious. In the field of computational science, these changes in the modern world are reflected in evolutionary features for software and databases (Brereton, Budgen & Bennet, 1999), based on database concepts, structures, and processes that allow for rapid, albeit not traumatic, shifts to new industrial, commercial, or scientific systems (Mcfadden, Hoffer & Prescott, 1999) in new contexts (temporal scenarios) (Camolesi, 2004).

Table 1. Types of changes

BACKGROUND

Database models must comprise representation elements that are adaptable to the user’s varying and dynamic needs, and contain the taxonomy needed for their manipulation. Thus, traditional (generic) database models such as the Entity-Relationship (ERM) and Relational (RM) models (Siau, 2004) have been expanded with appropriate “profile” for specific applications and requirements. Considering their purpose of supporting changes, “profile models” can be easily referenced in scientific researches as:

Version Model: considering versions as database objects derived (originating from, but containing alterations) from others, models of this profile must be applied to a database characterized by the explicit and voluntary storage of the historical information about object changes (Conradi & Westfechtel, 1998). The features frequently specified in versions models are:

Derivation Structure: establishes the data structure for organizing versions, for example, stack, tree, or not cyclic digraph, linked by special relationships representing linear or nonlinear derivation actions;

Versionable Element: establishes which variant elements (database objects) can have versions created and represented in the database;

Property of Versioning: this is a feature that serves to define a versionable element. This

Evolution: actions for an (variant) element’s technological progress, improvement, modernization, or correction.

Revolution: alteration actions of an element can influence the element’s purpose in the context.

Involution: simplification actions of an element, regression in its conception or content.

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Database Engineering Focusing on Modern Dynamism Crises

property must be dynamically established for each element during either its creation or its definition;

Version Status: status set (state or situation) for the versions;

Manipulation of Versions: the creation, update, and deletion of versions can be accomplished implicitly either by the database system or by the user through specific command language;

Operation Restrictions: the manipulation of versions by users can be granted unconditionally, or restrictions may be imposed for each operation (create, update, and delete).

Time Model: in models of this profile, the elements that represent the dimension time are established essentially to control the evolution of the database. The reliability of such time-based models depends on the unambiguous definition of temporal limits to be imposed in any business or scientific database system. In an evaluation of systems using the time representation, database designers find many variations and ambiguous representations that can degenerate the processing of the time value simply because they are ignorant of how time is represented, how it can be analyzed, or how it should be converted. Using a Time Model for the homogeneous representation of the data type time and interval enables the designer to improve the evolution control performance. Based on many researches about time representation and utilization (Allen, 1991; Bettini et al., 1998), the following features are identified for a homogeneous data type definition:

Moment: a time instant value;

Granularity: precision domain of time instant, this feature can be based on the ISO 8601 (International Organization for Standardization, 2000) standard, for example, PnYnMnDTnHnMnS, or any other standard established by the application;

Orientation: reference system for temporal representation, for example, Gregorian calendar (UTC or Coordinated Universal Time), Chinese calendar, Jewish calendar or others;

Direction: all orientation has a moment of origin (0), and a time may be the moment preceding or following this origin moment;

Application: specification of the use of the temporal representation, allowing for the semantic recognition of the type, independently of the context in which it is inserted. The at-

tribute application should indicate one value as: Occurrence: to specify a moment (time or inter- D val) to carry through an action (either a past situation or a future one); Duration: using time

or interval to specify the duration of an action (either a past situation or a future one); Frequency: to specify a moment (time or interval) used to record repetitions.

Configuration Model: this “profile model” is based on and related to the version model, with version aggregations defined as configurations or releases (Conradi & Westfechtel, 1998). The configurations or releases are logical aggregation components or artifacts, selected and arranged to satisfy the needs of applications based on composition abstractions (Sabin & Weigel, 1998). Applications that require the composition of database objects are related with engineering, that is, Computer Aided Software Engineering (CASE) and Computer Aided Design (CAD);

Integrity Model: required in all data models (RM, OOM, and ORM), elements are established in models of this profile to support data consistency and integrity. In certain typical databases, the number of constraints and actions for database integrity may involve hundreds of elements, which require periodical reviews since they are strictly related to continually changing real situations. The elements in these models vary in form and purpose, the most common being rules, business rules (Date, 2000) and database constraints (Doorn & Rivero, 2002; OMG,2003);

User Model: necessary in all data models (RM, OOM, and ORM), the elements that represent the users are established in models of this profile. The modeling of user features is critical in the evolution of a database because it involves a diversity of needs and changes in the operational behavior (insert, delete, alter, and select) in the database. The modeling of human behavior to design access privileges is a well-consolidated analytical process (Middleton, Shadbolt & Roure, 2004). However, this process must take into account the static and dynamic aspects of people and their activities (Perry & Staudenmayer, 1994). Analyses based on a static approach define User Roles that are appropriate for traditional applications, whose activities change with relative infrequency. In dynamic applications, however, static definitions are insufficient, since the requirements call for temporary and specific activities that are necessary to support the dynamic definition of User Roles.

141

TEAM LinG

Database Engineering Focusing on Modern Dynamism Crises

Database engineering involves database models and profile models applied in development methodologies (Elmasri & Navathe, 2003), that is, a collection of correlated techniques arranged in a logical order, with orientation rules for the materialization of an objective. The objective of a database engineering methodology should be the creation of an optimized database (based on ORM or OOM) flexible to changes implemented through well-ex- ecuted engineering phases (elicitation of requirements, viability analysis, design, testing, and implementation), particularly in the design stages (conceptual, logical, and physical) in which database models and profile models are used.

The logical order of a methodology established by a database designing or engineering group should reflect the group’s level of maturity and its knowledge of the work context, that is, the group’s dedication to the design, which can be based on two distinct methodologies:

Sequential Engineering: traditional methodology for database engineering in which the phases are executed linearly and can be based on the bot- tom-up approach, that is, from details of the data requirements (attributes) to the recognition of elements (entity or objects). In top-down sequential engineering, the elements are first identified and then refined in detail;

Concurrent Engineering: methodologies initially created for conventional areas of engineering (mechanical and electrical), they establish the simultaneous development of a “product” through the cooperation of designing groups working separately but in constant communication and exchange of information (Carter & Baker, 1992). Concurrent (or Simultaneous) Engineering has been adapted to the broad and complex process of software engineering, but in this case, systems may be required to support the cooperative work among project groups, such as Groupware software, researched in the CSCW (Computer Support Cooperative Work) area.

The methodology may also depend on the stage of the database’s life cycle (creation, implementation, maintenance). Database Maintenance supervised by Administration (DBA) is motivated solely by the system’s degenerating performance resulting from the constant modernization (restructuring) of the database (Ponniah, 2003). If degeneration is not prevented, it can lead to increasing information access costs, rendering the use of a database unfeasible.

In some cases, the database maintenance operations are insufficient or not sufficiently adapted to maintain the database qualities. This is usually the case when mainte-

nance and documentation updates in a database have been neglected over a long period. Thus, evolution and change-oriented database engineering should be:

Reverse Engineering: methodologies to recognize and represent the structural, functional, and behavioral aspects of an already operating database (Blaha & Premerlani, 1998). This mode of engineering focuses on the process steps needed to understand and represent a current database, and must include an analysis of the context in which the database was engineered. Depending on the case, the database should be treated as a “black box” to recognize its data, or physically analyzed to identify special data structures and storage solutions;

Reengineering: methodologies for converting databases in obsolete environments to more modern technologies. Reengineering introduces techniques for the restudy and conversion of obsolete data to new realities and the resulting redesign of databases (Sage, 1995). Reengineering can com-

prise three phases: Reverse Engineering, Delta (∆), and Forward Engineering. Reverse Engineering involves the abstract and clear definition of data representations. In the Delta phase, the designer executes modifications in database sys-

tems to incorporate new functionalities (positive ∆) or to reduce them (negative ∆) in order to accomplish complete or partial implementations. The third and last phase, Forward Engineering, refers to the normal development of database systems following the natural stages.

MAIN THRUST

The profile models were created or inserted in ObjectOriented Models (OOM); however, the creation and adaptation of the Object-Oriented paradigm to traditional data models enabled profile models to be widely applied, for example, in Unified Model Language (UML) (Naiburg & Maksimchuk, 2002) and in Object-Rela- tional Models (ORM). Although not described in the current literature as profile models, these models have well-defined purposes with a variable degree of flexibility in relation to the generic database models.

Though still incipient, the composition of these profile models for use in database engineering allows a database to be defined with features which are essential for its changeability, characteristic associated to quality features listed in Table 2, allowing rapid changes with the smallest possible impact on the database and its users.

142

TEAM LinG

Database Engineering Focusing on Modern Dynamism Crises

Table 2. Features to evolution

D

Traceability: The monitoring is an essential task for efficiency tracing of the designer and team jobs, aiming at better evaluation of production (modeling and engineering). To this, the influence (OMG, 2001) among elements should be inserted in the project to allow database engineers to recognize automatically which are the consequences of the alterations accomplished in relation to all the elements of a system (Ramesh & Jarke, 2001). These consequences can be characterized as inconsistencies of data, what demand the revision of all the reached elements, directly or indirectly, for the accomplished alterations. The correct definition of the influences allows optimizing the actions of inconsistencies verification and consequently the reduction of the maintenance costs.

Flexibility: Facility degree to changes with the smallest possible impact on the project and the database user. Models used and project accomplished must be capable to support new requirements (Domingues, Lloret & Zapata, 2003).

Portability: The necessities to transfer the database to new environments (Operating Systems or DBMS), new technologies (programming language), or interfaces can be a demand for the evolution of a database. To this, the database modeling accomplished must be capable to support these changes with low impact.

Compatibility: Facility degree to insert new elements in database model. To this, the database modeling must obey the rigid standards of concepts and techniques to accept specifications or components of the same standard.

Proving the importance of changeability, are the emergent frameworks and patterns searching to unify and to integrate the profile models in database engineering.

FUTURE TRENDS

Database development methodologies, models, and processes are constantly updated to support the changing requirements that occur during the recovery processes of requirements or maintenance (Paton & Diaz, 1999). With theses scenarios, the status of traditional database design shifted from a complex software engineering phase to that of engineering (Roddick et al., 2000). Thus, database engineering today serves as the foundation for the development of data modeling adapted to frequent changes in dynamic requirements established by the user.

The recognizing of needs for Database Evolution is a more complex task (Table 3) that involves questions such as: What should be altered? When should it be executed?

How should it be executed? What are the benefits? What is the cost of this process? Who will be affected?

These issues have motivated researches on emergent topics such as:

Database Administration Policy or Database Maintenance Policy (Moore et al., 2001) with the definition of data-driven rules to management of expired solutions and innovation of models;

Aspect-Oriented Database Engineering with new perspectives on evolution based on separation of concerns related to evolution. The aspects modeling (Table 2) supports the non-functional properties of database and the identification of influences, using specific languages to represent profile models;

Database Modeling Languages and Database Engineering Tools with resources to assure the Features to Evolution (Table 2) in database engineering, not forgetting that the choice of degree evolu-

Table 3. Database maintenance operations

Correction of the Database Schema in response to problems found during the database operation phase.

Integration of Database Schemas for the creation of a single database.

Adaptation of the Database Schema to the new elements resulting from new requirements. Updating of the schema elements to conform the changing user requirements and business evolution.

Refinement of the elements of the Database Schema that were insufficiently detailed during the design phase.

Incorporation of several databases into a “Federated Database”, maintaining the independence of each individual database.

143

TEAM LinG

Database Engineering Focusing on Modern Dynamism Crises

tion depends on the maturity designer team and stability (or instability) of users’ requirements.

CONCLUSION

As can be seen from the above descriptions, Database Evolution is a vast subject under constant discussion and innovation, which explains why so many subjects require analysis, particularly those involving profile models. Despite specific researches on this theme, pragmatic interest in the process of Database Evolution is not usually reflected in these researches due to the lack of details or progress achieved.

The difficulties encountered in many researches may be attributed to the theme’s complexity. The evolutionary process can be defined by many variations in form: voluntary or involuntary; explicit or implicit; temporal or atemporal; recorded or not in databases in historical form; executed through a simple update of information or involving the physical or logical restructuring of the database; following predefined semantic rules; originating from the user (through an interface) or from the DBMS (Database Management System).

Database Administrators (DBA) should clearly indicate that the process of changes in a database, regardless of the characteristics and techniques utilized, should maintain or expand the database’s adaptability, flexibility, and efficiency (Domingues et al., 2003) to conform to new technologies and to the company’s growth-re- lated requirements, without neglecting the crucial problem of application adaptations (Hick & Hainaut, 2003).

REFERENCES

Allen, J.F. (1991). Time and time again. International Journal of Intelligent Systems, 6(4), 1-13.

Bettini, C., Dyreson, C.E., Evans, W.S., Snodgrass, R.T., and Wang, X.S. (1998). A glossary of time granularity concepts. Lecture Notes in Computer Science, 1399, 406413.

Blaha, M., & Premerlani, W. (1998). Object-oriented modeling and design for database applications. Indianapolis, IN: Prentice Hall.

Brereton, P., Budgen, D., Bennett, K., et al. (1999). The future of software. Communication of ACM, 42(12), 7884.

Camolesi, L. (2004). Survivability and applicability in database constraints: Temporal boundary to data integrity scenarios. Proceedings of V IEEE International Con-

ference on Information Technology: Coding and Computing, (Vol. 1, pp. 518-522).

Carter, D.E., & Baker, B.S. (1992). CE: Concurrent engineering. Boston: Addison-Wesley.

Conradi, R., & Westfechtel, B. (1998). Version models for software configuration management. ACM Computing Surveys, 30(2), 232-282.

Date, C.J. (2000). What not how: The business rules approach to applications development. Boston: AddisonWesley.

Domingues, E., Lloret, J., & Zapata, M.A. (2003). Architecture for managing database evolution. Lecture Notes in Computer Science, 2784, 63-74.

Doorn, J.H., & Rivero, L.C. (2002). Database integrity: Challenges and solutions. Hershey, PA: Idea Group Publishing.

Elmasri, R., & Navathe S.B. (2003). Fundamentals of database systems (4th ed.). Boston: Addison-Wesley.

Hick, J.M., & Hainaut, J.L. (2003). Strategy for database application evolution: The DB-MAIN approach. Lecture Notes in Computer Science, 2813, 291-306.

International Organization for Standardization. (2000).

ISO 8601: Data elements and interchange formats - information interchange - representation of dates and times. Technical Committee ISO/TC 154.

Khan, K., & Khang, Y. (2004). Managing corporate information systems evolution and maintenance.

Hershey, PA: Idea Group Publishing.

Lehman, M.M., & Stenning, V. (1997). Laws of software evolution revisited. Lecture Notes in Computer Science, 1149, 108-124.

McFadden, F.R., Hoffer, J.A., & Prescott, M.B. (1999).

Modern database management. Boston: AddisonWesley.

Middleton, S.E., Shadbolt, N.R., & de Roure D.C. (2004). Ontological user profiling in recommender systems.

ACM Transactions on Information Systems, 22(1), 5488.

Moore, B., Ellesson, E., Strassner, J., & Westerinen, A. (2001). Policy core information model. Version 1 Specification, Internet Engineering Task Force.

Naiburg, E.J., & Maksimchuk, R.A. (2002). UML for database design. Boston: Addison-Wesley.

Object Management Group (OMG). (2001). SPEM: Software process enginnering metamodel.

144

TEAM LinG

Database Engineering Focusing on Modern Dynamism Crises

Object Management Group (OMG). (2003). OCL: Object constraint language. Version 2.0.

Paton, N.W., & Diaz, O. (1999). Active database systems.

ACM Computing Surveys, 31(1), 63-103.

Perry,D.E.,&Staudenmayer,N.A.(1994).People,organiza- tionsandprocessimprovement.IEEESoftware,11(4),36-45.

Ponniah, P. (2003). Database design and development: An essential guide for IT professional. Indianapolis, IN: Wiley-IEEE Press.

Ramesh, B., & Jarke, M. (2001). Toward reference models for requirements traceability. IEEE Transaction on Software Engineering, 27(1), 58-93.

Roddick, J.F., et. al. (2000). Evolution and change in data management: Issues and directions. ACM SIGMOD Record, 29(1), 21-25.

Sabin, D., & Weigel, R. (1998). Product configuration framework: A survey. IEEE Intelligente Systems, 13(4), 42-49.

Sage, A.P. (1995). Systems engineering and systems management for reengineering. The Journal of Systems and Software, 30(1/2), 3-25.

Siau, K. (2004). Advanced topics in database research

(Vol. 3). Hershey, PA: Idea Group Publishing.

KEY TERMS

Business Rule: Expression or statement that represents a restriction of data or operations in a business domain. A collection of Business Rules is a behavioral guide to support the Business Policy (organizational strategy). Business Rules can be implemented as Database Constraints, depending on the form and complexity of the restriction (Date, 2000).

Configuration: Collection of versions of different elements that make up a complex element. Versions in a configuration must be stable; that is, they cannot be altered but can be changed by a new version of the same element. Configurations are abstractions representing semantic relationships among elements. Configurations serve to establish the scope of an application (temporal, spatial, or user) (Sabin & Weigel, 1998). Revised Configurations consolidated by users can be defined as Releases. Versions of a release cannot be altered or changed by another version. Configurations defined as releases cannot be deleted because they are or were used. User identifiers, valid intervals for use, and constraints are some important items of information associated with releases in databases.

Database Constraints: Boolean functions associated

with elements of a database used to evaluate the integrity D of actions or data that are inserted, removed, or updated.

A Database Constraint can be defined as a set of predicates P1 P2 ... Pk, where each predicate possesses the form C1θC2, and where C1 is an attribute, θ is a comparison operator, and C2 is either an attribute or a constant (Camolesi, 2004). The specification of a database constraint can be formalized using the Object Constraint Language (OCL) (OMG, 2003).

Database Maintenance Policy: Policy can be defined as a plan or guide to constrain the actions executed by an individual or a group (Moore et al., 2001). The Database Maintenance Policy establishes rules for the action of database designers that are executed in response to intrinsic and constant maintenance-related alteration needs, based on the designers’ empirical knowledge and work capacity. The definition of goals, implementation phases, results, and desired outcomes is information associated with every policy or subset of rules.

Influence: Type of dependency association to conceptually and logically represent interrelations among elements and to define their levels of involvement prior to a process of alteration of the data (OMG, 2001). Influence modeling is based on impact specifications of modifications in database objects during the engineering process.

Temporal Scenario: A concept that adequately represents and includes the characteristics typical of evolution and is used in adaptive, dynamic, or flexible systems. The role of temporal scenarios is to represent situations in which the database requirements have been or will be modified (Camolesi, 2004). Every scenario reaches a moment in time when its existence begins (opening). Starting from this moment, the actors begin to perform (acting). Scenarios may reach a moment in time when they cease to exist (closing). Scenarios can open/close several times (recurrent scenarios). The definition of the opening and closing moments is obligatory for the specification of the temporal existence of temporary scenarios. The exceptions to this rule are the permanent scenarios, which do not close, and the ones that do not open.

Time and Interval (datatype): Fundamental datatypes to establish the temporal reference in a database. Time is symbolized by real numbers, based on a sequence of representative values to meet the application (Allen, 1991). Interval is an aggregation of two time moments intended to delimit and characterize the interval (Allen, 1991). The interval may represent continuous or discrete time.

145

TEAM LinG

Database Engineering Focusing on Modern Dynamism Crises

User Role: Collection of structural and behavioral characteristics (i.e., privileges) established by the DBA, DBMS, or applications for use in dynamic or static privileges. Dynamic user roles are typically employed in flexible and evolutionary systems. User profiling refers to typical knowledge-based modeling to recognize user roles, based on questionnaires and interviews to identify behavioral patterns (Middleton et al., 2004).

Variant: Applied to many structural elements of a database (class, table, constraint, etc.), it is used to define mutable elements or elements whose modification is highly probable. An invariant is the opposite of a variant, that is,

an element not involved in the evolution of the database. Variants of database objects or object properties can be modified and can generate versions.

Version: An alternative of a database element, with variations in structure, function, or behavior, used in the context of an application as a boundary of the work executed. A version can represent a state of an evolving element with variant and invariant properties (Conradi & Westfechtel, 1998). A collection of actions executed on a database element can generate an element version if it results in significant alterations of the element’s characteristics.

146

TEAM LinG

 

147

 

Database Query Personalization

 

 

 

D

 

 

 

 

 

Georgia Koutrika

University of Athens, Greece

INTRODUCTION

Traditional database and information retrieval systems have followed a query-based information access paradigm (i.e., information is returned to the user on the basis of a query issued). As a result, users issuing the same query are provided with the same answer. With the advent of the World Wide Web and hand-held electronic devices such as palmtops and cellular phones, information access entered a new era. Increasing amounts of information become available to a growing mass of untrained lay users through various access media. A user searching Web-resident information may have to reformulate queries issued several times and sift through many results until a satisfactory, if any, answer is obtained. As purely query-driven approaches may be inappropriate in this context, the need for a shift towards a more user-centered information access paradigm arises. To this end, different approaches aim to the personalization of the overall user experience at different levels: content selection, content presentation, and user interaction. There is no generally accepted definition of personalization, so I adopt a broad one as follows:

Personalization is the approach of providing an overall customized, individualized user experience by taking into account the needs, preferences, and particular characteristics of a user or group of users.

Focusing on the level of personalized content selection, several distinct lines of research exist. There are two broad categories: filter-based and personalized approaches.

Filter-based approaches filter system responses on the basis of a user profile, storing long-term user interests. In particular, information filtering methods employ profiles comprised of keywords (Foltz & Dumais, 1992). Anatagonomy is a representative example of customized Web-based newspapers employing information filtering methods (Sakagami, Kamba, & Sugiura, 1997). Web-acces- sible databases employ continuous queries to allow users to obtain new results from the underlying collection or stream without issuing the

same query repeatedly (Chen, DeWitt, Tian, & Wang, 2000; Liu, Pu, & Tang, 1999). User preferences are also used for providing recommendations (Karypis, 2001; Shahabi, Banaei-Kashani, Chen, & McLeod, 2001), and for ranking search results (Glover, Lawrence, Birmingham & Lee Giles, 1999; Smyth, Bradley, & Rafter, 2002).

It is worth noting that query-based and filter-based approaches have been characterized as two sides of the same coin (Belkin & Croft, 1992). Filter-based approaches deal only partially with the problem of information overload. This problem still haunts Web searches in which users may issue various queries expressing different information needs.

Personalized approaches are based on the observation that different people find different things relevant; therefore, they may expect different answers to the same query. Consider a simple example: John and Ann access a Web-based movies database, searching for comedies. John is a fan of director W. Allen, and Ann is not. Traditional query-based systems would consider only the query issued and return the same, exhaustive list of comedies to both users. Focusing on the user enables a shift from what is called consensus relevancy in which the computed relevancy for the entire population is presumed relevant for each user, toward personal relevancy in which relevancy is computed based on each individual’s characteristics (Pitkow et al., 2002). Personal agents and query personalization approaches belong here. Personal agents represent virtual assistants that learn user preferences by dialoguing in natural language with customers. These preferences are used for the formulation of personalized queries and generation of individual recommendations (André & Rist, 2002; Semeraro, Degemmis, Lops, Thiel, & L’Abbate, 2003). Query personalization approaches exploit user preferences stored in profiles to dynamically enhance any user query by integrating preferences relevant to it (Koutrika & Ioannidis, 2004b; Pitkow et al, 2002). Personalized results may be ranked according to user preferences.

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Figure 1. Information access paradigms

Query-based

 

 

 

Filter-based

 

 

 

Personalized

paradigm

 

 

 

paradigm

 

 

 

paradigm

 

 

 

 

 

 

 

 

 

 

 

Query

 

 

 

 

 

 

 

 

Query

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Content

Profile

 

 

Content

 

Profile

 

 

Content

 

Selection

 

 

Selection

 

 

 

Selection

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

e.g., Traditional IR &

 

 

e.g., Information Filtering

 

e.g., Query Personalization

Databases

 

 

 

Continuous queries

 

 

 

Personal agents

Figure 1 summarizes the aforementioned information access paradigms.

BACKGROUND

Storing user preferences in user profiles gives a system the opportunity to return more focused personalized (and hopefully smaller) answers. The primary ways to personalize a search for an active searcher are query augmentation and result ranking. Returning to the previous example, John’s personalized results would include W. Allen’s comedies, and Ann’s would not. Which preferences are relevant to a specific request and how they affect the final answer are dynamically determined based on the query, the profile, and the personalization philosophy adopted. For example, when Ann is searching for a theatre to go to, the system should also consider her preference for downtown theatres. Query personalization approaches have recently attracted interest in both information retrieval and databases research communities.

Outride is a personalized IR system (Pitkow et al., 2002) that exploits user profiles defined upon the ontology of the open directory project (ODP). For query augmentation, the similarity between the query term and the user model is computed to decide which, if any, of the stored keywords are relevant to and should be included in the query. For example, if an individual is interested in coffee information and then searches for java, the system may dynamically augment the query by adding the relevant term coffee to provide the user with results about Java coffee. If another user interested in programming searches for java, the system may augment the query by adding the term programming or language to provide this user with results about the Java programming language. To rank search results based upon the user profile, metadata from the pages are compared via vector methods against the profile.

Another personalized IR system is described by Liu, Yu, and Meng (2002). A query is mapped to a set of categories stored in a user profile. A category is comprised of a set of terms with weights. A term weight

Database Query Personalization

reflects the significance of the term in representing the user’s interest in that category.

Koutrika and Ioannidis (2004b) have implemented a first personalized database system and provide a query personalization framework that specifies which preferences from a profile should affect a query issued. Based on this framework, the top-K preferences are integrated into the query producing one that returns results satisfying at least L of them. Preferences are stored as degrees of interest in atomic query elements. Consider a user that accesses a movies database looking for films released in 2004. The corresponding SQL query could look like this:

Select title from MOVIES where year = 2004

Assuming the user is interested in comedies, the system may execute a personalized query that could look like this:

Select title from MOVIES, GENRES where MOVIES.id = GENRES.id

and year = 2004 and genre = “comedy”

The aforementioned efforts have shown that the benefits of personalized search can be significant, appreciably decreasing the time it takes people to find information. In the following section, I focus on personalization of database queries.

DATABASE QUERY

PERSONALIZATION

A personalized database system keeps a repository of user profiles (Koutrika, 2003). Information in profiles is either inserted explicitly by the user or collected implicitly by monitoring user interaction with the system (profile creation). When a new query is issued, query personalization proceeds in three stages: (a) preference selection (preferences relevant to the query are extracted from the user profile); (b) preference integration

Figure 2. Personalized database system

User input

User query

Results

 

Preference Selection

Profile Creation

Preference Integration

 

Personalized Query Execution

 

 

Query Personalization

User Profiles

 

Data

148

TEAM LinG

Database Query Personalization

Table 1. A summary of query personalization issues

 

 

 

 

 

D

 

 

 

 

 

 

 

 

 

 

User Preference Modeling

Preference Elicitation

 

 

 

Expressing user preferences in queries

Algorithms for the (semi-) automatic derivation

 

 

and storing user preferences in profiles

of user preferences

 

 

Preference Query Languages

Query Personalization Logic

 

 

Languages that allow for the expression

Specifying how a query and a profile are

 

 

of user preferences in queries

combined in order to provide personalized

 

 

 

answers

 

 

Preference Combination and Ranking

 

 

 

 

Ways in which preferences are combined.

Query Personalization Algorithms

 

 

Ranking results based on the preferences

Algorithms for the implementation of query

 

 

satisfied

personalization based on a preference model

 

 

 

and specific logic

 

 

 

 

 

 

(these preferences are integrated into the query producing one that will be executed); and (c) personalized query execution (the personalized query is executed and returns results ranked based on their interest). The general architecture of such a system is depicted in Figure 2. Towards personalized database systems, several challenging issues need to be addressed. Table 1 summarizes many of them.

User Preference Modeling and

Preference Query Languages

There is a plethora of preference types: conditional and unconditional, positive and negative, atomic and aggregate, and so forth. Therefore, an effective and efficient preference model has to combine expressivity and concision. That information stored in database systems is structured as opposed to unstructured information stored in information retrieval systems lends itself to specification of more expressive user models than simple key- word-based ones used in the latter systems.

User preferences and profiles have recently attracted a broader interest in the database community. Database research approaches fall into two categories: those dealing with the expression of preferences at the query level and those dealing with the representation and storage of preferences in user profiles.

Database research approaches dealing with the expression of preferences at the query level may be qualitative or quantitative. In the qualitative approach, preferences between tuples in the answer to a query are specified directly, typically using binary preference relations. Two frameworks have been independently proposed, in which preference relations are defined using logical formulas (Chomicki, 2003) or special preference constructors (Kießling & Köstler, 2002). Preference relations are embedded into relational query languages through a relational operator that selects from its input the set of the most preferred tuples. This operator is variously called winnow (Chomicki, 2003), BMO (Kießling & Köstler, 2002a). Kießling and Köstler (2002b)

propose a structured query language, Preference-SQL that is an extension of SQL for expression and handling of user preferences. In the quantitative approach, preferences are specified indirectly using scoring functions that associate a numeric score with every tuple of the query answer. A framework for using and combining quantitative preferences in queries has been proposed (Agrawal & Wimmers, 2000). Several efforts have focused on algorithms for efficiently answering top-K queries (Bruno, Chaudhuri, & Gravano, 2002; Chaudhuri & Gravano, 1999; Hristidis, Koudas, & Papakonstantinou, 2001). On the other hand, models for representing and storing user preferences in profiles have been recently presented (Koutrika & Ioannidis, 2004a, 2004b).

None of the aforementioned proposals encompasses all possible preference notions. The difference in foci, as well as variations in terminology, makes the results obtained in one area difficult to use in another. This situation calls for specialized and collaborative research towards preference models that are appropriate for expression of preferences at the query level and the formulation of profiles in a unifying manner.

Preference Combination and Ranking

Different preferences may be combined. What is the user interest in a combination of preferences? How are results ranked based upon the user profile? These are issues that must be addressed by a personalized database system. Koutrika and Ioannidis (2004b) propose the use of ranking functions for estimating the degree of interest in results, based on preferences satisfied. Yet, much experimentation is required towards an appropriate and intuitive solution.

Preference Elicitation

Users may manually create and update their profiles by inserting and editing information related to their preferences. Alternatively, the system may collect and store information in profiles by monitoring user interac-

149

TEAM LinG

Соседние файлы в предмете Электротехника