- •Information Technology
- •5. Find synonyms of the following expressions among the words and word combinations of the previous exercises:
- •6. Use each of Exercise 3 words/expressions in the sentences from the text.
- •Information Technology's Role Today
- •Unit 2
- •A. Comprehension
- •B. Vocabulary
- •Historical preamble
- •Unit 3 Computer
- •A. Comprehension
- •11) Circuit Imple- mentation Exercises
- •3. Make a summary of the text using the words from Vocabulary Exercises. B. Vocabulary
- •4. Give English-Russian equivalents of the following words and ex- pressions:
- •5. Find the word not belonging to the given synonymic group. Explain your choice.
- •Harvard or von Neumann?
- •Архитектура компьютера
- •B. Vocabulary
- •Unit 9 Operating System
- •Unit ю Data Conversion
- •6. Translate the words/expressions into English:
- •7. Interpret the following abbreviations:
- •8. Read the text. Give the title to it. Make an outline of the text and a one-sentence summary of each part.
- •Конвертация данных
- •Unit 11 Data Storage
- •A. Comprehension
- •Unit 12 Data Processing
- •Rocessor
- •Exercises
- •Define the term 'data processing'.
- •Explain the reference to data-processing systems as information systems, their difference.
- •Answer these questions:
- •4. Summarize the text using the words from Vocabulary Exercises.
- •5. Give English-Russian equivalents of the following words and ex- pressions:
- •6. Find the word belonging to the given synonymic group among the words and word combinations from the previous exercise:
- •Data Validation
- •Unit 13
- •Information Retrieval
- •A tipical iRsystem
- •Exercises a. Comprehension
- •B. Vocabulary
- •Performance Measures
- •History
- •Information Overload
- •Data Transmission
- •Applications and History
- •Protocols and Handshaking
- •A. Comprehension
- •B. Vocabulary
- •С. Reading and Discussion
- •Protocol
- •Unit 15
- •A. Comprehension
- •B. Vocabulary
- •Unit 16
- •B. Vocabulary
- •C. Reading and Discussion
- •History
- •Internet
- •Visualization of the various routes through a portion of the Internet
- •Internet Structure
- •A. Comprehension
- •С. Reading and Discussion
- •Voice Telephony
- •Internet Creation
- •Web Design
- •A. Comprehension
- •Glossary
- •Variable — переменная (величина)
B. Vocabulary
4. Give English-Russian equivalents of the following words and ex- pressions:
entity; размер, степень; item; трихотомия (деление на три части, на три элемента); surrogate; совмещение, наложение; inference; детализировать, уточнять; trite; (логический) вывод, умоза- ключение; iterate; элемент (данных); extent; заменять, замешать; refine; объект, категория; peg; замена; trichotomy; повторять, говорить или делать что-то еще раз; substitute; банальный, из- битый, неоригинальный; overlap; стержень.
5. Find the word belonging to the given synonymic group among the words and word combinations from the previous exercise:
^195 —
size, amount, degree, level, scope;
specify, make more exact/precise/accurate, itemize, work out in detail;
unit, thing, object, matter;
repeat, review, follow;
coincidence, combining, matching, overlay, stacking;
replace with, exchange, use instead;
stale, banal, commonplace, unoriginal;
substitute, replacement, stand-in, deputy;
element, character, cell;
core, pivot, stem, bar;
conclusion, deduction, supposition, assumption, suggestion.
C. Reading and Discussion
6. Translate the words. Read the text and answer the questions: 1) What is needed for evaluating the performance of information re- trieval systems? 2) How are documents represented for the efficiency of information retrieval? 3) What are the performance measures ?
ill-posed |
cutoff |
immanent |
fraction |
recall |
transcendent |
fallout |
set-theoretic |
tuple |
dimension |
fuzzy |
scalar value |
Performance Measures
Many different measures for evaluating the performance of infor- mation retrieval systems have been proposed. The measures require a collection of documents and a query. All common measures described here assume a ground truth notion of relevancy: every document is known to be either relevant or non-relevant to a particular query. In practice queries may be ill-posed and there may be different shades of relevancy.
Precision
Precision is the fraction of the documents retrieved that are rel- evant to the user's information need.
In binary classification, precision is analogous to positive predic- tive value. Precision takes all retrieved documents into account. It can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system. This measure is called preci- sion at n or P@n.
Note that the meaning and usage of «precision» in the field of Information Retrieval differs from the definition of accuracy and precision within other branches of science and technology.
Recall
Recall is the fraction of the documents that are relevant to the query that are successfully retrieved.
In binary classification, recall is called sensitivity. So it can be looked at as the probability that a relevant document is retrieved by the query.
It is trivial to achieve recall of 100% by returning all documents in response to any query. Therefore recall alone is not enough but one needs to measure the number of non-relevant documents also, for example by computing the precision.
Fall-Out
In binary classification, fall-out is closely related to specificity. More precisely: fall-out = 1 - specificity. It can be looked at as the probability that a non-relevant document is retrieved by the query.
It is trivial to achieve fall-out of 0 % by returning zero documents in response to any query.
Average Precision of Precision and Recall
The precision and recall are based on the whole list of documents returned by the system. Average precision emphasizes returning more relevant documents earlier.
For the information retrieval to be efficient, the documents are typically transformed into a suitable representation. There are several representations which can be illustrated by the relationship of some
196
— 197
common models. The models are categorized according to two dimen- sions: the mathematical basis and the properties of the model.
First Dimension: Mathematical Basis
Set-dieoretic models represent documents as sets of words or phrases. Similarities are usually derived from set-theoretic operations on those sets. Common models are: standard Boolean model, extended Boolean model, fuzzy retrieval.
Algebraic models represent documents and queries usually as vectors, matrices or tuples. The similarity of the query vector and document vector is represented as a scalar value.
Probabilistic models treat me process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query. Probabilistic theorems like me Bayes' meorem are often used in these models.
Second Dimension: Properties of the Model
Models without term-interdependencies treat different terms/words as independent. This fact is usually represented in vector space models by the orthogonality assumption of term vectors or in probabilistic models by an independency assumption for term variables.
Models with immanent term interdependences allow a repre- sentation of interdependences between terms. However me degree of me interdependency between two terms is defined by the model itself. It is usually directly or indirectly derived (e.g. by dimensional reduction) from the co-occurrence of those terms in the whole set of documents.
Models with transcendent term interdependencies allow a repre- sentation of interdependencies between terms, but they do not allege how me interdependency between two terms is defined. They relay an external source for the degree of interdependency between two terms. (For example, a human or sophisticated algorithms.)
7. State whether the following statements are true or false. Correct the false ones.
1. Fall-out can be looked at as the probability that a non-relevant document is retrieved by me query and is called sensitivity.
Recall is the fraction of the documents retrieved that are rel- evant to the user's information need and is not enough alone but one needs to measure the number of non-relevant docu- ments also. It is closely related to specificity.
Precision is the fraction of the documents that are relevant to the query that are successfully retrieved, analogous to positive predictive valuetiiatcan also be evaluated at a given cut-off rank.
Set-theoretic models treat me process of document retrieval as a probabilistic inference. Similarities are usually derived from set-theoretic operations on diose sets.
Algebraic models represent documents and queries usually as vectors, matrices or tuples. The similarity of the query vector and document vector is represented as a scalar value.
Probabilistic models represent documents as sets of words or phrases. Similarities are computed as probabilities diat a document is relevant for a given query. Probabilistic theorems like the Bayes' theorem are often used in these models.
Models with transcendent term interdependencies allow a representation of interdependencies between terms.
Models wim immanent term interdependencies treat differ- ent terms/words as independent. However the degree of the interdependency between two terms is defined by the model itself.
Models without term-interdependencies allow a representation of interdependencies between terms, but they do not allege how the interdependency between two terms is defined.
8. Translate the text without a dictionary.