Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
1блок 1-25 пит.docx
Скачиваний:
8
Добавлен:
12.09.2019
Размер:
153.78 Кб
Скачать

14. The notion of lexicon in Computational Lexicography

Contemporary linguistic theories are now emphasizing an ever-greater reliance on the lexicon, because the lexicon may be viewed as the central repository of linguistic knowledge. For the computational linguist, the lexicon is the ‘bottleneck’ of natural language processing systems. This includes attempting to manipulate machine-readable versions of printed dictionaries and transforming them into computational lexicons.

At early days the lexicon was equated merely as ‘a dictionary, a book teaching the signification of words’. Nowadays, the lexicon is generally understood as ‘the vocabulary of a language, especially in dictionary form, offering various types of linguistic information.’ D. Crystal also called lexis.

  • George Grimes describes the lexicon as ‘simply the totality of all the information there is about words and word-like objects in a natural language, it registers items and their properties in contrast to the grammar, which registers combinations of items and their properties’(1988).

  • Paul Bennett make the distinction between a grammar (i.e. ‘a set of rules for the formation of meaningful and well-formed sentences’) and a lexicon (i.e. ‘a set of words and expressions whose use is governed by those rules’).

Grimes definition for the lexicon is interesting since it raises the question of weather a ‘theory-neutral’ lexicon is possible to create. His definition also concerns the problem as to what a lexicon should contain, since individual lexicons will have their own specification, depending on the purpose for which they were built.

  • A more recent definition was suggested by J. Mel’cuk (1992). He views the lexicon as ‘a specific list of lexical units of a language, arrange in a specific way and supplied with specific information , the whole being designed for a specific purpose’.

  • Bloomfield’s definition: within the framework of American structural linguistics the lexicon was treated as a peripheral.

  • Chomsky’s definition: the lexicon was conceptualized as an independent component in linguistic theory by Noam Chomsky. However in his theory lexical facts were not only said to be a different type from general facts, but the lexicon was still viewed as a ‘wastebin’.

Chomsky suggest the differentiation between “Internalized language” (I-language) i.e. mental knowledge of the language , assuming that this occurs in a homogeneous speaker-hearer community (also called language competence) and “Externalized language” (E-language, also called linguistic performance), i.e. everyday speech and writing (newspaper, televised speeches and dialogues etc.).

The relation between the lexicons of the E- and I-language may be formulated in terms of an Associative Lexicon, suggested by Makkai in 1980. AL is an information retrieval system that represents in visual and audible form the knowledge native speakers possess about the lexis of their language. The human brain is the primary ‘information retrieval system’ activating our ability to associate lexemes with one another. Any artificial system we may build must, therefore, try to do justice to what there is in human sociopsychological reality. The natural ALs we carry in our heads are dialectically and sociolinguistically limited.

AL represents the cumulative knowledge of most available geographic and sociological dialects. Its indicates that members of various speech community have the capability of learning from one another either by memorization or by immigration.

The differences between a printed dictionary and AL is that:

-conventional dictionaries tend to form natural semantic nets around concretely observable and abstract entities, while AL aims at building associative groups of lexems.

- conventional dictionaries rely traditionally on alphabetization by which they try to present a totality of the available lexis in the form of a list, while AL represents the sets of lexems according to their frequency to use.