Thesaurus: what is it. A thesaurus dictionary that is more than a dictionary

THESAURUSES. LINGUISTIC PRINCIPLES OF THESAURUS CONSTRUCTION

3.1. Thesaurus concept

Thesaurus (from the Greek θήσαϋροξ - treasure, stock) or ideographic dictionary (from the Greek idea - concept, representation, idea and grapho - write, describe) - in modern linguistics: 1) a special type of dictionary of general or special vocabulary, which contains semantic relations between lexical units; 2) a dictionary for searching for a word based on its semantic connection with other words; 3) a certain way of organizing (arranging) words in the dictionary; 4) a way of organizing the lexical composition, which allows you to economically “model the world.”

In the first, original meaning - repository, treasure, the term thesaurus was used by L.V. Shcherba in the article “Experience of general lexicography” (third opposition: thesaurus - an ordinary (explanatory or translation) dictionary). The scientist writes: “When they say thesaurus, today we most often mean “Thesaurus linguae latinae,” an enterprise of five German academies, which began back in 1900 and has so far been completed with omissions only to the letter M. Feature This type of dictionary is that they contain all the words found in given language at least once, and that under each word absolutely all quotes from texts available in a given language are given. The basis of the above opposition - thesaurus - an ordinary (explanatory or translation) dictionary - is the opposition of “linguistic material” and “linguistic system” - concepts that I tried to substantiate in my article “On the threefold aspect of linguistic phenomena and on experiment in linguistics.”

The second meaning of this term is associated with the widely known dictionary-thesaurus “Thesaurus of English Words and Expressions” by P.M. Roger (Roget's Thesaurus of English Words and Phrases, 1852) and its continuation, the dictionary of O.V. Baranov.

In this interpretation, the term thesaurus denotes a certain way of organizing and arranging the lexical composition in the dictionary (see the third meaning of the term).

The fourth meaning of the term thesaurus is associated with the universal recognition of this method of organizing the lexical composition, which allows one to economically “model the world.” From this point of view, a thesaurus dictionary is “a systematic ordering of the vocabulary of any scientific or technical field, and in the most general view- general literary vocabulary, and moreover, the entire vocabulary of a given language.”

According to Yu.N. Karaulova, a general language thesaurus, fixing in the structure and relationships of its headings, sections, zones, areas ample opportunities non-verbal connection of ideas, ensures consideration of human values.

A.N. Baranov and D.O. Dobrovolsky in the preface “From the editors” to his “Dictionary-thesaurus of modern Russian idioms” gives the thesaurus the following definition - a special type of dictionary that differs from others (in particular, explanatory, bilingual, etc.) in the way of organizing linguistic material. In the thesaurus, language units are not presented in alphabetical order, as in a regular dictionary, but are grouped based on their meaning.

L.P. Krysin calls the thesaurus (ideographic dictionary) a special kind of explanatory dictionary, a dictionary “on the contrary.” “If in an explanatory dictionary, the scientist writes, the “entry” to a dictionary entry is a word, and the content of the dictionary entry is the interpretation of the meaning of this word, then in an ideographic dictionary the “entry” is the meaning, the idea (hence the name of this type of dictionary - ideographic), and the content of a dictionary entry is a list of words expressing a given meaning. And if an explanatory dictionary is an indispensable tool for understanding a text, then an ideographic dictionary can be used in generating a text: very often a person wants to express a certain thought, but cannot find the words suitable for this; an ideographic dictionary facilitates these searches. There are two main types of thesauri:

linguistic thesaurus - a dictionary containing a list of natural language words selected as a result of meaningful analysis of texts and systematized in accordance with the accepted classification system;

statistical thesaurus - an information retrieval dictionary containing a list of words selected as a result of statistical analysis of texts on a specific topic and grouped into dictionary entries based on the frequency of co-occurrence of these words in the same texts.

Information retrieval thesauri (IRT) facilitate the search for information during its automatic processing. IPT maximally reveals the semantic relationships between lexical units. As stated in GOST on IPT, “a monolingual information retrieval thesaurus is a controlled and changing dictionary of lexical units, based on the vocabulary of one natural language, displaying semantic relationships between lexical units and intended for processing and retrieving information.”

The basic unit of IPT is descriptor terms. The alphabetical, lexical-semantic part of the IPT is a set of descriptor articles.

Descriptive dictionaries are intended to fully describe the vocabulary of a certain field and record all uses therein; they record all available relevant cases. A typical example The descriptive dictionary is “Explanatory Dictionary of the Living Great Russian Language” by V.I. Dahl (the first edition in four volumes was published in 1863-1866). The goal of its creator was not to standardize the language, but to fully describe the entire diversity of Great Russian speech - including its dialect forms of vernacular.

Each descriptor dictionary entry begins with a descriptor, in which synonyms of this descriptor, as well as other lexical units associated with the main descriptor by genus-species or associative relations, are given below within the GOST article.

Thus, thesauri, especially in electronic format, are one of the most effective tools for describing individual subject areas.

IN pure form a thesaurus is rare. In real thesauri, the original idea is simplified or extraneous, but potentially necessary, information is added to the user. The most famous today are “Russian Semantic Dictionary” by Yu.N. Karaulova, “Dictionary of identical names” N.Yu. Shvedova, “Thematic Dictionary of the Russian Language” by L.G. Smekhova and others.

Summary. Thesaurus term L.V. Shcherba used it in relation to the dictionary, which recorded, if possible, all the contexts in which it occurs given word. A characteristic feature of thesauruses is that they list all the words that appear in a given language at least once, and under each word all quotes from texts available in that language are given. The content of a thesaurus dictionary is language material, and a regular dictionary is language material and language system(terms by L.V. Shcherba).



This characteristic is complemented by cross-connections of various kinds - often paradigmatic (synonymous or antonymic), which indicate commonality or opposition of meanings. In addition, various kinds of associations. connections (i.e. syntagm connections).

Thus, the task of a thesaurus (ideographic dictionary) is to give an idea of ​​the semantic organization of a certain cross-section of linguistic material, showing the main semantic fields, their internal structure and external connections. A thesaurus is a clear demonstration of the systemic nature of a language, allowing one to see many types of relationships connecting individual linguistic units and groups of units.

3.2. The history of representing conceptual knowledge about the world in the form of a thesaurus

The need to arrange words according to similarity, contiguity, and analogy of their meanings has been felt throughout the observable history of human thought.

To trace the origins of the idea of ​​representing conceptual knowledge about the world in the form of a thesaurus, we will be helped by turning to the history of compiling thesauri (ideographic dictionaries).

Thus, at the dawn of civilization, when people could express their thoughts in writing only with the help of ideograms and symbols, the only possible dictionary was probably one in which words were arranged into thematic groups. It was simply difficult for a lexicographer at that time to find another criterion for classifying words other than the relationships that exist in reality itself.

Unfortunately, we have no evidence of whether the peoples who used ideographic writing actually had such dictionaries. Among the most ancient attempts at ideographic classification known to us is the Attikai Lexeis of the Greek grammarian, director of the Library of Alexandria, Aristophanes of Byzantium (died 180 BC).

In the II century. n. e. the major work “Onomasticon” appears, compiled on material from the Greek language by the lexicographer and sophist Julius Pollux (real name Polydeuces), a native of the Egyptian city of Naucratis. Yu. Pollux wrote several works, but only “Onomasticon” has reached us (Pollux Yu. Onomasticon. M., 1956).


Onomasticon consists of 10 books. Books are essentially separate treatises and contain the most important words related to a particular topic. Thus, the first book talks about gods and kings; in the second - about people, their lives and physiological structure; in the third - about kinship and civil relations, etc. The words included in the dictionary are accompanied by brief interpretations. In modern times, the dictionary was first published in 1502 in Venice.

Between the 2nd and 3rd centuries. n. e. The wonderful Sanskrit dictionary “Amarakosha” (Amarakosha. Paris, 1839) was published. Its author is the ancient Indian poet, grammarian and lexicographer Amara Sina, who was called “one of the nine pearls that adorn the throne of Vikramaditya.” Amarakosha translated into Russian means the treasury of Amara. The dictionary contains 10 thousand words. To better remember the interpretation of the meanings of words, dictionary entries are constructed in the form of poems. All dictionary material is divided into 3 books. Each book includes several chapters, and the chapter in turn, if necessary, is divided into a number of sections. The first book is dedicated to the sky, the gods and everything that is directly related to them. The second book contains words related to the earth, settlements, plants, animals and humans (first, man is considered as a living being, and then as a social being; the entire caste structure of the author’s contemporary society appears before our eyes; priests, as God’s trustees, are at the very top , and below are military men and kings, even lower are landowners, and at the very bottom are artisans, jugglers, servants, etc.). The third book is strictly linguistic, as is clear from the titles of its six chapters.

The dictionary became known to European scientists only at the end of the 18th century, when its first part was published in Rome in 1798. It was published in full with translation into English in 1808 by the English Sanskrit scholar G.T. Colebrooke (N.T. Colebrooke). In 1839, its French translation appeared, made by A.L. Delonchamps (A.L. Deslongchamps). Further development of the idea of ​​semantic classification of vocabulary is associated with the problem of the so-called world language.

Summary. This, in the most general terms, is the first stage in the development of the tradition of ideographic classification of vocabulary. This stage can be called the prehistory of ideographic dictionaries. It is now advisable to turn to modern classification dictionaries-thesauruses.

It is easy to see how different the described works are from alphabetical dictionaries. If in alphabetic dictionaries the presentation of words is regulated by such a conventional and highly neutral instrument as the alphabet, then when constructing an ideographic dictionary, the worldview of the lexicographer himself becomes decisive.

3.3. Principles of classification of dictionaries-thesauruses

As has already been shown above, the problem of compiling a classification of thesauri is not new and for several decades has attracted the attention of a number of domestic and foreign linguists (C. Marello, V.V. Morkovkin, L.P. Stupin, V.V. Dubichinsky, etc. ). The result of research in this area was the creation of alternative classifications of these lexicographic works. One of the latest classifications is based on the following criteria: a) the type of semantic connections between vocabulary units; 2) volume of the vocabulary; 3) generalization of the vocabulary; 4) development of the meaning of lexemes; 5) grammatical and stylistic qualification of lexemes; 6) demonstration of the functioning of lexemes; 7) number of languages ​​represented; 8) the type of semiotic means used to semantize lexemes. This classification is based on the previously created classifications by O.M. Karpova and I. Burkhanov (Burchanov I. On the Ideographic Description of Stylistically and Pragmatically Relevant Aspects of Lexical Meanings. London, 1996); terminology used in classification is introduced into the lexicographic apparatus


V.V. Morkovkin, Yu.N. Karaulov, K. Marello. The classification criteria were formulated by O.M. Karpova. At the same time, C. Marello distinguishes three types of thesauri:

cumulative, which are groupings of words without defining their meanings;

definitive, interpreting each lexical unit of a group of words;

bi- and multilingual thesauri for travelers (Marello C. TheThesaurus//W.D.D. 1990. V. 2. P. 1083).

Cumulative thesauruses not only provide the opportunity to find a more understandable, accurate, stylistically correct word in the situation of being in a certain semantic field, but also become the basis for the formation of thematic computer data banks.

Definitive thesauri can include, along with definitions of meaning, etymological information and quotations from literary works, which shows the direct encyclopedic orientation of this type of thesaurus. In addition, dictionaries of this type introduce the user to the necessary system concepts, explain the essence, similarities and differences of concepts, their paradigmatic and syntagmatic connections, sometimes provide information about the pronunciation, grammatical, word-formation and other possibilities of lexical units denoting these concepts.

Bilingual and multilingual thesauri for travelers are usually created according to thematic sections: numbers, food, transport, hotels, etc. with translation equivalents of two or more languages.

To display the types of existing thesaurus dictionaries as completely as possible, a multi-level classification is created. Firstly, according to the type of semantic connections between vocabulary units, thesauri are divided into three large classes:

1. Associative thesaurus (terminology by Yu.N. Karaulov

2. Analogous thesaurus (terminology by V.V. Morkovkin

3. Ideographic (ideological) thesaurus (terminology by L.V. Shcherba, V.V. Morkovkin. The above three types of thesauri reflect the following types of semantic connections of lexemes, respectively:

1. Semantic-syntactic connections, on the basis of which
words are combined into groups or pairs, predetermined in their occurrence and existence by double connections: semantic and syntactic. Semantic connections between words are established mainly between verbs and adjectives that perform a predicative function in a sentence, and nouns, for example:

a) between an action and the organ (instrument) with which it is performed: to grab - a hand, to see - an eye, to swim - a boat, etc.;

b) between action verbs that require one subject and a subject: bark - a dog, neigh - a horse, etc.; c) between verbs and a certain grammatical addition, which the former require: chop - wood, eat - food, etc.

Hence, an associative thesaurus is a dictionary-thesaurus that organizes lexical units based on the semantic and syntactic connections that exist between them and arranges groups in accordance with the graphic form of center words.

2. Lexico-semantic connections. Grouping with this type of connection occurs according to the main feature for words - lexical meaning. At the same time, lexico-grammatical connections are also taken into account, in the form of which individual meanings of words are realized.

Thus, an analogical thesaurus is a lexicographic reference book, the main unit of macrostructure of which is the lexical-semantic group; the groups are systematized in alphabetical order of semantic dominants.

3. Subject or thematic connections, where the combination of words into one group occurs due to the similarity or commonality of functions of the objects and processes denoted by the words: objects
household items, body parts, types of clothing, buildings, etc.

Thus, an ideographic thesaurus is a lexicographic work that represents lexical units as part of subject (thematic) groups and organizes them into a hierarchical structure designed to represent conceptualized knowledge about the world.

Within the framework of the same criterion, we further subdivide the types. Thus, the ideographic thesaurus is represented by the following 4 types:


Actually an ideographic thesaurus.

Thematic dictionary.

Systematic dictionary.

Thematic-systematic dictionary


The ideographic thesaurus itself is a special type of ideographic dictionary, the macrostructure of which is organized in accordance with an a priori synoptic map superimposed on the lexical composition of the language. Unlike other types of ideographic dictionary, the ideographic thesaurus itself is characterized by a logical and strictly ordered classification structure created on the basis of scientific taxonomy, even if general vocabulary is subject to lexicographic description (New Webster "Thesaurus. Landoll, 1991).

A thematic dictionary is a special type of ideographic thesaurus, the main unit of macrostructure of which is a thematic group, including lexemes, united on the basis of the classification of their denotations (referents) and considered from the point of view of compliance with a specific topic.

A systematic dictionary is a special type of ideographic thesaurus whose classification structure is intended to represent the actual semantic relationships that exist between lexical units of a language. At its core, the classification structure represents the lexico-grammatical classification of the vocabulary, in other words, its paradigmatic structure, described from the point of view of subordination and composition.

A thematic-systematic dictionary is a special type of ideographic dictionary, which is a combination of a thematic and systematic dictionary.

Summary. The considered classification of linguistic thesauri includes the following types of dictionaries: analogical thesaurus (terminology by V.V. Morkovkin); ideographic (ideological) thesaurus (terminology by L.V. Shcherba and V.V. Morkovkin); assoc. thesaurus (terminology by Yu.N. Karaulov). Next will be presented pop. thesauri and their features are revealed.

3.4. Popular thesauri and their features

The most famous of the available dictionaries-thesauruses, to which this term itself owes its existence, was created on the material of the English language; this is a constantly reprinted thesaurus by P.M. Roger Roget's Thesaurus of English Words and Phrases (1852).

It is important to note that the author of the Thesaurus of English Words and Expressions made full use of the experience available by that time. “The principle that guided me when classifying words,” writes P.M. Roger, is the same one that is used to classify individuals in various areas natural history. Therefore, the sections I have highlighted correspond to the natural families of botany and zoology, and the series of words are cemented by the same relationships that unite the natural series of plants and animals."

P.M. Roger believed that a convincing classification of words according to their meanings is impossible until the objects of reality called these words are properly studied and organized. Therefore, he begins his work by dividing the conceptual field of the English language into four large classes: abstract relations, space, matter and spirit (mind, will, feelings). These classes are further divided into a number of genera, which in turn are divided into a certain number of species.

Among the shortcomings of the ideographic dictionary of P.M. Scientists attribute the following to Roger: 1) a not entirely convincing nomenclature of the main conceptual classes; 2) abstract logic prevails over natural connections of words; 3) relative inconvenience of use (this deficiency has been largely corrected in subsequent editions).

In modern Russian lexicography there are several dictionaries that should be classified as dictionaries-thesauruses (ideographic dictionaries). This, for example, was created under the leadership of Yu.N. Karaulova “Russian semantic dictionary”, “Russian semantic dictionary” edited by N.Yu. Shvedova, “Thematic Dictionary of the Russian Language” by L.G. Sayakhova, D.M. Khasanova and V.V. Morkovkina, “Dictionary of lexical-semantic groups of Russian verbs”, ed. E.V. Kuznetsova, “Ideographic Dictionary of the Russian Language” O.S. Baranova, “The Conceptosphere of the Inner World of Man in the Russian Language” by V.I. Ubiyko, a comprehensive educational dictionary “Lexical basis of the Russian language” under the guidance of V.V. Morkovkina.

Let's get to know some of them.

Dictionary-thesaurus of modern Russian idioms” edited by A.N. Baranova and D.O. Dobrovolsky includes four main parts: 1) synopsis; 2) legend; 3) the main body of the Dictionary-Thesaurus; 4) pointers. The purpose of the Synopsis is to give a general idea of ​​the structure of the Main Body of the Thesaurus. It lists all taxa with subtaxa and corresponding paradigmatic references. The main body of the Thesaurus Dictionary is a collection of dictionary entries, grouped into groups (taxa) and subgroups (subtaxa) in accordance with the meaning of the idioms described in them. Each article contains an idiom and examples of its use in modern Russian. Synopsis, Legend, Indexes are service parts of the above-mentioned Dictionary-thesaurus, providing the user with the opportunity to work quickly and efficiently. The legend is used in cases where examples of the use of idioms are not needed, because it reproduces all information except examples. In fact, this is the vocabulary of the Dictionary. The units of the vocabulary are lemmas. The lemma in this case represents the idiom in its original (dictionary) form and includes, if possible, all its significant variants. For example, the idiom stand still is part of the lemma mark time, stand still, skid in place.

The dictionary contains two pointers. At the end of the book there is an article “Theoretical Concept of the Dictionary-Thesaurus of Modern Russian Ideomatics”, which analyzes in detail the scientific features of this project.

“Russian Semantic Dictionary”, created under the leadership of Yu.N. Karaulova includes 10 thousand Russian words, which are divided into 1600 conceptual groups. The identification of groups is based on repeated elements of word interpretation in explanatory dictionaries: for example, “action”, “property”, “tool”, etc.

“Russian semantic dictionary”, created under the leadership of academician N.Yu. Shvedova, is based on slightly different principles characteristic of the compilation of both ideographic and explanatory dictionaries. Firstly, all the words of the language are divided here into four classes: 1) indicating units (pronouns), 2) naming (notional words), 3) actual connectors (conjunctions, prepositions, linking verbs), 4) classifying (modal words, particles, interjections). Secondly, within each class, all words are distributed according to parts of speech. Thirdly, within each part of speech, sets and subsets are identified based on thematic proximity or, conversely, opposition of word meanings.

DUDEN is a book with pictures (drawings) on the left side (according to different software) with numbered parts (down to the smallest). On right side this numbered list is accompanied by titles (even in two languages). For example, railway equipment, stations, and tracks are drawn on a whole page. On the right are the names of arrows, semaphores, crutches, etc.

“Thematic Dictionary of the Russian Language” L.G. Sayakhova, D.M. Khasanova and V.V. Morkovkina contains 25 thousand lexical units, grouped into three large classes: “Man”, “Society”, “Nature”, which branch stepwise into smaller subclasses. For example, in the class “Human” there are subclasses “Human body and organism”, “Human life”, “Appearance, appearance of a person”, “Emotional appearance of a person”, etc. Each of the subclasses in turn is divided into even more specific ones: “ The emotional world of man" - "Mental properties of man" - "Temperament", "Character" - " Common features character”, etc. The meaning and use of words belonging to each class are illustrated by the most common phrases. For example, the word “laughter”, which is in the subgroup “expression of feelings, emotions” of the “Man” class, is accompanied by an indication of such combinations with this word as cheerful laughter, joyful laughter, child’s laughter, burst into laughter, etc.

Summary. One of the effective tools for describing individual subject areas, especially in electronic format, are thesauri.

The term thesaurus has long been widely used in linguistics to mean special type dictionaries, to one degree or another reflecting the “picture of the world”, “linguistic model of the world” (according to Yu.N. Karaulov). The thesaurus as a “treasury” has grown in its semantic scope and received a new meaning. They began to call it a dictionary that not only absorbs all the lexical riches of a language, but organizes them in a certain logical-systemic way. In a thesaurus dictionary, words are combined into groups, and this unification occurs on the basis of the ability of a particular word to convey a certain concept.

The thesaurus dictionary has always been considered in linguistics as a kind of universal system that ensures the storage of collective (for a particular society) knowledge about the world in verbal form. Unlike other dictionaries, in a thesaurus-dictionary this knowledge is stored in structured form, reflecting our ideas about the “structure of the world.”

The most famous and popular thesauri at present are the English Roger's Thesaurus, O.V. Ideographic Dictionary of the Russian Language. Baranova, Russian semantic dictionary Yu.N. Karaulova, Russian semantic dictionary of academician N.Yu. Shvedova, DUDEN, Thematic Dictionary of the Russian Language L.G. Sayakhova, D.M. Khasanova and V.V. Morkovkina.

The first stage of creating a thesaurus was the search for information about the structure of thesauri, its types and operating programs. The second stage was the choice of a programming language and a scheme for constructing your future thesaurus. The third stage is the search for information to fill it out; for this I used the “Educational and Methodological Complex Computer Networks”.

Here are a couple of examples of thesauri (see Figure 1.1 and Figure 1.2):

Figure 1.1 - Information retrieval system “Thesaurus.com”

Figure 1.2 - Dictionary of gender terms

After collecting the necessary information, the creation of the thesaurus began. To create the thesaurus, the programming language chosen was HTML. Hyper Text Markup Language - “HTML” (hypertext markup language) has long ceased to be considered simply a programming language by many. Since the very concept of HTML includes various methods for designing hypertext documents, design, hypertext editors, browsers and much more. A user who has mastered this language gains the ability to do serious things using simple methods and, most importantly, quickly, which in modern world considered very good!

In the HTML language, you can create your own multimedia products and distribute them on any media, and all these products, made in the form of sets of HTML pages, do not require the development of specialized software, since everything necessary for working with data (Web browsers) has become part of the standard software of most personal computers.

The code for the future Web page is usually typed in a standard text editor, but there are other programs and programming languages, for example: Adobe Dreamweaver CS3, JavaScript, Pascal, C, C++, BASIC, Prolog.

To begin with, the thesaurus will consist of three frames: a title frame, a links frame, and a content frame, as shown in Figure 1.3.

Figure 1.3 - Thesaurus diagram

The following tags and attributes were used to create the thesaurus sketch HTML language:

text- site title;

- two frames horizontally measuring 120px and the remaining space;

- canceling the ability to stretch frame boundaries;

- vertical frames;

- indicates the name of the frame for the possibility of sending information to this frame.

To fill the frames with information, we write the code in the documents: “new.txt” - the “Title” frame, “nav.txt” - the “Links” frame, “main.txt” - the “Content” frame.

The document “new.txt” contains the code responsible for the name of the thesaurus itself. Main tags: