What is a compiler, interpreter, translator. Compiler architecture: components. The main idea of recursive descent is that each nonterminal of the grammar has a corresponding procedure that recognizes any chain generated by this nonterminal

Each computer has its own own language programming - command language or machine language– and can execute programs written only in this language. In principle, any algorithm can be described using machine language, but the programming costs will be extremely high. This is due to the fact that machine language allows you to describe and process only primitive data structures - bits, bytes, words. Programming in machine codes requires excessive detail of the program and is accessible only to programmers, well knowledgeable about the device and functioning of the computer. Languages allowed us to overcome this difficulty high level(Fortran, PL/1, Pascal, C, Ada, etc.) with developed data structures and means of processing them, independent of the language of a particular computer.

High-level algorithmic languages enable the programmer to quite simply and conveniently describe algorithms for solving many applied problems. This description is called original program, and the high level language is input language.

Language processor is a machine language program that allows a computer to understand and execute programs in the input language. There are two main types of language processors: interpreters and translators.

Interpreter is a program that accepts a program in the input language as input and, as the input language constructs are recognized, implements them, producing the output results of calculations prescribed by the source program.

Translator is a program that accepts an original program as an input and generates a program at its output that is functionally equivalent to the original one, called object. An object program is written in an object language. In a particular case, machine language can serve as an object language, and in this case, the program obtained at the output of the translator can be immediately executed on a computer (interpreted). In this case, the computer is an interpreter object program in machine codes. IN general case an object language does not have to be machine or something close to it (autocode). Some object language can serve as intermediate language– a language lying between the input and machine languages.

If an intermediate language is used as an object language, then two options for constructing a translator are possible.

The first option is that for the intermediate language there is (or is being developed) another translator from the intermediate language to the machine language, and it is used as the last block of the designed translator.

The second option for building a translator using an intermediate language is to build an interpreter for intermediate language commands and use it as the last block of the translator. The advantage of interpreters is manifested in debugging and interactive translators, which ensure that the user can work in an interactive mode, up to making changes to the program without re-translating it completely.

Interpreters are also used in program emulation - execution on a technological machine of programs compiled for another (object) machine. This option, in particular, is used when debugging programs on a general purpose computer that will be executed on a specialized computer.

A translator that uses a language close to machine language (autocode or assembler) as an input language is traditionally called assembler. A translator for a high-level language is called compiler.

In building a compiler for last years Significant progress has been made. The first compilers used so-called live broadcast methods- These are predominantly heuristic methods, in which, based on general idea For each language construct, its own translation algorithm into a machine equivalent was developed. These methods were slow and unstructured.

The design methodology for modern compilers is based on compositional syntactically driven method language processing. Compositional in the sense that the process of converting a source program into an object program is implemented by the composition of functionally independent mappings with explicitly identified input and output data structures. These mappings are constructed from considering the source program as a composition of the main aspects (levels) of the description of the input language: vocabulary, syntax, semantics and pragmatics, and identifying these aspects from the source program during its compilation. Let's consider these aspects in order to obtain a simplified compiler model.

The basis of any natural or artificial language is alphabet– a set of elementary characters allowed in the language (letters, numbers and service characters). Signs can be combined into words– elementary constructions of the language, considered in the text (program) as indivisible symbols that have a certain meaning.

A word can also be a single character. For example, in the Pascal language, words are identifiers, keywords, constants, and delimiters, in particular arithmetic and logical operators, parentheses, commas, and other symbols. The vocabulary of a language, together with a description of the ways they are represented, constitute vocabulary language.

Words in a language are combined into more complex structures - sentences. In programming languages, the simplest sentence is an operator. Sentences are built from words and simpler sentences according to the rules of syntax. Syntax language is a description of correct sentences. Description of the meaning of sentences, i.e. meanings of words and their internal connections, is semantics language. In addition, we note that a specific program has some impact on the translator - pragmatism. Taken together, syntax, semantics and pragmatism of language form semiotics language.

Translating a program from one language to another, in general, consists of changing the alphabet, vocabulary and syntax of the program language while maintaining its semantics. The process of translating a source program into an object program is usually divided into several independent subprocesses (translation phases), which are implemented by the corresponding translator blocks. It is convenient to consider lexical analysis, syntactic analysis, semantic analysis and

object program synthesis. However, in many real compilers these phases are broken down into several subphases, and there may also be other phases (for example, object code optimization). In Fig. 1.1 shows a simplified functional model translator

According to this model, the input program is first subjected to lexical processing. The purpose of lexical analysis is to translate the source program into the internal language of the compiler, in which keywords, identifiers, labels and constants are reduced to a single format and replaced with conditional codes: numeric or symbolic, which are called descriptors. Each descriptor consists of two parts: the class (type) of the token and a pointer to the memory address where information about the specific token is stored. Typically this information is organized in tables. Simultaneously with the translation of the source program into the internal language, at the stage of lexical analysis, lexical control- identifying unacceptable words in the program.

The parser takes the output of the lexical analyzer and translates the sequence of token images into the form of an intermediate program. An intermediate program is essentially a representation of the syntax tree of a program. The latter reflects the structure of the original program, i.e. order and connections between its operators. During the construction of a syntax tree, the syntactic control– identifying syntax errors in the program.

The actual output of the parser may be the sequence of commands needed to build the middleware, access directory tables, and issue a diagnostic message when required.

Rice. 1.1. Simplified functional model of the translator

Synthesis of an object program begins, as a rule, with the distribution and allocation of memory for the main program objects. Each sentence in the source program is then examined and semantically equivalent sentences in the object language are generated. The input information here is the syntax tree of the program and the output tables of the lexical analyzer - a table of identifiers, a table of constants, and others. Tree analysis allows us to identify the sequence of generated commands of an object program, and using the table of identifiers, we determine the types of commands that are valid for the values of the operands in the generated commands (for example, which commands need to be generated: fixed or floating point, etc.).

The actual generation of an object program is often preceded by semantic analysis which includes different kinds semantic processing. One type is checking semantic conventions in a program. Examples of such agreements: the uniqueness of the description of each identifier in the program, the definition of a variable is made before its use, etc. Semantic analysis can be performed at later phases of translation, for example, at the program optimization phase, which can also be included in the translator. The goal of optimization is to reduce the time resources or RAM resources required to execute an object program.

These are the main aspects of the translation process from high-level languages. More details on the organization of the various phases of broadcasting and related issues practical ways their mathematical descriptions are discussed below.

Translator (eng. translator - translator) is a translator program. It converts a program written in one of the programming languages into a binary file of a program consisting of machine instructions, or directly executes the actions of the program.

Translators are implemented in the form of compilers, interpreters, preprocessors and emulators. In terms of doing the work, the compiler and interpreter are significantly different.

Compiler (eng. compiler - compiler, collector)- reads the entire program, translates it and creates a complete version of the program in machine language, that is, a binary file containing a list of machine instructions. A binary file can be executable, library, object), it is executed by the operating system without the participation of a compiler.

Interpreter (eng. interpreter - interpreter, translator)— translates the program line by line (one statement at a time) into machine code (processor instructions, OS, other environment), executes the translated statement (program line), and then proceeds to next line program text. The interpreter does not generate executable files; it itself performs all the actions written in the text of the source program.

After the program is compiled, neither original program, nor the compiler are no longer needed. At the same time, the program processed by the interpreter must be re-translated into machine language each time the program is launched.

Compiled programs run faster, but interpreted ones are easier to fix and change.

Every specific language oriented either towards compilation or interpretation - depending on the purpose for which it was created. For example, Pascal is usually used to solve rather complex problems in which program speed is important. That's why given language usually implemented using a compiler.

On the other hand, BASIC was created as a language for novice programmers, for whom line-by-line execution of a program has undeniable advantages.

Sometimes there is both a compiler and an interpreter for the same language. In this case, you can use an interpreter to develop and test the program, and then compile the debugged program to improve its execution speed.

Preprocessor is a translator from one programming language to another without creating an executable file or executing a program.

Preprocessors are convenient for expanding the capabilities of a language and the convenience of programming by using, at the stage of writing a program, a more human-friendly dialect of a programming language and translating it with a preprocessor into the text of a standard programming language, which can be compiled by a standard compiler.

Emulator- software and/or operating in some target operating system and hardware platform hardware, designed to execute programs produced in a different operating system or running on hardware different from the target, but allowing the same operations to be performed in the target environment as in the simulated system.

Emulating languages include systems such as Java, .Net, Mono, in which, at the stage of creating a program, it is compiled into special bytecode and obtained binary file, suitable for execution in any operating room and hardware room environment, and execution the resulting bytecode is produced on the target machine using a simple and fast interpreter (virtual machine).

Reassembler, disassembler- a software tool designed to decrypt binary code with its presentation in the form of assembler text or text of another programming language, which allows you to analyze the algorithm of the source program and use the resulting text for the necessary modification of the program, for example, change addresses external devices, accessing system and network resources, identifying hidden functions of binary code (for example, computer virus or other malicious program: Trojan, worm, keylogger, etc.).

to algorithmization algorithms, data structures and programming DBMS Ya&MP 3GL 4GL 5GL technologies prog.

Did you know, What abstraction through parameterization is a programming technique that allows, using parameters, to represent a virtually unlimited set of different calculations with one program, which is an abstraction of these sets.

Translators

Since the text of a program written in Pascal is not understandable to a computer, it needs to be translated into machine language. This translation of a program from a programming language into a machine code language is called broadcast (translation - translation), and it is performed by special programs - broadcasters.

There are three types of translators: interpreters, compilers and assemblers.

Interpreter is called a translator that performs operator-by-operator (instruction-by-command) processing and execution of the source program.

Compiler converts (translates) the entire program into a module in machine language, after which the program is written into the computer's memory and only then executed.

Assemblers translate a program written in assembly language (autocode) into a program in machine language.

Any translator solves the following main tasks:

Analyzes the translated program, in particular determines whether it contains syntax errors;

Generates an output program (often called an object or working program) in a computer command language (in some cases, the translator generates an output program in an intermediate language, for example, assembly language);

Allocates memory for the output program (in the simplest case, this consists of assigning each program fragment, variables, constants, arrays and other objects their own memory addresses).

Introduction to .Net and Sharp

The programmer writes a program in a language that the programmer understands, and the computer executes only programs written in machine code language. The set of tools for writing, editing and converting a program into machine code and executing it is called the development environment.

The development environment contains:

Text editor for entering and editing program text

Compiler for translating a program into machine command language

Tools for debugging and launching programs for execution

Shared libraries with reusable software elements

Help system, etc.

The .NET platform ("dot net"), developed by by Microsoft, includes not only a multi-language development environment called Visual Studio .NET, but many other tools such as database tools, Email and etc.

The most important tasks in the development of modern software are:

Portability - ability to run on different types of computers

Security – impossibility of unauthorized actions

Reliability – failure-free operation under given conditions

Using off-the-shelf components to speed up development

Interlingual interaction – the use of several programming languages.

All these tasks are solved within the .NET platform.

To ensure portability, platform compilers translate the program not into machine code, but into the intermediate language MSIL (Microsoft Intermediate Language) or simply into IL. IL does not contain operating system or computer type specific commands. The IL program is executed by the CLR (Common Language Runtime), which is already specific to each type of computer. The translation of an IL program into machine codes of a specific computer is performed by a JIT (Just In Time) compiler.

The program execution diagram on the .NET platform is shown in Fig. 1.

The compiler creates a program assembly - a file with the extension . exe or . dll, which contains the IL code. Program execution is organized by the CRL environment, which monitors the validity of operations, performs memory allocation and cleanup, and handles execution errors. This ensures the safety and reliability of programs.

The price for these advantages is a decrease in program performance and the need to install .NET on the computer to execute ready-made programs.

So, .NET is a programming platform.

C# (Sea Sharp) is one of the programming languages of the .NET platform. It is included with Visual Studio - Visual Studio.NET (Versions 2008, 2010, 2012). In addition to C#, Visual Studio.NET includes Visual Basic.NET and Visual C++.

One of the reasons for Microsoft to develop a new language is to create a component-oriented language for the platform .NET Framework.

Fig.1 Program execution diagram in .NET

NET Framework consists of two parts:

First, it includes a huge library of classes that can be called from C# programs. There are a lot of classes (about several thousand). This eliminates the need to write everything yourself. Therefore, programming in C# involves writing your own code that calls classes stored in the .NET Framework as needed.

Secondly, it includes the .NET Runtime environment, which controls the launch and operation of ready-made programs.

The .NET platform is an open environment – third party developers created dozens of compilers for .NET for the languages Ada, COBOL, Fortran, Lisp, Oberon, Perl, Python, etc.

The .NET platform is actively developing - new versions of this platform are being released. Using the menu Project – Properties Find out the version of the .NET platform you are using.

In theory, a .NET program can run on any operating system on which .NET is installed. But in practice, the only official platform for this is the Windows operating system. However, there are unofficial .NET implementations for Unix-like Linux, Mac OS X and others (Mono is a .NET Framework project based on the free software).

The word rapier

The word rapier in English letters (transliterated) - rapira

The word rapier consists of 6 letters: a a and p r r

Meanings of the word rapier. What is a rapier?

Rapier (German Rapier, from French rapière, originally Spanish espadas roperas - literally, “sword for clothing” (that is, not for armor), distorted in French la rapiere) - a predominantly piercing edged weapon, a type of sword...

en.wikipedia.org

Rapier (German Rapier, from French rapière), a sports piercing weapon, consists of a steel elastic blade and a hilt (a protective cup-shaped guard and handle).

TSB. - 1969-1978

RAPIER (German Rapier, from French rapiere). Sports piercing weapon. Consists of a steel flexible blade and hilt (protective cup-shaped guard and handle). The blade has a rectangular cross-section, tapering towards the top...

Olympic Encyclopedia. - 2006

RAPIRA, Extended Adapted Poplan-Interpreter, Editor, Archive, is an educational and industrial programming language. Developed in the early 80s in the USSR. The rapier is a means...

Encyclopedia of Programming Languages

Rapier (SAM)

Rapier is a surface-to-air missile system developed by the British armed forces for the Royal Air Force. It is in service with the armies of Australia, Great Britain, Indonesia, Singapore, Turkey, Malaysia and Switzerland.

en.wikipedia.org

Rapier combat

Combat rapier - (from the French rapiere) - piercing, piercing-cutting long-bladed X.0. with a handle, known in Europe from the 2nd half of the 17th century. Consisted of a straight flat or faceted steel blade with a pointed (for dueling R.)…

Sports rapier

SPORTS RAPIRA - a sports edged weapon, consisting of a flexible rectangular blade in cross-section and a removable handle with a round cup-shaped guard.

weapon.slovaronline.com

A sports rapier is a sports edged weapon consisting of a flexible rectangular blade and a removable handle with a round cup-shaped guard.

Petrov A. Dictionary of bladed weapons and armor

Rapier (programming language)

RAPIRA - Extended Adapted Poplan-Interpreter, Editor, Archive - procedural programming language. Developed in the early 80s in the USSR as a means of transition from more simple languages(in particular, educational language Robik)…

en.wikipedia.org

Fencing at the 1896 Summer Olympics - foil

The men's foil fencing competition at the 1896 Summer Olympics took place on April 7.

Translator, compiler, interpreter

Eight athletes from two countries took part. First they competed in two groups of four athletes...

en.wikipedia.org

Fencing at the 1900 Summer Olympics - foil

The men's foil fencing competition at the 1900 Summer Olympics took place from 14 to 19 and 21 May. 54 athletes from ten countries took part.

en.wikipedia.org

Russian language

Rapier.

Morphemic-spelling dictionary. - 2002

Fencing at the 1900 Summer Olympics - foil among the maestros

The foil fencing competition among male maestros at the 1900 Summer Olympics took place from 22 to 25 and 27 to 28 May.

59 athletes from seven countries took part.

en.wikipedia.org

Usage examples for rapier

The rapier is made in such a way that it cannot go inside; the maximum that can remain is a bruise.

Based on the markings on the handle of the rapier, operatives go to the fencing club and find out that the rapier was stolen from there a year ago.

Lecture: Software standards and licenses

Standards UNIX family. C programming language standards. System V Interface Definition (SVID). POSIX committees. X/Open, OSF and Open Group. Licenses for software and documentation.
Content

3.1. UNIX family standards

C Programming Language Standards
System V Interface Definition (SVID)
POSIX committees
X/Open, OSF and Open Group

3.2. Software and Documentation Licenses

3.1. UNIX family standards

The reason for the emergence of standards for the operating room UNIX system what happened was that it was ported to many hardware platforms. Its first versions ran on PDP hardware, but in 1976 and 1978 the system was ported to Interdata and VAX. From 1977 to 1981, two competing branches took shape: AT&T UNIX and BSD. Probably, the goals for developing standards were different. One of them is to legitimize the primacy of its version, and the other is to ensure the portability of the system and application programs between different hardware platforms. In this regard, they talk about program mobility. Such properties relate to both the source code of programs and executable programs.

The following material is presented in chronological order of appearance of the standards.

C Programming Language Standards

This standard does not apply directly to UNIX. But since C is the base for both this family and other operating systems, we will mention the standard of this programming language. It began with the publication in 1978 of the first edition of the book by B. Kernighan and D. Ritchie. This standard is often called K&R. The programmers behind this work worked on UNIX with Ken Thompson. Moreover, the first of them suggested the name of the system, and the second invented this programming language. The corresponding text is available on the Internet [ 45 ].

However industry standard The C programming language was released in 1989 by ANSI and was named X3. 159 – 1989. Here is what is written about this standard [ 46 ]:

"The standard was adopted to improve the portability of programs written in the C language between various types OS. Thus, the standard, in addition to the syntax and semantics of the C language, includes recommendations on the content standard library. Support for the ANSI C standard is indicated by the predefined symbolic name _STDC."

In 1988, based on this programming language standard, the second edition of Kernighan and Ritchie’s book about C was released. Note that companies producing software products to develop programs in the C language, they can form their own libraries and even slightly expand the composition of other language tools.

^ System V Interface Definition (SVID)

Another direction in the development of UNIX standards is due to the fact that not only enthusiasts thought about creating “standards”. The main developers of the system, with the emergence of many “variants”, decided to publish their own documents. This is how the standards are produced by the USG, the organization that has been developing documentation for AT&T's UNIX versions since the operating system was created. subsidiary. The first document appeared in 1984 based on SVR2. It was called SVID (System V Interface Definition). A four-volume description was released after the release of SVR4. These standards were supplemented by a set of test programs SVVS (System V Verification Suite). The main purpose of these tools was to enable developers to judge whether their system could qualify for the name System V [ 14 ].

Note that the situation with the SVID standard is somewhat similar to the standard of the C programming language. The book published by the authors of this programming language is one of the standards, but not the only one. Standard C, released later, is the result of collective work, has passed the stage of discussion by the general public and, apparently, can lay claim to a leading role in the list of standards. Likewise, SVVS is a set of tests that allows you to judge whether a system is worthy of the name System V, just one of the versions of UNIX. This does not take into account all the experience in developing operating systems from different manufacturers.

POSIX committees

Work on the design of UNIX standards began by a group of enthusiasts in 1980. The goal was formulated to formally define the services that operating systems provide to applications. This programming interface standard became the basis of the POSIX document (Portable Operating System Interface for Computing Environment - a portable operating system interface for a computing environment) [ 14 ]. The first POSIX working group was formed in 1985 from the UNIX-oriented standards committee /usr/group, also called UniForum [ 47 ]. The name POSIX was proposed by GNU founder Richard Stallman.

Early versions of POSIX define many system services necessary for the operation of application programs, which are described within the interface specified for the C language (interface system calls). The ideas contained in it were used by the ANSI (American National Standards Institute) committee when creating the C language standard mentioned earlier. The initial set of functions included in the first versions was based on AT&T UNIX (version SVR4 [ 48 ]). But in the future there is a break in the specifications POSIX standards from that particular OS. The approach to organizing a system based on many basic system functions was applied not only in UNIX (for example, Microsoft's WinAPI).

In 1988, the 1003.1 - 1988 standard was published, defining the API (Application Programming Interface). Two years later he was accepted new option IEEE 1003.1 - 1990 standard. It defined the general rules of the program interface for both system calls and library functions. Further additions to it are approved, defining services for real-time systems, POSIX threads, etc. The POSIX 1003.2 standard – 1992 – definition is important command interpreter and utilities.

There is a translation [ 1 ] these two groups of documents, which are called: POSIX.1 (application program interface) and POSIX.2 (command interpreter and utilities - user interface). The mentioned translation contains three chapters: basic concepts, system services and utilities. Chapter " System services" is divided into several parts, each of which groups services with similar functions. For example, in one of the sections "Basic I/O", the seventh part, devoted to directory operations, describes three functions (opendir, readdir and closedir). They are defined in four paragraphs: "Syntax", "Description", "Return Value" and "Errors".

For those who are familiar with the algorithmic programming language C, here is an example of description fragments.

Programming languages, translators, compilers and interpreters

In fact, this description gives an idea of how the "System Call Interface" is specified. In the "Syntax" section about the readdir function, the following lines are given:

#include

struct dirent *readdir(DIR *dirp);

The second paragraph (“Description”) contains the following text:

"The types and data structures used in directory definitions are defined in the file dirent.h. The internal composition of directories is implementation-defined. When read using the readdir function, an object of type struct dirent is formed, containing as a field the character array d_name, which contains the character-terminated NUL local name file.

Readdir reads the current directory element and sets the position pointer to the next element. The open directory is specified by the dirp pointer. Element containing empty names, is skipped."

And here is what is given in the “Return value” paragraph:

"Readdir, upon successful completion, returns a pointer to an object of type struct dirent containing the directory element read. The read element can be stored in static memory and is overwritten by the next such call applied to the same open directory. Calling readdir for different open directories does not overlap the read information. In If an error occurs or the end of the file is reached, a null pointer is returned."

The paragraph "Errors in the standard" states the following:

"Readdir and closedir encountered an error. Dirp is not a pointer to an open directory."

This example shows how the services provided by an application are described. The requirements for the operating system (implementation) is that it “...must support all required utilities, functions, header files ensuring the behavior specified in the standard. The _POSIX_VERSION constant has the value 200112L [ 49 ]".

In the world computer technology There is such a phrase: "POSIX programming". This can be learned using various manuals on UNIX systems programming and operating systems (for example, [ 5 ]). There is a separate book with this title [ 3 ]. Note that the preface to this book states that it describes "... a threefold standard..." as it is based on the latest 2003 version of POSIX, which is based on three standards: IEEE Std 1003.1, technical standard Open Group and ISO/IEC 9945.

How can you verify that a particular system complies with the POSIX standard? Formalizing such a question is not as simple as it seems at first glance. Modern versions offer 4 types of compliance (four semantic meanings of the word “compliance”: full, international, national, extended).

The documents under consideration provide lists of two types of interface tools: mandatory (if possible, it is assumed to be compact) and optional. The latter must either be processed in the prescribed manner or return fixed value ENOSYS code indicating that the function is not implemented.

Note that the POSIX document set has been changing for many years. But the developers of new versions always try to maintain continuity as much as possible with previous versions, Something new may appear in more recent editions. For example, the 2004 document combined four parts [ 50 ]:

Base Definitions volume (XBD) – definition of terms, concepts and interfaces common to all volumes of this standard;
System Interfaces volume (XSH) – system level interfaces and their binding to the C language, which describes the mandatory interfaces between application programs and the operating system, in particular – system call specifications;
Shell and Utilities volume (XCU) – definition of standard command interpreter interfaces (the so-called POSIX shell), as well as the basic functionality of Unix utilities;
Rationale (Informative) volume (XRAT) – additional, including historical, information about the standard.

Like the first editions, the document in its main part describes the groups of services provided. Each element is described there in the following paragraphs: NAME (Name), SINOPSIS (Syntax), DISCRIPTION (Description), RETURN VALUE (Return Value), ERRORS (Errors) and finally EXAMPLE (Examples).

Modern versions of the standard define the requirements for both the operating system and application programs. Let's give an example [ 51 ].

The readdir() function must return a pointer to the structure corresponding to the next directory element. Whether directory elements named "dot" and "dot-to-dot" are returned is not specified by the standard. In this example, four outcomes are possible, and the requirement for application program is that it must be designed for any of them.

And in conclusion, we present an excerpt from the course of lectures by Sukhomlinov (“INTRODUCTION TO THE ANALYSIS OF INFORMATION TECHNOLOGIES”, Sukhomlinov V.A. Part V. Methodology and system of POSIX OSE standards), dedicated to the scope of applicability of the standards [ 52 ]:

"The scope of applicability of POSIX OSE (Open System Environment) standards is to provide the following capabilities (also called openness properties) for developed information systems:

Application Portability at the Source Code Level, i.e. providing the ability to transfer programs and data presented in the source code of programming languages from one platform to another.
System Interoperability, i.e. supporting interconnectivity between systems.
User Portability, i.e. providing the ability for users to work on different platforms without retraining.
Adaptability to new standards (Accommodation of Standards) related to achieving the goals of open systems.
Adaptability to new information technologies (Accommodation of new System Technology) based on the universality of the classification structure of services and the independence of the model from implementation mechanisms.
Scalability of application platforms (Application Platform Scalability), reflecting the ability to transfer and reuse application software in relation to different types and configurations of application platforms.
Scalability distributed systems(Distributed System Scalability), reflecting the ability of application software to function regardless of the development of the topology and resources of distributed systems.
Implementation Transparency, i.e. hiding the features of their implementation from users behind system interfaces.
Systematic and accurate specifications of user functional requirements (User Functional Requirements), which ensures completeness and clarity in determining user needs, including in determining the composition of applicable standards."

This allows you to solve the following problems:

integration of information systems from components from various manufacturers;
efficiency of implementations and developments, thanks to the accuracy of specifications and compliance standard solutions, reflecting the advanced scientific and technical level;
efficiency of application software transfer, thanks to the use of standardized interfaces and transparency of mechanisms for implementing system services.

The standards also formally define the following important concepts of operating systems: user; file; process; terminal; host; network node; time; linguistic and cultural environment. The formulation of such a definition is not given there, but the operations applied to them and the attributes inherent in them are introduced.

In total, there are more than three dozen elements in the list of POSIX standards. Their names traditionally begin with the letter "P", followed by a four-digit number with additional symbols.

There are also group names for the POSIX1, POSIX2, etc. standards. For example, POSIX1 is associated with standards for basic OS interfaces (P1003.1x, where x is either empty or characters from a to g; thus, there are 7 documents in this group), and POSIX3 is related to testing methods (two documents - P2003 and P2003n ).

The translator usually also diagnoses errors, compiles identifier dictionaries, produces program texts for printing, etc.

Broadcast of the program- transformation of a program presented in one of the programming languages into a program in another language and, in a certain sense, equivalent to the first.

The language in which the input program is presented is called original language, and the program itself - source code. The output language is called target language or object code.

The concept of translation applies not only to programming languages, but also to other computer languages, such as markup languages, similar to HTML, and natural languages, such as English or Russian. However, this article is only about programming languages; for natural languages, see: Translation.

Types of translators

Address. A functional device that converts a virtual address into real address Memory address.
Dialog. Provides use of a programming language in time-sharing mode.
Multi-pass. Forms an object module over several views of the source program.
Back. Same as detranslator. See also: decompiler, disassembler.
Single pass. Forms an object module in one sequential viewing of the source program.
Optimizing. Performs code optimization in the generated object module.
Syntactic-oriented (syntactic-driven). Receives as input a description of the syntax and semantics of the language and text in the described language, which is translated in accordance with the given description.
Test. A set of assembly language macros that allow you to set various debugging procedures in programs written in assembly language.

Implementations

The purpose of translation is to convert text from one language to another, which is understandable to the recipient of the text. In the case of translator programs, the addressee is a technical device (processor) or interpreter program.

There are a number of other examples in which the architecture of the developed series of computers was based on or strongly depended on some model of program structure. Thus, the GE/Honeywell Multics series was based on a semantic model for executing programs written in the PL/1 language. In Template:Not translated B5500, B6700 ... B7800 was based on a model of a runtime program written in the extended ALGOL language. ...

The i432 processor, like these earlier architectures, is also based on a semantic model of program structure. However, unlike its predecessors, the i432 is not based on a specific programming language model. Instead, the developers' main goal was to provide direct runtime support for both abstract data(that is, programming with abstract data types), and for domain-specific operating systems. …

The advantage of the compiler: the program is compiled once and no additional transformations are required each time it is executed. Accordingly, a compiler is not required on the target machine for which the program is compiled. Disadvantage: A separate compilation step slows down writing and debugging and makes it difficult to run small, simple, or one-off programs.

If the source language is an assembly language (a low-level language close to machine language), then the compiler of such a language is called assembler.

The opposite method of implementation is when the program is executed using interpreter no broadcast at all. The interpreter software models a machine whose fetch-execute cycle operates on instructions in high-level languages, rather than on machine instructions. This software simulation creates a virtual machine that implements the language. This approach is called pure interpretation. Pure interpretation is usually used for languages with a simple structure (for example, APL or Lisp). Command line interpreters process commands in scripts in UNIX or in batch files (.bat) in MS-DOS, also usually in pure interpretation mode.

The advantage of a pure interpreter: the absence of intermediate actions for translation simplifies the implementation of the interpreter and makes it more convenient to use, including in dialog mode. The disadvantage is that an interpreter must be present on the target machine where the program is to be executed. And the property of a pure interpreter, that errors in the interpreted program are detected only when an attempt is made to execute a command (or line) with an error, can be considered both a disadvantage and an advantage.

There are compromises between compilation and pure interpretation in the implementation of programming languages, when the interpreter, before executing the program, translates it into an intermediate language (for example, into bytecode or p-code), more convenient for interpretation (that is, we are talking about an interpreter with a built-in translator) . This method is called mixed implementation. An example of a mixed language implementation is Perl. This approach combines both the advantages of a compiler and interpreter (greater execution speed and ease of use) and disadvantages (additional resources are required to translate and store a program in an intermediate language; an interpreter must be provided to execute the program on the target machine). Also, as with the compiler, a mixed implementation requires that before executing source did not contain errors (lexical, syntactic and semantic).

As computer resources increase and heterogeneous networks (including the Internet) connecting computers expand different types and architectures, a new type of interpretation has emerged, in which the source (or intermediate) code is compiled into machine code directly at runtime, “on the fly.” Already compiled sections of code are cached so that when they are accessed again, they immediately receive control, without recompilation. This approach is called dynamic compilation.

The advantage of dynamic compilation is that the speed of program interpretation becomes comparable to the speed of program execution in conventional compiled languages, while the program itself is stored and distributed in a single form, independent of target platforms. The disadvantage is greater implementation complexity and greater resource requirements than in the case of simple compilers or pure interpreters.

This method works well for

Programs, like people, require a translator, or translator, to translate from one language to another.

Basic Concepts

The program is a linguistic representation of the calculations: i → P → P(i). An interpreter is a program whose input is a program P and some input data x. It performs P on x: I(P, x) = P(x). The fact that there is only one translator capable of doing everything possible programs(which can be represented in a formal system) is Turing's very profound and significant discovery.

The processor is an interpreter of machine language programs. In general, it is too expensive to write interpreters for high-level languages, so they are translated into a form that is easier to interpret.

Some types of translators have very strange names:

An assembler translates assembly language programs into machine language.
The compiler translates from a high-level language to a lower-level language.

A translator is a program that takes as input a program in some language S and outputs a program in language T such that both have the same semantics: P → X → Q. That is, ∀x. P(x) = Q(x).

Translating an entire program into something interpreted is called compilation before execution, or AOT compilation. AOT compilers can be used sequentially, the last of which is often an assembler, for example:

Source code → Compiler (translator) → Assembly code → Assembler (translator) → Machine code → CPU (interpreter).

Online or dynamic compilation occurs when part of a program is translated while other previously compiled parts are executed. Just-in-time translators remember what they've already done so they don't have to repeat the source code over and over again. They can even perform adaptive compilation and recompilation based on the behavior of the program's runtime environment.

Many languages allow code to be executed at broadcast time and compiled new code during program execution.

Broadcast stages

The translation consists of the stages of analysis and synthesis:

Source code → Analyzer → Conceptual view → Generator (synthesizer) → Target code.

This is due to the following reasons:

Any other method is not suitable. Word-by-word translation simply doesn't work.
A good engineering solution: if you need to write translators for M source languages and N target languages, you only need to write M + N simple programs (semi-compilers), and not M × N complex programs (full translators).

However, in practice, a conceptual representation is very rarely expressive and powerful enough to cover all conceivable source and target languages. Although some were able to get closer to this.

Real compilers go through many stages. When you create your own compiler, you don't have to repeat all the hard work that people have already done when creating views and generators. You can translate your language directly into JavaScript or C and use existing JavaScript engines and C compilers to do the rest. You can also use existing intermediate views and

Translator recording

A translator is a program or technical means, which involves three languages: source, target and base. They can be written in T-shape with source on the left, target on the right and base below.

There are three types of compilers:

A translator is a self-compiler if its source language matches the base one.
A compiler whose target language is equal to the base language is called self-resident.
A translator is a cross-compiler if its target and base languages are different.

Why is it important?

Even if you never make a real compiler, it's good to know about the technology behind it, because the concepts used for it are used everywhere, for example in:

text formatting;
to databases;
advanced computer architectures;
generalized;
graphical interfaces;
scripting languages;
controllers;
virtual machines;
machine translations.

Additionally, if you want to write preprocessors, assemblers, loaders, debuggers, or profilers, you must go through the same steps as when writing a compiler.

You can also learn how to write programs better, since creating a translator for a language means better understanding its intricacies and ambiguities. Studying general principles translation also allows you to become a good language designer. Does it really matter how cool a language is if it can't be implemented efficiently?

Comprehensive technology

Compiler technology covers many different areas of computer science:

formal theory of language: grammar, parsing, computability;
computer architecture: instruction sets, RISC or CISC, pipelining, cores, clock cycles, etc.;
programming language concepts: e.g. sequence control, conditional execution, iterations, recursions, functional decomposition, modularity, synchronization, metaprogramming, scope, constants, subtypes, templates, output type, prototypes, annotations, streams, monads, mailboxes, continuations, wildcards, regular expressions, transactional memory, inheritance, polymorphism, parameter modes, etc.;
abstract languages and virtual machines;
algorithms and regular expressions, parsing algorithms, graphical algorithms, training;
programming languages: syntax, semantics (static and dynamic), support for paradigms (structural, OOP, functional, logical, stack, parallelism, metaprogramming);
software creation (compilers are usually large and complex): localization, caching, componentization, APIs, reuse, synchronization.

Compiler design

Some problems that arise when developing a real translator:

Problems with the source language. Is it easy to compile? Is there a preprocessor? How are types processed? Are there libraries?
Compiler pass grouping: single or multi-pass?
The degree of optimization desired. A fast and dirty broadcast of a program with little or no optimization may be normal. Over-optimization will slow down the compiler, but better code at runtime may be worth it.
Required degree of error detection. Can the translator just stop at the first error? When should he stop? Should the compiler be trusted to correct errors?
Availability of tools. Unless the source language is very small, a scanner and parser generator are a must. There are also code generators, but they are not as common.
Type of target code to generate. You should choose from pure, augmented, or virtual machine code. Or simply write an input part that creates popular intermediate views such as LLVM, RTL, or JVM. Or do a translation from source to source code in C or JavaScript.
Target code format. You can select a portable memory image.
Retargeting. With multiple generators, it is good to have a common input part. For the same reason, it is better to have one generator for many input parts.

Compiler architecture: components

These are the main functional components of the translator that generates machine code (if the output program is a C program or a virtual machine, then not many steps will be required):

The input program (a stream of characters) enters a scanner (lexical analyzer), which converts it into a stream of tokens.
The parser (parser) builds an abstract syntax tree from them.
The semantic analyzer decomposes semantic information and checks the tree nodes for errors. As a result, a semantic graph is built - an abstract syntactic tree with additional properties and established links.
The intermediate code generator builds a flow graph (tuples are grouped into main blocks).
The machine-independent code optimizer performs both local (within the base block) and global (across all blocks) optimization, mainly remaining within the framework of subroutines. Reduces redundant code and simplifies calculations. The result is a modified flow graph.
The target code generator chains the basic blocks into straightforward control transfer code, creating an assembly language object file with (possibly inefficient) virtual registers.
The machine-dependent linker optimizer allocates memory between registers and performs instruction scheduling. Converts an assembler program into real assembler with good use conveyor processing.

In addition, error detection subsystems and a symbol table manager are used.

Lexical analysis (scanning)

The scanner converts a stream of source code characters into a stream of tokens, removing spaces, comments and expanding macros.

Scanners often run into issues such as whether or not they respect case, indentation, line breaks, and nested comments.

Errors that may be encountered during scanning are called lexical errors and include:

characters missing from the alphabet;
exceeding the number of characters in a word or line;
not a private character or string literal;
end of file in comment.

Parsing (parsing)

The parser converts a sequence of tokens into an abstract syntax tree. Each tree node is stored as an object with named fields, many of which are themselves tree nodes. There are no cycles at this stage. When creating a parser, you need to pay attention to the complexity level of the grammar (LL or LR) and find out if there are any disambiguation rules. Some languages actually require semantic analysis.

Errors encountered at this stage are called syntactic errors. For example:

k = 5 * (7 - y;
j = /5;
56 = x * 4.

Semantic analysis

During execution, it is necessary to check validity rules and link parts of the syntax tree (resolving name references, inserting operations for implicit type casting, etc.) to form a semantic graph.

Obviously, the set of admissibility rules varies from language to language. If Java-like languages are compiled, translators can find:

multiple declarations of a variable within its scope;
references to a variable before its declaration;
references to undeclared name;
violation of accessibility rules;
too many or insufficient number of arguments when calling a method;
type mismatch.

Generation

Intermediate code generation produces a flow graph composed of tuples grouped into basic blocks.

Code generation produces real machine code. In traditional compilers for RISC machines, the first stage is to create an assembler with infinite number virtual registers. For CISC machines this probably won't happen.

SECTION 7. Translation, compilation and interpretation

A program is a sequence of instructions designed to be executed by a computer. Currently, programs are formatted as text, which is written to files. This text is the result of the programmer’s activities and, despite the specifics of the formal language, remains program for a programmer.

The process of creating a program involves several stages. The program design stage is followed by the programming stage. At this stage the program is written. For programmers, this text is easier to understand than binary code because the various mnemonics and names contain additional information.

File with original text program (also called the source module) is processed translator , which translates a program from a programming language into a machine-readable sequence of codes.

Translator - program or technical tool that performs broadcast of the program. A machine program that translates from one language to another and, in particular, from one programming language to another. A processing program designed to transform a source program into an object module.

The translator usually also diagnoses errors, creates dictionaries of identifiers, produces program texts for printing, etc.

Broadcast of the program - transformation of a program presented in one of the programming languages into a program in another language and, in a certain sense, equivalent to the first.

The language in which the input program is presented is called original language, and the program itself - source code. The output language is called target language or object code.

Types of translators

Translators are divided into:

· Address. A functional device that converts a virtual address Virtual address) to a real address (English) Memory address).

· Dialog. Provides the use of a programming language in time-sharing mode.

· Multi-pass. Forms an object module over several views of the source program.

· Back. Same as detranslator. See also: decompiler, disassembler.

· Single pass. Forms an object module in one sequential viewing of the source program.

· Optimizing. Performs code optimization in the generated object module.

· Syntactic-oriented (syntactic-driven). Receives as input a description of the syntax and semantics of the language and text in the described language, which is translated in accordance with the given description.

· Test. A set of assembly language macro commands that allow you to set various debugging procedures in programs written in assembly language.

Translators are implemented in the form compilers or interpreters . In terms of doing the work, the compiler and interpreter are significantly different.

Compiler(English) compiler- compiler, collector) - a translator that converts a program written in the source language into an object module. A program that translates program text in a high-level language into an equivalent machine language program.

· A program designed to translate high-level language into absolute code or, sometimes, into assembly language. The input information to the compiler (source code) is a description of the algorithm or program in a problem-oriented language, and the output of the compiler is an equivalent description of the algorithm in a machine-oriented language (object code).

Compilation-translation of a program written in the source language into an object module. Carried out by the compiler.

Compile - translate a machine program from a problem-oriented language to a machine-oriented language.

The compiler reads the entire program entirely, translates it and creates a complete version of the program in machine language, which is then executed.

Interpreter(English) interpreter- interpreter, interpreter) translates and executes the program line by line. The interpreter takes the next language operator from the program text, analyzes its structure and then immediately executes it (usually after analysis, the operator is translated into some intermediate representation or even machine code for more efficient further execution). Only after the current statement has been successfully executed will the interpreter move on to the next one. Moreover, if the same operator is executed in the program many times, the interpreter will execute it as if it were encountered for the first time. As a result, programs that require implementation large volume calculations will be slow. In addition, to run a program on another computer, there must also be an interpreter - after all, without it, the text is just a set of characters.

In another way, we can say that the interpreter simulates some computational virtual machine, for which basic instructions It is not the elementary processor commands that serve, but the programming language operators.

Differences between compilation and interpretation.

1. Once a program is compiled, neither the source program nor the compiler are needed anymore. At the same time, the program processed by the interpreter must again transfer into machine language each time the program is launched.

2. Compiled programs run faster, but interpreted ones are easier to fix and change.

3. Each specific language is oriented either towards compilation or interpretation - depending on the purpose for which it was created. For example, Pascal usually used to solve rather complex problems in which program speed is important. Therefore, this language is usually implemented using compiler.

On the other side, BASIC was created as a language for novice programmers, for whom line-by-line execution of a program has undeniable advantages.

Almost all low-level and third-generation programming languages, such as assembly, C, or Modula-2, are compiled, while higher-level languages, such as Python or SQL, are interpreted.

Sometimes for one language there is and compiler, and interpreter. In this case, you can use an interpreter to develop and test the program, and then compile the debugged program to improve its execution speed. There is an interpenetration of translation and interpretation processes: interpreters can be compiling (including dynamic compilation), and translators may require interpretation for metaprogramming constructs (for example, for macros in assembly language, conditional compilation in C, or for templates in C++).

4. Translation and interpretation are different processes: translation deals with the translation of programs from one language to another, and interpretation is responsible for the execution of programs. However, since the purpose of translation is usually to prepare the program for interpretation, these processes are usually considered together.

Conclusion: The drawback of the compiler is the laboriousness of translating programming languages focused on processing data of complex structures, often unknown in advance or dynamically changing while the program is running. Then you have to insert a lot of additional checks, analyze the availability of operating system resources, dynamically seize and release them, form and process complex objects in computer memory, which is quite difficult to implement at the level of hard-coded machine instructions, and is almost impossible for the task.

With the help of an interpreter, on the contrary, it is possible to stop the program at any time, examine the contents of memory, organize a dialogue with the user, perform arbitrarily complex transformations, and at the same time constantly monitor the state of the surrounding software and hardware environment, thereby achieving high reliability of operation. When executing each statement, the interpreter checks many characteristics of the operating system and, if necessary, informs the developer in as much detail as possible about emerging problems. In addition, the interpreter is very convenient to use as a tool for learning programming, as it allows you to understand the principles of operation of any individual operator in the language.

The compilation process is divided into several stages:

1. Preprocessor. The source program is processed by substituting existing macros and header files.

2. Lexical and syntactic analysis. The program is converted into a chain of tokens and then into an internal tree representation.

3. Global optimization. The internal representation of the program is repeatedly transformed in order to reduce the size and execution time of the program.

4. Code generation. The internal representation is converted into processor instruction blocks, which are converted into assembly text or object code.

5. Assembly. If assembly text is generated, it is assembled to obtain object code.

6. Assembly. The assembler concatenates multiple object files into executable file or library.

On phase lexical analysis (LA) the input program, which is a stream of characters, is divided into lexemes - words in accordance with the definitions of the language. The main formalism underlying the implementation of lexical analyzers is finite state machines and regular expressions. The lexical analyzer can operate in two main modes: either as a subroutine called by the parser after each token, or as a complete pass, the result of which is a file of tokens. In the process of isolating lexemes, the LA can both independently build tables of names and constants, and provide values for each lexeme the next time it is accessed. In this case, the table of names is built in subsequent phases (for example, during the parsing process).

At the LA stage, some (simple) errors are detected (invalid characters, incorrect recording of numbers, identifiers, etc.).

Let's take a closer look at the stage of lexical analysis.

The main task of lexical analysis - split the input text, consisting of a sequence of single characters, into a sequence of words, or lexemes, i.e. select these words from a continuous sequence of characters. From this point of view, all characters of the input sequence are divided into characters that belong to some lexemes, and characters that separate lexemes (delimiters). In some cases, there may be no separators between tokens. On the other hand, in some languages, tokens may contain insignificant characters (for example, the space character in Fortran). In C, the separating value of delimiter characters can be blocked ("\" at the end of a line inside a "...").

Typically, all lexemes are divided into classes. Examples of such classes are numbers (integer, octal, hexadecimal, real, etc.), identifiers, strings. Keywords and punctuation symbols (sometimes called delimiter characters) are highlighted separately. Typically, keywords are some finite subset of identifiers. In some languages (for example, PL/1), the meaning of a lexeme may depend on its context and it is impossible to carry out lexical analysis in isolation from syntactic analysis.

From the point of view of further phases of analysis, the lexical analyzer produces information of two types: for a syntactic analyzer, working after the lexical one, information about the sequence of classes of lexemes, delimiters and keywords is essential, and for contextual analysis, working after the syntactic one, information about the specific meanings of individual lexemes (identifiers, numbers, etc.).

Thus, the general scheme of operation of the lexical analyzer is as follows. First, a single token is extracted (perhaps using delimiter characters). Keywords are recognized either by explicit extraction directly from the text, or by first extracting an identifier and then checking whether it belongs to a set of keywords.

If the selected lexeme is a delimiter, then it (more precisely, some of its attribute) is issued as a result of lexical analysis. If the selected lexeme is a keyword, then a sign of the corresponding keyword. If the selected token is an identifier, the identifier attribute is issued, and the identifier itself is stored separately. Finally, if the selected token belongs to any of the other token classes (for example, the token is a number, a string, etc.), then an attribute of the corresponding class is returned, and the value of the token is stored separately.

The lexical analyzer can be either an independent translation phase or a subroutine that works on the “give a token” principle. In the first case (Fig. 3.1, a) the output of the analyzer is a lexeme file, in the second (Fig. 3.1, b) the lexeme is issued each time the analyzer is accessed (in this case, as a rule, the lexeme class attribute is returned as a result of the “lexical analyzer” function , and the value of the token is passed through a global variable). In terms of processing token values, the parser can either simply output the value of each token, in which case the construction of object tables (identifiers, strings, numbers, etc.) is deferred to later phases, or it can construct the object tables itself. In this case, the value of the token is a pointer to the entry into the corresponding table.

Rice. 3.1:

The operation of the lexical analyzer is specified by some finite state machine. However, a direct description of a finite state machine is inconvenient from a practical point of view. Therefore, to specify a lexical analyzer, as a rule, either a regular expression or a right-hand grammar is used. All three formalisms (finite state machines, regular expressions and right-linear grammars) have the same expressive power. In particular, according to regular expression or right-linear grammar can be constructed state machine, recognizing the same language.

The main task of parsing - analysis of the program structure. As a rule, structure is understood as a tree corresponding to parsing in the context-free grammar of the language. Currently, either LL(1) analysis (and its variant - recursive descent), or LR(1) analysis and its variants (LR(0), SLR(1), LALR(1) and others) are most often used. . Recursive descent is more often used when manually programming a parser, LR(1) - when using automation systems for building parsers.

The result of the parsing is a syntax tree with links to a table of names. The parsing process also reveals errors related to the program structure.

At the stage of contextual analysis dependencies are identified between parts of the program that cannot be described by context-free syntax. It's basically connections" description - use", in particular, analysis of object types, analysis of scopes, parameter correspondence, labels and others. In the process of contextual analysis, a table of symbols is built, which can be considered as a table of names, supplemented with information about the descriptions (properties) of objects.

The main formalism used in contextual analysis is attribute grammars. The result of the context analysis phase is an attributed program tree. Information about objects can be either dispersed in the tree itself or concentrated in separate tables characters. During the context analysis process, errors related to the incorrect use of objects can also be detected.

Then the program can be transferred to internal representation . This is done for optimization purposes and/or ease of code generation. Another purpose of converting a program into an internal representation is the desire to have portable compiler. Then only the last phase (code generation) is machine dependent. Prefix or postfix notation, directed graph, triples, quadruples and others can be used as an internal representation.

There may be several optimization phases . Optimizations usually divided into machine-dependent and machine-independent, local and global. Some machine-dependent optimization is performed during the code generation phase. Global optimization tries to take into account the structure of the entire program, local - only its small fragments. Global optimization is based on global flow analysis, which is performed on the program graph and essentially represents a transformation of this graph. In this case, such properties of the program as interprocedural analysis, intermodular analysis, analysis of areas of the life of variables, etc. can be taken into account.

Finally, code generation- the last phase of the broadcast. The result is either an assembler module or an object (or load) module. During the code generation process, some local optimizations may be performed, such as register allocation, selection of long or short branches, and consideration of instruction costs when choosing a particular instruction sequence. Various techniques have been developed for code generation such as decision tables, pattern matching including dynamic programming, various syntactic techniques.

Of course, certain phases of the translator can either be completely absent or combined. In the simplest case of a one-pass translator, there is no explicit phase of generating an intermediate representation and optimization; the remaining phases are combined into one, and there is no explicitly constructed syntax tree.