Types of translators. chains consistent with situation A:_b, for any rule A:b. The purpose of this course work is to develop an educational translator from a given simplified high-level text language

Translator (English translator - translator) is a translator program. It converts a program written in one of the languages high level, into a program consisting of machine instructions. The translator usually also diagnoses errors, creates dictionaries of identifiers, produces program texts for printing, etc. The language in which the input program is presented is called the source language, and the program itself is called the source code. The output language is called the target language or object code.

In general, the concept of translation applies not only to programming languages, but also to other languages - both formal computer languages (like markup languages like HTML) and natural ones (Russian, English, etc.).

Types of translators

Dialog. Provides the use of a programming language in time-sharing mode.

Syntactically-oriented (syntactically-driven). Receives as input a description of the syntax and semantics of the language and text in the described language, which is translated in accordance with the given description.

Single pass. Forms an object module in one sequential viewing of the source program.

Multi-pass. Forms an object module over several views of the source program.

Optimizing. Performs code optimization in the generated object module.

Test. A set of assembly language macro commands that allow you to set various debugging procedures in programs written in assembly language.

Back. For a program in machine code, it produces an equivalent program in any programming language (see: disassembler, decompiler).

Translators are implemented as compilers or interpreters. In terms of doing the work, the compiler and interpreter are significantly different.

Compiler (English compiler - compiler, collector) reads the entire program, translates it and creates a complete version of the program in machine language, which is then executed. The input information to the compiler (source code) is a description of the algorithm or program in a problem-oriented language, and the output of the compiler is an equivalent description of the algorithm in a machine-oriented language (object code).

Types of compilers

Vectorizing. Translates source code into machine code on computers equipped with a vector processor.

Flexible. Designed in a modular manner, driven by tables and programmed in a high-level language or implemented using a compiler of compilers.

Dialog. See: dialogue translator.

Incremental. Retransmits program fragments and additions to it without recompiling the entire program.

Interpretive (step-by-step).

Sequentially performs independent compilation of each individual statement (command) of the source program.

Compiler of compilers. A translator that accepts a formal description of a programming language and generates a compiler for this language.

Debug. Eliminates certain types of syntax errors. Resident. Is constantly in random access memory

and is available for reuse by many tasks.

Self-compiling. Written in the same language from which the broadcast is carried out.

Universal. Based on a formal description of the syntax and semantics of the input language. The components of such a compiler are: the kernel, syntactic and semantic loaders. Translator (English translator - translator)

is a translator program. It converts a program written in one of the programming languages into a binary file of a program consisting of machine instructions, or directly executes the actions of the program.

Translators are implemented in the form of compilers, interpreters, preprocessors and emulators. In terms of doing the work, the compiler and interpreter are significantly different. Compiler (eng. compiler - compiler, collector)

- reads the entire program, translates it and creates a complete version of the program in machine language, that is, a binary file containing a list of machine commands. A binary file can be executable, library, object), it is executed by the operating system without the participation of a compiler. Interpreter (eng. interpreter - interpreter, translator)

Once a program is compiled, neither the source program nor the compiler are needed anymore. At the same time, the program processed by the interpreter must be re-translated into machine language each time the program is launched.

Compiled programs run faster, but interpreted ones are easier to fix and change.

Each specific language is oriented either towards compilation or interpretation - depending on the purpose for which it was created. For example, Pascal is usually used to solve rather complex problems in which program speed is important. Therefore, this language is usually implemented using a compiler.

On the other hand, BASIC was created as a language for novice programmers, for whom line-by-line execution of a program has undeniable advantages.

Sometimes there is both a compiler and an interpreter for the same language. In this case, you can use an interpreter to develop and test the program, and then compile the debugged program to improve its execution speed.

Preprocessor is a translator from one programming language to another without creating an executable file or executing a program.

Preprocessors are convenient for expanding the capabilities of a language and the convenience of programming by using, at the stage of writing a program, a more human-friendly dialect of a programming language and translating it with a preprocessor into the text of a standard programming language, which can be compiled by a standard compiler.

Emulator- software and/or hardware operating in a certain target operating system and hardware platform, designed to execute programs produced in another operating system or running on a device different from the target one hardware, but allowing the same operations to be performed in the target environment as in the simulated system.

Emulating languages include systems such as Java, .Net, Mono, in which, at the stage of creating a program, it is compiled into a special bytecode and a binary file is obtained, suitable for execution in any operating and hardware environment, and the resulting bytecode is executed performed on the target machine using a simple and fast interpreter (virtual machine).

Reassembler, disassembler- a software tool designed to decipher binary code and present it in the form of assembly text or text of another programming language, allowing you to analyze the algorithm of the source program and use the resulting text for the necessary modification of the program, for example, changing addresses external devices, access to system and network resources, identify hidden functions of binary code (for example, a computer virus or other malicious program: Trojan, worm, keylogger, etc.).

to algorithmization algorithms, data structures and programming DBMS Ya&MP 3GL 4GL 5GL technologies prog.

Did you know, What abstraction through parameterization is a programming technique that allows, using parameters, to represent a virtually unlimited set of different calculations with one program, which is an abstraction of these sets.

Translators

Since the text of a program written in Pascal is not understandable to a computer, it needs to be translated into machine language. This translation of a program from a programming language into a machine code language is called broadcast (translation - translation), but it is performed special programs —broadcasters.

There are three types of translators: interpreters, compilers and assemblers.

Interpreter is called a translator that performs operator-by-operator (instruction-by-command) processing and execution of the source program.

Compiler converts (translates) the entire program into a module in machine language, after which the program is written into the computer's memory and only then executed.

Assemblers translate a program written in assembly language (autocode) into a program in machine language.

Any translator solves the following main tasks:

Analyzes the translated program, in particular determines whether it contains syntax errors;

Generates an output program (often called an object or working program) in a computer command language (in some cases, the translator generates an output program in an intermediate language, for example, assembly language);

Allocates memory for the output program (in the simplest case, this consists of assigning each program fragment, variables, constants, arrays and other objects their own memory addresses).

Introduction to .Net and Sharp

The programmer writes a program in a language that the programmer understands, and the computer only executes programs written in machine code language. The set of tools for writing, editing and converting a program into machine code and executing it is called the development environment.

The development environment contains:

Text editor for entering and editing program text

Compiler for translating a program into machine command language

Tools for debugging and launching programs for execution

Shared libraries with reusable software elements

Help system, etc.

The .NET platform, developed by Microsoft, includes not only a multi-language development environment called Visual Studio .NET, but many other tools, such as database support, Email and etc.

The most important tasks in the development of modern software are:

Portability - ability to run on different types of computers

Security – impossibility of unauthorized actions

Reliability – failure-free operation under given conditions

Using off-the-shelf components to speed up development

Interlingual interaction – the use of several programming languages.

All these tasks are solved within the .NET platform.

To ensure portability, platform compilers translate the program not into machine code, but into the intermediate language MSIL (Microsoft Intermediate Language) or simply into IL. IL does not contain commands that depend on operating system or type of computer. The IL program is executed by the CLR (Common Language Runtime), which is already specific to each type of computer. The translation of an IL program into machine codes of a specific computer is performed by a JIT (Just In Time) compiler.

The program execution diagram on the .NET platform is shown in Fig. 1.

The compiler creates a program assembly - a file with the extension . exe or . dll, which contains the IL code. Program execution is organized by the CRL environment, which monitors the validity of operations, performs memory allocation and cleanup, and handles execution errors. This ensures the safety and reliability of programs.

The price for these advantages is a decrease in the performance of programs and the need to install .NET on the computer to execute ready-made programs.

So, .NET is a programming platform.

C# (Sea Sharp) is one of the programming languages of the .NET platform. It is included with Visual Studio - Visual Studio.NET (Versions 2008, 2010, 2012). In addition to C#, Visual Studio.NET includes Visual Basic.NET and Visual C++.

One of the reasons for Microsoft to develop a new language is to create a component-oriented language for the platform .NET Framework.

Fig.1 Program execution diagram in .NET

NET Framework consists of two parts:

First, it includes a huge library of classes that can be called from C# programs. There are a lot of classes (about several thousand). This eliminates the need to write everything yourself. Therefore, programming in C# involves writing your own code that calls classes stored in the .NET Framework as needed.

Secondly, it includes the .NET Runtime environment, which controls the launch and operation of ready-made programs.

The .NET platform is an open environment - third-party developers have created dozens of compilers for .NET for the languages Ada, COBOL, Fortran, Lisp, Oberon, Perl, Python, etc.

The .NET platform is actively developing - new versions of this platform are being released. Using the menu Project – Properties Find out the version of the .NET platform you are using.

In theory, a .NET program can run on any operating system on which .NET is installed. But in practice, the only official platform for this is the Windows operating system. However, there are unofficial .NET implementations for Unix-like Linux, Mac OS X and others (Mono is a .NET Framework project based on the free software).

The word rapier

The word rapier in English letters (transliterated) - rapira

The word rapier consists of 6 letters: a a and p r r

Meanings of the word rapier. What is a rapier?

Rapier (German Rapier, from French rapière, originally Spanish espadas roperas - literally, “sword for clothing” (that is, not for armor), distorted in French la rapiere) - a predominantly piercing edged weapon, a type of sword...

en.wikipedia.org

Rapier (German Rapier, from French rapière), a sports piercing weapon, consists of a steel elastic blade and a hilt (a protective cup-shaped guard and handle).

TSB. - 1969-1978

RAPIER (German Rapier, from French rapiere). Sports piercing weapon. Consists of a steel flexible blade and hilt (protective cup-shaped guard and handle). The blade has a rectangular cross-section, tapering towards the top...

Olympic Encyclopedia. - 2006

RAPIRA, Extended Adapted Poplan-Interpreter, Editor, Archive, is an educational and industrial programming language. Developed in the early 80s in the USSR. The rapier is a means...

Encyclopedia of Programming Languages

Rapier (SAM)

The Rapier is a surface-to-air missile system developed by the British Armed Forces for the Royal Air Force. It is in service with the armies of Australia, Great Britain, Indonesia, Singapore, Turkey, Malaysia and Switzerland.

en.wikipedia.org

Rapier combat

Combat rapier - (from the French rapiere) - piercing, piercing-cutting long-bladed X.0. with a handle, known in Europe from the 2nd half of the 17th century. Consisted of a straight flat or faceted steel blade with a pointed (for dueling R.)…

Sports rapier

SPORTS RAPIRA - a sports edged weapon, consisting of a flexible rectangular blade in cross-section and a removable handle with a round cup-shaped guard.

weapon.slovaronline.com

A sports rapier is a sports edged weapon consisting of a flexible rectangular blade and a removable handle with a round cup-shaped guard.

Petrov A. Dictionary of bladed weapons and armor

Rapier (programming language)

RAPIRA - Extended Adapted Poplan-Interpreter, Editor, Archive - procedural programming language. Developed in the early 80s in the USSR as a means of transition from more simple languages(in particular, educational language Robik)…

en.wikipedia.org

Fencing at the 1896 Summer Olympics - foil

The men's foil fencing competition at the 1896 Summer Olympics took place on April 7.

Translator, compiler, interpreter

Eight athletes from two countries took part. First they competed in two groups of four athletes...

en.wikipedia.org

Fencing at the 1900 Summer Olympics - foil

The men's foil fencing competition at the 1900 Summer Olympics took place from 14 to 19 and 21 May. 54 athletes from ten countries took part.

en.wikipedia.org

Russian language

Rapier.

Morphemic-spelling dictionary. - 2002

Fencing at the 1900 Summer Olympics - foil among the maestros

The foil fencing competition among male maestros at the 1900 Summer Olympics took place from 22 to 25 and 27 to 28 May.

59 athletes from seven countries took part.

en.wikipedia.org

Usage examples for rapier

The rapier is made in such a way that it cannot go inside; the maximum that can remain is a bruise.

Based on the markings on the handle of the rapier, operatives go to the fencing club and find out that the rapier was stolen from there a year ago.

Lecture: Software standards and licenses

Standards UNIX family. C programming language standards. System V Interface Definition (SVID). POSIX committees. X/Open, OSF and Open Group. Licenses for software and documentation.
Content

3.1. UNIX family standards

C Programming Language Standards
System V Interface Definition (SVID)
POSIX committees
X/Open, OSF and Open Group

3.2. Software and Documentation Licenses

3.1. UNIX family standards

The reason for the emergence of standards for the UNIX operating system was that it was ported to many hardware platforms. Its first versions ran on PDP hardware, but in 1976 and 1978 the system was ported to Interdata and VAX. From 1977 to 1981, two competing branches took shape: AT&T UNIX and BSD. Probably, the goals for developing standards were different. One of them is to legitimize the primacy of its version, and the other is to ensure the portability of the system and application programs between different hardware platforms. In this regard, they talk about program mobility. Such properties relate to both the source code of programs and executable programs.

The following material is presented in chronological order of appearance of the standards.

C Programming Language Standards

This standard does not apply directly to UNIX. But since C is the base for both this family and other operating systems, we will mention the standard of this programming language. It began with the publication in 1978 of the first edition of the book by B. Kernighan and D. Ritchie. This standard is often called K&R. The programmers who wrote this work worked on UNIX with Ken Thompson. Moreover, the first of them suggested the name of the system, and the second invented this programming language. The corresponding text is available on the Internet [ 45 ].

However, the industry standard for the C programming language was released in 1989 by ANSI and was named X3. 159 – 1989. This is what is written about this standard [ 46 ]:

“The standard was adopted to improve the portability of programs written in the C language between different types of OS. Thus, in addition to the syntax and semantics of the C language, the standard included recommendations for the content of the standard library. The presence of support for the ANSI C standard is indicated by the predefined symbolic name _STDC.”

In 1988, based on this programming language standard, the second edition of Kernighan and Ritchie’s book about C was released. Note that companies producing software products to develop programs in the C language, they can form their own libraries and even slightly expand the composition of other language tools.

^ System V Interface Definition (SVID)

Another direction in the development of UNIX standards is due to the fact that not only enthusiasts thought about creating “standards”. The main developers of the system, with the emergence of many “variants”, decided to publish their own documents. Thus come the standards produced by USG, the organization that has been documenting AT&T's versions of UNIX since that subsidiary was formed to create the operating system. The first document appeared in 1984 based on SVR2. It was called SVID (System V Interface Definition). A four-volume description was released after the release of SVR4. These standards were supplemented by a set test programs SVVS (System V Verification Suite). The main purpose of these tools was to enable developers to judge whether their system could qualify for the name System V [ 14 ].

Note that the situation with the SVID standard is somewhat similar to the standard of the C programming language. The book published by the authors of this programming language is one of the standards, but not the only one. Standard C, released later, is the result of collective work, has passed the stage of discussion by the general public and, apparently, can lay claim to a leading role in the list of standards. Likewise, SVVS is a set of tests that allows you to judge whether a system is worthy of the name System V, just one of the versions of UNIX. This does not take into account all the experience in developing operating systems from different manufacturers.

POSIX committees

Work on the design of UNIX standards began by a group of enthusiasts in 1980. The goal was formulated to formally define the services that operating systems provide to applications. This software interface standard became the basis of the POSIX document (Portable Operating System Interface for Computing Environment - a portable operating system interface for a computing environment) [ 14 ]. The first POSIX working group was formed in 1985 from the UNIX-oriented standards committee /usr/group, also called UniForum [ 47 ]. The name POSIX was proposed by GNU founder Richard Stallman.

Early versions of POSIX defined a set system services, necessary for the operation of application programs that are described within the interface specified for the C language (interface system calls). The ideas contained in it were used by the ANSI (American National Standards Institute) committee when creating the C language standard mentioned earlier. The initial set of functions included in the first versions was based on AT&T UNIX (version SVR4 [ 48 ]). But in the future, the specifications of the POSIX standards are separated from this particular OS. The approach to organizing a system based on many basic system functions was applied not only in UNIX (for example, Microsoft's WinAPI).

In 1988, the 1003.1 - 1988 standard was published, defining the API (Application Programming Interface). Two years later, a new version of the IEEE 1003.1 - 1990 standard was adopted. It defined general rules programming interface for both system calls and library functions. Further additions to it are approved, defining services for real-time systems, POSIX threads, etc. The POSIX 1003.2 - 1992 standard is important - defining the command interpreter and utilities.

There is a translation [ 1 ] these two groups of documents, which are called: POSIX.1 (application program interface) and POSIX.2 (command interpreter and utilities - user interface). The mentioned translation contains three chapters: basic concepts, system services and utilities. Chapter " System services" is divided into several parts, each of which groups services with similar functions. For example, in one of the sections "Basic I/O", the seventh part, devoted to directory operations, describes three functions (opendir, readdir and closedir). They are defined in four paragraphs: "Syntax", "Description", "Return Value" and "Errors".

For those who are familiar with the algorithmic programming language C, here is an example of description fragments.

Programming languages, translators, compilers and interpreters

In fact, this description gives an idea of how the "System Call Interface" is specified. In the "Syntax" section about the readdir function, the following lines are given:

#include

struct dirent *readdir(DIR *dirp);

The second paragraph (“Description”) contains the following text:

"The types and data structures used in directory definitions are defined in the dirent.h file. The internal composition of directories is implementation-defined. When read using the readdir function, an object of type struct dirent is formed, containing as a field the character array d_name, which contains the character-terminated NUL local name file.

Readdir reads the current directory element and sets the position pointer to the next element. The open directory is specified by the dirp pointer. Element containing empty names, is skipped."

And here is what is given in the “Return value” paragraph:

"Readdir, upon successful completion, returns a pointer to an object of type struct dirent containing the directory element read. The read element can be stored in static memory and is overwritten by the next such call applied to the same open directory. Calling readdir for different open directories does not overlap the read information. In If an error occurs or the end of the file is reached, a null pointer is returned."

The "Standard Errors" section states the following:

"Readdir and closedir encountered an error. Dirp is not a pointer to an open directory."

This example shows how the services provided by an application are described. The requirements for the operating system (implementation) are that it “...must support all required utilities, functions, header files ensuring the behavior specified in the standard. The _POSIX_VERSION constant has the value 200112L [ 49 ]".

In the world computer technology There is such a phrase: "POSIX programming". This can be learned from various UNIX system programming and operating systems tutorials (for example, [ 5 ]). There is a separate book with this title [ 3 ]. Note that in the preface to this book it is said that it describes "... a threefold standard... ", since it is based on latest version POSIX 2003, which is based on three standards: IEEE Std 1003.1, an Open Group technical standard, and ISO/IEC 9945.

How can you verify that a particular system complies with the POSIX standard? Formalizing such a question is not as simple as it seems at first glance. Modern versions offer 4 types of compliance (four semantic meanings of the word “compliance”: full, international, national, extended).

The documents under consideration provide lists of two types of interface tools: mandatory (if possible, it is assumed to be compact) and optional. The latter must either be processed in the prescribed manner or return a fixed ENOSYS code value indicating that the function is not implemented.

Note that the POSIX document set has been changing for many years. But the developers of new versions always try to maintain continuity with previous versions as much as possible. Something new may appear in more recent editions. For example, the 2004 document combined four parts [ 50 ]:

Base Definitions volume (XBD) – definition of terms, concepts and interfaces common to all volumes of this standard;
System Interfaces volume (XSH) – system level interfaces and their binding to the C language, which describes the mandatory interfaces between application programs and the operating system, in particular – system call specifications;
Shell and Utilities volume (XCU) – definition of standard command interpreter interfaces (the so-called POSIX shell), as well as the basic functionality of Unix utilities;
Rationale (Informative) volume (XRAT) – additional, including historical, information about the standard.

Like the first editions, the document in its main part describes the groups of services provided. Each element is described there in the following paragraphs: NAME (Name), SINOPSIS (Syntax), DISCRIPTION (Description), RETURN VALUE (Return Value), ERRORS (Errors) and finally EXAMPLE (Examples).

Modern versions of the standard define the requirements for both the operating system and application programs. Let's give an example [ 51 ].

The readdir() function must return a pointer to the structure corresponding to the next directory element. Whether directory elements named "dot" and "dot-to-dot" are returned is not specified by the standard. In this example, four outcomes are possible, and the requirement for application program is that it must be designed for any of them.

And in conclusion, we present an excerpt from the course of lectures by Sukhomlinov (“INTRODUCTION TO THE ANALYSIS OF INFORMATION TECHNOLOGIES”, Sukhomlinov V.A. Part V. Methodology and system of POSIX OSE standards), dedicated to the scope of applicability of the standards [ 52 ]:

"The scope of applicability of the POSIX OSE (Open System Environment) standards is to provide the following capabilities (also called openness properties) for developed information systems:

Application Portability at the Source Code Level, i.e. providing the ability to transfer programs and data presented in the source code of programming languages from one platform to another.
System Interoperability, i.e. supporting interconnectivity between systems.
User Portability, i.e. providing the ability for users to work on different platforms without retraining.
Adaptability to new standards (Accommodation of Standards) related to achieving the goals of open systems.
Adaptability to new information technologies (Accommodation of new System Technology) based on the universality of the classification structure of services and the independence of the model from implementation mechanisms.
Scalability of application platforms (Application Platform Scalability), reflecting the ability to transfer and reuse application software in relation to different types and configurations of application platforms.
Scalability of distributed systems (Distributed System Scalability), reflecting the ability of application software to function regardless of the development of the topology and resources of distributed systems.
Implementation Transparency, i.e. hiding the features of their implementation from users behind system interfaces.
Systematic and accurate specifications of user functional requirements (User Functional Requirements), which ensures completeness and clarity in determining user needs, including in determining the composition of applicable standards."

This allows you to solve the following problems:

integration of information systems from components from various manufacturers;
efficiency of implementations and developments, thanks to the accuracy of specifications and compliance with standard solutions that reflect the advanced scientific and technical level;
efficiency of application software transfer, thanks to the use of standardized interfaces and transparency of mechanisms for implementing system services.

The standards also formally define the following important concepts of operating systems: user; file; process; terminal; host; network node; time; linguistic and cultural environment. The formulation of such a definition is not given there, but the operations applied to them and the attributes inherent in them are introduced.

In total, there are more than three dozen elements in the list of POSIX standards. Their names traditionally begin with the letter "P", followed by a four-digit number with additional symbols.

There are also group names for the POSIX1, POSIX2, etc. standards. For example, POSIX1 is associated with standards for basic OS interfaces (P1003.1x, where x is either empty or characters from a to g; thus, there are 7 documents in this group), and POSIX3 is related to testing methods (two documents - P2003 and P2003n ).

Each computer has its own programming language - command language or machine language - and can execute programs written only in this language. In principle, any algorithm can be described using machine language, but the programming costs will be extremely high. This is due to the fact that machine language allows you to describe and process only primitive data structures - bits, bytes, words. Programming in machine codes requires excessive detail of the program and is accessible only to programmers who have a good knowledge of the structure and functioning of the computer. High-level languages (Fortran, PL/1, Pascal, C, Ada, etc.) with developed data structures and means of processing that do not depend on the language of a particular computer made it possible to overcome this difficulty.

High-level algorithmic languages enable the programmer to quite simply and conveniently describe algorithms for solving many applied problems. This description is called original program, and the high level language is input language.

Language processor is a machine language program that allows a computer to understand and execute programs in the input language. There are two main types of language processors: interpreters and translators.

Interpreter is a program that accepts a program in the input language as input and, as the input language constructs are recognized, implements them, producing the output results of calculations prescribed by the source program.

Translator is a program that accepts an original program as input and generates a program at its output that is functionally equivalent to the original, called object. An object program is written in an object language. In a particular case, machine language can serve as an object language, and in this case, the program obtained at the output of the translator can be immediately executed on a computer (interpreted). In this case, the computer is an interpreter of an object program in machine code. In general, an object language does not have to be machine or something close to it (autocode). Some object language can serve as intermediate language– a language lying between the input and machine languages.

If an intermediate language is used as an object language, then two options for constructing a translator are possible.

The first option is that for the intermediate language there is (or is being developed) another translator from the intermediate language to the machine language, and it is used as the last block of the designed translator.

The second option for building a translator using an intermediate language is to build an interpreter for intermediate language commands and use it as the last block of the translator. The advantage of interpreters is manifested in debugging and interactive translators, which ensure that the user can work in an interactive mode, up to making changes to the program without re-translating it completely.

Interpreters are also used in program emulation - execution on a technological machine of programs compiled for another (object) machine. This option, in particular, is used when debugging on a general purpose computer programs that will be executed on a specialized computer.

A translator that uses a language close to machine language (autocode or assembler) as an input language is traditionally called assembler. A translator for a high-level language is called compiler.

Significant progress has been made in compiler development in recent years. The first compilers used so-called live broadcast methods- These are predominantly heuristic methods, in which, based on general idea For each language construct, its own translation algorithm into a machine equivalent was developed. These methods were slow and unstructured.

The design methodology for modern compilers is based on compositional syntactically driven method language processing. Compositional in the sense that the process of converting a source program into an object program is implemented by the composition of functionally independent mappings with explicitly identified input and output data structures. These mappings are constructed from considering the source program as a composition of the main aspects (levels) of the description of the input language: vocabulary, syntax, semantics and pragmatics, and identifying these aspects from the source program during its compilation. Let's consider these aspects in order to obtain a simplified compiler model.

The basis of any natural or artificial language is alphabet– a set of elementary characters allowed in the language (letters, numbers and service characters). Signs can be combined into words– elementary constructions of the language, considered in the text (program) as indivisible symbols that have a certain meaning.

A word can also be a single character. For example, in the Pascal language words are identifiers, keywords, constants, and delimiters, in particular arithmetic and logical operations, parentheses, commas and other symbols. The vocabulary of a language, together with a description of the ways they are represented, constitute vocabulary language.

Words in a language are combined into more complex structures - sentences. In programming languages, the simplest sentence is an operator. Sentences are built from words and simpler sentences according to the rules of syntax. Syntax language is a description of correct sentences. Description of the meaning of sentences, i.e. meanings of words and their internal connections, is semantics language. In addition, we note that a specific program has some impact on the translator - pragmatism. Taken together, syntax, semantics and pragmatism of language form semiotics language.

Translating a program from one language to another, in general, consists of changing the alphabet, vocabulary and syntax of the program language while maintaining its semantics. The process of translating a source program into an object program is usually divided into several independent subprocesses (translation phases), which are implemented by the corresponding translator blocks. It is convenient to consider lexical analysis as the main phases of translation, parsing, semantic analysis and

object program synthesis. However, in many real compilers these phases are broken down into several subphases, and there may also be other phases (for example, object code optimization). In Fig. Figure 1.1 shows a simplified functional model of the translator.

According to this model, the input program is primarily subject to lexical processing. The purpose of lexical analysis is to translate the source program into the internal language of the compiler, in which keywords, identifiers, labels and constants are reduced to a single format and replaced with conditional codes: numeric or symbolic, which are called descriptors. Each descriptor consists of two parts: the class (type) of the token and a pointer to the memory address where information about the specific token is stored. Typically this information is organized in tables. Simultaneously with the translation of the source program into the internal language, at the stage of lexical analysis, lexical control- identifying unacceptable words in the program.

The parser takes the output of the lexical analyzer and translates the sequence of token images into the form of an intermediate program. An intermediate program is essentially a representation of the syntax tree of a program. The latter reflects the structure of the original program, i.e. order and connections between its operators. During the construction of a syntax tree, the syntactic control– identifying syntax errors in the program.

The actual output of the parser may be the sequence of commands needed to build the middleware, access directory tables, and issue a diagnostic message when required.

Rice. 1.1. Simplified functional model of the translator

Synthesis of an object program begins, as a rule, with the distribution and allocation of memory for the main program objects. Each sentence in the source program is then examined and semantically equivalent sentences in the object language are generated. The input information here is the syntax tree of the program and the output tables of the lexical analyzer - a table of identifiers, a table of constants, and others. Tree analysis allows us to identify the sequence of generated commands of an object program, and using the table of identifiers, we determine the types of commands that are valid for the values of the operands in the generated commands (for example, which commands need to be generated: fixed or floating point, etc.).

The actual generation of an object program is often preceded by semantic analysis which includes different kinds semantic processing. One of the types is checking semantic agreements in the program. Examples of such agreements: the uniqueness of the description of each identifier in the program, the definition of a variable is made before its use, etc. Semantic analysis can be performed at later phases of translation, for example, at the program optimization phase, which can also be included in the translator. The goal of optimization is to reduce the time resources or RAM resources required to execute an object program.

These are the main aspects of the translation process from high-level languages. More details on the organization of the various phases of broadcasting and related issues practical ways their mathematical descriptions are discussed below.

The translator usually also diagnoses errors, compiles identifier dictionaries, produces program texts for printing, etc.

Broadcast of the program- transformation of a program presented in one of the programming languages into a program in another language and, in a certain sense, equivalent to the first.

The language in which the input program is presented is called original language, and the program itself - source code. The output language is called target language or object code.

The concept of translation applies not only to programming languages, but also to other computer languages, such as markup languages, similar to HTML, and natural languages, such as English or Russian. However, this article is only about programming languages; for natural languages, see: Translation.

Types of translators

Address. A functional device that converts a virtual address into a real memory address.
Dialog. Provides use of a programming language in time-sharing mode.
Multipass. Forms an object module over several views of the source program.
Back. Same as detranslator. See also: decompiler, disassembler.
Single pass. Forms an object module in one sequential viewing of the source program.
Optimizing. Performs code optimization in the generated object module.
Syntactic-oriented (syntactic-driven). Receives as input a description of the syntax and semantics of the language and text in the described language, which is translated in accordance with the given description.
Test. A set of assembly language macros that allow you to set various debugging procedures in programs written in assembly language.

Implementations

The purpose of translation is to convert text from one language to another, which is understandable to the recipient of the text. In the case of translator programs, the addressee is a technical device (processor) or interpreter program.

There are a number of other examples in which the architecture of the developed series of computers was based on or strongly depended on some model of program structure. Thus, the GE/Honeywell Multics series was based on semantic model execution of programs written in the PL/1 language. In Template:Not translated B5500, B6700 ... B7800 was based on a model of a runtime program written in the extended ALGOL language. ...

The i432 processor, like these earlier architectures, is also based on a semantic model of program structure. However, unlike its predecessors, the i432 is not based on a specific programming language model. Instead, the developers' main goal was to provide direct runtime support for both abstract data(that is, programming with abstract data types), and for domain-specific operating systems. …

The advantage of the compiler: the program is compiled once and no additional transformations are required each time it is executed. Accordingly, a compiler is not required on the target machine for which the program is compiled. Disadvantage: A separate compilation step slows down writing and debugging and makes it difficult to run small, simple, or one-off programs.

In case the source language is assembly language (a low-level language close to machine language), then the compiler of such a language is called assembler.

The opposite method of implementation is when the program is executed using interpreter no broadcast at all. The interpreter software models a machine whose fetch-execute cycle operates on instructions in high-level languages, rather than on machine instructions. This software simulation creates a virtual machine that implements the language. This approach is called pure interpretation. Pure interpretation is usually used for languages with a simple structure (for example, APL or Lisp). Command line interpreters process commands in scripts in UNIX or in batch files (.bat) in MS-DOS, also usually in pure interpretation mode.

The advantage of a pure interpreter: the absence of intermediate actions for translation simplifies the implementation of the interpreter and makes it more convenient to use, including in dialog mode. The disadvantage is that an interpreter must be present on the target machine where the program is to be executed. And the property of a pure interpreter, that errors in the interpreted program are detected only when an attempt is made to execute a command (or line) with an error, can be considered both a disadvantage and an advantage.

There are compromises between compilation and pure interpretation in the implementation of programming languages, when the interpreter, before executing the program, translates it into an intermediate language (for example, into bytecode or p-code), more convenient for interpretation (that is, we are talking about an interpreter with a built-in translator) . This method is called mixed implementation. An example of a mixed language implementation is Perl. This approach combines both the advantages of a compiler and interpreter (greater execution speed and ease of use) and disadvantages (additional resources are required to translate and store a program in an intermediate language; an interpreter must be provided to execute the program on the target machine). Also, as in the case of a compiler, a mixed implementation requires that the source code be free of errors (lexical, syntactic and semantic) before execution.

With the increase in computer resources and the expansion of heterogeneous networks (including the Internet), connecting computers of different types and architectures, the new kind interpretation, in which the source (or intermediate) code is compiled into machine code directly at runtime, “on the fly”. Already compiled sections of code are cached so that when they are accessed again, they immediately receive control, without recompilation. This approach is called dynamic compilation.

The advantage of dynamic compilation is that the speed of program interpretation becomes comparable to the speed of program execution in conventional compiled languages, while the program itself is stored and distributed in a single form, independent of target platforms. The disadvantage is greater implementation complexity and greater resource requirements than in the case of simple compilers or pure interpreters.

This method works well for

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru

Introduction

1.1 Top-down analysis

1.2 Bottom-up analysis

1.2.1 LR(k) - grammars

1.2.1.1 LR(0) - grammars

1.2.2 LALR(1) - grammars

2. Translator development

2.1 Requirements analysis

2.2 Design

2.2.1 Designing a lexical analyzer

2.2.4 Software implementation of the parser

2.3 Coding

2.4 Testing

Conclusion

List of sources used

Appendix A. Listing of the translator program text

Appendix B. Test results

Appendix B. Translator program diagram

Introduction

Long gone are the days when, before writing a program, you had to understand and remember dozens of machine instructions. A modern programmer formulates his tasks in high-level programming languages and uses assembly language only in exceptional cases. As is known, algorithmic languages become available to the programmer only after the creation of translators from these languages.

Programming languages are quite different from each other in purpose, structure, semantic complexity, and implementation methods. This imposes its own specific features on the development of specific translators.

Programming languages are tools for solving problems in different subject areas, which determines the specifics of their organization and differences in purpose. Examples include languages such as Fortran, which is oriented toward scientific calculations, and C, which is intended for system programming,Prolog, which efficiently describes inference problems, Lisp, used for recursive list processing. These examples can be continued. Each subject area places its own requirements on the organization of the language itself. Therefore, we can note the variety of forms of representation of operators and expressions, the difference in the set of basic operations, the decrease in programming efficiency when solving problems not related to subject area. Linguistic differences are also reflected in the structure of translators. Lisp and Prolog are most often executed in interpretation mode due to the fact that they use dynamic generation of data types during calculations. Fortran translators are characterized by aggressive optimization of the resulting machine code, which becomes possible due to the relatively simple semantics of language constructs - in particular, due to the absence of mechanisms for alternative naming of variables through pointers or references. The presence of pointers in the C language imposes specific requirements for dynamic memory allocation.

The structure of a language characterizes the hierarchical relationships between its concepts, which are described by syntactic rules. Programming languages can differ greatly from each other in the organization of individual concepts and the relationships between them. The PL/1 programming language allows arbitrary nesting of procedures and functions, whereas in C all functions must be at the outer nesting level. The C++ language allows variables to be declared at any point in the program before its first use, while in Pascal variables must be defined in a special declaration area. Taking this even further is PL/1, which allows a variable to be declared after it has been used. Or you can omit the description altogether and use the default rules. Depending on the decision made, the translator can analyze the program in one or several passes, which affects the translation speed.

The semantics of programming languages varies widely. They differ not only in the implementation features of individual operations, but also in programming paradigms, which determine fundamental differences in program development methods. The specifics of the implementation of operations may concern both the structure of the data being processed and the rules for processing the same types of data. Languages such as PL/1 and APL support matrix and vector operations. Most languages work primarily with scalars, providing procedures and functions written by programmers for processing arrays. But even when performing the operation of adding two integers, languages such as C and Pascal can behave differently.

Along with the traditional procedural programming, also called imperative, there are such paradigms as functional programming, logic programming and object-oriented programming. The structure of concepts and objects of languages strongly depends on the chosen paradigm, which also affects the implementation of the translator.

Even the same language can be implemented in several ways. This is due to the fact that the theory of formal grammars allows various methods parsing the same sentences. In accordance with this, translators can obtain the same result (object program) from the original source text in different ways.

At the same time, all programming languages have a number of common characteristics and parameters. This commonality also determines the principles of organizing translators that are similar for all languages.

Programming languages are designed to make programming easier. Therefore, their operators and data structures are more powerful than those in machine languages.

To increase the clarity of programs, instead of numeric codes, symbolic or graphical representations language structures that are more convenient for human perception.

For any language it is defined:

- many symbols that can be used to write correct programs (alphabet), basic elements,

- many correct programs (syntax),

- the “meaning” of every correct program (semantics).

Regardless of the specifics of the language, any translator can be considered a functional converter F, providing a unique mapping from X to Y, where X is a program in the source language, Y is a program in the output language. Therefore, the translation process itself can be formally represented quite simply and clearly: Y = F(X).

Formally, each correct program X is a string of characters from some alphabet A, converted into its corresponding string Y, composed of characters from the alphabet B.

A programming language, like any complex system, is defined through a hierarchy of concepts that defines the relationships between its elements. These concepts are interconnected in accordance with syntactic rules. Each program built according to these rules has a corresponding hierarchical structure.

In this regard, the following common features can be additionally distinguished for all languages and their programs: each language must contain rules that allow generating programs corresponding to this language or recognizing the correspondence between written programs and a given language.

Another characteristic feature of all languages is their semantics. It determines the meaning of language operations and the correctness of the operands. Chains that have the same syntactic structure in different programming languages may differ in semantics (which, for example, is observed in C++, Pascal, Basic). Knowledge of the semantics of a language allows you to separate it from its syntax and use it for conversion to another language (to generate code).

The purpose of this course work is to develop an educational translator from a given simplified text language high level.

1. Methods of grammar analysis

Let's look at the basic methods of grammatical parsing.

1.1 Top-down analysis

When parsing from top to bottom, intermediate leads move along the tree in the direction from the root to the leaves. In this case, when viewing the chain from left to right, left-handed conclusions will naturally be obtained. In deterministic parsing, the problem will be which rule to apply to resolve the leftmost nonterminal.

1.1.1 LL(k) - languages and grammars

Consider the inference tree in the process of obtaining the left output of the chain. The intermediate chain in the inference process consists of a chain of terminals w, the leftmost non-terminal A, the under-inferred part x:

-S--

/ \

/ -A-x-\

/ | \

-w---u----

Figure 1

To continue parsing, it is necessary to replace the nonterminal A according to one of the rules of the form A:y. If you want the parsing to be deterministic (no returns), this rule needs to be chosen in a special way. A grammar is said to have the LL(k) property if, to select a rule, it is sufficient to consider only wAx and the first k characters of the unexamined string u. The first letter L (Left, left) refers to viewing the input chain from left to right, the second - to the left output used.

Let's define two sets of chains:

a) FIRST(x) is the set of terminal strings derived from x, shortened to k characters.

b) FOLLOW(A) - a set of terminal chains shortened to k characters, which can immediately follow A in the output chains.

A grammar has the LL(k) property if, from the existence of two chains of left inferences:

S:: wAx: wzx:: wu

S:: wAx: wtx:: wv

from the condition FIRST(u)=FIRST(v) it follows z=t.

In the case of k=1, to choose a rule for A, it is enough to know only the non-terminal A and a - the first character of the chain u:

- rule A:x should be selected if a is included in FIRST(x),

- rule A:e should be selected if a is in FOLLOW(A).

The LL(k)-property imposes quite strong restrictions on the grammar. For example, LL(2) grammar S: aS | a does not have the LL(1) property, because FIRST(aS)=FIRST(a)=a. IN in this case you can reduce the value of k using “factorization” (taking the factor out of brackets):

S: aA

A: S | e

Any LL(k)-grammar is unambiguous. A left-recursive grammar does not belong to the class LL(k) for any k. Sometimes it is possible to convert a non-LL(1) grammar into an equivalent LL(1) grammar by eliminating left recursion and factorization. However, the problem of the existence of an equivalent LL(k)-grammar for an arbitrary non-LL(k)-grammar is undecidable.

1.1.2 Recursive descent method

The recursive descent method is aimed at those cases when the compiler is programmed in one of the high-level languages, when the use of recursive procedures is allowed.

The main idea of recursive descent is that each nonterminal of the grammar has a corresponding procedure that recognizes any chain generated by this nonterminal. These procedures call each other when required.

Recursive descent can be used for any LL(1) grammar. Each non-terminal of the grammar has a corresponding procedure, which begins with a transition to the calculated label and contains code corresponding to each rule for this non-terminal. For those input symbols that belong to the selection set of this rule, the computed transition transfers control to the code that matches that rule. For the remaining input symbols, control is transferred to the error handling procedure.

The code of any rule contains operations for each character included in the right side of the rule. The operations are arranged in the order in which the symbols appear in the rule. Following the last operation, the code contains a return from the procedure.

Using recursive descent in a high-level language makes programming and debugging easier.

1.2 Bottom-up analysis

Let's consider bottom-up parsing, in which intermediate pins are moved along the tree towards the root. If you read the characters in the string from left to right, the parse tree will look like this:

-S--

/ \

/-x-\

/ | \

--w--b--u-

Figure 2

The intermediate output has the form xbu, where x is a chain of terminals and non-terminals, from which the viewed part of the terminal chain w is output, bu is the unviewed part of the terminal chain, b is the next symbol. To continue the analysis, you can either add the character b to the viewed part of the chain (perform a so-called “shift”), or select at the end of x such a chain z (x=yz) that one of the rules of the grammar B:z can be applied to z and replaced x to chain yB (perform the so-called “convolution”):

-S-- -S--

/ \ / \

/-x-b-\ /yB-\

/ | \ / | \

--w--b--u- --w--b--u-

Figure 3 - After shift Figure 4 - After convolution

If convolution is applied only to the last characters of x, then we will get the right outputs of the chain. This parsing is called LR, where the symbol L (Left, left) refers to viewing the chain from left to right, and R (Right, right) refers to the resulting outputs.

The sequence of shift and fold operations is essential. Therefore, deterministic parsing requires choosing between shift and convolution (and different convolution rules) at each moment.

1.2.1 LR(k) - grammars

If, in the process of LR parsing, it is possible to make a deterministic decision about shift/reduction, considering only the string x and the first k characters of the unseen part of the input string u (these k characters are called the advance string), the grammar is said to have the LR(k) property.

-S--

/ \

/-x-\

--w----u--

Figure 5

The difference between LL(k) and LR(k) grammars in terms of an inference tree:

-S-

/ | \

/A\

/ / \ \

-w---v---u-

Figure 6

In the case of LL(k)-grammars, the rule applied to A can be uniquely determined by w and the first k characters of vu, and in the case of LR(k)-grammars, by w,v and the first k characters of u. This non-rigorous reasoning shows that LL(k)-languages< LR(k)-языки (при k > 0).

1.2.1.1 LR(0) - grammars

Let us first consider the simplest grammars of this class - LR(0). When parsing a string in an LR(0) language, you don’t have to use the advance chain at all - the choice between shift and fold is made based on the chain x. Since during parsing it changes only from the right end, it is called a stack. Let's assume that there are no useless symbols in the grammar and the initial symbol does not appear on the right sides of the rules - then convolution to the initial symbol signals the successful completion of parsing. Let's try to describe the set of chains of terminals and non-terminals that appear on the stack during all LR parsing (in other words, all right-hand inferences from the grammar).

Let us define the following sets:

L(A:v) - left context of rule A:v - set of stack states immediately before v is folded into A during all successful LR parses. Obviously, every chain in L(A:v) ends in v. If the tail v of all such chains is cut off, then we get the set of chains occurring to the left of A during all successful right-hand inferences. Let us denote this set as L(A) - the left context of the nonterminal A.

Let us construct a grammar for the set L(A). The terminals of the new grammar will be the terminals and non-terminals of the original grammar; the non-terminals of the new grammar will be denoted by ,... - their values will be the left contexts of the nonterminals of the original grammar. If S is the initial symbol of the original grammar, then the left contexts grammar will contain the rule : e - the left context S contains an empty chain For each rule of the original grammar, for example, A: B C d E

and add rules to the new grammar:

: - L(B) includes L(A)

: B - L(C) includes L(A) B

: B C d - L(E) includes L(A) B C d

The resulting grammar has a special form (such grammars are called left-linear), therefore, sets of left-hand contexts

- regular. It follows that whether a string belongs to the left context of a nonterminal can be calculated inductively using a finite state machine, scanning the string from left to right. Let us describe this process constructively.

Let us call an LR(0)-situation a grammar rule with one marked position between the symbols of the right side of the rule. For example, for the grammar S:A; A:aAA; A:b the following LR(0)-situations exist: S:_A; S:A_; A:_aAA; A:a_AA; A:aA_A; A:aAA_; A:_b; A:b_. (position is indicated by an underscore).

We will say that the chain x is consistent with the situation A:b_c if x=ab and a belongs to L(A). (In other words, the LR output can be continued x_... = ab_...:: abc_...:: aA_...:: S_.) In these terms, L(A:b) is the set of strings consistent with situation A:b_, L(A)

- chains consistent with situation A:_b, for any rule A:b.

Let V(u) be the set of situations consistent with u. Let us show that the function V is inductive.

If the set V(u) includes the situation A:b_cd, then the situation A:bc_d belongs to V(uc). (c - terminal or non-terminal; b, d - sequences (may be empty) of terminals and non-terminals). There are no other situations of the form A:b_d, with non-empty b in V(uc). It remains to add situations of the form C:_... to V(uc), for each non-terminal C whose left context contains uc. If situation A:..._C... (C-nonterminal) belongs to the set V(uc), then uc belongs to L(C) and V(uc) includes situations of the form C:_... for all C- rules of grammar.

V(e) contains situations S:_... (S-starting character), as well as situations A:_... if the nonterminal A occurs immediately after _ in situations already included in V(e).

Finally, we are ready to define an LR(0) grammar. Let u be the contents of the stack during LR parsing, and V(u) be the set of LR(0) situations consistent with u. If V(u) contains a situation of the form A:x_ (x-sequence of terminals and non-terminals), then u belongs to L(A:x) and the convolution of x into A is allowed. If V(u) contains the situation A:..._a. .. (a-terminal), then a shift is allowed. A shift-convolution conflict is said to exist if both shift and convolution are allowed for the same string u. A convolution-reduction conflict is said to exist if convolutions according to different rules are allowed.

A grammar is called LR(0) if there are no shift-reduce or fold-reduce conflicts for all stack states during LR inference.

1.2.1.2 LR(k) - grammars

Only the state of the stack is used to decide between shifting and folding in LR(0) parsing. LR(k) parsing also takes into account the first k characters of the unseen part of the input chain (the so-called advance chain). To justify the method, you should carefully repeat the reasoning of the previous paragraph, making changes to the definitions.

We will also include an advance chain in the left context of the rules. If the right output uses the output wAu: wvu, then the pair wv,FIRSTk(u) belongs to Lk(A:v), and the pair w,FIRSTk(u) belongs to Lk(A). The set of left contexts, as in the case of LR(0), can be calculated using induction on the left chain. Let us call an LR(k)-situation a pair: a grammar rule with a marked position and an advance chain of length no more than k. We will separate the rule from the advance chain with a vertical line.

We will say that the chain x is consistent with the situation A:b_c|t if there is an LR-output: x_yz = ab_yz:: abc_z:: aA_z:: S_, and FIRSTk(z) = t. The rules for inductively calculating the set of states Vk are as follows:

Vk(e) contains situations S:_a|e for all rules S:a, where S is the starting character. For each situation A:_Ba|u from Vk(e), each rule B:b and chain x belonging to FIRSTk(au), it is necessary to add situation B:_b|x to Vk(e).

If Vк(w) includes the situation A:b_cd|u, then the situation A:bc_d|u will belong to Vk(wc). For each situation A:b_Cd|u from Vk(wc), each rule C:f and chain x belonging to FIRSTk(du) it is necessary to add situation C:_f|x to Vk(wc).

We use the constructed sets of LR(k) states to resolve the shift-convolution issue. Let u be the contents of the stack and x be the upchain. Obviously, convolution according to rule A:b can be carried out if Vk(u) contains the situation A:b_|x. Deciding whether a shift is permissible requires care if the grammar contains e-rules. In the situation A:b_c|t (c is not empty), a shift is possible if c starts from a terminal and x belongs to FIRSTk(ct). Informally speaking, you can push the leftmost symbol of the right side of the rule onto the stack, preparing the subsequent convolution. If c begins with a nonterminal (the situation looks like A:b_Cd|t), then pushing a symbol onto the stack in preparation for convolution into C is possible only if C does not generate an empty chain. For example, in the state V(e)= S:_A|e; A:_AaAb|e,a, A:_|e,a there are no permissible shifts, because When deriving terminal chains from A, at some step it is required to apply the rule A:e to the non-terminal A located at the left end of the chain.

Let us define the set EFFk(x), consisting of all elements of the set FIRSTk(x), during the derivation of which the nonterminal at the left end of x (if there is one) is not replaced by an empty chain. In these terms, a shift is permissible if in the set Vk(u) there is a situation A:b_c|t, c is not empty and x belongs to EFFk(ct).

A grammar is called an LR(k)-grammar if no LR(k) state contains two situations A:b_|u and B:c_d|v such that u belongs to EFFk(dv). Such a pair corresponds to a fold-reduce conflict if d is empty, and a shift-fold conflict if d is not empty.

In practice, LR(k) grammars are not used for k>1. There are two reasons for this. First: very big number LR(k) states. Second: for any language defined by an LR(k)-grammar, there is an LR(1)-grammar; Moreover, for any deterministic KS language there is an LR(1) grammar.

The number of LR(1) states for practically interesting grammars is also quite large. Such grammars rarely have the LR(0) property. In practice, a method intermediate between LR(0) and LR(1), known as LALR(1), is more often used.

1.2.2 LALR(1) - grammars

These two methods are based on the same idea. Let us construct a set of canonical LR(0)-states of the grammar. If this set does not contain conflicts, then the LR(0) parser can be used. Otherwise, we will try to resolve the conflicts that have arisen by considering a one-character advance chain. In other words, let's try to build an LR(1) parser with many LR(0) states.

The LALR(1) method (Look Ahead) is as follows. Let us introduce an equivalence relation on the set of LR(1) situations: we will consider two situations equivalent if they differ only in their leading chains. For example, the situations A:Aa_Ab|e and A:Aa_Ab|a are equivalent. Let's construct a canonical set of LR(1) states and combine the states consisting of a set of equivalent situations.

If the resulting set of states does not contain LR(1) conflicts, and therefore allows the construction of an LR(1) parser, then the grammar is said to have the LALR(1) property.

2. Translator development

2.1 Requirements analysis

In this course work, it is necessary to develop an educational translator in the form of an interpreter from a language defined by the corresponding formal grammar. There are four main stages in developing an interpreter:

Designing a lexical analyzer;

Design of a vending machine;

Software implementation parser;

Development of an interpretation module.

Development will be carried out using the Windows XP operating system on a personal IBM computer PC with Intel Pentium IV processor.

Based on software development trends, the C# programming language in the Visual Studio 2010 environment was chosen to implement the educational translator.

2.2 Design

2.1.1 Designing a lexical analyzer

Lexical analysis involves scanning the translated (source) program and recognizing the lexemes that make up the sentences of the source text. Tokens include, in particular, keywords, operation signs, identifiers, constants, special characters, etc.

The result of the work of a lexical analyzer (scanner) is a sequence of tokens, with each token usually represented by some fixed-length code (for example, an integer), as well as the generation of messages about syntactic (lexical) errors, if any. If the token is, for example, a keyword, then its code gives all the necessary information. In the case of, for example, an identifier, the name of the recognized identifier is additionally required, which is usually recorded in a table of identifiers, organized, as a rule, using lists. A similar table is needed for constants.

A lexeme can be described by two main features. One of them is that the lexeme belongs to a certain class (variables, constants, operations, etc.). The second attribute defines a specific element of this class.

The specific form of the symbol table (data structure) does not matter to the lexer or parser. Both of them only need to provide the ability to obtain an index that uniquely identifies, for example, a given variable and return the index value to replenish information about a given variable name in the symbol table.

Viewing the ID table performs two main functions:

a) recording a new name in the table when processing variable descriptions;

b) searching for a name previously recorded in the table.

This makes it possible to identify such erroneous situations, such as multiple descriptions of a variable and the presence of an undescribed variable.

The development of a lexical analyzer consists in part of modeling different automata to recognize identifiers, constants, reserved words, etc. If tokens of different types begin with the same character or the same sequence of characters, it may be necessary to combine their recognition.

By running the lexical analyzer, we break our program into tokens, after which each token passes a length check (a token cannot be more than 11 characters). Having successfully completed this stage, we check the correct location of the tokens (keywords var, begin, end, for, to, do, end_for). Then we analyze variable lexemes - they should not contain numbers in their description and should not be repeated. At the last stage, we check the correct spelling of lexemes (keywords, unknown identifiers). If at least one of the checks fails, the lexical analyzer prints an error.

The diagram of the lexical analyzer program is shown in Appendix B in Figure B.1.

2.2.2 Design of a vending machine

Let's define the following grammar:

G: (Vt, Va, I, R),

where Vt is a set terminal characters, Va is the set of non-terminal symbols, I is the initial state of the grammar, R is the set of grammar rules.

For this grammar, we define sets of terminal and non-terminal symbols:

Let's compose the rules for our grammar G and present them in Table 1.

Table 1 - Grammar rules

Rule No.	Left side of the rule	Right side of the rule








		f ID = EX t EX d LE n;













Continuation of table 1.
Rule No.	Left side of the rule	Right side of the rule

The designations of lexemes, the translation of lexemes into codes and a list of grammar designations are given in tables 2, 3, 4, respectively.

Table 2 - Designations of lexemes

	Token designation
	keyword “begin” (beginning of the description of actions)
	keyword “end” (end of action description)
	keyword "var" (variable description)
	keyword "read" (data input operator)
	keyword "write" (data output operator)
	keyword "for" (loop statement)
	keyword "to"
	keyword "do"
	keyword "end_case" (end of loop statement)
	variable type "integer"
	addition operation
	subtraction operation
	multiplication operation
	separator character ":"
	separator character ";"
	separator character "("
	separator character ")"
	separator character ","
	Token designation
	separator character "="

Table 3 - Translation of lexemes into codes












<цифра>
<буква>

Table 4 - List of grammar symbols

Designation	Explanation
	Program
	Description of calculations
	Description of Variables
	List of variables
	Operator
	Assignment
	Expression
	Subexpression
	Binary operations
	Unary operations
	List of assignments
	Identifier
	Constant

Let's build a deterministic bottom-up recognizer.

Consider the following relations in order to construct a deterministic bottom-up recognizer:

a) If there is a symbol of group B such that some rule of grammar includes the chain AB and there is a symbol xFIRST"(B), then we will assume that the relations x AFTER A are determined between the symbols x and A

b) If in a given grammar there is a rule B -> bAb A, BV a, b then the relation A COVER x is determined between A and x.

All our grammar remains the same, that is:

G: (Vt, Va, I, R),

and the rules of grammar G are given in Table 5.

Table 5 - Grammar rules

Rule No.	Left side of the rule	Right side of the rule








		f ID = EX t EX d LE n;?














Continuation of table 5.
Rule No.	Left side of the rule	Right side of the rule

Where? - marker for the end of the chain.

Let's define some cases:

a) The ID consists of many letters of the Latin alphabet, that is, we will assume that u = ( a, b, c, d, e, f,g, h, i,j,k, l,m, n, o, p,q,r,s, t, u, v, w, x, y, z)

b) The CO constant consists of numbers, that is, we will assume that k = (0,1,2,3,4,5,6,7,8,9)

For our grammar to be a mixed precedence strategy, the following conditions must be met:

a) Lack of e-rules

b) Are there rules under which, x AFTER A? A VERT x = ?

c) A -> bYg

and it is necessary that IN AFTER x? B VERT x = ?

that is, in the grammar they will be executed IN AFTER x or A AFTER x, where x is the predicate symbol of the chain b.

a) FIRST"(PG)=(PG?)

FIRST"(RG) = FIRST(DE) = (RG, v,:, i,;)

FIRST" (AL) = FIRST (b LE e)= (AL, b, e)

FIRST" (DE) = FIRST (v LV: i;) = (DE, v,:, i,;)

FIRST" (LV) = FIRST (ID, LV) = (LV, ID)

FIRST" (OP) =(OP, ID, CO)

FIRST" (EQ) = FIRST(ID = EX;) = (EQ, =,;)

FIRST" (EX) = (EX, SB, -)

FIRST" (BO) =(B0, +,*,-)

FIRST" (SB) = FIRST((EX)SB) ? FIRST(OP) ? FIRST(BO)=(SB, (,), OP, BO);

FIRST" (LE) = FIRST(EQ) = (LE, (,), =,;, f, t, d, n, w, r)

FIRST" (UO) = (UO,-)

FIRST" (ID)= FIRST" (u) = (u)

FIRST" (CO) = FIRST" (k) = (k)FIRST" (e) =( e)

FIRST" (b) =( b)

FIRST" (e) =( e)

FIRST" (v) =( v)

FIRST" (w) =( w)

FIRST" (r) =( r)

FIRST" (i) =( i)

FIRST" (f) =( f)

FIRST" (d) =(d)

FIRST" (n) =( n)

FIRST" (c) =( c)

FIRST" (+) =( +)

FIRST" (*) =( *)

FIRST" (-) =( -)

FIRST" (,) =(,)

FIRST" (;) =(;)

FIRST" (:) =(:)

FIRST" (=) = ( = )

FIRST" (() =( ()

FIRST" ()) =() )

FIRST" (u) = (u)

FIRST" (k) =(k)

b) TRACE `(AL) = (?)? TRACE"(PG)=(?,b,e)

NEXT ` (DE) = (?)?FIRST"(AL)= (?, b, e)

NEXT ` (LV) = (?)?FIRST"(:)= (?,:)

NEXT ` (OP) = (?)?FIRST"(SB)= (?,;,), d, t, +, -, *)

TRACK ` (EQ) = (?)?FIRST"(LE)=(?, (,),;, f, =, t, d, n,w,r )

TRACK ` (EX) = (?)?FIRST"(t)?FIRST"(d)?FIRST"(;)?FIRST"())=(?, t,d,;,))

NEXT ` (BO) = (?)?FIRST"(SB)= (?, (,), OP, BO)

NEXT ` (UO) = (?)?FIRST"(SB)= (?, (,), OP, BO)

TRACE ` (SB) = (?)? TRACE"(EX)= (?, t,d,;,), +, *, -)

TRACK ` (LE) = (?) ?FIRST"(e) ?FIRST"(n) = (?, e, n)

TRACE `(ID)= (?)? NEXT" (OP) ? FIRST" (=) =(?,;,), d, t, +, -, *, =)

TRACE `(CO) = (?)? TRACE" (OP)= (?,;,), d, t, +, -, *, =)

NEXT ` (b) =(?)?FIRST"(LE)= (?, u, =,;)

TRACE ` (e) =(?)? TRACE"(AL)= (?, b)

NEXT ` (v) =(?)?FIRST"(LV)= (?, u)

NEXT ` (w) =(?)?FIRST"(()= (?, ()

NEXT ` (r) =(?)?FIRST"(() = (?, ()

NEXT ` (i) =(?)?FIRST"(;)= (?,; )

NEXT ` (f) =(?)?FIRST"(ID) = (?, u)

NEXT ` (d) =(?)?FIRST"(LE)= (?, u, =,;)

NEXT ` (n) =(?)?FIRST"(i) = (?, i )

TRACE ` (+) =(?)? TRACE"(IN) = (?, +,*,-)

TRACE ` (-) =(?)? TRACE"(IN) = (?, +,*,-)

TRACE ` (*) =(?)? TRACE"(IN) = (?, +,*,-)

TRACK ` (;) =(?)?TRACK" (DE)?TRACK `(LE1)?TRACK" (EQ) = (?, b, e, l, u)

NEXT ` (:) =(?)?FIRST"(i)= (?, i )

NEXT ` (=) = (?)?FIRST"(EX) = (? (,), u, k, +, -, *)

NEXT ` (() =(?)?FIRST"(DE)= (?, v,:, i,;)

TRACE ` ()) =(?)? FIRST"(;) = (?,; )

TRACE ` (,) =(?)? FIRST"(LV) = (?, u)

TRACE `(u) =(?)? FIRST" (ID)= ( u, ?)

TRACE `(k) =(?)? FIRST (CO)= (?, k)

c) PG ->DE AL

AL AFTER DE = (b,e) AFTER DE = ((b DE), (e DE) )

e AFTER LE = ((e LE))

LE AFTER b = ((,), =,;, f, t, d, n, w, r) AFTER b = (((b), ()b), (=b), (;b), ( f b), (t b), (d b), (n b), (w b), (r b))

;AFTER i = ((; i))

i AFTER: = ( (i:) )

: AFTER LV = ( (: LV) )

LV AFTER v = ( (ID, v) )

LV AFTER, = ((ID,))

AFTER ID = ((,u))

LE AFTER EQ = ((,), =,;, f, t, d, n, w, r ) AFTER EQ = (((EQ), () EQ), (= EQ), (; EQ), ( f EQ), (t EQ), (d EQ), (n EQ), (w EQ), (r EQ))

LE -> r (DE);

; AFTER) = ((;)))

) AFTER DE = (((DE))

DE AFTER (= (= ((v)), (:)), (i)), (;)), (e)))

(AFTER r = (((r))

LE -> w (DE);

; AFTER) = ((;)))

) LAST DE = (((DE))

DE AFTER (= ((v)), (:)), (i)), (;)), (e)))

(AFTER w = (((w))

LE -> f ID = EX t EX d LE n;

; AFTER n = ((;n))

n AFTER LE = ( (n, LE))

LE AFTER d = ( ((,), =,;, f, t, d, n, w, r)) ? AFTER d = (((d), ()d), (;d), (f d), (t d), (d d), (n d), (w d), (r d))

d AFTER EX = ((d, EX))

EX AFTER t = (BO, -) ? AFTER t = ((BO t), (- t))

t AFTER EX = ( t EX)

EX AFTER = = ((BO, -) ? AFTER = = ((BO =), (- =))

AFTER ID = ((= ID))

ID AFTER f = ((ID f))

EQ -> ID = EX;

; AFTER EX = ((; EX )

EX AFTER = = (BO, -) ? AFTER = = ((BO =), (- =))

AFTER u = ( (=u))

SB AFTER UO = ( (,), OP, BO ) AFTER UO = (((UO), (OP UO), (BO UO) )

) AFTER EX = ( ()EX) )

EX AFTER (= (BO, -) ? AFTER (= ((BO (), (- ())

SB->SB BO SB

SB AFTER BO = ((,), OP, BO) AFTER BO = (((BO), ()BO), (OP BO), (BO BO))

BO AFTER SB = (+,*,-) AFTER SB = ((+SB), (*SB), (-SB))

ID AFTER u = ((u, u))

G) PG ->DE AL

AL COLLECTION PG = AL COLLECTION TRACE" (PG) = ((AL ?))

e COLLECTION AL = e COLLECTION TRACE"(AL)= ((eb), (e?))

=; COLLECTION TRACK"(DE) = ((;b), (;?))

LV COLLECTION LV = LV COLLECTION TRAIL" (LV) = ((LV:), (LV?))

ID COLLECTION LV = ID COLLECTION TRACK" (LV) = ((ID:), (ID ?))

; COLLAPSE LE =; COLLECTION TRACK" (LE) = ((; e), (;?), (; n))

LE -> f ID = EX t EX d LE n;

; COLLAPSE LE =; COLLECTION TRACK" (LE) = ((; e), (;?), (; n))

EQ COLLECTION LE = EQ COVER TRACE" (LE) = ((EQ e), (EQ?), (EQ n))

EQ -> ID = EX;

; COLLAPSE EQ =; COLLECTION TRACK" (EQ) = ((; (), (;)), (;;), (;f), (;?), (;=), (;t), (;d), (; n), (;w), (;r))

SB COLLECTION EX = SB COVER TRACE" (EX) = ((SB t), (SB?), (SB d), (SB)), (SB;), (SB(), (SB=), (SBf ), (SBn), (SBw), (SBr) )

) COLLECTION SB = SB COLLECTION TRACE" (SB) = (() t), ()?), () d), ())), ();))

OP COLLECTION SB = OP COLLECTION TRACE" (SB) = ((OP t), (OP ?), (OP d), (OP)), (OP;))

SB->SB BO SB

SB COLLECTION SB = SB COVER TRACE" (SB) = ((SB, t), (SBd), (SB;). (SB)), (SB+), (SB-), (SB*), (SB? ) }

COLLECTION UO = - COLLECTION TRACK" (UO) = ( (-?), (--))

COLLECTION BO = + COLLECTION TRACK" (BO) = ((++), (+?), (+*), (+-))

* COLLECTION BO = * COLLECTION TRACK" (BO) = ((*+), (*?), (**), (*-))

COLLECTION BO = - COLLECTION TRACK" (BO) = ((-+), (-?), (-*), (--))

ID COLLECTION OP = ID COVERAGE TRACE" (OP) = ((ID+), (ID?), (ID*), (ID-))

CO COVER OP = CO COVER TRACE" (OP) = ((CO+), (CO?), (CO*), (CO-), (CO;), (COd), (COt), (CO)))

ID COLLECTION ID = ID COLLECTION TRACK" (ID) = ((ID)), (ID ?), (ID k), (ID+), (ID-), (ID*), (ID=), (IDt) , (IDd)))

u COLLECTION ID = l COLLECTION TRACK" (ID) = ((u)), (u?), (uk), (u+), (u-), (u*), (u=), (ut), (ud)))

CO COVER CO = CO COVER TRACE" (CO) = (CO+), (CO?), (CO*), (CO-), (CO;), (COd), (COt), (CO)))

k COLLECTION CO = k COLLECTION TRACE" (CO) = (k+), (k?), (k*), (k-), (k;), (kd), (kt), (k)))

One found conflict situation when collapsing the rules

OP ->ID and ID -> u ID

We enter ID1 -> ID, therefore we rewrite the rule ID1 -> u ID

Therefore, we will perform convolution operations.

ID1 COLLECTION ID = ID1 COLLECTION TRACK" (ID) = ((ID1)), (ID1 ?), (ID1 k), (ID1+), (ID1-), (ID1*), (ID1=), (ID1t) , (ID1d)))

For each pair (x, A)? x AFTER A we construct a transition function that determines the action of transfer??(S 0 , x, A) = (S 0 , A)

? (S0, b, DE) = (S0, DEb)

? (S0, e, DE) = (S0, DEe)

? (S0, e, LE) = (S0, LEe)

? (S0,), b) = (S0, b))

? (S0,;, b) = (S0, b;)

? (S0, (, b) = (S0, b()

? (S0, =, b) = (S0, b=)

? (S0, f, b) = (S0, bf)

? (S0, t, b) = (S0, bt)

? (S0, d, b) = (S0, bd)

? (S0, n, b) = (S0, bn)

? (S0, w, b) = (S0, bw)

? (S0, r, b) = (S0, br)

? (S0,;, i) = (S0, i;)

? (S0, i,:) = (S0, i:)

? (S0,:LV) = (S0,LV:)

? (S0, ID, v) = (S0, vID)

? (S0,ID,) = (S0,ID)

? (S0, u) = (S0, u,)

? (S0, (, EQ)= (S0, EQ()

? (S0,), EQ)= (S0, EQ))

? (S0, =, EQ)= (S0, EQ=)

? (S0,;, EQ)= (S0, EQ;)

? (S0, f, EQ)= (S0, EQf)

? (S0, t, EQ)= (S0, EQt)

? (S0, d, EQ)= (S0, EQd)

? (S0, n, EQ)= (S0, EQn)

? (S0, w, EQ)= (S0, EQw)

? (S0, r, EQ)= (S0, EQr)

? (S0,;,)) = (S0,);)

? (S0, (, DE) = (S0, DE()

? (S0, v,)) = (S0,)v)

? (S0,;,)) = (S0,);)

? (S0, i,)) = (S0,)i)

? (S0,:,)) = (S0,):)

? (S0, e,)) = (S0,)e)

? (S0, (, r) = (S0, r()

? (S0, (, w) = (S0, w()

? (S0,;, n) = (S0, n;)

? (S0, n, LE) = (S0, LEn)

? (S0, (, d) = (S0, d()

? (S0,), d) = (S0, d))

? (S0,;, d) = (S0, d;)

? (S0, f, d) = (S0, df)

? (S0, t, d) = (S0, dt)

? (S0, d, d) = (S0, dd)

? (S0, n, d) = (S0, dn)

? (S0, w, d) = (S0, dw)

? (S0, r, d) = (S0, dr)

? (S0, d, EX) = (S0, EXd)

? (S0, BO, t) = (S0, tBO)

? (S0, -, t) = (S0, t-)

? (S0, t, EX) = (S0, EXt)

? (S0, BO, =) = (S0, =BO)

? (S0, -, =) = (S0, =-)

? (S0, =, ID) = (S0, ID=)

? (S0, ID, f) = (S0, fID)

? (S0,;, EX) = (S0, EX;)

? (S0, =, u) = (S0, u=)

? (S0, (, UO) = (S0, UO()

? (S0, OP, UO) = (S0, UO OP)

? (S0, BO, UO) = (S0, UO BO)

? (S0,), EX) = (S0, EX))

? (S0, BO, () = (S0, (BO)

? (S0, BO, -) = (S0, -BO)

? (S0, (, BO) = (S0, BO()

? (S0,),BO) = (S0,)BO)

? (S0, OP, BO) = (S0, BOOP)

? (S0, +, SB) = (S0, SB+)

? (S0, *, SB) = (S0, SB*)

? (S0, -, SB) = (S0, SB-)

? (S0, u, u) = (S0, uu)

For each pair (x,A)? And CONVERT x we build a transition function that determines the action of convolution?? * (S 0 , x, bA) = (S 0 , B), where B->bA

? * (S 0 , AL, ?) = (S 0 , PG)

? * (S 0 , e, b) = (S 0 , AL)

? * (S 0 , n, ?) = (S 0 , AL)

? * (S 0 ,;, b) = (S 0 , DE)

? * (S 0 ,;, ?) = (S 0 , DE)

? * (S 0 ,;, e) = (S 0 , DE)

? * (S 0 , LV,:) = (S 0 , LV)

? * (S 0 , LV, ?) = (S 0 , LV)

? * (S 0 , ID, ?) = (S 0 , LV)

? * (S 0 , ID, e) = (S 0 , LV)

? * (S 0 ,;, e) = (S 0 , LE)

? * (S 0 ,;, ?) = (S 0 , LE)

? * (S 0 ,;, n) = (S 0 , LE)

? * (S 0 , EQ, n) = (S 0 , LE)

? * (S 0 , EQ, e) = (S 0 , LE)

? * (S 0 , EQ, ?) = (S 0 , LE)

? * (S 0 ,;, e) = (S 0 , LE)

? * (S 0 ,;, ?) = (S 0 , LE)

? * (S 0 ,;, () = (S 0 , EQ)

? * (S 0 ,;,)) = (S 0 , EQ)

? * (S 0 ,;, f) = (S 0 , EQ)

? * (S 0 ,;, =) = (S 0 , EQ)

? * (S 0 ,;, t) = (S 0 , EQ)

? * (S 0 ,;, d) = (S 0 , EQ)

? * (S 0 ,;, n) = (S 0 , EQ)

? * (S 0 ,;, w) = (S 0 , EQ)

? * (S 0 ,;, r) = (S 0 , EQ)

? * (S 0 , SB, ?) = (S 0 , EX)

? * (S 0 , SB, d) = (S 0 , EX)

? * (S 0 , SB,)) = (S 0 , EX)

? * (S 0 , SB,;) = (S 0 , EX)

? * (S 0 , SB, w) = (S 0 , EX)

? * (S 0 , SB, r) = (S 0 , EX)

? * (S 0 , SB, f) = (S 0 , EX)

? * (S 0 , SB, =) = (S 0 , EX)

? * (S 0 , SB, t) = (S 0 , EX)

? * (S 0 , SB, ?) = (S 0 , SB)

? * (S 0 , SB, () = (S 0 , SB)

? * (S 0 , SB,)) = (S 0 , SB)

? * (S 0 , SB, u) = (S 0 , SB)

? * (S 0 , SB, k) = (S 0 , SB)

? * (S 0 , SB, +) = (S 0 , SB)

? * (S 0 , SB, -) = (S 0 , SB)

? * (S 0 , SB, *) = (S 0 , SB)

? * (S 0 , SB, e) = (S 0 , SB)

? * (S 0 ,), t) = (S 0 , SB)

? * (S 0 ,), ?) = (S 0 , SB)

? * (S 0 ,), t) = (S 0 , SB)

(S 0 ,),)) = (S 0 , SB)

? * (S 0 ,),;) = (S 0 , SB)

? * (S 0 , -, ?) = (S 0 , UO)

? * (S 0 , -, -) = (S 0 , UO)

? * (S 0 , +, +) = (S 0 , BO)

? * (S 0 , +, ?) = (S 0 , BO)

? * (S 0 , +, *) = (S 0 , BO)

? * (S 0 , -, +) = (S 0 , BO)

? * (S 0 , -, ?) = (S 0 , BO)

? * (S 0 , -, *) = (S 0 , BO)

? * (S 0 , -, -)) = (S 0 , BO)

? * (S 0 , *, +) = (S 0 , BO)

? * (S 0 , *, ?) = (S 0 , BO)

? * (S 0 , *, *) = (S 0 , BO)

? * (S 0 , *, -)) = (S 0 , BO)

? * (S 0 , u, +) = (S 0 , BO)

? * (S 0 , u, ?)= (S 0 , BO)

? * (S 0 , u, *) = (S 0 , BO)

? * (S 0 , u, -)) = (S 0 , BO)

? * (S 0 , k, +) = (S 0 , BO)

? * (S 0 , k, ?) = (S 0 , BO)

? * (S 0 , k, *) = (S 0 , BO)

? * (S 0 , k, -)) = (S 0 , BO)

? * (S 0 , CO, ?) = (S 0 , OP)

? * (S 0 , CO, +) = (S 0 , OP)

? * (S 0 , CO, *) = (S 0 , OP)

? * (S 0 , CO, -) = (S 0 , OP)

? * (S 0 , CO,;) = (S 0 , OP)

? * (S 0 , CO, d) = (S 0 , OP)

? * (S 0 , CO, t) = (S 0 , OP)

? * (S 0 , ID, -) = (S 0 , OP)

? * (S 0 , ID, *) = (S 0 , OP)

? * (S 0 , ID, ?) = (S 0 , OP)

? * (S 0 , ID, () = (S 0 , OP)

? * (S 0 , ID,)) = (S 0 , OP)

? * (S 0 , ID, u) = (S 0 , OP)

? * (S 0 , ID, k) = (S 0 , OP)

? * (S 0 , ID, -) = (S 0 , OP)

? * (S 0 , ID, +) = (S 0 , OP)

? * (S 0 , u,)) = (S 0 , I OP)

? * (S 0 , ID1, *) = (S 0 , ID)

? * (S 0 , ID1, ?) = (S 0 , ID)

? * (S 0 , ID1, () = (S 0 , ID)

? * (S 0 , ID1,)) = (S 0 , ID)

? * (S 0 , ID1, u) = (S 0 , ID)

? * (S 0 , ID1, k) = (S 0 , ID)

? * (S 0 , ID1, -) = (S 0 , ID)

? * (S 0 , ID1, +) = (S 0 , ID)

? * (S 0 , u,)) = (S 0 , ID)

? * (S 0 , u, ?) = (S 0 , BO)

? * (S 0 , u, k) = (S 0 , ID)

? * (S 0 , u, *)) = (S 0 , ID)

? * (S 0 , u, -)) = (S 0 , ID)

? * (S 0 , u, +)) = (S 0 , ID)

? * (S 0 , u, d)) = (S 0 , ID)

? * (S 0 , u, t)) = (S 0 , ID)

? * (S 0 , u, =)) = (S 0 , ID)

? * (S 0 , CO, ?) = (S 0 , CO)

? * (S 0 , CO, +) = (S 0 , CO)

? * (S 0 , CO, -) = (S 0 , CO)

? * (S 0 , CO, *) = (S 0 , CO)

? * (S 0 , CO,;) = (S 0 , CO)

? * (S 0 , CO, d) = (S 0 , CO)

? * (S 0 , CO, t) = (S 0 , CO)

? * (S 0 , CO,)) = (S 0 , CO)

? * (S 0 , k, +) = (S 0 , CO)

? * (S 0 , k, -) = (S 0 , CO)

? * (S 0 , k, *) = (S 0 , CO)

? * (S 0 , k,;) = (S 0 , CO)

?? * (S 0 , k, d) = (S 0 , CO)

? * (S 0 , k, t) = (S 0 , CO)

? * (S 0 , k,)) = (S 0 , CO)

? * (S 0 , k, () = (S 0 , CO)

2.2.3 Software implementation of the parser

The syntactic analyzer (parser) reads the lexeme file generated by the lexical analyzer, performs grammatical parsing, and issues messages about syntax errors if available, and creates an intermediate form of recording the original program. The basis for the development of a parser is the design and implementation of the corresponding magazine machine.

For bottom-up parsing for a deterministic bottom-up parser after reducing it to the right type It is required to use the functions AFTER and COVER to design a store machine with a detailed description of all transitions within the transition function.

When developing a vending machine, we built transition functions that will be the basis of the parser. All transition functions can be divided into two types:

The operation cycle of a magazine machine without reading the input symbol (empty cycle);

Tact of operation of a magazine machine with reading of the input symbol.

When implementing the lexical analyzer, we divided the program into lexemes and wrote them into a list. We then process this list in the parser. We send our program (list), the initial symbol (PG) and the bottom marker of the magazine machine (h0) to the input, after which we select required function transition and a recursive call is made.

The diagram of the parser program is shown in Appendix B in Figure B.2.

2.2.4 Development of the interpretation module

When developing an interpretation module as an intermediate form of the original program The postfix form of notation is most often used, which makes it quite easy to implement the process of executing (interpreting) the translated program.

Let's consider the basic principles of forming and executing the postfix form of writing expressions.

The basic rules for converting an infix expression into a postfix one are as follows.

The read operands are added to the postfix notation and the operations are written to the stack.

If the operation at the top of the stack has a higher (or equal) priority than the currently read operation, then the operation on the stack is added to the postfix entry, and the current operation is pushed onto the stack. Otherwise (at lower priority), only the current operation is pushed onto the stack.

The read opening parenthesis is pushed onto the stack.

After the closing brace is read, all operations up to the first opening brace are popped from the stack and added to the postfix entry, after which both the opening and closing braces are discarded, i.e. are not placed on a postfix record or on the stack.

After the entire expression is read, the remaining operations on the stack are added to the postfix entry.

Postfix notation of an expression allows it to be calculated as follows.

If the token is an operand, it is written to the stack. If the token is an operation, then the specified operation is performed on the last elements (last element) written to the stack, and those elements (element) are replaced on the stack by the result of the operation.

If the lexical and syntactic analyzes have been successfully completed, then we proceed to interpretation. First, we make sentences from the words, then we translate the expressions into postfix notation and calculate.

The interpreter operation diagram is shown in Appendix B in Figure B.3.

2.3 Coding

The program is implemented in C# language in the Visual Studio 2010 programming environment. The text of the program is presented in Appendix A.

The program contains five classes. The user interface is implemented using the MainForn class. Using the LexAnalysis class, a lexical analysis module is implemented, SynAnalysis is a syntactic analysis module, Intepreter is an interpretation module, ProgramisciJakPolska is an auxiliary class for converting expressions into reverse Polish entry(postfix).

The purpose of the procedures and functions implemented in the program is described in tables 6,7,8.

Table 6 - Purpose of procedures and functions of lexical analysis

Types of translators. chains consistent with situation A:_b, for any rule A:b. The purpose of this course work is to develop an educational translator from a given simplified high-level text language

Types of translators

Types of compilers

Translators

Introduction to .Net and Sharp

The word rapier

Meanings of the word rapier. What is a rapier?

Translator, compiler, interpreter

Usage examples for rapier

Programming languages, translators, compilers and interpreters

Types of translators

Implementations

Send your good work in the knowledge base is simple. Use the form below

1.1 Top-down analysis

1.2 Bottom-up analysis

1.2.1 LR(k) - grammars

1.2.1.1 LR(0) - grammars

1.2.2 LALR(1) - grammars

2. Translator development

2.1 Requirements analysis

2.2 Design

2.2.1 Designing a lexical analyzer

2.2.4 Software implementation of the parser

2.3 Coding

2.4 Testing

Conclusion

List of sources used

Appendix A. Listing of the translator program text

Appendix B. Test results

Appendix B. Translator program diagram

Introduction

Programming languages ​​are quite different from each other in purpose, structure, semantic complexity, and implementation methods. This imposes its own specific features on the development of specific translators.

Even the same language can be implemented in several ways. This is due to the fact that the theory of formal grammars allows various methods parsing the same sentences. In accordance with this, translators can obtain the same result (object program) from the original source text in different ways.

At the same time, all programming languages ​​have a number of common characteristics and parameters. This commonality also determines the principles of organizing translators that are similar for all languages.

Programming languages ​​are designed to make programming easier. Therefore, their operators and data structures are more powerful than those in machine languages.

To increase the clarity of programs, instead of numeric codes, symbolic or graphical representations language structures that are more convenient for human perception.

For any language it is defined:

- many symbols that can be used to write correct programs (alphabet), basic elements,

- many correct programs (syntax),

- the “meaning” of every correct program (semantics).

Formally, each correct program X is a string of characters from some alphabet A, converted into its corresponding string Y, composed of characters from the alphabet B.

In this regard, the following common features can be additionally distinguished for all languages ​​and their programs: each language must contain rules that allow generating programs corresponding to this language or recognizing the correspondence between written programs and a given language.

The purpose of this course work is to develop an educational translator from a given simplified text language high level.

1. Methods of grammar analysis

Let's look at the basic methods of grammatical parsing.

1.1 Top-down analysis

1.1.1 LL(k) - languages ​​and grammars

Consider the inference tree in the process of obtaining the left output of the chain. The intermediate chain in the inference process consists of a chain of terminals w, the leftmost non-terminal A, the under-inferred part x:

-S--

/ \

/ -A-x-\

/ | \

-w---u----

Figure 1

Let's define two sets of chains:

a) FIRST(x) is the set of terminal strings derived from x, shortened to k characters.

b) FOLLOW(A) - a set of terminal chains shortened to k characters, which can immediately follow A in the output chains.

A grammar has the LL(k) property if, from the existence of two chains of left inferences:

S:: wAx: wzx:: wu

S:: wAx: wtx:: wv

from the condition FIRST(u)=FIRST(v) it follows z=t.

In the case of k=1, to choose a rule for A, it is enough to know only the non-terminal A and a - the first character of the chain u:

- rule A:x should be selected if a is included in FIRST(x),

- rule A:e should be selected if a is in FOLLOW(A).

The LL(k)-property imposes quite strong restrictions on the grammar. For example, LL(2) grammar S: aS | a does not have the LL(1) property, because FIRST(aS)=FIRST(a)=a. IN in this case you can reduce the value of k using “factorization” (taking the factor out of brackets):

S: aA

A: S | e

1.1.2 Recursive descent method

The recursive descent method is aimed at those cases when the compiler is programmed in one of the high-level languages, when the use of recursive procedures is allowed.

The main idea of ​​recursive descent is that each nonterminal of the grammar has a corresponding procedure that recognizes any chain generated by this nonterminal. These procedures call each other when required.

The code of any rule contains operations for each character included in the right side of the rule. The operations are arranged in the order in which the symbols appear in the rule. Following the last operation, the code contains a return from the procedure.

Using recursive descent in a high-level language makes programming and debugging easier.

1.2 Bottom-up analysis

Let's consider bottom-up parsing, in which intermediate pins are moved along the tree towards the root. If you read the characters in the string from left to right, the parse tree will look like this:

-S--

/ \

/-x-\

/ | \

--w--b--u-

Figure 2

Programming languages are quite different from each other in purpose, structure, semantic complexity, and implementation methods. This imposes its own specific features on the development of specific translators.

At the same time, all programming languages have a number of common characteristics and parameters. This commonality also determines the principles of organizing translators that are similar for all languages.

Programming languages are designed to make programming easier. Therefore, their operators and data structures are more powerful than those in machine languages.

In this regard, the following common features can be additionally distinguished for all languages and their programs: each language must contain rules that allow generating programs corresponding to this language or recognizing the correspondence between written programs and a given language.

1.1.1 LL(k) - languages and grammars

The main idea of recursive descent is that each nonterminal of the grammar has a corresponding procedure that recognizes any chain generated by this nonterminal. These procedures call each other when required.