Instructions for recognizing the abbyy finereader program. How to use ABBYY FineReader? Working with the program

The history of Abbyy FineReader goes back more than 20 years. The company celebrated the anniversary of 2013 with the release of a full-fledged (compared to the Express Edition from 2009) Abbyy FineReader Pro for Mac, and a couple of months later, in February 2014, they also received their “gift” Windows users- Abbyy FineReader 12 Professional and Corporate. Let me remind you that the previous version appeared back in 2011, and two and a half years is a long time - let’s figure out how significant the changes are.

general information

System requirements for new version have not changed at all. The platform can be Windows or Windows Server starting from XP and 2003 respectively. Hardware requirements are even more modest these days: a processor of any capacity with a frequency of 1 GHz or more, random access memory at least 1 GB plus 512 MB for each computing core, etc. Only the need for disk space- now installation requires not 700, but 850 MB (plus, as before, another 700 MB for working files).

Naturally, we're talking about O minimum requirements; The full capabilities of Abbyy FineReader 12 Professional will be revealed only on relatively modern systems. In particular, let me remind you that the program can effectively parallelize processing individual pages, uses all processor cores and loads any processor almost 100%. But it’s really not greedy when it comes to RAM, and even remains 32-bit.

The installation procedure has not changed either: a minimum of questions and options. Abbyy FineReader 12 Professional still comes with Abbyy Screenshot Reader, which becomes operational only after user registration.

After this, access to technical support will also open.

Even on the basis of this modest information, we can assume that this is the result of evolution. Accordingly, in the future I will focus on describing the changes compared to the previous version, which can be divided into two main groups: working with the program (interface, auxiliary tools, ease of use) and OCR (quality and performance of the recognition itself).

Working with the program

Abbyy FineReader 12 Professional demonstrates some improvements in the user interface. This is immediately noticeable in the Tasks window, which opens by default when the program starts. It obviously imitates the concept Windows tiles 8.x and is adapted for finger control, especially since the program also supports basic gestures like scrolling and zooming. In fact, the changes affected only the “facade”, and only partly - next to the tiles there are regular controls and in the process of setting up any scenario you will have to deal with standard ones dialog boxes. Working with them with your fingers is quite problematic, especially on 8-10″ screens, which are becoming popular with Windows tablets.

It’s really not difficult to imagine that the user of such a tablet equipped with a camera might want to quickly enter some printed document “on the go.” Meanwhile, all Windows history, starting with the first edition of Tablet PC, confirms the pointlessness of adapting a standard desktop interface to touch controls. Apparently, for these purposes it is much more correct to create a special shell that corresponds to all Metro canons, but uses the same “engine”. An example of such a solution is Internet Explorer from Windows 8.x. In addition, Abbyy even has a certain backlog in the form of Abbyy FineReader Touch for Windows 8, which uses cloud service companies.

If we take our minds off touch input, then there will be more changes in this class - from the quite expected update of windows for opening/saving documents, which, among other things, provide easy access to cloud storage(if there is a corresponding agent and its folder in the system), to several more important and useful ones.

Page processing in Abbyy FineReader 12 Professional is now done in the background. This implies the absence of the former modal window with the status of operations (now this role the status line at the bottom of the screen plays) and, accordingly, the availability of access to the interface. Thus, the user has the opportunity to work with the program in parallel with the recognition process (if it is, of course, long enough), for example, copy fragments of the received text or even adjust the page layout - the latter will be queued and processed again.

Unlike the previous version, there is also no page turning during recognition or when the document is initially loaded if automatic recognition is disabled. In Abbyy FineReader 12 Professional, the document is loaded and divided into pages almost instantly, and their thumbnails are built only as you manually scroll through the left panel. Among other things, this saves computing resources, quite noticeably on large multi-page documents.

The remaining changes in this class are not so interesting, although they may be useful in some scenarios, so we will talk about them briefly.

If you do not need to process the entire document, but only quote individual passages, then you can disable all automatic operations and select the necessary fragments of any type, immediately copying them to the clipboard - while analysis and recognition will be performed on the fly.

To get a result with a simpler structure than the original, you can disable the recreation of headers, footers, and other layout elements. This can be useful, for example, when preparing e-books.

Continuing about e-books - Abbyy FineReader 12 Professional supports EPUB formats 2.0.1 and 3.0.

The conversion options to XLSX have been expanded, for example, it is now possible to clear formatting or save images.

When saving the resulting documents to PDF with text layer now you can use new technology Abbyy Precise Scan, which consists of smoothing characters on original images pages. By the way, it is available only in color mode.

The effect of her work is quite noticeable, although not always, let’s say, “academic.” However, the readability of antialiased characters should be higher in any case, and in in this example The original is really very low quality.


OCR

Now let's see what improvements have occurred in the recognition mechanisms themselves.

The developers report the next stage in improving ADRT technology, which, let me remind you, analyzes and recreates the logical structure of the document. It is declared that it has begun to work much more accurately, especially with tables, lists, and diagrams. Demonstrating this with adequate examples is not so easy, but not impossible. Here, for example, are the recognition results (with default settings) of the same page in Abbyy FineReader 11 Professional (above) and Abbyy FineReader 12 Professional (below).


The old version selected and processed only the main text block, perhaps considering the remaining elements as “garbage” due to the low quality of the original. The new one, on the contrary, correctly identified the list and tried to recreate it. The result, however, is not ideal: the fact that not all markers were recognized can, again, be attributed to the quality of the image, but the program, apparently, still did not understand that there was content in front of it, otherwise it would not have interpreted the numbers as letters. However, progress is obvious and such claims might not have been made with higher quality originals.

And here is how an “implicit” table without dividing lines is processed - Abbyy FineReader 11 Professional (above) and Abbyy FineReader 12 Professional (below).


It is clearly visible that the old version, unlike the new one, did not see a table structure here at all and was limited to a set of unrelated text blocks. Take the time to click on the images and compare the recognition results - Abbyy FineReader 12 Professional is close to ideal.

Unfortunately, this does not always happen, and already on the neighboring pages Abbyy FineReader 12 Professional showed results similar to Abbyy FineReader 11 Professional. Although it would be ADRT who should have tracked the identical “caps” and understood that in front of it was a kind of flowing table.

But it is still clearly noticeable that the updated algorithms pay attention to large quantity details than before. During testing of Abbyy FineReader 12 Professional, for example, there was even an attempt to interpret a picture with an ordered placement on it as a table text information. Much more often, the new version also tries to recreate various diagrams and diagrams based on the background image, rather than from individual graphic and text blocks.

There are several other new features designed to improve the quality of recognition in Abbyy FineReader 12 Professional. As you know, one of the prerequisites for this is the quality of the original, especially if it was obtained using a camera rather than a scanner. That is why, at one time, FineReader introduced tools for pre-processing originals. In the new version, their list has been expanded, cropping along the edges of pages, lightening and leveling the background brightness, and removing colored elements have been added. The latter can be useful, for example, for processing documents with seals and stamps. In addition, the user can now connect various methods individually.

Language support has also been improved. Firstly, a Russian alphabet with accents has appeared, and secondly, an increase in the quality of recognition of Chinese, Japanese and Korean (up to 20%), Arabic (up to 60%), and Hebrew (up to 10%) is declared - this has apparently been achieved through improvement and additional training of classifiers.

And finally, one of the most burning questions for many readers: has the speed of the program increased? It is not so easy to answer this question substantively, especially with numbers - there are too many languages, each of which has its own nuances; the variety of originals is too great; There are too many unknown factors influencing the operation of algorithms. Therefore, even the developers themselves are quite restrained when talking about an increase in the performance of Abbyy FineReader 12 Professional by 10-15%.

Such figures are usually obtained from the results of processing fairly large arrays of documents and, accordingly, represent something like “ average temperature around the hospital." Therefore, it is useful to study in more detail some illustrative special cases, for example, like the following two:

  • scanned in color with a resolution of 300 dpi 10 pages of a full-color booklet in A4 format. The quality is good, languages ​​are Russian and English, the layout is complex;
  • PDF with graphic images 138 pages of the book containing a small number of color and black and white illustrations, several tables. The quality is low (starting, apparently, with the “blind” printing in the paper book), the languages ​​are Ukrainian and Russian, the layout is simple.

Both documents were recognized in color mode, and the second one was also recognized in black and white, which was intended to simulate the preparation process e-book. All default settings were left unchanged, with the exception of the set of languages ​​and, accordingly, operating modes. A PC with an i5-3450 processor and 8 GB of memory was used as a testing ground. The results are presented in the following table:

As you can see, for PDF the speedup even exceeds the promised 15% - perhaps this is just one of the special cases that is well suited for the latest optimizations in recognition algorithms. It should be borne in mind that programs, generally speaking, have done different amounts of work. Just look at the illustrations above for table processing - it’s hard to say which version was more difficult.

As for the number of errors, it was practically the same for both versions, although it was noticeable that sometimes doubts were raised different fragments and symbols - this appears to be evidence of algorithmic training. In any case, the majority of uncertainly recognized characters were absolutely correctly identified using dictionaries, and “gross” errors (incorrect interpretation of special and decorative symbols, text on graphics, etc.) coincided. So the difference can be considered completely disappearing.

Another question is, how much does such productivity improvement matter? Apparently, the gain of half a minute on 138 pages that still need to be checked and possibly corrected is not worth much. If work like test tasks is supposed to be performed occasionally, then you definitely don’t have to worry about performance. It's a different matter when it comes to offline processing of large volumes of documents, which is available in Abbyy FineReader 12 Corporate. In this case, saving 15% of time is already quite noticeable.

Summary

Despite the fact that the new Abbyy FineReader 12 Professional did not promise anything revolutionary, at least a few changes in it deserve all the praise. First of all, these are improvements to ADRT technology in terms of recognizing tables, charts and in general logical structure pages, which in some cases allows you to get dramatically top scores, and background mode processing, which opens up new opportunities for interactive work with large documents.

There are also many other changes, although they are less significant. The movement towards support for touch control today is certainly justified, but the path chosen is a vicious one - to provide the same in one interface comfortable work It's hardly possible with a mouse and fingers. However, for now, Windows tablets are just trying to break into the market, and the developers from Abbyy still have time.

Abbyy FineReader 12 Professional prices:

  • boxed version: 4990 RUR;
  • download version: RUB 4,490;
  • update: 2690 rub.

As usual, the answer to the question “is it worth changing the old version to a new one?” depends on the situation. In any case, it is worth considering that life cycle FineReader is quite long-lasting, and if any of the described improvements plays any significant role for you, then in 2-3 years the cost of updating will certainly pay off - if not financially, then morally. Solving this question for yourself will finally help.

Hello. Today I will talk about how to use the Abbyy FineReader program to recognize text from an image that you may have received as a result of scanning. Your scanned text will be completely in the document Microsoft Word and this recognized text can be edited! Recognizing text using Abbyy Finereader can be useful for those who study, work with texts and translations. The program, unfortunately, is paid. I once had a chance to try one of free options similar programs, but very well scanned text is recognized simply terribly... And text recognition in Abbyy FineReader turns out to be very high quality! Now I will show you how to use the Abbyy FineReader program to quickly recognize text from an image.

ABBYY FineReader has trial version for 30 days with the ability to recognize up to 100 pages and save no more than 3 pages from a document. Those. During this time, you can see the capabilities of the program and make an informed decision - whether you need it, whether it’s worth buying or not.

How to install Abbyy FineReader!

Before using Abbyy Finereader you need to install it. Let's look at the installation process of this program...

First, select the program language. Click "OK".

We accept the terms license agreement(If you wish, you can read the license agreement if you are interested in what it is about). Click “Next”.

Next, you must select the installation mode. At normal mode the program will not ask you and will install what is specified in the program by default, namely all components: the Abbyy Finereader text recognition program itself, a component for Microsoft Office programs and a component for Windows Explorer (which allows you to quickly recognize images without opening the program separately) . I advise you to check custom installation to configure it the way you need. Moreover, it won’t take even 15 minutes :) Below is the folder where the program will be installed. It is advisable to leave the default selection so that there are no problems later when using the program. Click “Next”.

Program components. This window will appear if you select the “Custom” installation type. Components are something like auxiliary applications for a program. The first component “Integration with Microsoft programs Office and Windows Explorer" This component will be displayed in Microsoft menu Office and if you click on the image on your computer right click mouse, then there will be an item with this program. This is what your menu will look like in Microsoft Office after adding this component.

Here's what happens if you right-click on the image:

Those. A menu will appear in which you can do quick text recognition and send the results to Word, Excel or PDF.

The second component will allow you to recognize text from your computer screen. This means that you can take a screenshot and also recognize the text. If you do not want to install one of these components, or do not want to install both, then you need to click on the down arrow and select “This component will not be available.” Then the component will not be installed. I left both.

Next 4 points. The first means that information about how you use the Abbyy Finereader program will be transferred to the developer. I advise you not to check this item so that the program does not once again go online to send information about working with it. Moreover, you never know what other information will be sent :) The 2nd point creates a shortcut to the program on the desktop. The 3rd means that the program will start when the computer is turned on, and the 4th will check for program updates. I leave only the second one and leave a tick next to it. Closing everything Microsoft applications Office, because the installer requires it and click “Install”.

You need to wait a couple of minutes for the program to load and click “Next”.

That's it, installation is complete! Click “Finish”.

How can I use Abbyy Finereader to recognize text from a scanned or any other image?

Let's look at how to use the program. For example, you have scanned text. Now, to recognize text in Abbyy FineReader, open the program. Click “Open”.

Select the image we need and click open.

When you open the desired document, Abbyy Finereader will begin recognizing the text. How more document, the longer recognition will take. Recognition of one page may take several seconds.

After the text is recognized, all you have to do is save the result in Microsoft document Word so you can then edit anything in it. To do this, click the “Save” button on the top toolbar, then select the folder in which it will be saved. Word document and under what name.

If you have a scanner connected to your computer, then you can start scanning directly from the program, and after which the scanned document will be immediately recognized. To do this, click the “Scan” button on the top toolbar. Next steps will depend on the driver program for your printer. You only need to follow the instructions of the scanning wizard.

As you can see, everything is very simple and fast. Now you know how to use Abbyy FineReader to recognize text from images! I hope this information will help a lot of people :) Good luck!

This time I’ll tell you how to turn paper documents into electronic PDF format, as well as how to transfer a paper document to a computer in order to change the text. So, let's begin.
I have a paper document in my hands.

SCAN to PDF

Task: transfer this document to the computer (translate into electronic form). Moreover, it needs to be done exactly in this form so that it cannot be changed in the future (roughly speaking, you need to take a photo of the document). Then this electronic document must be sent by mail to an email address. Moreover, the client requests it in pdf format.

By stages:
1) I pass the document through the scanner
2) I save the resulting print in pdf format to my computer
3) I send the received file by mail
In my work, I use 2 programs to solve this problem:
Foxit Phantom or ABBYY FineReader. For clarity, I attach screenshots:
In Foxit Phantom, when the scanner is turned on, you need to select FILE-CREATE PDF FROM SCANNER in the main menu...
The scan will occur and you will be prompted to save the file. Select a location, write the file name and save.

ABBYY FineReader has huge buttons in the toolbar. One of them is called SCAN to PDF. We use it.

If you need to scan a multi-page document, then, in stages:
1) Press the button number 1 SCAN

We receive a scanned document

We also scan another page (press the button number 1 SCAN again).
2) Save as PDF



As a result, we get a finished multi-page document in the form of a PDF file.

Now this file can be sent by email.

TEXT RECOGNISING

Task: convert a paper document into electronic form (to a computer)

By stages:
1) Scan (button 1 SCAN)

2) Recognition (button 2 RECOGNIZE ALL)

Recognition should be understood as the process of translating a photograph (picture) into text (letters, numbers, signs). If you took a photo text page, then after recognizing 99% of the text from paper it will turn into electronic text. Electronic text You can already change (edit) on your computer the way you want.

3) Saving to a text editor (button 4 Save)
I advise you to select TRANSFER ALL PAGES TO MICROSOFT WORD

We get

I would like to point out important points during the RECOGNITION procedure. There are nuances when working.
Immediately after recognition, I advise you to look at the result. Especially on the blocks that the FineReader program creates.

These are areas highlighted in rectangular frames. These frames are of different colors. If it is red, then this block is recognized as a PICTURE. If it is black, then TEXT. There are blocks different types. The block type can be found by clicking on the block RIGHT key mouse and selecting CHANGE BLOCK TYPE.

A little trick: you can select an arbitrary area and label it with any type of block. For example, let’s select that part of the text that is poorly recognized using the left mouse button (click, hold and drag, the frame changes size).

As a result, the document in Word will have a block of text and a block of images. The block image will have an absolutely unchanged appearance. This method I use it when saving stamps, custom fonts, pictures, photographs.

PS: Knowledge and ability to work with PDF, scan and recognize documents very often helps out in office work. Knowledge saves your time!

Although advances issued artificial intelligence(AI) over the past 50 years have not brought “smart” machines one iota closer to the cognitive capabilities of humans; it would be unfair to completely deny successes in this direction. The most obvious and striking example is chess (not to mention more simple games). A computer cannot yet imitate our thinking, but it is quite capable of compensating for this gap with a larger volume specialized memory and search speed. Vladimir Kramnik described the game of the Deep Fritz program that defeated him in 2006 as “inhuman” in the sense that it often contradicted the established (human) rules of strategy and tactics.

And just over a year ago, another brainchild of IBM, which at one time laid the foundation for the triumphant chess victories of computers (the famous Deep Blue), called Watson, made new breakthrough, defeating two champions of the popular American quiz Jeopardy by a wide margin. It is significant, however, that although Watson independently voiced the answers, the questions were still transmitted to him in text form. This suggests that successes in many areas of AI application - speech and image recognition, machine translation - are quite modest, although this does not prevent us from using them in practice today. The greatest successes, perhaps, are demonstrated by optical character recognition systems (OCR, Optical Character Recognition), with which almost all PC users are probably familiar in one way or another. Moreover, Russian developments in this area occupy a worthy place in the world - I mean ABBYY FineReader.

A little history

The current version of ABBYY FineReader is number 11, i.e. the application has gone through quite a long development path, and even the history of this process is of some interest. Without pretending to be an exhaustive chronicle, I will give only the main milestones over the last decade, during which I more or less followed FineReader:

YearVersionMain features
2003 7.0 Increase in recognition accuracy up to 25%. This was most reflected in tables, especially complex ones, with colored cells, hidden dividers, etc.
2005 8.0 Further optimization of recognition algorithms, primarily aimed at working not with document scans, but with digital photographs. For this purpose there were additional functions preparation of originals (elimination of distortions, alignment of lines, etc.).
2007 9.0 The emergence of ADRT technology, which takes into account the logical structure of the entire processed (multi-page) document and is able to highlight repeating elements (headers and footers), connect “flowing” objects (tables), etc.
2009 10.0 Further improvement of ADRT and recognition algorithms, increasing the processing accuracy of low-resolution originals by up to 30%.
2011 11.0 The main attention is paid to the speed of the program. “Second Coming” of the black and white mode, which on good quality originals gives an additional acceleration of up to 30%.

Naturally, during the same time, FineReader expanded support for document formats, improved built-in tools and interface, improved reconstruction of the structure of originals, etc. However, the highlighted points are directly related to OCR technologies and demonstrate well the spasmodic development process characteristic of complex knowledge-intensive systems when after the next “breakthrough” there follows a certain period of “quiet”, necessary for improving new algorithms. They represent the main value of any OCR program, and therefore to some extent detailed information users rarely hear about them. However, ABBYY kindly agreed to lift the veil of secrecy, and today we have the opportunity to look into the holy of holies of FineReader.

Basic principles

So, since OCR belongs to the field of AI, it is logical that developers strive to at least to some extent imitate the activity of our brain. Of course, the structure of our visual system is incredibly complex, but the basic “large-block” principles of its functioning have been sufficiently studied; usually there are three of them:

  1. Integrity- an object is considered as a collection of its parts and (for visual images) spatial relationships between them. In turn, the parts receive interpretation only as part of the entire object. This principle helps to build and clarify hypotheses, quickly eliminating unlikely ones.
  2. Purposefulness- since any interpretation of data pursues a specific goal, recognition is a process of putting forward hypotheses about an object and purposefully testing them. A system operating in accordance with this principle will not only be more economical computing power, but also less likely to make mistakes.
  3. Adaptability- the system saves the information accumulated during operation and reuses it, i.e. it learns itself. This principle allows you to create and accumulate new knowledge and avoid repeatedly solving the same problems.

FineReader is the only OCR system in the world that operates in accordance with the principles described above at all stages of document processing. The corresponding technology is called IPA- according to the first letters of English terms. For example, according to the principle of integrity, a fragment of an image will be interpreted as a symbol only if it contains all the structural parts of similar objects, and those that are in certain relationships. This helps to replace the search of a large number of standards (in search of a more or less suitable one) with a targeted test of a reasonable number of hypotheses, relying on previously accumulated information about the possible outlines of a character in a recognized document.

However, IPA principles apply when analyzing not only fragments corresponding to (presumably) individual characters, but also the entire source image of the page. Most OCR systems are based on recognizing the hierarchical structure of a document, i.e. the page is divided into basic structural elements such as tables, images, blocks of text, which, in turn, are divided into other characteristic objects - cells, paragraphs - and so on , down to individual characters.

Such an analysis can be carried out in two main ways: top-down, i.e., from constituent elements to individual characters, or, conversely, bottom-up. Most often one of them is used, but ABBYY has developed special algorithm MDA(multilevel document analysis), which combines both. Briefly, it looks like this: the structure of the page is analyzed using a top-down method, and the reconstruction electronic document upon completion, recognition occurs from the bottom up, but at all levels there is an additional feedback mechanism. As a result, the likelihood of gross errors associated with incorrect recognition of high-level objects is sharply reduced.

ADRT

Historically, OCR systems have evolved from recognizing individual characters. This task is still the most important and most difficult; the most complex algorithms are associated with it. However, it soon became clear that higher-level information (for example, about the language of the document and the correct spelling of recognized words) could help in solving this problem - this is how contextual and dictionary checks appeared. Then the desire to preserve formatting and recreate the physical structure (i.e., the relative position various objects) document led to the need detailed analysis whole page. It is clear that this also significantly affects the overall quality of recognition, since it helps to correctly process multi-column layout, tables and other methods of “non-linear” text arrangement.

Most modern OCR operates precisely at these three levels - characters, words, pages - practicing, as already mentioned, top-down or bottom-up approaches. However, ABBYY, in accordance with the principles of IPA, introduced one more level into FineReader - a total multi-page document. First of all, this was needed to correctly reproduce the logical structure, which in modern documents is becoming more and more complex. But there are additional bonuses: increased accuracy and faster processing of repeating objects, more correct identification (and therefore recognition) of objects “flowing” from page to page.

This is exactly why it was developed ADRT(Adaptive Document Recognition Technology) - technology for document analysis and synthesis at the logical level. Ultimately, it helps to make the result of FineReader as similar to the original as possible. To do this, the image of the entire document is analyzed, and the recognized words are combined into groups (clusters) depending on the style, environment and location on the page. In this way, the program seems to see the “logic” of the document markup and can subsequently unify the design of the result.

Thanks to ADRT, FineReader, starting with version 9.0, has learned to detect, recognize and reproduce the following structural parts and document formatting elements:

  • main text;
  • headers and footers;
  • page numbers;
  • headings of the same level;
  • table of contents;
  • text inserts;
  • captions for drawings;
  • tables;
  • footnotes;
  • signature/seal zones;
  • fonts and styles.

Recognition process

In accordance with the MDA algorithm, the actual recognition begins from top to bottom, from the page level. It is clear that the more wrong decisions are made in the early stages of this process, the more there will be in the subsequent ones. This is why recognition accuracy depends so much on the quality of the originals, but their pre-processing algorithms can also have a significant impact. So, as the popularity of color documents grew in FineReader, a procedure appeared adaptive binarization(adaptive binarization, AB). If you scan a document immediately in black and white mode, where there are watermarks or the text is located on a textured or color substrate, then “garbage” will invariably appear on the image, which will then be quite difficult to separate from the “useful” image (since the original information about him is already lost). That is why FineReader prefers to work with color or grayscale images, independently converting them into black and white (this process is called binarization). But that's not all. Since the colors of the text and background can vary within the page and even within individual lines, AB identifies words with more or less the same characteristics and selects the optimal binarization parameters for each from the point of view of recognition quality. This is precisely the adaptivity of the algorithm, which is therefore an example of the use of feedback in MDA. It is clear that the effectiveness of AB strongly depends on the design of the source documents - on the ABBYY test base, this algorithm provided an increase in recognition accuracy by 14.5%.

But the most interesting, of course, begins when the recognition process descends to the most lower levels. The so-called linear division procedure splits lines into words and words into individual letters; then, in accordance with the IPA principle, it generates a set of hypotheses (i.e. possible options what kind of symbol it is, what symbols the word is divided into, etc.) and, providing each with a probability estimate, transmits it to the input of the character recognition mechanism. The latter consists of a number of so-called classifiers, each of which also generates a number of hypotheses ranked by their expected degree of probability. The most important characteristic of any classifier is the average position of the correct hypothesis. It is clear that the higher it is, the less work for subsequent algorithms - for example, dictionary checking. But for sufficiently well-established classifiers, characteristics such as recognition accuracy based on the first three hypotheses or only on the first one are most often assessed - i.e., roughly speaking, the ability to guess the correct answer in three or one attempt. ABBYY uses in its systems following types classifiers: raster, feature, feature differential, contour, structural and structural differential - which are grouped at two logical levels.

Operating principle RK, or raster classifier, is based on a pixel-by-pixel comparison of a character image with standards. The latter are formed as a result of averaging images from the training set and reduced to a certain standard form; Accordingly, the size, thickness of elements, and slope are also pre-normalized for the recognized image. This classifier is characterized by ease of implementation, speed of operation and resistance to image defects, but provides relatively low accuracy and that is why it is used at the first stage - to quickly generate a list of hypotheses.

Feature classifier ( PC), as its name suggests, is based on the presence of signs of a particular symbol in the image. If there are N such features in total, then each hypothesis can be represented by a point in N-dimensional space; accordingly, the accuracy of the hypothesis will be assessed by the distance from it to the point corresponding to the standard (which is also developed on the training sample). It is clear that the types and number of features largely determine the quality of recognition, so there are usually quite a lot of them. This classifier is also relatively fast and simple, but is not very robust to various image defects. In addition, the PC does not operate with the original image, but with a certain model, an abstraction, i.e., it does not take into account some of the information: say, the very fact of the presence of some important elements does not say anything about their relative position. For this reason, the PC is used not instead of, but together with the RK.

Contour classifier ( QC) represents special case The PC differs in that it analyzes the contours of the intended symbol, extracted from the original image. IN general case its accuracy is lower than that of a full-fledged PC.

Feature differential classifier ( MPC) is also similar to PC, but is used solely to distinguish between similar objects such as "m" and "rn". Accordingly, it analyzes only those areas where differences are hidden, and it receives as input not only the original images, but also hypotheses formed at the early stages of recognition. The principle of its operation, however, is somewhat different from a PC. At the training stage, two “clouds” (groups of points) of possible values ​​for each of the two options are formed in N-dimensional space, then a hyperplane is constructed that separates the “clouds” from each other and is approximately equidistant from them. The recognition result depends on which half-space the point corresponding to the original image falls into.

MPC itself does not put forward hypotheses, but only refines existing ones (the list of which is generally sorted using the bubble method), so that a direct assessment of its effectiveness is not carried out, but indirectly it is equated to the characteristics of the entire first level of OCR recognition. However, it is clear that it depends on the correctness of the selected features and the representativeness of the sample of standards, ensuring which is a rather labor-intensive task.

Structural differential classifier ( KFOR) was originally used for processing handwritten texts. Its task is to distinguish between similar objects such as “C” and “G”. Thus, SDK is based on features characteristic of each pair of characters, its learning process is even more complex than that of MDC, and its operating speed is lower than that of all previous classifiers.

Structural classifier ( SK) is a source of pride for ABBYY; it was originally developed for recognizing so-called handwritten text, i.e. when a person writes in “printed” letters, but was later used for printing. It is used at the final stages of recognition and comes into effect quite rarely, namely, only when at least two hypotheses with sufficiently high probabilities reach it.

The qualitative characteristics of all classifiers are collected in the following table. They, however, only allow one to evaluate the effectiveness of the algorithms relative to each other, since they are not absolute, but are obtained based on the processing of a specific test sample. It may seem that last stages recognition, the struggle is literally for a fraction of a percent, but in fact, each classifier makes a significant contribution to increasing the recognition accuracy - for example, the SC reduces the number of errors by a noticeable 20%.

RKPCQCMPC*KFOR**SK**
Accuracy for the first three options, %99,29 99,81 99,30 99,87 99,88 -
Accuracy according to the first option, %97,57 99,13 95,10 99,26 99,69 99,73

* evaluation of the entire first level of the ABBYY OCR algorithm
** evaluation for the entire algorithm after adding the appropriate classifier

It is curious, however, that despite quite high accuracy, the recognition algorithm itself does not make the final decision. In accordance with the MDA principle, hypotheses are put forward at each logical level, and their number can grow exponentially. Accordingly, sequential testing of all hypotheses is unlikely to be effective, and therefore ABBYY OCR systems use the method of structuring hypotheses, i.e., assigning them to one or another model. There are a couple dozen of the latter, here are just a few of their types: dictionary word, non-dictionary word, Arabic numerals, Roman numerals, URL, regular expression- and each one can include many specific models (for example, a word in one of the known languages, Latin, Cyrillic, etc.).

All final actions are carried out with hypotheses built using models. For example, contextual checking will determine the language of the document and immediately significantly reduce the likelihood of models using incorrect alphabets, and dictionary checking will compensate for errors in the event of uncertain recognition of certain characters: for example, the word “turn” is present in the dictionary in English- in contrast to “tum” (in any case, it is not among the popular ones). Although the priority of the dictionary is higher than that of any classifier, it is not necessarily the last resort, and in general does not stop further checks: firstly, as mentioned above, there is a model of a non-dictionary word, and secondly, the special organization of dictionaries allows with a high percentage probabilities to guess whether some unknown word can belong to a particular language. However, dictionary checking (and the completeness of dictionaries) has a significant impact on the recognition result, and in ABBYY’s own tests it reduces the number of errors by almost half.

Not only OCR

Printed documents are far from the only ones of interest from the point of view of their digitization and automatic processing. Quite often you have to work with forms, i.e. documents with predefined and fixed fields that are filled out manually, but relatively accurately (so-called hand-printed characters) - various questionnaires can serve as an example. The technology for their processing has a separate name - ICR(intelligent character recognition) - and differs quite significantly from OCR. So, since in this case the task is not to recreate the entire document, but to extract specific data from it, it breaks down into two main subtasks: finding the necessary fields and actually recognizing their contents.

This is a fairly specific area, and ABBYY offers a completely separate software product, ABBYY FlexiCapture, for it. It is intended for creating automated and semi-automated systems, involves customization for specific types of documents for which special templates are created, can intelligently find various fields on pages and verify data in them, etc. However, at the very core are character recognition algorithms similar to those , which are used in FineReader, and general scheme very similar:

However, important difference nevertheless, there is: the structural classifier is an obligatory participant in the process - this is due to the specifics of hand-printed symbols. In addition, ICR assumes big number specific additional checks: For example, whether the character is crossed out, or whether the recognized characters actually form a date.

One of the most popular functionality for working with scanning and file processing various types- Fine Reader. Functional software product was developed by the Russian company ABBYY, it allows you not only to recognize, but also to process documents (translate, change formats, etc.). Many users can only install it, but cannot immediately figure out how to use ABBYY FineReader. You can find answers to many questions in this article.

The program allows you to scan and recognize text - and more

To understand in detail what kind of program ABBYY FineReader 12 is, you need to consider in detail all its capabilities. The first and simplest function is to scan a document. There are two scanning options: with and without recognition. In the case of a regular scan of a printed sheet, you will receive the image that you scanned in the specified folder on your computing device.

ATTENTION. The sheet must be placed evenly on the scanning part of the printer, along the contours indicated on the printer. Do not allow the source file to become crooked, as this may lead to poor quality of the final scan.

You must decide for yourself why you need FineReader, since the utility has significant functionality, for example, you can independently choose what color you want to receive the image in, it is possible to convert all photos to black and white. In black and white, recognition is faster and the quality of processing increases.

If you are interested in the text recognition function of ABBYY FineReader, before scanning you need to click special button. In this case, there are several options for obtaining information. As standard, a recognized piece of sheet will be displayed on your screen, which you can copy or edit manually.

If you select other functions, you can immediately receive the file as a Word document or Excel table. Selecting functions is very simple, the menu is intuitive and easy to customize due to the fact that all the buttons you need are in front of your eyes.

IMPORTANT. Before ABBYY FineReader can recognize text, you need to accurately select the processing language. Despite the fact that the utility works completely automatically, it happens that low quality the source does not allow us to understand what kind of language was in the source. This greatly reduces the quality of the final results of the application.

Multiple operating modes

To fully understand how to use ABBYY FineReader 12, you need to try two modes of operation: “Careful” and “Quick recognition”. The second mode is suitable for high-quality images, and the first for low-quality files. The Thorough mode takes 3-5 times longer to process files.

The illustration shows the result of the program - text recognition from an image

What other functions are there?

Text recognition in ABBYY FineReader is not the only one useful feature. For greater user convenience, there is