Abbyy finereader 12 what is this program for. How it works: FineReader

💖 Like it? Share the link with your friends

ABBYY® FineReader 12 Quick Start Guide This document contains basic information on how to use ABBYY FineReader. Full...»

ABBYY® FineReader 12

Quick User Guide

This document contains basic information about ABBYY FineReader. Complete guide to

working with ABBYY FineReader, Screenshot Reader and Hot Folder applications is available on the company's website

ABBYY. If you do not have a permanent internet connection, you can download the manual

user in PDF format.

What is ABBYY FineReader

Installing and launching ABBYY FineReader

System requirements

Program installation

Launching ABBYY FineReader

Working with ABBYY FineReader

Built-in tasks

Step by step document conversion

Document Structure Analysis and Region Editing

Activation and registration of the program

Activating ABBYY FineReader

Registering ABBYY FineReader

Data security

What is ABBYY FineReader ABBYY FineReader is an Optical Character Recognition (OCR) system. It is designed to convert scanned documents, PDF documents and image files, including digital photographs, into editable formats.

Benefits of ABBYY FineReader 12 Speed ​​and high accuracy of recognition Support for most world languages ​​Checking recognition results Simple and clear interface Recognition of photographed documents Saving documents in various formats and sending them to online storages Free technical support for registered users Installing and launching ABBYY FineReader System requirements 1. 32-bit (x86) or 64-bit (x64) processor, 1 gigahertz (GHz) or faster.



2. Operating system Microsoft® Windows® 8, Microsoft® Windows® 7, Microsoft Windows Vista, Microsoft Windows XP, Microsoft Windows Server 2012/2012 R2, Microsoft Windows Server 2008/2008 R2, Microsoft Windows Server 2003.

To work with a localized interface, the operating system must provide the necessary language support.

3. RAM - 1024 MB.

When running on multi-core systems, an additional 512 MB of RAM is required for each additional core.

4. Free disk space: 850 MB for installing all program components, 700 MB for running the program.

5. Video card and monitor with a resolution of at least 1024768 pixels.

6. Keyboard, mouse or other pointing device.

Program installation

To install ABBYY FineReader 12:

1. Run the Setup.exe file from the installation CD or from the program distribution folder.

You can install ABBYY FineReader 12 Corporate on a local network. You can download the System Administrator's Guide in PDF format from the ABBYY website.

Launching ABBYY FineReader

To launch ABBYY FineReader 12:

ABBYY FineReader 12 Quick Start Guide Select ABBYY FineReader 12 (ABBYY FineReader 12 Corporate) in the StartPrograms menu, or In Microsoft Office applications, click the ABBYY FineReader launch button located on the FineReader 12 panel, or In Windows Explorer, select the image file and in its context menu menu, select Open with ABBYY FineReader or the option to convert to the desired format.

Working with ABBYY FineReader

Processing documents using ABBYY FineReader consists of four stages:

Getting an image;

Document recognition;

Checking and editing the received text;

Saving recognition results.

Often this process consists of the same sequence of actions, for example, scanning, recognition and saving the recognized text in a specific format. To perform the most common tasks, the program provides built-in tasks that allow you to get recognized text at the click of a button. To recognize documents with a complex structure, you can set up and run each stage of processing yourself.

Built-in tasks The built-in tasks are launched from the Tasks window, which is opened by default when the application starts. If the window is closed, click the Task button on the main toolbar of the program.

ABBYY FineReader 12 Quick Start Guide

To convert a document using built-in tasks:

1. In the Task window, select the necessary tab with tasks:

Basic - contains the most frequently used built-in tasks in ABBYY FineReader;

Microsoft Word - tasks for converting to a Microsoft Word document;

Microsoft Excel - tasks for creating Microsoft Excel spreadsheets;

–  –  –

My Tasks - you can create your own custom tasks consisting of the steps required for you (only for ABBYY FineReader Corporate version).

2. In the Document language list, specify the recognition languages.

3. In the Color mode list, select a color mode:

Color - the color scheme of the document will not change;

Black and White - The pages of the document will be black and white, which will reduce the size of the FineReader document. Compared to Color mode, this mode takes less time to process a document.

ABBYY FineReader 12 Quick Start Guide Attention! After selecting the black-and-white mode, it will not be possible to restore the color appearance of the document. To get a color document, open a file that contains color images of the pages, or scan a paper document in color mode.

4. If necessary, set additional options for the tasks of converting to a Microsoft Word document, to a Microsoft Excel document, and to an Adobe PDF document in the right part of the window.

5. Click the button for the task you want.

ABBYY FineReader tasks are performed according to the settings specified in the Settings dialog (ToolsSettings... menu).

After launch, a task progress bar appears on the screen, containing a task progress indicator, a list of steps, as well as tips and warnings.

As a result of the task, a document of the required format will be created, and the images will be added to the FineReader document. If necessary, you can edit the selected areas on the images, check the recognized text and save the recognition results in a different format.

Step-by-step document conversion To independently set up and launch each stage of document processing, use the main window of ABBYY FineReader.

ABBYY FineReader 12 Quick Start Guide

1. On the main toolbar, in the Document language drop-down list, specify the recognition languages.

2. Scan or open your images.

By default, document analysis and recognition will start automatically. You can change these settings on the Scan/Open tab of the Options dialog (Tools Options... menu).

3. In the Image window, check the selected areas and edit them if necessary.

4. If you have changed areas, on the main toolbar, click the Recognize button.

5. In the Text window, check and, if necessary, edit the recognition results.

Analyzing the structure of a document and editing areas The quality obtained as a result of converting a document depends on many factors: on the original image, recognition settings, saving parameters. One of the most important steps is to analyze the logical structure of the document, i.e. selection of areas with text, pictures, tables and barcodes. Areas are allocated in order to tell the system how to recognize certain parts of the image and in what order. This reproduces the original design of the document.

ABBYY FineReader 12 Quick User Guide By default, document analysis in ABBYY FineReader is performed automatically.

However, in complex documents, some areas may not be highlighted correctly. It is often more convenient to correct only them, rather than reselect all areas. Tools for manual marking and editing of areas are located on the panel of the Image window, as well as on the pop-up toolbars for the Text, Image, Background Image and Table areas. A pop-up toolbar appears next to the active area. To select an area, click on it with the left mouse button.

With manual layout tools, you can:

Add or remove an area

–  –  –

Move area borders or the area itself Add/remove a rectangular part of the area Renumber areas After all operations on editing areas are completed, start recognition again.

You can read more about how to work with manual marking tools, as well as about non-standard situations that may require additional settings, in the full help on the ABBYY website.

Activating and registering the program Activating ABBYY FineReader To use ABBYY FineReader 12 in full-featured mode, you may need to activate the product. Activation is completely secure and anonymous.

The easiest and fastest way is to activate the program via the Internet. You can also activate the program by e-mail or phone/fax. Detailed information about activation can be found on the ABBYY website.

Registering ABBYY FineReader ABBYY invites you to become a registered user of ABBYY

FineReader 12. By registering, you get a number of benefits:

Free technical support;

The ability to use the ABBYY Screenshot Reader application, designed to recognize text from screenshots of screen areas (screenshots);

Restoration of the serial number in case of its loss;

Automatic product update;

Opportunity to receive information about special offers for ABBYY products.

You can register your copy of the program in one of the following ways:

Fill out the registration card during the program activation process. If you did not register the program during the activation process, you can do it later, at any time convenient for you.

ABBYY FineReader 12 Quick Start Guide From the Help menu, select Register... and fill out the registration form.

Register on the ABBYY website.

Data security During the registration process, you consent to the voluntary transfer of your personal data to ABBYY. You also express your consent to the collection, processing and use of your personal data by ABBYY under the terms of confidentiality and in accordance with applicable law in accordance with the License Agreement. The personal data provided by you will be used only within the ABBYY group of companies and will not be provided to third parties, except as provided by the applicable law under the License Agreement or the License Agreement itself.

ABBYY has the right to send you e-mails containing news about products, price changes, special offers, and other information about products or the company only if you have confirmed your consent to receive information from ABBYY by checking the appropriate box during registration. You can remove your address from the list of subscribers at any time by contacting ABBYY.

ABBYY FineReader 12 Quick Start Guide

The information contained in this document is subject to change without notice and ABBYY assumes no obligation to do so.

The software described in this document is furnished under a License Agreement. This software may only be used or copied in strict accordance with the terms of this agreement. Copying this software to any media, unless there is a special permission for this in the License Agreement or in the non-distribution agreement, is a violation of the Law of the Russian Federation "On the legal protection of computer programs and databases" and international law.

No part of this manual may be reproduced or transmitted for any purpose in any form or by any means, electronic or mechanical, including photocopying and recording on magnetic media, unless expressly authorized in writing. ABBYY company.

© Abi Production LLC, 2013. All rights reserved.

ABBYY, ABBYY FineReader, ADRT are registered trademarks or trademarks of ABBYY Software Ltd.

© 1984-2008 Adobe Systems Incorporated and their licensors. All rights reserved.

Protected by US Patents: 5,929,866; 5,943,063; 6,289,364; 6,563,502; 6,185,684; 6,205,549; 6,639,593;

7,213,269; 7,246,748; 7,272,628; 7,278,168; 7,343,551; 7,395,503; 7,389,200; 7,406,599; 6,754,382; Patent applications are being considered.

Adobe® PDF Library is licensed by Adobe Systems Incorporated.

Adobe, Acrobat®, the Adobe logo, the Acrobat logo, the Adobe PDF logo, and Adobe PDF Library are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries.

This program contains components owned by © 2008 Celartem, Inc. All rights reserved.

This software contains components owned by © 2011 Caminova, Inc. All rights reserved.

Based on AT&T Labs Technology.

DjVu® is covered by US Patent No. 6.058.214. Patent applications in other countries are being considered.

This program contains components owned by © 2013 University of New South Wales. All rights reserved.

© 2002-2008 Intel Corporation.

© 2010 Microsoft Corporation. All rights reserved.

Microsoft, Outlook, Excel, PowerPoint, Windows Vista, Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

© 1991-2013 Unicode, Inc. All rights reserved.

© 2010 Oracle and/or its affiliates. All rights reserved.

OpenOffice.org, the OpenOffice.org logo are trademarks or registered trademarks of Oracle and/or its affiliates.

JasPer License Version 2.0:

© 2001-2006 Michael David Adams © 1999-2000 Image Power, Inc.

© 1999-2000 The University of British Columbia EPUB® is a registered trademark of IDPF (International Digital Publishing Forum) This software contains components owned by © 2009 The FreeType Project (www.freetype.org). All rights reserved.

The product includes software developed by the OpenSSL project for use in the OpenSSL Toolkit. (http://www.openssl.org/). The product contains cryptographic software written by Eric Young ( [email protected]).

© 1998-2011 The OpenSSL Project. All rights reserved.

© 1995-1998 Eric Young ( [email protected]) All rights reserved.

This product includes software developed by Tim Hudson ( [email protected]).

Other trademarks are trademarks or registered trademarks

Similar works:

“02/01/2016 SHEMOT NAMES Exodus 1:1-6:1/Isaiah 27:6-28:13; 29:22,23 / Mark 1,2 Exodus 1:1 Behold (today) the names of the children of Israel that entered into Egypt, each one entered with his house with Jacob. Exodus 1:1 And these are the names of the children of Israel, which entered into Egypt with Jacob, each went in with his house. Here is a direct...

“FineReader what kind of program is this” - we are talking about, in its way, an irreplaceable program that can very often come in handy in office work. It is even a bit strange to see such requests, in which the name of such a well-known and popular program appears, which at one time was on almost every computer.

There is a list of programs that most users install as soon as they install the operating system. This happens not because there is an immediate need for absolutely each of them, but because these are office applications that will definitely be used, and very actively.

Their list has always included and will always include the Microsoft Office software package, Adobe Reader, some of the browsers, to download which you launch Internet Explorer for the first and last time on fresh Windows. Further variations begin depending on the needs of the user. It can be Photoshop, or Sony Vegas, and so on.

Once the above programs included FineReader, which has lost its popularity only in recent years. Now it is no longer as often seen on cars as it used to be. But it would be foolish to deny that its functionality is still relevant today.

What is the FineReader program?

The beauty of this application is that it can not only recognize and read text files of different formats, but also convert text from images into a DOC file.

That is, non-editable text is made editable. And in view of the fact that you often have to deal with various magazine formats, this ability comes in handy.

Using FineReader to convert text saves a huge amount of time because the user doesn't have to enter text manually by retyping it from an image.

Therefore, now that you know what kind of program it is - FineReader, you will never again have to retype text from an image on a computer in order to edit it in the future.

One of the most popular functions for working with scanning and processing files of various types is Fine Reader. The functionality of the software product was developed by the Russian company ABBYY, it allows not only to recognize, but also process documents (translate, change formats, etc.). Many users can only install, but cannot figure out how to use ABBYY FineReader right away. You can find answers to many questions in this article.

The program allows you to scan and recognize text - and not only

To understand in detail what kind of program ABBYY FineReader 12 is, it is necessary to consider in detail all its features. The first and easiest function is to scan a document. There are two scanning options: with recognition and without it. In the case of a regular scan of a printed sheet, you will receive the image that you scanned in the specified folder on your computer device.

ATTENTION. The sheet must be placed on the scanning part of the printer exactly, along the contours indicated on the printer. Do not allow the source to be wrinkled, this can lead to poor quality of the final scan.

You must decide for yourself what FineReader is for you, since the utility has significant functionality, for example, you can choose what color you want to get an image in, it is possible to convert all photos to black and white. In black and white color recognition is faster, the processing quality increases.

If you are interested in the text recognition function of ABBYY FineReader, you need to press a special button before scanning. In this case, there are several options for obtaining information. By default, a recognized piece of the sheet will be displayed on your screen, which you can copy or edit manually.

If you select other functions, you can immediately get the file as a Word document or Excel spreadsheet. Selecting functions is very simple, the menu is intuitive, easy to configure due to the fact that all the buttons you need are in front of your eyes.

IMPORTANT. Before you can recognize text in ABBYY FineReader, you need to choose the processing language exactly. Despite the fact that the utility works completely automatically, it happens that the low quality of the source code does not make it possible to understand what language was in the source code. This greatly reduces the quality of the final results of the application.

Multiple operating modes

To fully understand how to use ABBYY FineReader 12, you need to try two modes of operation: "Through" and "Quick Recognition". The second mode is suitable for high quality images, while the first mode is suitable for low quality files. Thorough mode takes 3-5 times longer to process files.

The illustration shows the result of the program - text recognition from an image

What other features are there?

Text recognition in ABBYY FineReader is not the only useful feature. For greater convenience of users, it is possible to translate the document into the formats required by the user (pdf, doc, xls, etc.).

Text change

To understand how to change the text in the Fine Reader, the user needs to open the "Service" - "Check" tab. After that, a window will open that will allow you to edit the font, change characters, colors, etc. If you are editing an image, then you should open the “Image Editor”, it almost completely corresponds to the simple Paint drawing tool, but it will allow you to make minimal edits.

ATTENTION. If you still can't figure out how to use ABBYY FineReader productively, you can read the "Help" section, which can be found in the application window, in the "About" tab.

Now you know what the FineReader program is for, and you can use it correctly at home or in the office. The functionality of the application is huge, use it and you can be convinced of the indispensability of this software product when processing documents and files during office work.

Although the advances made to artificial intelligence (AI) over the past 50 years have not brought "smart" machines one iota closer to human cognitive capabilities, it would be unfair to completely deny advances in this direction. The most obvious and striking example is chess (not to mention the simpler games). The computer cannot yet imitate our thinking, but it is quite capable of making up for this gap with a large amount of specialized memory and the speed of enumeration. Vladimir Kramnik described the game of the Deep Fritz program that won him in 2006 as "inhuman" in the sense that it often contradicted established (human) rules of strategy and tactics.

A little more than a year ago, another brainchild of IBM, which at one time laid the foundation for the triumphant chess victories of computers (the famous Deep Blue), called Watson, made a new breakthrough, defeating two champions of the popular American quiz Jeopardy by a wide margin. It is significant, however, that although Watson voiced the answers himself, the questions were still transmitted to him in text form. This suggests that progress in many areas of AI application - speech and image recognition, machine translation - is quite modest, although this does not prevent us from putting them into practice today. The greatest success, perhaps, is demonstrated by optical character recognition systems (OCR, Optical Character Recognition), which almost all PC users are probably familiar with in one way or another. Moreover, Russian developments in this area occupy a worthy place in the world - I mean ABBYY FineReader.

A bit of history

The current version of ABBYY FineReader is number 11, which means that the application has come a long way in development, and even the history of this process is of some interest. Without pretending to be an exhaustive chronicle, I will give only the main milestones over the past decade, during which I more or less followed FineReader:

YearVersionKey Features
2003 7.0 Up to 25% increase in recognition accuracy. Most of all, this was reflected in tables, especially complex ones, with colored cells, hidden separators, etc.
2005 8.0 Further optimization of recognition algorithms, primarily aimed at working not with document scans, but with digital photographs. For this, additional functions for preparing originals appeared (elimination of distortions, alignment of lines, etc.).
2007 9.0 The advent of ADRT technology, which takes into account the logical structure of the entire processed (multi-page) document and is able to select repeating elements (headers and footers), connect "flowing" objects (tables), etc.
2009 10.0 Further improvement of ADRT and recognition algorithms, increasing the accuracy of processing low-resolution originals up to 30%.
2011 11.0 The main attention is paid to the speed of the program. The "second coming" of the black-and-white mode, which gives an additional acceleration of up to 30% on good quality originals.

Naturally, during the same time, FineReader expanded support for document formats, improved the built-in tools and interface, improved the reproduction of the structure of originals, etc. However, the highlights are directly related to OCR technologies and demonstrate quite well the spasmodic development process characteristic of complex science-intensive systems, when after the next “breakthrough”, a certain period of “calm” follows, which is necessary for the improvement of new algorithms. They represent the main value of any OCR program, and therefore any detailed information about them rarely reaches users. However, ABBYY kindly agreed to lift the veil of secrecy, and today we have the opportunity to look into the inner sanctuary of FineReader.

Basic principles

So, since OCR belongs to the field of AI, it is quite logical that developers strive to imitate the activity of our brain at least to some extent. Of course, the structure of our visual system is incredibly complex, but the basic "large-block" principles of its functioning have been sufficiently studied, usually there are three of them:

  1. Integrity- the object is considered as a set of its parts and (for visual images) spatial relations between them. In turn, the parts are interpreted only as part of the entire object. This principle helps to build and refine hypotheses, quickly cutting off the unlikely ones.
  2. purposefulness- since any interpretation of data has a specific goal, recognition is also a process of putting forward hypotheses about an object and purposefully testing them. A system operating in accordance with this principle will not only use computing power more economically, but will also make fewer mistakes.
  3. Adaptability- the system saves the information accumulated in the course of work and reuses it, i.e. it learns itself. This principle allows you to create and accumulate new knowledge and avoid re-solving the same problems.

FineReader is the only OCR system in the world that operates in accordance with the principles described above at all stages of document processing. The corresponding technology is called IPA- by the first letters of English terms. For example, according to the principle of integrity, a fragment of an image will be interpreted as a symbol only if it contains all the structural parts of similar objects, and those that are in certain relationships. This helps to replace the search of a large number of templates (in search of a more or less suitable one) with a purposeful test of a reasonable number of hypotheses, moreover, based on previously accumulated information about possible character styles in a recognized document.

However, IPA principles are applied when analyzing not only fragments corresponding to (presumably) individual characters, but also the entire original page image. Most OCR systems are based on recognition of the hierarchical structure of a document, i.e. the page is broken down into basic structural elements such as tables, images, text blocks, which, in turn, are divided into other characteristic objects - cells, paragraphs - and so on. , down to individual characters.

Such an analysis can be carried out in two main ways: from top to bottom, i.e. from constituent elements to individual characters, or, conversely, from bottom to top. Most often, one of them is used, but ABBYY has developed a special algorithm MDA(multilevel document analysis, multilevel document analysis), which combines both. In short, it looks like this: the page structure is analyzed by the top-down method, and the electronic document is reconstructed after recognition is completed from the bottom-up, however, at all levels, an additional feedback mechanism operates. As a result, the probability of gross errors associated with incorrect recognition of high-level objects is sharply reduced.

ADRT

Historically, OCR systems evolved from single character recognition. This task is still the most important and most difficult, it is with it that the most complex algorithms are associated. However, it soon became clear that higher-level information (for example, about the language of the document and the correct spelling of recognized words) could help in solving it - this is how context and dictionary checks appeared. Then the desire to preserve formatting and recreate the physical structure (i.e., the relative position of various objects) of the document led to the need for a detailed analysis of the entire page. It is clear that this also significantly affects the overall quality of recognition, since it helps to correctly process multi-column layout, tables, and other methods of “non-linear” text arrangement.

Most modern OCR operates on these three levels - characters, words, pages - practicing, as already mentioned, top-down or bottom-up approaches. However, ABBYY, in accordance with the principles of the IPA, introduced one more level into FineReader - the entire multipage document. First of all, this was necessary for the correct reproduction of the logical structure, which is becoming increasingly complex in modern documents. But there are additional bonuses: increased accuracy and faster processing of repetitive objects, more correct identification (and hence recognition) of objects “flowing” from page to page.

This is what it was designed for. ADRT(Adaptive Document Recognition Technology) - a technology for analyzing and synthesizing a document at a logical level. Ultimately, it helps to make the result of FineReader's work as similar to the original as possible. To do this, the image of the entire document is analyzed, and the recognized words are combined into groups (clusters) depending on the style, environment, and location on the page. Thus, the program, as it were, sees the "logic" of the markup of the document and in the future can unify the design of the result.

Thanks to ADRT, FineReader, starting from version 9.0, has learned to detect, recognize and reproduce the following structural parts and document formatting elements:

  • main text;
  • headers and footers;
  • page numbers;
  • headings of the same level;
  • table of contents;
  • text inserts;
  • captions for drawings;
  • tables;
  • footnotes;
  • signature/print zones;
  • fonts and styles.

Recognition process

According to the MDA algorithm, actual recognition starts from top to bottom, from the page level. It is clear that the more wrong decisions are made in the early stages of this process, the more there will be in the next. That is why the recognition accuracy depends so much on the quality of the originals, but their pre-processing algorithms can also be of significant importance. So, as the popularity of color documents grew in FineReader, an adaptive binarization procedure appeared (adaptive binarization, AB). If you scan a document immediately in black and white mode, where there are watermarks or the text is located on a texture or color substrate, then “garbage” will invariably appear on the image, which then will be quite difficult to separate from the “useful” image (because the original information about him is already lost). That is why FineReader prefers to work with color or grayscale images, converting them to black and white on its own (this process is called binarization). But that's not all. Since the colors of text and background can vary within a page and even within individual lines, AB highlights words with more or less the same characteristics and selects optimal binarization parameters for each in terms of recognition quality. This is precisely the adaptability of the algorithm, which is thus an example of the use of feedback in MDA. It is clear that the effectiveness of AB strongly depends on the design of source documents - on the ABBYY test base, this algorithm provided an increase in recognition accuracy by 14.5%.

But the most interesting, of course, begins when the recognition process descends to the lowest levels. The so-called linear division procedure breaks strings into words and words into individual letters; then, in accordance with the IPA principle, it forms a set of hypotheses (that is, possible options for what kind of symbol it is, what symbols the word is divided into, etc.) and, having provided each with a probability estimate, passes it to the input of the character recognition mechanism. The latter consists of a series of so-called classifiers, each of which also generates a number of hypotheses, ranked according to the estimated degree of probability. The most important characteristic of any classifier is the average position of the correct hypothesis. It is clear that the higher it is, the less work for subsequent algorithms - for example, dictionary checking. But for sufficiently well-adjusted classifiers, such characteristics as recognition accuracy according to the first three hypotheses or only according to the first one are most often evaluated - that is, roughly speaking, the ability to guess the correct answer from three or one attempt. ABBYY uses the following types of classifiers in its systems: raster, feature, feature differential, contour, structural and structural differential - which are grouped at two logical levels.

Operating principle RK, or raster classifier, is based on a pixel-by-pixel comparison of a symbol image with references. The latter are formed as a result of averaging images from the training sample and are reduced to a certain standard form; accordingly, for the recognizable image, the size, thickness of the elements, and slope are also pre-normalized. This classifier is distinguished by ease of implementation, speed of operation, and resistance to image defects, but it provides relatively low accuracy and that is why it is used at the first stage - to quickly generate a list of hypotheses.

Feature classifier ( PC), as its name implies, is based on the presence of signs of a particular symbol in the image. If there are N such signs, then each hypothesis can be represented by a point in the N-dimensional space; accordingly, the accuracy of the hypothesis will be estimated by the distance from it to the point corresponding to the standard (which is also developed on the training sample). It is clear that the types and number of features largely determine the quality of recognition, so there are usually quite a lot of them. This classifier is also relatively fast and simple, but not very resistant to various image defects. In addition, the PC does not operate with the original image, but with a certain model, abstraction, i.e. does not take into account some of the information: for example, the very fact of the presence of some important elements does not say anything about their relative position. For this reason, the PC is used not instead of, but together with the RK.

contour classifier ( QC) is a special case of the PC and differs in that it analyzes the contours of the intended character, extracted from the original image. In general, its accuracy is lower than that of a full-fledged PC.

Feature differential classifier ( MPC) is also similar to PC, but is used solely to distinguish similar objects such as "m" and "rn". Accordingly, it analyzes only those areas where differences are hidden, and it is fed not only initial images, but also hypotheses formed at the early stages of recognition. The principle of its operation, however, is somewhat different from the PC. At the training stage, two "clouds" (groups of points) of possible values ​​for each of the two options are formed in the N-dimensional space, then a hyperplane is built that separates the "clouds" from each other and is approximately equidistant from them. The recognition result depends on which half-space the point corresponding to the original image falls into.

By itself, MPC does not put forward hypotheses, but only refines the existing ones (the list of which is generally sorted by the bubble method), so that a direct assessment of its effectiveness is not carried out, but indirectly it is equated with the characteristics of the entire first level of OCR recognition. However, it is clear that it depends on the correctness of the selected features and the representativeness of the sample of standards, which is a rather laborious task.

Structural differential classifier ( KFOR) was originally used to process handwritten texts. Its task is to distinguish between such similar objects as "C" and "G". Thus, the SDK is based on the features characteristic of each pair of characters, the process of its learning is even more complicated than that of the MPC, and the speed of work is lower than that of all previous classifiers.

Structural classifier ( SC) is the pride of ABBYY, it was originally developed for recognizing the so-called hand-printed text, i.e. when a person writes in “printed” letters, but was subsequently applied to print. It is used at the final stages of recognition and comes into action quite rarely, namely, only when at least two hypotheses reach it with sufficiently high probabilities.

The qualitative characteristics of all classifiers are collected in the following table. However, they only allow one to evaluate the effectiveness of algorithms relative to each other, since they are not absolute, but are obtained on the basis of processing a specific test sample. It may seem that at the last stages of recognition, the struggle is literally for a fraction of a percent, but in fact, each classifier makes a significant contribution to improving recognition accuracy - for example, SC reduces the number of errors by a significant 20%.

RKPCQCMPC*KFOR**SC**
Accuracy for the first three options, %99,29 99,81 99,30 99,87 99,88 -
Accuracy according to the first option, %97,57 99,13 95,10 99,26 99,69 99,73

* evaluation of the entire first level of the ABBYY OCR algorithm
** estimate for the whole algorithm after adding the corresponding classifier

It is curious, however, that, despite the rather high accuracy, the recognition algorithm itself does not make a final decision. According to the MDA principle, hypotheses are put forward at each logical level, and their number can grow exponentially. Accordingly, successive testing of all hypotheses is unlikely to be effective, which is why ABBYY OCR systems use the method of structuring hypotheses, i.e. assigning them to certain models. There are a couple of dozen of the latter, here are just a few of their types: dictionary word, non-dictionary word, Arabic numerals, Roman numerals, URL, regular expression - and each can include many specific models (for example, a word in one of the known languages, Latin, Cyrillic etc.).

All final actions are already performed with the hypotheses built according to the models. For example, a contextual check will determine the language of a document and immediately significantly reduce the likelihood of models using incorrect alphabets, while a dictionary check will compensate for errors in the uncertain recognition of certain characters: for example, the word "turn" is present in the English dictionary - in contrast to "tum" (in in any case, it is not among the popular ones). Although the dictionary priority is higher than that of any classifier, it is not necessarily the last resort, and in the general case does not stop further checks: firstly, as mentioned above, there is a non-dictionary word model, and secondly, the special organization of dictionaries allows with a high proportion the probability of guessing whether some unknown word can belong to a particular language. Nevertheless, the dictionary check (and the completeness of dictionaries) has a significant impact on the recognition result, and in ABBYY's own tests it reduces the number of errors by almost half.

Not only OCR

Printed documents are far from the only ones of interest in terms of their digitization and automatic processing. Quite often you have to work with forms, i.e. documents with predefined and fixed fields, which are filled in manually, but relatively accurately (so-called hand-printed characters) - various questionnaires can serve as an example. The technology of their processing has a separate name - ICR(intelligent character recognition) - and quite significantly different from OCR. So, since in this case the task is not to recreate the entire document, but to extract specific data from it, it falls into two main subtasks: finding the required fields and actually recognizing their contents.

This is a rather specific area, and ABBYY offers a completely separate software product ABBYY FlexiCapture for it. It is designed to create automated and semi-automated systems, involves setting up for specific types of documents for which special templates are created, can intelligently find various fields on pages and verify data in them, etc. However, it is based on character recognition algorithms similar to those , which are used in FineReader, and the general scheme is very similar:

However, there is still an important difference: the structural classifier is an obligatory participant in the process - this is due to the specifics of handwritten characters. In addition, ICR involves a large number of specific additional checks: for example, whether the character is not a strikethrough, or whether the recognized characters actually form a date.

Consideration of a program for scanning and recognizing text from an image, as well as its installation on the Windows 7 operating system.

Almost every computer user has faced such a task as scanning books or magazines for subsequent text recognition, or simply recognizing text from an image, such as a photograph. And, probably, the most popular (and probably the best) of programs of this kind is the product of our Russian company ABBYY, namely the FineReader program.

To date, the latest version of this product is FineReader 12, so today we will consider the features of the program ABBYY FineReader 12 Professional, and also install a trial version of this program on the Windows 7 operating system.

I want to build our today's article as follows, first we will talk about the features, advantages of this program, then we will analyze the system requirements of the computer and the OS on which this program will be installed, and also consider in detail the installation of FineReader 12 Professional and the limitations of the trial version. Since the program is popular, so almost everyone has used it at least once, whether at home, at a friend’s or, for example, at work, so we won’t consider exactly how you can scan and recognize text, especially that detailed instructions are on the official website, and, by the way, you can also download a trial version on the official website, at the moment the program page is as follows - http://www.abbyy.ru/download/finereader/

On this page, you can download both the instruction (User's Guide) and the trial version of the program itself, for this, click download on the right, then we will be asked to enter your email address, respectively, we enter the (valid) one, since it will be the link for downloading the program. After entering the email, click "Submit" then the message " Thank you for your interest in ABBYY products. A link to download the program has been sent to your e-mail". And you can immediately check your mailbox, to which you will receive a message with a download link, you accordingly follow this link, and the download of the product begins. Accordingly, if you like this program, you can purchase it here on the ABBYY website. Now, where to get this program you know, let's talk about its features and benefits.

Features and Benefits of ABBYY FineReader 12 Professional

ABBYY FineReader is a program for recognizing text from an image, without the need to reprint the entire text, and also with the ability to scan documents from a scanner.

In the field of creating such programs, ABBYY is a world leader and has a large number of awards, which, according to the FineReader program, gives a huge advantage over competitors.

Another distinctive feature of the FineReader program is that it recognizes text on images with high accuracy, which subsequently practically does not need to be formatted, which is probably its main advantage.

ABBYY releases FineReader with support for 190 languages ​​of the world, this also gives advantages over all competitors, and all over the world. It also supports many formats for saving recognition results, such as Word, Excel, OpenOffice and others. It also supports a huge variety of image formats from which it can recognize text, such as: JPEG, BMP, PNG, TIFF, GIF, PDF, DJVU, PCX, DCX and others.

Among other things, as for me, it has a fairly convenient, intuitive interface, so everyone, even a novice computer user, can use this program.

As mentioned above, version 12 is already available today, so let's talk about the innovations of this version, and its advantages over the previous version 11.

Firstly, of course, it is worth noting that in version 11 there was support for 188 languages, and now 190 (maybe someone was waiting for the moment when FineReader was localized for his language :)).

Secondly, according to the developers, the recognition speed has increased, in addition, such functions have been added as: page recognition in the background, instant opening of multi-page documents, automatic cropping of excess parts of the image, removal of seals and marks on office documents to improve the quality of recognition, added the ability to disable such elements of the structure as footnotes, footers, table of contents, also added tools for formatting text in the window for checking the results.

Even according to the developers, some part of the existing functionality has been improved and optimized, in general, there are enough changes.

ABBYY FineReader 12 Professional trial version limitations

After downloading the trial version of FineReader 12 (its size is 351 megabytes) and then installing it on your computer, it will be valid for 30 days, it will also allow you to recognize only 100 pages and save the results at a time no more than 3 pages from the document. In fact, this will be enough for you to appreciate and understand the advantages of this program.

And if you are satisfied with this program, you can purchase it there on the official website and then activate it. If we talk about the cost, then ABBYY FineReader 12 Professional is licensed in two forms, this is by subscription, i.e. an annual license, and a perpetual license, i.e. Once and for all.

At the moment:

  • The annual license costs - 1990 rubles. (download version).
  • A perpetual license costs - 4990 rubles. (boxed version) and 4490 rubles. (download version).

Which one is right for you, decide according to you, the cost, as you can see, is not so high, especially if you actively scan something, take pictures, and then recognize the text.

System requirements for installing ABBYY FineReader 12 Professional

ABBYY FineReader 12 Professional supports the following operating systems: Windows XP, Windows Vista, Windows 7, Windows 8/8.1, Windows Server 2003/2008/2008 R2/2012/2012 R2.

According to the developers, to install FineReader 12 Professional, you need a computer with a processor clock speed of 1 GHz or higher, 1024 MB of RAM and 850 MB of free disk space. An Internet connection is also recommended to activate the software product and a monitor with a resolution of at least 1280 × 1024 pixels.

Installing ABBYY FineReader 12 Professional on Windows 7

Step 1

After downloading the program, you will have a file ABBYY_FineReader_12_Professional.exe, which we respectively launch, for example, by double clicking. Subsequently, we will open a window for unpacking the installation files, click " Install»


Unpacking will begin


Step 2

Then a direct menu for installation will appear, if you purchased the boxed version, then the disc menu will look exactly the same. Click " Installing ABBYY FineReader 12»


Step 3

Then you need to select the language of the program, by default the installer determined it correctly, so immediately click " OK»

Step 4


Step 5

Then, since we are novice users, we select the installation mode " Plain» and press « Further»


Step 6

At this step, we have to choose like this, say the initial settings, for example, I checked the checkboxes as follows, and clicked " Install»


And here comes the installation.



Step 7

The installation does not last long for about 5 minutes, and in conclusion a window will appear with a message about the completion of the installation, click " Ready»


Step 8

All installation is completed, and now a shortcut will appear on the desktop to launch the program, we accordingly launch the program

Every time you start, the trial version will have a window asking you to purchase a license, but we are still trying, so click " Run the program»

And now, finally, the program itself will open, and we can admire, as I said earlier, an excellent interface.


I propose to finish with this, let me remind you once again that you can download a detailed user manual on the ABBYY website. Well, that's all for now! Good luck!

tell friends