3.11. OCR and Installing Tesseract for Squish

Table of Contents

3.11.1. OCR Functionality in Squish
3.11.2. Configuring the Package
3.11.3. Performing Unattended Installations

3.11.1. OCR Functionality in Squish

Optical Character Recognition (OCR) is a technology that enables the digitization of scanned images with printed or handwritten text into machine-readable data that can later be used for electronic editing. Image sources fed to OCR software include image-only PDFs, scanned documents, handwritten manuscripts or camera images, among others.

Applications of OCR technology are wide and varied and include automatic data entry, passport recognition in airports, digitizing dated newspapers, automatic number-plate recognition and assistive technology for the visually impaired. Advantages of using OCR to digitize text are clear. That is, OCR offers a massive saving of storage space by compacting paper documents into electronic documents; searchability is vastly improved for printed texts; revising a document is then easier once a text has been computerised into machine-encoded text and can be done with a standard word processor; and digital backups of printed text (e.g., legal paperwork or newspapers) can be done frequently and with greater security over keeping documents in printed form.

Squish includes OCR as a compliment to its already powerful Object-based and Image-based recognition methods. Variability in a component's visual appearance is particularly prominent for onscreen text when trying to create platform-independent tests, due to a wide assortment of fonts, font sizes, decorations and rendering modes. Thus, Image-based recognition methods, including Fuzzy Image Search, are generally unsuitable for locating text onscreen. OCR therefore allows for efficient text handling in those scenarios where the same text is rendered with different parameters, making it look largely dissimilar in pixel-to-pixel comparison (i.e., due to varying letter widths, different kerning or shifting line break positions).

Squish uses, as its primary engine, the free Tesseract OCR library to faciliate text recognition. In order to use the Tesseract OCR engine, the package, including all of the language files, needs to be installed independently of Squish. Any other OCR engine can potentially be substituted for use with Squish.

Tesseract for Squish is supplied as a single, easy-to-install binary package that contains the engine libraries and the full set of language files. The packages for all supported platforms can be found in the download portal.

3.11.2. Configuring the Package

Please download the Tesseract for Squish package for your operating system from your customer area onto your computer, and execute it.

[Note]On Linux

On Linux, you first need to make the .run file which you downloaded executable. Popular desktop environments allow this by right-clicking the file and enabling the Execute permission. You can also make the installer executable on the command line by issuing the following:

$ chmod a+x

The installation program will guide you through the configuration process by presenting multiple pages.

The Tesseract for Squish setup program in action.
[Tip]Changing Configuration Settings

Once you start the installer, you can go back to change a configuration setting using the Back button and proceed to the following pages using the Next button. Installation Folder

This step decides the location on your system in which the Tesseract for Squish will be installed.

A picture of the target selection page from the froglogic Tesseract for Squish setup program. Acknowledging the Terms of Use

A picture of the Appache license test page from the froglogic Tesseract for Squish installation program.

After selecting the installation folder, you will be presented with the license under which you are permitted to use your copy of Tesseract for Squish. Please read the entire license text carefully before proceeding. Click one of the two radio buttons (I accept the license. or I do not accept the license.), that appear below the license text, to indicate whether you agree or disagree with the terms. If you disagree, then you cannot install or use Tesseract for Squish. To terminate the installation, click the Cancel.

If you accept the license, the Next button will become enabled, and you can proceed to the next step of the configuration process. Tesseract engine registration

In order to use Tesseract with Squish, its installation path needs to be registered with Squish. The Tesseract for Squish package installer will perform the registration during the installation if the Register the Tesseract installation with Squish selected.

A picture of the page from the froglogic Squish configuration program where you configure which Qt library should be used for the tested application.

If you choose not to register the Tesseract installation with Squish, you can do it at a later time by entering the chosen installation path on the Squish IDE OCR Preferences pane or by editing the ocr.ini (Section file manually. Ready to Install

At this point all the configuration options have been set and the installation is ready to launch. A page is shown which displays the disk space required by the Tesseract for Squish installation.

A picture of the configuration review page from the froglogic Tesseract for Squish configuration program. Executing the Installation

The installation program now commences installing Tesseract for Squish on your system. You can click the Show Details button to get a detailed list of actions performed as part of the installation.

A picture of the Tesseract for Squish setup program installating a package.

You can close the installer at any time, e.g. by closing the window or by pressing the Cancel button (only visible on platforms other than macOS). All changes done so far will be rolled back. Concluding the Configuration

Congratulations! You have finished installing the Tesseract for Squish. This page concludes the setup of your Tesseract for Squish binary package.

A picture of the final page from the froglogic Tesseract for Squish insatllation program.

You should now click the Finish button to close the installation program.

3.11.3. Performing Unattended Installations

It is possible to perform the installation of Tesseract for Squish completely unattended, passing any required values up front. Unattended installation requires no user interactions whatsoever and is equivalent to manually interacting with the installer interface. To perform an unattended installation, invoke the Tesseract for Squish installation program from the command line passing at least the argument unattended=1:

$ ./tesseract-4.0.0-for-squish.x64.run unattended=1 <more options...>

That argument will launch the installation without any graphical user interface. Instead, progress information and potential error messages are written to the console.

In addition to the unattended=1 argument, you may want to specify targetdir=<PATH> argument to specify the target installation directory or the register=0 to disable the automatic registration of the engine with Squish

$ ./tesseract-4.0.0-for-squish.x64.run unattended=1 targetdir=/opt/tesseract register=0