Anyline OCR

With the AnylineOCR Module, you have the ability to create your own OCR use case, on the fly with almost no effort. It offers a variety of parameters for you to adjust the scanning process to your use case.

This section describes the parameters in detail.
If you are looking for a How-To on loading the Anyline OCR Module on your platform, please refer to the following sections:

Simultaneous Barcode Scanning

Starting from SDK 3.8 Anyline supports simultaneous barcode scanning for any module. Additional Information can be found under Simultaneous Barcode Scanning

Parameters

scanMode

The scanMode provides the basis for the scanning experience. There are three options: AUTO, LINE or GRID

Range Unit Type Default Mandatory
AUTO, LINE or GRID - String -

Tip

As a rule of thumb: If you can place a grid on top of the text that you want to scan, use GRID mode, otherwise use LINE mode. AUTO automatically detects the valid text within the cutout.

scanMode.AUTO

New in version 3.11.

The AUTO mode automatically detects the text to be scanned if placed within the cutout. It automatically detects if the text to be scanned is formed of one or multiple lines, upper or lowercase characters and/or numbers - and adjusts the scan parameters automatically.

Android/Cordova iOS
AUTO ALAuto

In this mode, all parameters are optional. The parameters contained in the following table can be set in order to improve the scanning process.

Parameter Advised Remark
minCharHeight -
The AUTO mode automatically detects the text height within the cutout
maxCharHeight -
The AUTO mode automatically detects the text height within the cutout
tesseractLanguages -
If the font you are trying to scan differs from a standard sans-serif font,
this parameter should be set.
Otherwise there is no need
charWhitelist
Helps to filter false positives like the number 8 instead of the letter B
validationRegex
Validates the result against the desired structure.
Therefore this parameter helps to avoid false scans.
Especially if the cutout is not placed on top of the text at the start of the scanning.

Known Limitations

As of version 3.12, the AUTO mode detects text automatically up to 30 characters per line in up to 7 lines.
Version 3.11 did not include multiline support, and lowercase character detection was performed checking the charWhitelist

scanMode.LINE

The LINE mode is the best option for scanning multiple or single line(s) of text with an arbitrary length.

This could be an IBAN code, or a mail header with a prior unknown number of lines and length of the lines.

Android/Cordova iOS
LINE ALLine

scanMode.GRID

With the GRID scan mode, you can scan text that is equally laid out in a grid. One example would be Loyalty codes on cans.

This could be a bottlecap code, scrabble letters, or any other use case in which the text can be placed in an imaginary grid.

Android/Cordova iOS
GRID ALGrid

minCharHeight

Defines the minimum height that the symbols need to be considered in the scanning process.

If, for example, you know that the text you are going to scan is rather big, setting this to a high value prevents smaller contours in the image from being taken into account.

Range Unit Type Default Mandatory
- Pixel Integer 15

maxCharHeight

Defines the maximum height that the symbols need to be considered in the scanning process.

If, for example, you know that the text you are going to scan is rather small, setting this to a low value prevents bigger contours in the image from being taken into account.

Range Unit Type Default Mandatory
- Pixel Integer 60

tesseractLanguages

The OCR part of the SDK relies on so called traineddata files, which are specific to a font and language.
This parameter tells the module which traineddata file to use when performing the OCR.

You can use one of the default traineddata files that comes with the SDK bundle, like eng_no_dict or deu

Range Unit Type Default Mandatory
- - String -

Load the traineddata file on Android

On Android, the traineddata files must be copied first via copyTrainedData

Tip

If you have a Font you want to use, you can head over to trainyourtesseract.com and create a traineddata file for free

charWhitelist

Defines a whitelist of characters that are allowed in a result.

Setting this parameter thoroughly has two benefits:
First, the accuracy of the results will be improved. If you have a code that only contains the number 8, but not the letter B, removing B from the charWhitelist will improve the confidence of the result (as the two symbols can look alike)
Second, this parameter, together with validationRegex will prevent you from getting incorrect results.

Range Unit Type Default Mandatory
- - String -

Missing Characters in charWhitelist

If symbols are detected by the scan, that are not in the charWhitelist, the performance of the scan may suffer

validationRegex

Defines a Regular Expression which the detected result is validated against.

If a detected result does not match the validationRegex, it will not be returned.

The Regular Expression is in ECMAScript regular expressions pattern syntax

Hint

As of version 3.12, the Anyline OCR Module provides predefined Regular Expressions for
URL, EMAIL, ISBN, VIN, IMEI and PRICE
Please see the iOS API Reference and the Android API Reference for further details

Range Unit Type Default Mandatory
- - String -

minConfidence

Defines a minimum confidence the SDK has to have in the result to consider it valid.

Cofidence

The confidence describes how certain the SDK feels that the detected result equals the target to scan.

Range Unit Type Default Mandatory
0 - 100 - Integer 60

Additional Settings in LINE Mode

removeSmallContours

If set to true, small contours in the text will not be considered during the scanning process.

Range Unit Type Default Mandatory
true or false - boolean false

Tip

If your scanning use case only includes latin capital letters and/or numbers, set this to true.
If you also want to scan lower case letters, or other symbols, set this to false, as it may otherwise remove details like the dot of the lower case i

minSharpness

Defines a minimum sharpness that is required of the image to be processed further in the SDK.

It is used to avoid time consuming processing of blurry images which are unlikley to return a result.

Range Unit Type Default Mandatory
0 - 100 - Integer 0 (=Off)

Experimental

This parameter is experimental. It is recommended to set an initial sharpness of 50 and gradually increase the value to a threshold where you get satisfying results

removeWhitespaces

If set to true, any whitespace in the returned result will be removed.

Range Unit Type Default Mandatory
true or false - boolean false

Tip

This can be useful in scenarios where the information might be printed with whitespaces for better readability, but is not necessary. Scanning IBAN codes is one of the examples for this scenario.

Additional Settings in GRID Mode

charCountX

Defines the number of symbols in horizontal direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 4.

Range Unit Type Default Mandatory
- - Integer 1

charCountY

Defines the number of symbols in vertical direction in the grid.
For example, if a code to scan consists of 2 rows with 4 symbols each, this would be set to 2.

Range Unit Type Default Mandatory
- - Integer 1

charPaddingXFactor

Defines the average horizontal distance between two characters, measured in percentage of the characters width.

Range Unit Type Default Mandatory
- Precent Double 1.0

charPaddingYFactor

Defines the average vertical distance between two characters, measured in percentage of the characters width.

Range Unit Type Default Mandatory
- Precent Double 1.0

isBrightTextOnDark

If set to true, the SDK looks for bright symbols on a dark background. If set to false, the SDK looks for dark symbols on bright background.

Range Unit Type Default Mandatory
true or false - boolean false

Setting a Custom Command File

If your use case requires special opimisation, you will be provided a Custom Command File (.ale) by Anyline.

In order to load the custom command file, please refer to the platform specific implementations

Settings and Custom Command File

Notice that the custom script will override all settings made to the Anyline OCR Config, so you don’t have to set the parameters manually as they are optimized for your use-case