Dev Center
Table of contents

{WebTwainObject}.Addon.OCR

{WebTwainObject} denotes the WebTwain instance.

For Server Side OCR, check out Server-Side OCR.

Methods

       
Download() DownloadLangData() IsModuleInstalled() SetLanguage()
SetOutputFormat() SetPageSetMode() GetIfUseDetectedFont() SetIfUseDetectedFont()
GetUnicodeFontName() SetUnicodeFontName() GetMinFontSizeforMoreAccurateResult() SetMinFontSizeforMoreAccurateResult()
Recognize() RecognizeFile() RecognizeRect() RecognizeSelectedImages()

Download

Syntax

/**
 * Download and install the OCR add-on on the local system.
 * @param path The URL to download the add-on (typically a ZIP file).
 * @param successCallback A callback function that is executed if the request succeeds.
 * @param failureCallback A callback function that is executed if the request fails.
 * @argument errorCode The error code.
 * @argument errorString The error string.
 */
Download(
    path: string,
    successCallback: () => void,
    failureCallback: (
        errorCode: number,
        errorString: string
    ) => void
): void;

IsModuleInstalled

Syntax

/**
 * Return whether the OCR engine has been installed.
 */
IsModuleInstalled(): boolean;

DownloadLangData

Syntax

/**
 * Download and install an OCR language package.
 * @param path The URL to download the package (typically a ZIP file).
 * @param successCallback A callback function that is executed if the request succeeds.
 * @param failureCallback A callback function that is executed if the request fails.
 * @argument errorCode The error code.
 * @argument errorString The error string.
 */
DownloadLangData(
    path: string,
    successCallback: () => void,
    failureCallback: (
        errorCode: number,
        errorString: string
    ) => void
): void;

GetIfUseDetectedFont

Syntax

/**
 * Return whether the output uses the fonts detected by the OCR system or the default/provided ones. Only valid when the result format is PDF.
 */
GetIfUseDetectedFont(): boolean;

SetIfUseDetectedFont

Syntax

/**
 * Set whether the output uses the fonts detected by the OCR system or the default/provided ones. Only valid when the result format is PDF.
 * @param value Whether to use or not the detected font.
 */
SetIfUseDetectedFont(value: boolean): boolean;

GetMinFontSizeforMoreAccurateResult

Syntax

/**
 * Return the font size base to apply higher-level regional accurate OCR.
 */
GetMinFontSizeforMoreAccurateResult(): number;

SetMinFontSizeforMoreAccurateResult

Syntax

/**
 * Set the font size base to apply higher-level regional accurate OCR.
 * @param size Specify the size.
 */
SetMinFontSizeforMoreAccurateResult(size: number): number;

Usage notes

If the font size is set to 0, it indicates no regional accurate OCR will be performed.

GetUnicodeFontName

Syntax

/**
 * Return the font name for OCR. Only valid when the output format is PDF.
 */
GetUnicodeFontName(): string;

SetUnicodeFontName

Syntax

/**
 * Set the font name for OCR. Only valid when the output format is PDF.
 * @param name Specify a font to be used for the OCR.
 */
SetUnicodeFontName(name: string): boolean;

Usage notes

The name parameter in SetUnicodeFontName() should be the name (without the .font extension) of an existing Windows font in the directory ( C:\Windows\Fonts ).

The definition of a “unicode” font is loose. Any font can be provided here, however it needs to support the characters of the language to be used. Some fonts may support only some languages, but certain fonts such as ArialUni supports all common languages.

The font set with SetUnicodeFontName() is only used when SetIfUseDetectedFont() was called with false as the argument for value . In other words, the set font will only be used if the engine doesn’t use the detected fonts.

SetLanguage

Syntax

/**
 * Configure the OCR operation.
 * @param language Specify the target language.
 */
SetLanguage(language: Dynamsoft.DWT.EnumDWT_OCRLanguage | string): boolean;

SetOutputFormat

Syntax

/**
 * Configure the OCR operation.
 * @param format Specify the output format.
 */
SetOutputFormat(format: Dynamsoft.DWT.EnumDWT_OCROutputFormat | number): boolean;

SetPageSetMode

Syntax

/**
 * Configure the OCR operation.
 * @param mode Specify the OCR page layout analysis mode.
 */
SetPageSetMode(mode: Dynamsoft.DWT.EnumDWT_OCRPageSetMode | number): boolean;

Usage notes

The default language is eng which indicates English. To use a certain language, you must first have its language data locally, if it’s not available yet, you can download it using the method DownloadLangData().

The default format is OCROF_PDFIMAGEOVERTEXT which indicates an Image-over-Text PDF.

The default mode is PSM_AUTO which indicates automatic page segmentation.

Recognize

Syntax

/**
 * Perform OCR on the specified image in the buffer.
 * @param index Specify the image.
 * @param successCallback A callback function that is executed if the request succeeds.
 * @param failureCallback A callback function that is executed if the request fails.
 * @argument imageId The imageId of the image which can be used to find the index.
 * @argument result The OCR result.
 * @argument errorCode The error code.
 * @argument errorString The error string.
 */
Recognize(
    index: number,
    successCallback: (
        imageId: number,
        result: OCRResult
    ) => void,
    failureCallback: (
        errorCode: number,
        errorString: string
    ) => void
): void;

RecognizeFile

Syntax


/**
 * Perform OCR on the specified local file.
 * @param path Specify a local file.
 * @param successCallback A callback function that is executed if the request succeeds.
 * @param failureCallback A callback function that is executed if the request fails.
 * @argument path The file path.
 * @argument result The OCR result.
 * @argument errorCode The error code.
 * @argument errorString The error string.
 */
RecognizeFile(path: string,
    successCallback: (
        path: string,
        result: OCRResult
    ) => void,
    failureCallback: (
        errorCode: number,
        errorString: string
    ) => void
): void;

RecognizeRect

Syntax

/**
 * Perform OCR on the specified rectangular area on the image.
 * @param index Specify the image.
 * @param left Specify the rectangle (leftmost coordinate in pixels).
 * @param top Specify the rectangle (topmost coordinate in pixels).
 * @param right Specify the rectangle (rightmost coordinate in pixels).
 * @param bottom Specify the rectangle (bottommost coordinate in pixels).
 * @param successCallback A callback function that is executed if the request succeeds.
 * @param failureCallback A callback function that is executed if the request fails.
 * @argument imageId The imageId of the image which can be used to find the index.
 * @argument result The OCR result.
 * @argument errorCode The error code.
 * @argument errorString The error string.
 */
RecognizeRect(
    index: number,
    left: number,
    top: number,
    right: number,
    bottom: number,
    successCallback: (
        imageId: number,
        left: number,
        top: number,
        right: number,
        bottom: number,
        result: OCRResult
    ) => void,
    failureCallback: (
        errorCode: number,
        errorString: string
    ) => void
): void;

RecognizeSelectedImages

Syntax

/**
 * Perform OCR on the selected images in the buffer.
 * @param successCallback A callback function that is executed if the request succeeds.
 * @param failureCallback A callback function that is executed if the request fails.
 * @argument result The OCR result.
 * @argument errorCode The error code.
 * @argument errorString The error string.
 */
RecognizeSelectedImages(
    successCallback: (
        result: OCRResult
    ) => void,
    failureCallback: (
        errorCode: number,
        errorString: string
    ) => void
): void;

Usage notes

interface OCRResult {
  /**
   * Return a base64 string that contains the result of the OCR.
   * Newlines are represented by the newline character: '\n'.
   */
  Get(): string;
  /**
   * Return the error code.
   */
  GetErrorCode(): number;
  /**
   * Return the error string.
   */
  GetErrorString(): string;
  /**
   * Return the output format.
   */
  GetFormat(): number;
  /**
   * Return the source information. It could be the index of the OCR'd image or the path of the OCR'd file.
   */
  GetInput(): number | string;
  /**
   * Save the OCR result as a file.
   * @param path The path to save the file.
   */
  Save(path: string): boolean;
  /**
   * Return the number of pagesets in the OCR result.
   */
  GetPageSetCount(): number;
  /**
   * Return the content of a pageset.
   * @param index Specify the pageset
   */
  GetPageSetContent(index: number): PageSet;
}
interface PageSet {
  /**
   * Return the number of pages in the pageset.
   */
  GetPageCount(): number;
  /**
   * Return the content of the specified page.
   * @index Specify the page.
   */
  GetPageContent(index: number): Page;
}
interface Page {
  /**
   * Return the number of lines in the page.
   */
  GetLineCount(): number;
  /**
   * Return the content of the specified line.
   * @index Specify the line.
   */
  GetLineContent(index: number): Line;
}
interface Line {
  /**
   * Return the number of words in the line.
   */
  GetWordCount(): number;
  /**
   * Return the coordinates for the rectangle that contains the specified line. The coordinates (in pixels) are in the sequence of "left,top,right,bottom" like "121,125,892,143".
   */
  GetLineRect(): string;
  /**
   * Return the content of the specified word.
   * @index Specify the word.
   */
  GetWordContent(index: number): Word;
}
interface Word {
  /**
   * Return the font name/size of the word.
   */
  GetFontName(): string;
  GetFontSize(): number;
  /**
   * Return the text of the word.
   */
  GetText(): string;
  /**
   * Return the coordinates for the rectangle that contains the specified word. The coordinates are in the sequence of "left,top,right,bottom" like "121,126,157,139".
   * @index Specify the word.
   */
  GetWordRect(index: number): string;
}

Server-Side

The following are the JAVA APIs of the server-side OCR engine.

API Description
void setProductKey(String strProductKey) Set the ProductKey of the OCR engine.
String getOCRDllPath() Return the path of the OCR engine.
void setOCRDllPath(String strOCRDllPath) Set the path of the OCR engine.
String getOCRLanguage() Return the target language.
void setOCRLanguage(String strOCRLanguage) Set the target language for the OCR.
int getOCRMinFontSizeDoMoreOCR() Return the font size which determines whether the engine should perform OCR again on areas with bigger font size.
void setOCRMinFontSizeDoMoreOCR(int iMinFontSizeDoMoreOCR) The engine will perform OCR again on areas where the font size is bigger than what’s set by this API.
int getOCRPageSetMode() Return the mode used to analyze the OCR input.
void setOCRPageSetMode(int iOCRPageSetMode) Set how pages are determined when analyzing the OCR input.
int getOCRPdfFontSize() Return the font size for the output PDF file.
void setOCRPdfFontSize(int iPdfFontSize) Set the font size for the output PDF file.
int getOCRResultFormat() Return the result format for the OCR.
void setOCRResultFormat(int iOCRResultFormat) Set the result format for the OCR.
String getOCRTessDataPath() Return the path of the language packages.
void setOCRTessDataPath(String strOCRTessDataPath) Set the path of the language packages.
String getOCRUnicodeFontName() Return the detected OCR font name.
void setOCRUnicodeFontName(String strOCRUnicodeFontName) Set a font to be used by OCR when  isOCRUseDetectedFont  returns  false .
boolean isOCRNumbericOnly() Return whether the OCR engine only OCR’s numbers.
void setOCRIsNumbericOnly(boolean bNumbericOnly) Set whether the OCR engine should only OCR numbers.
boolean isOCRUseDetectedFont() Return whether the PDF output uses the font detected by the OCR engine.
void setOCRUseDetectedFont(boolean bUseDetectedFont) Set whether the PDF output uses the font detected by the OCR engine or the default/provided one.
byte array ocrFile(String strImagePath, out byte array aryOCRResult) Start to OCR an image on the disk.   aryOCRResult  returns details of the OCR result which you can get by calling  getValue()  on it.
byte array ocrImage(byte array aryImageBuffer, out byte array aryOCRResult) Start to OCR an image in the buffer.  aryOCRResult  returns details of the OCR result which you can get by calling  getValue()  on it.

Is this page helpful?

YesYes NoNo

In this article:

latest version

  • Latest Version
  • Version 17.1.1
  • Version 17.0
  • Version 16.2
  • Version 16.1.1
Change +
© 2003–2022 Dynamsoft. All rights reserved.
Privacy Statement / Site Map / Home / Purchase / Support