Technical tips & tricks identified in daily activities: Cognitive Services: OCR vs Analyze Layout vs Analyze Invoice vs Analyze Forms

Recently I was required to do some analysis on some of the APIs offered within the Azure Cognitive Services suite. Initially the APIs that were going to be analyzed were the following listed below. However one of them was later dropped due to the reasons that will be provided later.

OCR
Analyze Layout (v2.0)
Analyze Layout (v2.1 preview)
Analyze Invoice (v2.1 preview)
Analyze Forms (v2.0)
Analyze Forms (v2.1 preview)

When performing this analyses, a set of 8 different documents was used.

OCR vs Analyze Layout (v2.0)

The OCR engine used within Analyze Layout is different from the one offered by the OCR API. This conclusion was made as on one particular instance some specific text was OCRed differently
When using scanned documents, Analyze Layout seemed to pick-up noise which adds no value
On a couple of documents the coordinates for data extracted between the two APIs were completely different
On both APIs, text which spans on multiples lines gets extracted as different lines
OCR doesn't produce any structure to define table data
The JSON result produced by the Analyze Layout includes a new section pageResults. This is being used to define table structured data
On some occasions, Analyze Layout did extract table structured data but which do not map to any table structure
On some occasions, Analyze Layout did extract only parts of a table
When used a document with 2 pages, a table structure was identified on page 2 but not on page 1

OCR vs Analyze Layout (v2.1)

The OCR engine used within Analyze Layout is different from the one offered by the OCR API. This conclusion was made as on one particular instance some specific text was OCRed differently. In fact the value produced was also different from the value produced via v2.0
On both APIs, text which spans on multiples lines gets extracted as different lines
OCR doesn't produce any structure to define table data
The JSON result produced by the Analyze Layout includes a new section pageResults. This is being used to define table structured data
On different number of occasions, Analyze Layout was capable of extracting table structure related data
When used a document with multiple pages, table structure data was extracted on both pages
On one specific document, a particular symbol was identified as a selection mark (looks to be the identification of a checkbox)
On some instances within the table structure data, related text on multiple lines was being amalgamated
There seems to be the introduction of appearance metadata related to the data being extracted. However, this looked to be static as the same style was observed across all the data

Analyze Invoice (v2.1)

The JSON result produced excludes the lines section that is included in the OCR and Analyze Layout APIs
The JSON result produced includes a new section pageResults. This is being used to define table structured data. Compared to the Analyze Layout APIs, this is excluding references to lines information
The JSON result produced includes a new section documentResults to help classifying content within the related document
The classification data consists of a key value pair for single matches, but is capable of classifying table structure data

Analyze Forms (v2.1)

To analyze this API, the use of the FOTT tool was made use. This simplifies the generation of JSON messages to be used for the respective APIs. To understand how this tool work, the steps within this video were followed: Steps to use FOTT tool.

It is being assumed / concluded that Analyze Invoice is making use of Analyze Forms, but with already pre-trained data. Similarly there are some already other existing APIs to cater for Business Cards, etc.

Analyze Forms (v2.0)

This API wasn't analyzed as the FOTT tool doesn't support this version. Hence, the generation of JSON messages manually would have been a complex task. Also, there would have been a chance that the results would be similar or worse to the results produced within v2.1

Technical tips & tricks identified in daily activities

Thursday, May 13, 2021

Cognitive Services: OCR vs Analyze Layout vs Analyze Invoice vs Analyze Forms

No comments: