Recently I was required to do some analysis on some of the APIs offered within the Azure Cognitive Services suite. Initially the APIs that were going to be analyzed were the following listed below. However one of them was later dropped due to the reasons that will be provided later.
- OCR
- Analyze Layout (v2.0)
- Analyze Layout (v2.1 preview)
- Analyze Invoice (v2.1 preview)
- Analyze Forms (v2.0)
- Analyze Forms (v2.1 preview)
When performing this analyses, a set of 8 different documents was used.
OCR vs Analyze Layout (v2.0)
- The OCR engine used within Analyze Layout is different from the one offered by the OCR API. This conclusion was made as on one particular instance some specific text was OCRed differently
- When using scanned documents, Analyze Layout seemed to pick-up noise which adds no value
- On a couple of documents the coordinates for data extracted between the two APIs were completely different
- On both APIs, text which spans on multiples lines gets extracted as different lines
- OCR doesn't produce any structure to define table data
- The JSON result produced by the Analyze Layout includes a new section pageResults. This is being used to define table structured data
- On some occasions, Analyze Layout did extract table structured data but which do not map to any table structure
- On some occasions, Analyze Layout did extract only parts of a table
- When used a document with 2 pages, a table structure was identified on page 2 but not on page 1
OCR vs Analyze Layout (v2.1)
- The OCR engine used within Analyze Layout is different from the one offered by the OCR API. This conclusion was made as on one particular instance some specific text was OCRed differently. In fact the value produced was also different from the value produced via v2.0
- On both APIs, text which spans on multiples lines gets extracted as different lines
- OCR doesn't produce any structure to define table data
- The JSON result produced by the Analyze Layout includes a new section pageResults. This is being used to define table structured data
- On different number of occasions, Analyze Layout was capable of extracting table structure related data
- When used a document with multiple pages, table structure data was extracted on both pages
- On one specific document, a particular symbol was identified as a selection mark (looks to be the identification of a checkbox)
- On some instances within the table structure data, related text on multiple lines was being amalgamated
- There seems to be the introduction of appearance metadata related to the data being extracted. However, this looked to be static as the same style was observed across all the data
Analyze Invoice (v2.1)
- The JSON result produced excludes the lines section that is included in the OCR and Analyze Layout APIs
- The JSON result produced includes a new section pageResults. This is being used to define table structured data. Compared to the Analyze Layout APIs, this is excluding references to lines information
- The JSON result produced includes a new section documentResults to help classifying content within the related document
- The classification data consists of a key value pair for single matches, but is capable of classifying table structure data
Analyze Forms (v2.1)
To analyze this API, the use of the FOTT tool was made use. This simplifies the generation of JSON messages to be used for the respective APIs. To understand how this tool work, the steps within this video were followed: Steps to use FOTT tool.
It is being assumed / concluded that Analyze Invoice is making use of Analyze Forms, but with already pre-trained data. Similarly there are some already other existing APIs to cater for Business Cards, etc.
Analyze Forms (v2.0)
This API wasn't analyzed as the FOTT tool doesn't support this version. Hence, the generation of JSON messages manually would have been a complex task. Also, there would have been a chance that the results would be similar or worse to the results produced within v2.1
No comments:
Post a Comment