Logo Classification
The Logo classification module enables you to train and publish models that can auto-populate ontology fields by identifying document logos. Logo identification leads to document source identification, and this enables the application to lookup for source specific information in the database. If the information is available, it is used to auto-populate the ontology fields as per the requirements. Auto-populating saves time as it is quicker than the regular OCR-and-rule-based extraction and more reliable.
Note
To use business keyword modeling for a given subtype of documents, the subtype must have a field of Source Name type.
The accuracy and reliability of the models in finding correct data depends upon how well they have been trained.
Logo classification models internally use Machine Language (ML) concepts to identify logos present in the documents.
To open Logo classification page, go to the main menu, and then under ML Studio, click Logo classification.
Following activities are involved in publishing a logo classification model:
-
Uploading a batch
-
Setting annotation
-
Publishing the model
-
Self-training the model
Uploading a batch
To train and publish a model, the first step is to upload a batch of documents that can be sampled and annotated. To upload a batch , follow these steps:
Column header descriptions
COLUMN NAME |
DESCRIPTION |
---|---|
Batch Name |
Displays the name of a given batch. |
Duplicate Files |
Displays the number of duplicate files present in the batch. |
Error Files |
Displays the number of error files. |
Last Updated Date |
Displays the date on which the batch was last updated. |
Version |
Displays the version number of the batch. |
Status |
Displays the current status of the batch. To delete any batch, click |
-
On the upper-right corner of the panel, click Add. The Add Batch window opens.
-
Enter the batch name in the Batch Name field
-
Click Select File to browse and add files in the batch.
Note
-
Please select a file with a minimum of three pages present in it or a three single-paged document. You can set the maximum file upload count in application settings. The more number of files, the better the model will perform. However, there is a trade-off between the number of available files and the time required to process them.
-
Permitted file types are PDF, JPG, JPEG, PNG, TIFF, ZIP, BMP, DOC, XPS, and TXT.
-
-
Click Save to create a batch. After the batch is created, it will appear in the panel grid.
Setting annotation and training the model
Annotating a batch refers to labeling the logos identified in a given batch (of documents). You label a logo such that you know to which company/organization does the document belongs. After annotating the required batch, you can train and publish the required model.
To annotate a batch, perform the following steps:
-
In the Upload Batch panel, select the batch that you want to annotate, and then in the bottom-right corner of the section, click Logo Identification. The Upload Batch panel collapses and the Set Annotation panel expands. Also, a message is displayed on the page that the documents (in the selected batch) are being processed.
Note
Selection for a batch for which a model is already published is disabled for that batch.
-
After all the documents in the batch get processed, the logo images appear under each other in the panel.
-
Select and label the logos one-by-one. To label a logo, click the required logo and select the source name from the Source Name drop-down list. You can also create a new category if required by clicking on Add New Source available in the drop-down list and then click Save.
On the Set Annotation panel, following activities can be performed.
-
Rotate: Click on the
to rotate the image.NOTE
-
Mark ROI: Click on the
to mark ROI.
-
Delete: Click on the
to delete the marked ROI.
-
Save: Click on the
to save the marked ROI.
-
Discard Image: Click on the
to discard the page.
Note
A selected logo appears highlighted in the document viewer. You can click on the highlight and then adjust the boundary that appears around the logo to change the captured area.
You have to annotate all the logos. In case you want to omit any, you will have to delete it.
-
-
After labelling the logos, click Start Training to train the model. A message indicating that the model is being trained will appear on the page. After successful training, the model name (same as the source name) will appear under Model Metric panel in the page. The model version will be set to one more than the highest version number that was previously existing in the Model Metric panel.
Model metrics
The Model Metrics panel displays the logo classification models available in the application. The models appear stacked in the order they get created (unless some filter or sorting is applied). Each model appears with an incremented version number as compared to its predecessor.
Column header descriptions
COLUMN NAME |
DESCRIPTION |
---|---|
Batch Name |
Displays the name of a given batch. |
Model Version |
Displays the version of a given batch. |
Train Set File Count |
Displays the number of image profiles that have been used to train a given model. |
Displays the number of image profiles that were used to test the model. |
|
Date of Publish |
Displays the date when a given model was published. |
F1 Score/Accuracy |
Displays the accuracy results (in %) based on the test done on the available logos. (See Test Set File Count column header.) |
Class Name |
Displays the class name to which the model is associated. |
Status |
Displays the status of the model. |
In the panel, some of the models may yet not be published. To publish an unpublished model, select the required model to be published and click Publish. The Publish window opens displaying a message about the latest version being already published.
Click Yes if you want to continue, or else click No.
When you publish a Ready to Publish model, a message is displayed as Model published. Appending previous model to latest model.
Publishing a model
Publishing a model means making it available to the application for use in processing the documents. Published models are applied for processing one-by-one starting from the earliest version (1.00) available to the latest until one of them provides a result.
Note:
Removing a ML model is a permanent action, and once removed, the model cannot be recovered. If needed in the future, you'll have to either re-publish or re-train the model according to your requirements.
To remove a published model, follow these steps:
-
Access the Model Information page. This page provides a comprehensive overview of all the trained ML models available.
-
On the Model Information page, under the Custom tab, identify and find the Logo Classification ML model you wish to remove. Utilize the Search field to filter and find specific records within the displayed grid (Refer searching data).
-
Click
under Action column corresponding to the Logo Classification ML model to remove the ML model. A Attention window opens.
-
Click Yes to permanently remove the ML model from the list. You can view the removed model on the Model Metrics page.
Re-publishing a model is a crucial step in ensuring that the latest improvements or modifications are reflected in its performance.
Note:
Before initiating the re-publishing process, carefully review the existing details of the model and make any necessary updates to align with your desired changes.
To re-publish a model, follow these steps:
-
On the Model Metrics page, identify the ML model that you want to re-publish. Models that are not currently published will likely be displayed as UnPublished.
-
Select the unpublished model available in the panel, which you want to re-publish. A Publish button is visible. Utilize the Search field to filter and find specific records within the displayed grid (Refer searching data).
-
Click Publish to re-publish the model. Once published, the status of the model changes from UnPublished to Published indicating that the re-publishing process was successful.
You can view the published model on the Model Information page. Take a moment to review the details associated with the re-published model.
Self-training the model
The initiate self-learning feature of TruCap+ enables the user to self-train the application. The data for self-training the application gets generated automatically from the processed documents. Once the user initiates self-learning, all the processed documents' data gets added as a self-trained batch. The user can then annotate, train, and publish the self-trained batch.
To self-train a model, on the top-right corner of the Upload Batch panel, click Initiate Self Learning.