Logo Classification

The Logo classification module enables you to train and publish models that can auto-populate ontology fields by identifying document logos. Logo identification leads to document source identification, and this enables the application to lookup for source specific information in the database. If the information is available, it is used to auto-populate the ontology fields as per the requirements. Auto-populating saves time as it is quicker than the regular OCR-and-rule-based extraction and more reliable.

Note

To use business keyword modeling for a given subtype of documents, the subtype must have a field of Source Name type.

The accuracy and reliability of the models in finding correct data depends upon how well they have been trained.

Logo classification models internally use Machine Language (ML) concepts to identify logos present in the documents.

 

To open Logo classification page, go to the main menu, and then under ML Studio, click Logo classification.

Following activities are involved in publishing a logo classification model:

  1. Uploading a batch

  2. Setting annotation

  3. Publishing the model

  4. Self-training the model

Uploading a batch

To train and publish a model, the first step is to upload a batch of documents that can be sampled and annotated. To upload a batch , follow these steps: 

  1. On the Logo Classification page, click to expand the Upload Batch panel.

Column header descriptions

COLUMN NAME

DESCRIPTION

Batch Name

Displays the name of a given batch.

Duplicate Files

Displays the number of duplicate files present in the batch.

Error Files

Displays the number of error files.

Last Updated Date

Displays the date on which the batch was last updated.

Version

Displays the version number of the batch.

Status

Displays the current status of the batch.

To delete any batch, click corresponding to the batch name.

  1. On the upper-right corner of the panel, click Add. The Add Batch window opens.

  2. Enter the batch name in the Batch Name field

  3. Click Select File to browse and add files in the batch.

    Note

    1. Please select a file with a minimum of three pages present in it or a three single-paged document. You can set the maximum file upload count in application settings. The more number of files, the better the model will perform. However, there is a trade-off between the number of available files and the time required to process them.

    2. Permitted file types are PDF, JPG, JPEG, PNG, TIFF, ZIP, BMP, DOC, XPS, and TXT.

     

  4. Click Save to create a batch. After the batch is created, it will appear in the panel grid.

Setting annotation and training the model

Annotating a batch refers to labeling the logos identified in a given batch (of documents). You label a logo such that you know to which company/organization does the document belongs. After annotating the required batch, you can train and publish the required model.

To annotate a batch, perform the following steps: 

  1. In the Upload Batch panel, select the batch that you want to annotate, and then in the bottom-right corner of the section, click Logo Identification. The Upload Batch panel collapses and the Set Annotation panel expands. Also, a message is displayed on the page that the documents (in the selected batch) are being processed.

    Note

    Selection for a batch for which a model is already published is disabled for that batch.

  2. After all the documents in the batch get processed, the logo images appear under each other in the panel.

  1. Select and label the logos one-by-one. To label a logo, click the required logo and select the source name from the Source Name drop-down list. You can also create a new category if required by clicking on Add New Source available in the drop-down list and then click Save.

    On the Set Annotation panel, following activities can be performed.

    1. Rotate: Click on the to rotate the image.NOTE

    2. Mark ROI: Click on the to mark ROI.

    3. Delete: Click on the to delete the marked ROI.

    4. Save: Click on the to save the marked ROI.

    5. Discard Image: Click on the to discard the page.

    Note

    A selected logo appears highlighted in the document viewer. You can click on the highlight and then adjust the boundary that appears around the logo to change the captured area.

    You have to annotate all the logos. In case you want to omit any, you will have to delete it.

  1. After labelling the logos, click Start Training to train the model. A message indicating that the model is being trained will appear on the page. After successful training, the model name (same as the source name) will appear under Model Metric panel in the page. The model version will be set to one more than the highest version number that was previously existing in the Model Metric panel.

Model metrics

The Model Metrics panel displays the logo classification models available in the application. The models appear stacked in the order they get created (unless some filter or sorting is applied). Each model appears with an incremented version number as compared to its predecessor.

Column header descriptions

COLUMN NAME

DESCRIPTION

Batch Name

Displays the name of a given batch.

Model Version

Displays the version of a given batch.

Train Set File Count

Displays the number of image profiles that have been used to train a given model.

Test Set File Count

Displays the number of image profiles that were used to test the model.

Date of Publish

Displays the date when a given model was published.

F1 Score/Accuracy

Displays the accuracy results (in %) based on the test done on the available logos. (See Test Set File Count column header.)

Class Name

Displays the class name to which the model is associated.

Status

Displays the status of the model.

In the panel, some of the models may yet not be published. To publish an unpublished model, select the required model to be published and click Publish. The Publish window opens displaying a message about the latest version being already published.

Click Yes if you want to continue, or else click No.

When you publish a Ready to Publish model, a message is displayed as Model published. Appending previous model to latest model.

Publishing a model

Publishing a model means making it available to the application for use in processing the documents. Published models are applied for processing one-by-one starting from the earliest version (1.00) available to the latest until one of them provides a result.

 

 

Note:

Removing a ML model is a permanent action, and once removed, the model cannot be recovered. If needed in the future, you'll have to either re-publish or re-train the model according to your requirements.

To remove a published model, follow these steps:

  1. Access the Model Information page. This page provides a comprehensive overview of all the trained ML models available.

  1. On the Model Information page, under the Custom tab, identify and find the Logo Classification ML model you wish to remove. Utilize the Search field to filter and find specific records within the displayed grid (Refer searching data).

  2. Click under Action column corresponding to the Logo Classification ML model to remove the ML model. A Attention window opens.

  3. Click Yes to permanently remove the ML model from the list. You can view the removed model on the Model Metrics page.

 

Re-publishing a model is a crucial step in ensuring that the latest improvements or modifications are reflected in its performance.

Note:

Before initiating the re-publishing process, carefully review the existing details of the model and make any necessary updates to align with your desired changes.

To re-publish a model, follow these steps:

  1. On the Model Metrics page, identify the ML model that you want to re-publish. Models that are not currently published will likely be displayed as UnPublished.

  2. Select the unpublished model available in the panel, which you want to re-publish. A Publish button is visible. Utilize the Search field to filter and find specific records within the displayed grid (Refer searching data).

  3. Click Publish to re-publish the model. Once published, the status of the model changes from UnPublished to Published indicating that the re-publishing process was successful.

    You can view the published model on the Model Information page. Take a moment to review the details associated with the re-published model.

 

Self-training the model

The initiate self-learning feature of TruCap+ enables the user to self-train the application. The data for self-training the application gets generated automatically from the processed documents. Once the user initiates self-learning, all the processed documents' data gets added as a self-trained batch. The user can then annotate, train, and publish the self-trained batch.

To self-train a model, on the top-right corner of the Upload Batch panel, click Initiate Self Learning.