Tutorial #7: Explore IDEAL Group’s “Tesseract,” Online OCR Implementation

First, SIGN UP:

To Sign up for CRIS OCR, please go to SIGN UP or LOGIN and click on "I want to register".  A "New User Registration" dialog box will appear. See Figure 1.

Type in an eMail, Name, and Password. Click the Register button. Here are some credentials you can use to test the technology:

SECOND,  DOWNLOAD DOCUMENTS FOR TESTS

Test documents to download, submit to the OCR engine, and otherwise experiment with:

DOCUMENTATION AND INSTRUCTIONS:

Signup Page View
Signup Page View

 

Upon successful sign up you will be directly logged into the system. You will see the user dashboard as in Figure 2. Details of dashboard are described in Section 2 below.

Figure 2. Successful Sign Up
Figure 2. Successful Sign Up

 

Using the Archives System

When you login successfully or register successfully in the CRIS Archives Application you will see the "Logout" button on Top Right corner so that you can logout of the application when you have completed your work.

On the top left, there are two buttons for adding files to the  CRIS Archives Application.

  1. "Upload File" : Using this you can upload any PDF file into the archives application. The application then performs OCR on the PDF file uploaded to extract text from the PDF file uploaded.
  2. "Create File": Creates a fresh file rather than performing OCR on already existing file, then you can click on this button.

The two tables below are initially empty.  

Uploaded:

Here you will see  the PDF files that have uploaded or created using the buttons for "Upload File" or "Create File"

By default  100 records are shown, but you can customize the number of records you would like to see.

You can also type in Search box to find matching file names.

When you have uploaded the file to the system, you will see the following entries for a file in a single row.

  • File Name
  • Tesseract : (OCR engine) It has two buttons:
    • Edit Button: For Editing the OCR output generated from the PDF file or Edit the text file created.
    • Ebook Button: For downloading the ebook for the corresponding OCRed document.
  •  Action: Actions that you can perform on each file
    • Share: If you would like to invite any other user to edit the OCR output. After clicking the button,  enter the email id of the user who you would like to share the document.  They must have an account in the system.
    • Delete: If you would like to delete the entry for the file from the system.

Shared:

  • The system allows you to invite other collaborators to edit the same file that is on your system. Here you will see a list of files if anyone has invited you to edit a file "Uploaded" or "Created By" other users.

 

Steps for Uploading a file for OCR

Click on Upload File button.  You will taken to a page where you can drag and drop the file you would like to upload or you can click on the area to upload a file.

  • Once you select the file, please wait for the file uploader to complete 100% and show you the message "File uploaded successfully and queued for processing".
  • You can upload more files if you like using the same process, or you can click on "Check Files" to go back to list of files.

In the list of files in Uploaded section, you can search the name of the file you just uploaded.

  • Click on the "Edit" button. If the file process is not complete, it will show you the message "The file submitted by you is still being processed."
  • If the OCR process has completed successfully, you will be taken to the editor,  where you can see the original file uploaded and OCR output next to each other.

 

Steps for Creating an EPUB:

Once the OCR process has successfully completed, you are taken to the page where you can see the original file and the OCR output in an editor side by side.

The editor on the web-browser has all the standard editing functions of MS-Word. You can format the output of the OCR and correct to match the original document.

Please make sure to mark the headings in the document accordingly as they are used by the EPUB generator to create table of contents.

Once you have finished the formatting and correcting of the OCR output, you can click on EPUB button on the editor to export the document in EPUB Format.

The exported EPUB format is readable on any fully compatible EPUB reader.