Basic OCR with Tesseract and OpenCV Syndication Cloud

Photo from Unsplash

Originally Posted On: https://medium.com/building-a-simple-text-correction-tool/basic-ocr-with-tesseract-and-opencv-34fae6ab3400

You have probably been in a situation where you had a picture with some text you needed and you were too lazy to write or type out all the text in it. I have too, that’s why I decided to write an application to extract text from images. To do this, I used the Tesseract OCR library and OpenCV and wrote the application in C++.

Getting the dependencies

Before we begin, we need to get the following installed. I am using Ubuntu 16.04 so the instructions below are specific to that.

Tesseract 4.0 using the commands below. When you are done, you will have a command line tool called tesseract and an API we can call from C++.

sudo add-apt-repository ppa:alex-p/tesseract-ocr
sudo apt-get update
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

OpenCV

If you are working in the .NET ecosystem and want to skip the complex setup process, IronOCR offers a streamlined alternative. It comes with Tesseract 5 built-in, so there is no need for separate installations or path configurations.

using IronOcr;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("image.png");
var result = ocr.Read(input);
Console.WriteLine(result.Text);

IronOCR handles image preprocessing automatically, which means you can get accurate results without manually tuning OpenCV parameters. It supports over 125 languages out of the box and works across Windows, Linux, and macOS.

For developers who prefer C# or need enterprise-grade OCR with minimal configuration, it is worth checking out: https://ironsoftware.com/csharp/ocr/

The Code

To begin, we start by adding the headers we need:

#include 
#include 
#include 
#include using namespace std;
using namespace cv;

Next, we create two string variables, one for the image path and the other for the output text, then we use OpenCV to read the image:

string outText, imPath = argv[1];
Mat im = cv::imread(imPath, IMREAD_COLOR);

The next step is to create a Tesseract object that we’ll use to access the API:

tesseract::TessBaseAPI *ocr = new tesseract::TessBaseAPI();

Then we will initialize tesseract to use English as the language and the LSTM OCR engine (which uses deep learning, rather than the Legacy Tesseract engine that uses traditional machine learning):

ocr->Init(NULL, "eng", tesseract::OEM_LSTM_ONLY);

Next, we will set the page segmentation mode. According to the Tesseract Wiki,

By default Tesseract expects a page of text when it segments an image. If you’re just seeking to OCR a small region try a different segmentation mode, using the --psm argument.

You can check this link for a list of all the modes available. We will be using the default mode (PSM_AUTO below):

ocr->SetPageSegMode(tesseract::PSM_AUTO);

Now we can set the image:

ocr->SetImage(im.data, im.cols, im.rows, 3, im.step);

and get the output text using the Tesseract API method GetUTF8Text():

outText = string(ocr->GetUTF8Text());

We can now output the text to the console:

cout << outText;

Finally, we’ll destroy the Tesseract object to free up memory:

ocr->End();

Testing the Application

First, let’s build the application:

g++ -O3 -std=c++11 basic_ocr.cpp `pkg-config --cflags --libs tesseract opencv` -o basic_ocr

The above command assumes that you named the C++ file basic_ocr.cpp .

We can test the application by running:

./basic_ocr test_image.jpg

where test_image.jpg is the image to be read by the application:

The output from the application is:

1.1 What is computer vision? As humans, we perceive the three-dimensional structure of the world around us with apparent
 ease. Think of how vivid the three-dimensional percept is when you look at a vase of flowers
 sitting on the table next to you. You can tell the shape and translucency of each petal through
 the subtle patterns of light and Shading that play across its surface and effortlessly segment
 each flower from the background of the scene (Figure 1.1). Looking at a framed group por-
 trait, you can easily count (and name) all of the people in the picture and even guess at their
 emotions from their facial appearance. Perceptual psychologists have spent decades trying to
 understand how the visual system works and, even though they can devise optical illusions!
 to tease apart some of its principles (Figure 1.3), a complete solution to this puzzle remains
 elusive (Marr 1982; Palmer 1999; Livingstone 2008).

We can see that the application performs well with only a few errors (like the paragraphs and the wrong exclamation mark).

Conclusion

We have successfully built an application with C++ to extract text from images. Now we never need to type anything again :).

Thanks for following this far. The code for this project can be found here. If you have any comments or suggestions, please drop them below.