Textman; Text Extraction with FastAPI and Pytesseract

Daniel Boadzie
5 min readDec 14, 2020

In Africa and some parts of the world, some businesses still have a large volume of their data in physical files and not digitized. This significantly hinders progress in these companies because valuable insight that could help them make progressive decisions are locked away in these files.

Just imagine what insights will be covered if these data are digitized and analyzed. Fortunately, that is what we will be doing in this article. We will create a web app that will take in a picture containing text and then extract the text from the image using a cool library called Pytessaract.

Setup

Let’s get started by creating a virtual environment and installing the packages we will need for our app.


# create env
conda create --name textman

# activate it
conda activate textman

NB: for windows user, refer to this article to install tesseract https://medium.com/quantrium-tech/installing-and-using-tesseract-4-on-windows-10-4f7930313f82

# on Ubuntu install tesseract first 
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev


# and then install pytesseract
pip install pytesseract
pip install fastapi[all]
pip install opencv-python

Now let’s create a directory called textman and add the following files and folders

.
├── app.py
├── README.md
├── requirements.txt
├── templates
│ └── index.html

Build the app

Lets now add content to the app.py file

from fastapi import FastAPI, Request, File, UploadFile
from starlette.requests import Request
from fastapi.templating import Jinja2Templates
from pydantic import BaseModel
import numpy as np
import io
import cv2
import pytesseract


app = FastAPI()
templates = Jinja2Templates(directory="templates")

@app.get("/")
def home(request: Request):
return templates.TemplateResponse("index.html", {"request": request})


def read_img(img):
text = pytesseract.image_to_string(img)
return(text)

# , file: bytes = File(...)

@app.post("/extract_text")
async def extract_text(request: Request):
label = ""
if request.method == "POST":
form = await request.form()
# file = form["upload_file"].file
contents = await form["upload_file"].read()
image_stream = io.BytesIO(contents)
image_stream.seek(0)
file_bytes = np.asarray(bytearray(image_stream.read()), dtype=np.uint8)
frame = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)
label = read_img(frame)

# return {"label": label}

return templates.TemplateResponse("index.html", {"request": request, "label": label})

First, we imported our libraries, instantiated an app, and specified our template directory so FastAPI can read our index.hml. We then created a home route that displays our index.html file. We also created a function(read_img()) that takes in an image and extract text from it in just one line of code. Cool right? I know! We finally created a post routes that get our image file from the form we will create in the index.html file and then use OpenCV to convert the image into the right format for Pytesseract.

Our markup will now look like the following;

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="preconnect" href="https://fonts.gstatic.com" />
<link
href="https://fonts.googleapis.com/css2?family=Montserrat:wght@300&display=swap"
rel="stylesheet"
/>
<link
href="https://unpkg.com/tailwindcss@^2/dist/tailwind.min.css"
rel="stylesheet"
/>
<title>Image To Text</title>
<style>
body {
content: "By boadzie Daniel";
margin: 0;
min-height: 100vh;
background-color: #ffffff;
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 80 80' width='80' height='80'%3E%3Cpath fill='%2335323a' fill-opacity='0.17' d='M14 16H9v-2h5V9.87a4 4 0 1 1 2 0V14h5v2h-5v15.95A10 10 0 0 0 23.66 27l-3.46-2 8.2-2.2-2.9 5a12 12 0 0 1-21 0l-2.89-5 8.2 2.2-3.47 2A10 10 0 0 0 14 31.95V16zm40 40h-5v-2h5v-4.13a4 4 0 1 1 2 0V54h5v2h-5v15.95A10 10 0 0 0 63.66 67l-3.47-2 8.2-2.2-2.88 5a12 12 0 0 1-21.02 0l-2.88-5 8.2 2.2-3.47 2A10 10 0 0 0 54 71.95V56zm-39 6a2 2 0 1 1 0-4 2 2 0 0 1 0 4zm40-40a2 2 0 1 1 0-4 2 2 0 0 1 0 4zM15 8a2 2 0 1 0 0-4 2 2 0 0 0 0 4zm40 40a2 2 0 1 0 0-4 2 2 0 0 0 0 4z'%3E%3C/path%3E%3C/svg%3E");
font-family: "Montserrat", sans-serif;
font-weight: 700px;
text-align: center;
display: flex;
align-items: center;
justify-content: center;
}
</style>
</head>
<body>
<section class="container text-gray-500 mx-auto px-4 py-4 flex flex-col">
<div>
<h3 class="text-6xl font-bold">Textman</h3>
<p class="mt-2 italic text-2xl text-left lg:text-center font-semibold">
Upload a picture and have the text extracted from it.
</p>
</div>
<form method="post" action="/extract_text" enctype="multipart/form-data">
<div
class="flex w-full h-40 items-center justify-center bg-grey-lighter"
>
<label
class="w-64 flex flex-col items-center px-4 py-4 bg-green-600 text-white rounded-lg shadow-lg tracking-wide uppercase border border-blue cursorpointer hover:bg-blue hover:text-white"
>
<svg
class="w-8 h-8"
fill="#fff"
xmlns="http://www.w3.org/2000/svg"
viewBox="0 0 20
20"
>
<path
d="M16.88 9.1A4 4 0 0 1 16 17H5a5 5
0 0 1-1-9.9V7a3 3 0 0 1 4.52-2.59A4.98 4.98 0 0 1 17
8c0 .38-.04.74-.12 1.1zM11 11h3l-4-4-4 4h3v3h2v-3z"
/>
</svg>
<span class="mt-2 text-base leading-normal">Select a file</span>
<input type="file" name="upload_file" class="hidden" />
</label>
</div>
<div>
<input
class="text-center w-30 text-white bg-blue-500 hover:bg-blue-400 border-0 py-1 px-2 focus:outline-none hover:bg-red-600 rounded text-lg"
type="submit"
value="Extract text"
/>
</div>
</form>
{% if label %}
<div
class="w-1/2 mr-auto ml-auto rounded-lg mt-4 bg-gray-600 text-white py-4 px-4"
>
<p class="text-lg">{{label}}</p>
</div>
{% endif %}
</section>
</body>
</html>

We are using Tailwindcss; a cool utility-first CSS library and we are also using Heropatterns for the cool background.

Our logic in the template is simple. We first check to see if there is a label and then we display the label using Jinja2 template engine.

The only thing we are left with now is to run our app. To do so, add the following to app.py

# app.py 
# import uvicorn at the top of app.py
import uvicorn

# then add the following to the bottom of app.py
if __name__ == "__main__":
uvicorn.run("app:app", host="127.0.0.1", port=8000, reload=True)

Finally, run the app with;

python app.py

If all goes well, you should see the following;

In just a few lines of code, we’ve created an awesome text extractor. How cool is that? The code for this app is available here.

Conclusion

Machine Learning has numerous applications for industry. Text Extraction is just one these benefits that can transform a business and simplify lives. I hope this article will motivate you to try out these cool technologies and perhaps change a live. Thank for Reading!

--

--

Daniel Boadzie

Data scientist | AI Engineer |Software Engineering|Trainer|Svelte Entusiast. Find out more about my me here https://www.linkedin.com/in/boadzie/