Summaryman; Text Summarization using Gensim and FastAPI

Daniel Boadzie
5 min readDec 22, 2020

--

summaryman

A few months ago, I wrote an article demonstrating text summarization using a wordcloud on Streamlit. I wanted to build the same app on using FastAPI and Gensim in this article. The code for this article can be found here.

Let’s dive into it by creating our virtual environment. It is important that you make a habbit of creating virtualenv so your packages versions don’t conflict.

# create env
# this assumes you have anaconda installed
conda create --name summarizer

# activate it
conda activate summarizer

Then let’s install our dependencies from the requirements.text file that can be found here by running the following command in our env;

pip install -r requirements.text

Once our packages are installed, we are ready to build our app but first we will create our folders and files in a desired directory. Our folder structure will look like the following;

.
├── app.py
├── README.md
├── requirements.txt
└── templates
└── index.html

2 directories, 6 files

Lets begin with the app.py where our magic will happen. This file will look like the following;

from io import BytesIO
import base64
from fastapi import FastAPI
from starlette.requests import Request
from fastapi.templating import Jinja2Templates
import nltk
from nltk.tokenize import sent_tokenize
from gensim.summarization import summarize
from wordcloud import WordCloud, STOPWORDS

app = FastAPI()

templates = Jinja2Templates(directory="templates")


nltk.download('punkt') # download this

@app.get("/")
def home(request: Request):
return templates.TemplateResponse("index.html", {"request": request})

@app.post("/")
async def home(request: Request):
sumary=""
if request.method == "POST":
form = await request.form()
if form["message"] and form["word_count"]:
word_count = form["word_count"]
text = form["message"]
sumary = summarize(text, word_count=int(word_count))
sentences = sent_tokenize(sumary) # tokenize it
sents = set(sentences)
sumary = ' '.join(sents)
word_cloud = wordcloud(sumary)
return templates.TemplateResponse("index.html", {"request": request, "sumary": sumary, "wordcloud": word_cloud})



def wordcloud(text):
stopwords = set(STOPWORDS)
wordcloud = WordCloud(width = 800, height = 800,
background_color ='white',
stopwords = stopwords,
min_font_size = 10).generate(text).to_image()
img = BytesIO()
wordcloud.save(img, "PNG")
img.seek(0)
img_b64 = base64.b64encode(img.getvalue()).decode()
return img_b64

In the code above, we are first importing our libraries. Then we created an instance of FastAPI in a variable called app then we specify our templates folder and created two routes. A get and a post routes. The getroute will show our form and the post route will process our form data. We are accepting two field in our form. The text to summarize and a word count field specifying the number of words expected in the summary. To avoid duplicate text in our summary and hence improve the summary, we had to tokenize(split sentence into smaller units) the text with the nltk library.

We finally created a function that takes text and return a wordcloud image (png format) of the text and in our case, we want a wordcloud of the summarized text. We called the function in our post route.

The template for our app will look like the following ;

<!DOCTYPE html>
<html>
<head>
<!--Import Google Icon Font-->
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta http-equiv="X-UA-Compatible" content="ie=edge" />

<!-- Compiled and minified CSS -->
<link
href="https://unpkg.com/tailwindcss@^2/dist/tailwind.min.css"
rel="stylesheet"
/>
<link rel="preconnect" href="https://fonts.gstatic.com">
<link href="https://fonts.googleapis.com/css2?family=Rajdhani:wght@500&display=swap" rel="stylesheet">
<title>Summaryman</title>
<style>
body {
background-color: #e2e2e2;
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='28' height='49' viewBox='0 0 28 49'%3E%3Cg fill-rule='evenodd'%3E%3Cg id='hexagons' fill='%23cecbd3' fill-opacity='0.4' fill-rule='nonzero'%3E%3Cpath d='M13.99 9.25l13 7.5v15l-13 7.5L1 31.75v-15l12.99-7.5zM3 17.9v12.7l10.99 6.34 11-6.35V17.9l-11-6.34L3 17.9zM0 15l12.98-7.5V0h-2v6.35L0 12.69v2.3zm0 18.5L12.98 41v8h-2v-6.85L0 35.81v-2.3zM15 0v7.5L27.99 15H28v-2.31h-.01L17 6.35V0h-2zm0 49v-8l12.99-7.5H28v2.31h-.01L17 42.15V49h-2z'/%3E%3C/g%3E%3C/g%3E%3C/svg%3E");
display: flex;
/* height: 98vh; */
/* margin-top: 100px; */
justify-content: center;
align-items: center;
font-family: 'Rajdhani', sans-serif;
}
</style>
</head>

<body>
<section class="text-gray-700 body-font">
<div class="container px-5 py-10 mx-auto flex flex-col">
<div class="flex flex-col items-center text-center justify-center"
>
<h1 class="mb-4 font-bold title-font mt-2 text-gray-600 text-4xl">
Summaryman
</h1>
</div>
<div class="lg:w-3/6 mx-auto">
<div class="rounded-lg h-64 overflow-hidden">
<form action="" method="post">
<div class="relative mb-1">

<textarea
placeholder="Message"
id="message"
name="message"
class="w-full bg-white rounded border border-gray-300 focus:border-indigo-500 h-32 text-base outline-none text-gray-700 py-1 px-3 resize-none leading-6 transition-colors duration-200 ease-in-out"
required
></textarea>
<input placeholder="Word count" type="number" id="word_count" name="word_count" class="w-full bg-gray-100 rounded border border-gray-300 focus:border-indigo-500 text-base outline-none text-gray-700 py-1 px-3 leading-8 transition-colors duration-200 ease-in-out" required>
</div>
<input
class="mt-2 text-white bg-blue-500 border-0 py-1 px-4 focus:outline-none hover:bg-blue-400 rounded text-lg"
type="submit"
value="Summarize"
/>
</div>
</form>
</div>
<div class="flex flex-col sm:flex-row mt-0 lg:w-1/2 mr-auto ml-auto">
{% if sumary %}

<div class="h-full bg-gray-200 p-8 rounded">
<svg xmlns="http://www.w3.org/2000/svg" fill="currentColor" class="block w-5 h-5 text-gray-400 mb-4" viewBox="0 0 975.036 975.036">
<path d="M925.036 57.197h-304c-27.6 0-50 22.4-50 50v304c0 27.601 22.4 50 50 50h145.5c-1.9 79.601-20.4 143.3-55.4 191.2-27.6 37.8-69.399 69.1-125.3 93.8-25.7 11.3-36.8 41.7-24.8 67.101l36 76c11.6 24.399 40.3 35.1 65.1 24.399 66.2-28.6 122.101-64.8 167.7-108.8 55.601-53.7 93.7-114.3 114.3-181.9 20.601-67.6 30.9-159.8 30.9-276.8v-239c0-27.599-22.401-50-50-50zM106.036 913.497c65.4-28.5 121-64.699 166.9-108.6 56.1-53.7 94.4-114.1 115-181.2 20.6-67.1 30.899-159.6 30.899-277.5v-239c0-27.6-22.399-50-50-50h-304c-27.6 0-50 22.4-50 50v304c0 27.601 22.4 50 50 50h145.5c-1.9 79.601-20.4 143.3-55.4 191.2-27.6 37.8-69.4 69.1-125.3 93.8-25.7 11.3-36.8 41.7-24.8 67.101l35.9 75.8c11.601 24.399 40.501 35.2 65.301 24.399z"></path>
</svg>
<p class="leading-relaxed mb-6">{{sumary}}</p>
<img class="rounded-md w-full h-64 mr-auto mt-2" src="data:image/png;base64,{{wordcloud}}">
</div>
{% endif %}
</div>

</div>
</div>
</div>
</section>
</body>
</html>

We are using Tailwindcss; a cool utility-first CSS library that makes building websites fast and we are also using Heropatterns for the cool background.

Our logic in the template is simple. We first check to see if there is a summary and then we display the label using Jinja2 template engine. We also use a an HTML5 image data uri attribute to display our wordcloud.

We can now run the app by adding the following to the app.py

# app.py 
# import uvicorn at the top of app.py
import uvicorn

# then add the following to the bottom of app.py
if __name__ == "__main__":
uvicorn.run("app:app", host="127.0.0.1", port=8000, reload=True)

Then finally run the app with ;

python app.py

If all goes well, you should see the following;

Summaryman

Conclusion

I love working with FastAPI. It is fast and comes with a lot of cool features that makes building Machine Learning apps a breeze. I hope you will fall in love with FastAPI especially when building ML apps.

--

--

Daniel Boadzie
Daniel Boadzie

Written by Daniel Boadzie

Data scientist | AI Engineer |Software Engineering|Trainer|Svelte Entusiast. Find out more about my me here https://www.linkedin.com/in/boadzie/

No responses yet