Background tasks for NLP

Those people dancing in the photo are similar to web applications and their backend servers. They need to keep up with their timing. Otherwise, the interactions get out of step and eventually, one has to stop and wait for the other. Heavy CPU-intensive NLP tasks or steps in the middle of the dance would be heavy feet and a slow, clunky movement. So let’s lay off the heavy tasks to the background and let the dance flow.

This past week I have been pushing on with my mission to expose NLP routines to job hunters. Previously I configured an AWS server and deployed an initial skeleton version of a backend and frontend service. As I pushed on with adding features, I encountered bizarre CORS issues which appeared perplexing and wasted a lot of my build time. Let’s catch you up on how I created and solved my own CORS issues, changes I made to my strategy, and then move on with background tasks to do that heavy lifting keeping the dance rhythm flowing.

Earlier, I struggled with FastAPI and serving the frontend UI’s static content (Index.html and vue.js chunks). Several chats and exchanges with "experts" suggested having Nginx serve the static content and then reverse proxy API calls to the FastAPI application. Initially, this seemed to work fine, but later, I got strange CORS issues. The browser (Safari, Chrome, Firefox) started reporting pre-flight CORS issues. I didn’t want any CORS calls as Resumes (CV) contain personal or sensitive personal information, and those need to be securely processed. It seemed that many bugs in the FastAPI app presented, client-side, as ‘refused to connect’, and other seemingly CORS issues. In addition, Pydandic model validation errors in the request body seemed to return obscure network errors to the frontend.

In Nginx, I had configured two ‘sites-enabled’. One served up static content, and the other acted as the reverse proxy. Each site was effectively a unique origin and hence there appeared CORS cross-talk within my server. In hindsight, this was just a dirty shortcut. I cleaned that up and left a single clean ‘sites-enabled’ definition. All traffic flows straight to the FastAPI app. through the Nginx reverse proxy. Of course, I’d forgotten that HTTP traffic does not flow directly to the FastAPI application; instead, it flows through a UNIX socket to Gunicorn and those uvicorn workers. Mixing up development servers with production deployment consumed a lot of time.

Back at the FastAPI code, I removed the CORS middleware, which made me much more comfortable. Configuring the StaticFiles() and Jinja2Templates() directory paths allowed me to serve both static (index.HTML) and service the API calls in a single app instance. Over in the frontend code, checking the payload structure to match the Pydantic model requirements in the backend allowed me to resolve all the ‘alleged CORS and network issues’. Testing with the SwaggerUI showed the FastAPI app was correct, but bugs in the frontend code were reported as ‘CORS’, ‘network issue’, ‘connection refused’ and all sorts of weirdness.

I had underestimated the learning curve moving from Flask to FastAPI, and well, with my usual quick and dirty prototyping approach, I ended up with strange errors and no real clear root cause. Going back to clean and clear first principles allowed me to get back to a production server working correctly without any mistakes and, hence, continue building out the NLP services that are my main interest.

I liked that Photo by David Pupaza, and I imagined a kindred spirit. Every project, I get to that stage when I say, ‘this is my last time’, but web applications are like jigsaws.

Recap

Allow me to provide a small recap as I just discussed essential details in a retrospective manner that might also catch you out in your projects.

A Pyantic model

There is a very clear write-up about data models, inheritance from the base model and validation here.

from pydantic import BaseModel

class Text(BaseModel):
 text: str
 keywords: dict = None

My model, Text, has two attributes: Text, a string, and keywords, a Python dictionary. The Vue.js application passes the desired Text for keyword extraction, and then the Text and keywords are returned as a JSON object to the client side.

@app.post("/api/keywords")
def keywords(text: Text):
 text.keywords = Lang.keywords(text.text)
 json_compatible = jsonable_encoder(text)
 return JSONResponse(content=json_compatible)

Traffic to /API/keywords, as shown in the code above, requires an instance of Text in the incoming request body. I struggled because I was sending a string on the Axios call. However, the Axios call needs to be a JSON object with at least Text.text

Serving static content with FastAPI

The FastAPI documentation is unambiguous on this and most other topics. The snag is that error messages are or can be somewhat cryptic when Gunicorn, Uvicorn, and Nginx get involved.

app = FastAPI(openapi_url=None)
templates = Jinja2Templates(directory="./dist")
app.mount("/static", StaticFiles(directory="./dist/static"), name="static")

Serving static content turned out to be pretty straightforward, but I mixed up the directory configuration, resulting in perplexing behaviour.

@app.get("/")
def home(request: Request):
return templates.TemplateResponse("index.html", context={"request": request})

When traffic flows to request the index.html page and the Vue.js application, FastAPI, serves that and the associated static content (images, javascript, CSS) from where you defined in the code. Those static files arrive as part of the deployment process, which you can see in the following image.

npm run build

(npm run serve is for development - run build creates the dist package)

The FastAPI instance has to have the correct path to link to the dist/static/ files.

Nginx site

As I mentioned, I had to re-configure my Nginx server.

The server block, Nginx, for the www.justresumes.net service

Note that the proxy_pass is to a UNIX socket, but the exact path must include HTTP. Often we see articles that show FastAPI running on http://localhost:8000, but that is purely development. In my case, I do get that mixed up.

FastAPI is fabulous, but there is a bit of a learning curve; it isn’t as simple as taking Flask routes and dropping them in!

Heavy lifting, those unwieldy CPU intensive steps

We want to avoid adding sticky, long-running tasks into our dance routine. So instead, we want the client sending an API and the backend responding in a rhythm; otherwise, the audience (user) sees a slow, awkward interaction. When the interaction is slow and clumsy, that provides a bad user experience in the user interface.

But what is a slow, unwieldy task anyway? I use a Python class I developed a few years ago called NLP. The class has several methods, which can be sticky or fast jobs when introduced into the dance. One way to understand the class behaviour is to measure the performance of the method calls on varying amounts of text data.

A small exercise by the author using %Timeit in Jupyter notebook. Image by the author.

Text2, referred to above, is a 6-minute article, whereas Text1 is the opening paragraph of the same piece. The times shown are based on the Mac Mini M1 and vary slightly from server to server.

Another way to understand the subject is to consider the expectations of the actual users. Jacob Nielsen, Nielsen Norman Group, ‘World Leaders in Research-Based User Experience’ sets out a succulent description of our human tolerance for delays in Computer system interaction and its impact on their work.

According to Jacob, there are three primary limits when we consider these sticky tasks:-

Reacting instantaneously: .01-second response leaves the user feeling the system or UI is reacting instantaneously. Response times below .01, perhaps, are not required and would not be perceived by the user. Simply expensive to run and maintain at a price point that alienates the customer.
The flow of thought: A 1.0-second response is a limit before the process of thinking can be interrupted, or we could become interested in other distractors. The user can feel a 1.0-second reaction, and it doesn’t feel like working on the computer directly.
User attention span: A 10-second response time. Ten seconds is a limit to retaining the user’s focus and attention. Response times outside 10 seconds are likely to frustrate the user. The user bounces off!

Therefore my perspective is we have two scenarios:-

≤ 1-second response and interaction in the total round trip time. On average, this will be acceptable to the user.
> 1-second response is likely to be a sticky task, needs careful feedback messages in the user interface, and these are background tasks.

What contract do you have with your user? and really, how big are the NLP tasks in a person’s CV or resume? These are the questions to be asked before adding background tasks, as background processing introduces delays and overhead for everyone all the time.

Now, of course, there are many factors governing the user’s perception of response time. Their equipment, mobile signal coverage, network connection speeds point to point, and lastly, the size of the Text the user wishes to process and the treatments performed on the Text. What about a quick test?

A test

After fixing all the bugs, let’s see how the current application is performing!

Having authenticated at www.justresumes.net, the author has pasted in a piece of Text.

So when we press submit, the browser will send the Text to the FastAPI application, which will perform keyword extraction and send the Text and keywords back to the browser. The browser will then paint the keywords in yellow on the screen for the user to see.

Having pressed Submit - the application shows the result on the screen. Imagine by the author from www.justresumes.net, Owned by the author. — Having pressed Submit – the application shows the result on the screen. Imagine by the author from www.justresumes.net, Owned by the author.

So what was the response timing? It felt instant to me!

A screenshot from Chrome Inspector - showing the timing of an API call at 45.00 ms. — A screenshot from Chrome Inspector – showing the timing of an API call at 45.00 ms.

It took 45 milliseconds well under the .01-second instant reaction feeling.

So that provides a dilemma as I wanted to add Celery workers, Rabbit-MQ, a Redis backend and add that feedback mechanism in the frontend, but it seems I might not need that level of complication for my use cases. Admittedly, my NLP techniques are pretty elementary so far, so we will have to focus more on the NLP part and check the timings. If we get sticky tasks, then adding background services will become necessary.

Thanks for reading this article which is part of my journey to expose NLP routines to job hunters, which is a crazy ride through Data Privacy, Cybersecurity and providing a performant service to users who have a conditioned attention span. We cannot simply pass the CV (resume) details to a language API without understanding who owns that API, what safeguards are in place, and even where the processing will take place and on what basis!