Categories
Posts

Django Background Tasks (async)

Async is a general programming concept that is complex and difficult to understand and I definitely do not pretend that I understand even a fraction of the science behind it. But I do know that in web applications, there are long running tasks that you want to hand off to a background process without it blocking the page from loading while we wait for it to complete.

So this post documents my journey trying some different approaches that can be used to accomplish my first use case: sending an email from a contact form. You would think that this relatively trivial task would be easily accomplished by a batteries-included framework like Django, but unless I’m missing something glaringly obvious (quite possible) I could not for the life of me find any definitive documentation or tutorials on the best way to approach this.

In version 3.0 of Django, one of the headline features was the first implementation of async.io—a crucial first building block, but by itself not that useful. So for now it appears that to do perform any background tasks, third-party extensions should be used.

The three main contenders in this space seem to be:

  • Celery: a well maintained and popular Python package that can be used in Django without any Django-specific extensions.
  • Django-Q: a task queue, scheduler and worker application designed specifically for Django.
  • Django-RQ: a Django extension for the popular RQ (Redis Queue) python library

These three packages are also the three most popular packages in the Django Packages site in the Workers, Queues and Tasks category.

TL;DR I found Django-RQ to be my preference, and I was able to easily extend the simple contact form app to include a longer-running task that could report its progress from the queue. You can see the code behind this project here: django-django_rq-advanced-example.

Check out my code in the GitHub examples below and try out the packages yourself to see which one fits you the most.

The aim of this exercise was to see which was the easiest to configure and use, and also which one felt the nicest to me—a subjective rating that’s difficult to put into words.

For all three packages, I created the same basic Django project with a single app that has a basic, unstyled contact form. Submissions with the contact form are saved to a database and then the email is sent to the site owner.

All three packages rely on some form of outside broker to receive the tasks and manage the queues. Redis was the easiest choice for each of them, and for my testing I launched a basic Redis server in a Docker container with no additional security.

The installation of all three packages are similar in that you can add the respective package name and Redis to the requirements.txt file and install with Pip quite easily. The Celery package requires a bit more Django configuration because it’s not a native Django application, so you need to create a celery.py file in the root project with some application code, and then you need to edit your project’s __init__.py file to ensure this application gets loaded when the project starts. For Django-Q and Django-RQ you just need to add the respective package names to the INSTALLED_APPS section in your settings.py file. Then all three of them need some further configuration in the settings.py file for some application-specific configuration as well as the details of how to connect to Redis.

Once installed and configured, you can then call your background tasks from your views where needed. Celery is a bit more structured and wants you to store these tasks in a tasks.py file in the application. I liked this approach so used it for the other two as well, although I don’t think you need to do this—it just feels like good practice.

Both Celery and Django-RQ use decorators with the task so all your async imports and logic are within the tasks.py file. You just call the function name with .delay() from your view and it gets added to the queue. Django-Q was a bit different in that the function in the tasks.py file doesn’t even know that it’s a background task—it’s just a regular function with no importing from Django-Q in this file. Instead, the view file imports an async_task function which is used to put a specific task on the queue. I’m not sure which approach is better, or if I just read the docs wrong, but I preferred the approach that Celery and Django-RQ used.

For logging, Celery has it’s own logging functionality which is imported and appears to work just fine. I’m guessing it has a lot more functionality buried deep in the code but I didn’t investigate too much. Django-Q and Django-RQ both just use standard Python logging which I preferred for my use case.

Celery and Django-Q both have extensive documentation in a “Read The Docs” format. Django-RQ relies soley on a single GitHub Readme page which despite being brief, was very easy to consume and allowed me to figure out how to get the basic example working quickly. Django-RQ is part of the larger RQ library, so for anything more advanced you need to refer to the RQ docs too.

After trying them all out, I decided that I preferred Django-RQ. It just seemed easier to get working, the configuration was easy to do, and you get access to some nice features fairly easily like the ability to have multiple queues ready and waiting. The use case for this is that you might have a queue for high-priority tasks, and another for slow, long-running tasks that are not time-critical, and then a default queue for everything else. This can help prevent queues from getting backed up. I’m sure the other packages had this feature too, but Django-RQ made it easy to find and use.

Moving on to something more advanced

Now that I had picked Django-RQ as my favourite, I wanted to try and do something a bit more advanced.

When I was learning Flask, I followed an excellent tutorial from Miguel Grinberg on how to implement dynamic progress bars using Celery and Flask, and had managed to get it working nicely on my banking application while processing large CSV imports. Using Django-RQ and Django, I was able to easily implement the same functionality.

This is achieved by saving additional meta information about the current job running on the queue and then using a Django view to read this information and provide the status back as a JSON response. A Django template with a sprinkling of JavaScript and JQuery, calls the job_status view and updates the page with the progress and status information. In the example I posted to GitHub I’m just updating a div element with the information received, but a more practical use case would be to create a nice progress bar with something like Bootstrap, and then use the progress figure to change the width attribute. In my example I used a simple sleep timer to turn the contact form email task into a long-running task so you can see the progress.

One more thing…

All of the above may become irrelevant in the future when Django implements full view-based support for async code. So I will be looking forward to testing that out in due course. Also, I’m still wondering why none of my usual go-to Django sites have much information on this topic. I’ll keep digging to see if there are easier ways to achieve this.

Cover image: “queue” by per Corell
https://www.flickr.com/photos/62811941@N00/
https://creativecommons.org/licenses/by-sa/2.0/

Leave a Reply

Your email address will not be published. Required fields are marked *