With the release of Python 3.4, I updated my book on Introduction to Programming with Python. Some topics fall outside the scope of the book, which is intended for beginners. I’m going to start writing a series of short posts about some interesting topics that I think are worth discussing and might even be the basis for a new book.

One of the new features in Python 3.4 is the asyncio module, which brings various routines for calling asynchronous methods in Python. Asynchronous programming is a bit different from what we’re used to writing in Python, but it’s an excellent alternative to using threads and a good choice for solving problems with many inputs or outputs (I/O).

import asyncio

def print_and_repeat(loop):
    print('Hello World')
    loop.call_later(2, print_and_repeat, loop)

loop = asyncio.get_event_loop()
loop.call_soon(print_and_repeat, loop)
loop.run_forever()

The mechanism used in the example is quite simple. The loop variable contains the event loop, which we get using asyncio.get_event_loop() and returns the current event loop. In the next line, we call the method call_soon to schedule a function call. call_soon adds a call to the function print_and_repeat, defined earlier. The second parameter of call_soon is actually the parameter for print_and_repeat. Therefore, loop.call_soon(print_and_repeat, loop) adds to the event loop a call to the function print_and_repeat, passing loop as the first parameter. Confusing? Let’s see what happens when we run it:

Z:\articles> c:\python34\python asyncio1.py
Hello World
Hello World
Hello World
Hello World

The program runs and stays in the line of loop.run_forever(). The method run_forever() processes events and is necessary for good operation of our program. Try removing this line and see that nothing is printed on the screen, with the program ending soon after.

Let’s go back to our example; you understood how “Hello World” was printed several times? As if it were inside a for or while loop? This happens in the last line of print_and_repeat, where we schedule another call to print_and_repeat using the method call_later. The first parameter of call_later is the time to wait before calling the function, the second is the function itself, followed by the parameters passed to this function, as we did in the call_soon call earlier.

The code equivalent without events would be something like:

import time

while True:
    print("Hello World")
    time.sleep(2)

If it’s so simple, why complicate things? Events simplify the execution of code alternately, something difficult to achieve without using threads. Basically, we ask the event loop to execute our functions according to their availability, as if we were passing a list of tasks to be executed in our program. The event loop constantly updates the list of tasks and allows the execution of tasks in a different order than they were included.

Let’s see another example with call_later:

import asyncio
import time
import random

def faz_algo(loop):
    espera = random.random()
    print("Fazendo algo... espera = %f" % espera)
    loop.call_later(espera, faz_algo, loop)

def print_and_repeat(loop):
    global último
    agora = time.time()
    print('Alô - Tempo decorrido: %f' % (agora - último))
    último = agora
    loop.call_later(2, print_and_repeat, loop)

último = time.time()
loop = asyncio.get_event_loop()
loop.call_soon(print_and_repeat, loop)
loop.call_soon(faz_algo, loop)
loop.run_forever()

When you run this program, you’ll see:

Z:\articles> c:\python34\python asyncio2.py
Alô - Tempo decorrido: 0.007002
Fazendo algo... espera = 0.761420
Calculando 1
Fim do Cálculo 1
Calculando 2
Fim do Cálculo 2
Calculando 3
Fim do Cálculo 3
Calculando 4
Fim do Cálculo 4
Fazendo algo... espera = 0.395006
Alô - Tempo decorrido: 6.790526
Calculando 5
Fim do Cálculo 5
Calculando 6
Fim do Cálculo 6
Calculando 7
Fim do Cálculo 7
Calculando 8
Fim do Cálculo 8
Fazendo algo... espera = 0.540093
Alô - Tempo decorrido: 5.859909
Calculando 9
Fim do Cálculo 9
Calculando 10
Fim do Cálculo 10
Calculando 11
Fim do Cálculo 11
Calculando 12
Fim do Cálculo 12
Fazendo algo... espera = 0.174978
Alô - Tempo decorrido: 6.830562
Calculando 13
Fim do Cálculo 13

See how the delay to call other functions is now much more important, exceeding the 6 seconds between calls of print_and_repeat and having a significant delay also in faz_algo processing. Well, this behavior is expected, as the function calcula_algo is what’s called CPU bound, or rather, it’s a function that needs more attention from your computer’s processor than an operation to create a file (I/O bound).

To use your computer correctly, you need to start separating your problems into CPU bound and I/O bound. For CPU-bound problems, threads offer the best performance, as we have multiple processors in the same computer. While for I/O-bound problems, or those that need access to disk or network (or input from the keyboard), asynchronous events are faster and easier to program.

In mixed problems, where we have both CPU-bound and I/O-bound code to execute, a hybrid solution must be applied.

For example, threads are relatively expensive to create and difficult to control and program. In our next posts, we’ll cover other details of the asyncio module. Before continuing with asynchronous methods, let’s compare the execution time between async solutions, multiple threads, and multiple processes. We’ll evaluate modified versions of the calcula_algo function for each of these forms of parallelization. The problem to be tested will be the total execution time of 20 calls to calcula_algo.

Before starting, let’s remove the random part of the function and transform the value of limite into a constant. This way, the comparisons will be fairer and won’t depend on the number obtained by randint().

import asyncio

def calcula_algo(loop, id):
    limite = 40000
    print("Calculando %d" % id)
    z=1
    for x in range(1,limite):
        z*=x
    print("Fim do Cálculo %d" % id)
    if id < 20:
        loop.call_soon(calcula_algo, loop, id+1)

inicio = time.time()
loop = asyncio.get_event_loop()
loop.call_soon(calcula_algo, loop, 1)
loop.run_forever()
fim = time.time()
print("Tempo total: %f s" % (fim-inicio))

Run the program and see that the calculation was done sequentially, or rather, one call after another. In my computer, the test ran in approximately 30.37 seconds. The execution time on your computer may vary, as it depends on your processor and what your machine is doing during the tests.

import threading

def calcula_algo(id):
    limite = 40000
    print("Calculando %d" % id)
    z=1
    for x in range(1,limite):
        z*=x
    print("Fim do Cálculo %d" % id)

inicio = time.time()
ativos = []
for x in range(20):
    t = threading.Thread(target=calcula_algo, args=(x,))
    t.start()
    ativos.append(t)
for t in ativos:
    t.join()
fim = time.time()
print("Tempo total: %f s" % (fim-inicio))

Compare the output of the program with multiple threads to the output of the asynchronous program. See how the calls were initiated almost simultaneously and ended in a random order. This lack of predictability of threads is one of the reasons we avoid using them, especially in Python. Due to a characteristic of the Python interpreter, which uses a global lock called GIL (Global Interpreter Lock), programs with multiple threads in Python are not efficient, as only one thread runs at a time, bringing us back to the problem of the asynchronous program added to the creation and execution time of threads. In my tests, this program had a performance slightly worse than the async solution, ending in 30.67 seconds. If you run this program again, you’ll see that CPU usage doesn’t even come close to 100%. On my Core i7 system with 8 cores (4 real + 4 hyperthreaded), usage didn’t exceed 20%.

Let’s see a more Pythonic solution using multiple processes and the excellent multiprocessing module.

import sys
import time
from multiprocessing import Pool

def calcula_algo(id):
    limite = 40000
    print("Calculando %d" % id)
    z=1
    for x in range(1,limite):
        z*=x
    print("Fim do Cálculo %d" % id)

if __name__ == '__main__':
    nproc = int(sys.argv[1])
    print("Executing with %d processes." % nproc)
    inicio = time.time()
    processos = Pool(nproc)
    processos.map(calcula_algo,list(range(20)))
    fim = time.time()
    print("Tempo total: %f s" % (fim-inicio))

Run the program several times, passing each execution one of the parameters 1, 2, 4, 8, and 20. The parameter indicates how many processes we’ll have in our pool. A pool of processes is a set of initialized processes by the multiprocessing module. These processes remain available for our program and are system processes and not simple threads. All communication between processes is handled by the multiprocessing module. This module is very interesting because it’s much easier to do this type of task in other languages.

One of the advantages of multiprocessing is that each process runs its own Python interpreter, so they can run simultaneously without the problems of GIL we mentioned earlier. See in your system’s Task Manager that multiple python processes (or python.exe on Windows) are running at the same time during execution of our program and now you should have obtained 100% CPU usage for a while. See the result of all tests on my computer below (hover over the columns to see their values):

The multiprocessing solution improves as we add processes, but this improvement stabilizes around the number of processors in your machine and starts to degrade slightly after that.

The poor performance of the asyncio module is just an example of bad usage :-D. Asynchronous methods should be used with functions that don’t block and end quickly. Using asynchronous methods with CPU-bound functions won’t bring good results. However, due to GIL problems, asynchronous methods can simplify programming work and maintain multiple threads’ performance because sequential function execution avoids the need for synchronizing data. Even with GIL problems, programs in Python using threads should be prepared for simultaneous function execution, as function execution jumps from one thread to another during function execution.

We’ll see another comparison between threads and asynchronous methods in another post using files.

I hope this is just a small sample of what we can do in Python. In the next posts, I’ll cover more practical examples. The important thing is to know the difference between asynchronous execution, with threads, and with multiple processes. We’ll also see how to use a thread pool in Python and combine the advantages of threads and asynchronous methods.

Thank you for reading!