Using Python’s Multiprocessing Library to Improve Algorithm Runtime

In a previous post, I wrote about a combinatorial object known as a Latin square. I also discussed some research I had previously done and some research directions for working with these objects. Particularly, I mention the need for efficient algorithms to generate and process these objects. One objective I had for future work was to find a mapping that would take the values of an n x n Latin square, where n is the order of the Latin square, and map them to a number, ideally one that can be represented with relatively few bits.

Recently, I have been trying different formulas for determining such a number and have created a Python script to test different formulas. For small orders of Latin squares, where there aren’t that many, this was working just fine and, up to order 5, was performing reasonably (finishing the process in ~8 minutes). However, there are many Latin squares of order 6 (over 1,000,000 reduced Latin squares based on our definition) which took ~6.5 hours to check. This led to my searching for ways to quickly speed up the algorithm which I was able to do with Python’s multiprocessing library.

In this post, I will introduce the multiprocessing library in Python with a very simple example that gives a general overview of its usage. I will also include the serial and multiprocessing versions of my Python script to show how this can easily be implemented in any application.

A Brief Look at the Multiprocessing Library

The Python3 multiprocessing library is a package used to spawn processes, similar to the threading library. For the sake of this post, only Pool will be used for parallelism since it is easy to use and makes basic threading simple to implement. As an example, imagine needing to square the first 10,000,000 integers starting at 0. To implement this serially the Python code might look like this

A serial script to square the first 10,000,000 numbers starting with 0.

Using multiprocessing.Pool it takes very little extra work to make this run in parallel.

A parallel implementation of the squaring problem using 8 processes.

As seen here, by adding just a couple of lines of code we have successfully made this a multiprocessing application. The main difference is the modification of these two rows

In the latter code (i.e. the multiprocessing code), a multiprocessing.Pool object is created with a number passed in to represent the number of processes to spawn. Then, is called to map the values in the number list (values) to the function that is to be called (square). Everything else is handled by the multiprocessing library and the operating system.

One thing to note is the if __name__ == ‘__main__’: is, more or less, required since the script will error out due to how the processes and spawned processes are handled, at least this errored when I ran it on a Windows system.

Running the serial script for 10,000,000 numbers completed in approx. 1.69 seconds. The parallel script completed in approx. 1.98 seconds. This highlights an important aspect of parallelizing an application: not all applications should be ran with multiple threads/processes. Although this simple example provides good insight into how the multiprocessing library can quickly be used to achieve performance gains, the overhead from the multithreading took more time than was saved by spawning multiple processes.

A Real Application

As mentioned in the introduction, I recently used the multiprocessing library in a Python tool I had created and, unlike the example above, saw massive performance gains. Below is the serial code I was using

A serial implementation of the algorithm I was using to check if my mapping was unique for each Latin square.

The nature of the code above is not really important. Since most readers won’t know (or care) what a Latin square is, I will try to focus on the implementation of multiprocessing and the discussion of the performance gains. For those who are interested, I have added some sparse comments throughout the code to help describe what is happening.

Two things will be threaded in the multiprocessing implementation: the determination of the value that is representative of each Latin square and the processing which determines if each value is unique, the latter is much more time-consuming.

To use the multiprocessing.Pool class, we must have functions that can map the values in the provided list to. So, to begin, we pull out the parts of the code we want to run with more than one process into separate functions

These two functions will be used by the function to run as many processes.

The first function here determines the number that the Latin square is mapped to and the second aggregates the results into a dictionary, checking for duplicates along the way.

Now, with this functionality separated into functions, we can use the multiprocessing library to speed up the algorithm.

The Latin square number uniqueness checking algorithm with multiprocessing implemented.

That’s quite a bit of code so I will break out the most important part, i.e. the implementation of the multiprocessing

The part of the code above that implements the multiprocessing.

Once again, a multiprocessing.Pool object is created, this time with 20 processes available to the pool since I am running this on my server. The comments are pretty good but for thoroughness, the same process pool is used to map the get_value function to each square in the file (lines object = list of strings representing the Latin squares) and to create a list of dictionaries via the build_dict function. This latter function was actually most of the processing time in the serial script.

Performance Results

The results of the serial script are seen in the table below

Results of the serial script, taking up to approx. 400 minutes (6+ hours) to complete.

The script takes an incredible amount of time! Not only that but as the order increases (slightly) the time to complete takes much, much longer. This is due to how the number of Latin squares grows combinatorially as the order of the Latin square increases, imagine having to run this for orders 8, 9, or 10! (Running the script for any of these orders would probably take long enough to render it impractical.) Let’s take a look at the results with multiprocessing added

Spawning 20 processes made the runtime of this check script much more manageable. For order 6, which took over 6 hours in the serial run, the script finished in 2.14 minutes, a substantial improvement. The same can be said for order 5 which went from a runtime of approx. 8 minutes down to just 3.75 seconds. We can also see here that the multiprocessing overhead was greater than the speed up from using many processes for order 4 as the runtime went from 0.025 seconds to 0.13 seconds. In either case, the runtime is fast enough that it doesn’t make much of a difference.


In this post, I provided a brief look at Python’s multiprocessing library, in particular the multiprocessing.Pool object. Pool allows any developer to quickly improve the runtime of slow(er) algorithms, often with the addition of only a few lines of code. I provided a small example (which didn’t benefit from multiprocessing) as well as a real-world application that benefited greatly with only small tweaks to its implementation. I have also used the multiprocessing library in a similar way to process news articles for my stock research site,, which performs sentiment analysis on thousands of news articles every hour. Applications for this easy-to-use library abound and I hope this article has helped you to implement multiprocessing or at least demonstrated how it might be used in the wild.

Leave a Reply

Your email address will not be published. Required fields are marked *