Improving performance

While Nutils is not (yet) the fastest tool in its class, with some effort it is possible to achieve sufficient performance to allow simulations of over a million degrees of freedom. The matrix backend is the most important thing to get right, but there are a few other factors that are worth considering.

Enable parallel processing

On multi-core architectures, the most straightforward acceleration path available is to use parallel assembly, activated using the NUTILS_NPROCS environment variable. Both Linux and OS X both are supported. Unfortunately, the feature is currently disabled on Windows as it does not support the fork system call that is used by the current implementation.

On Windows, the easiest way to enjoy parallel speedup is to make use of the new Windows Subsystem for Linux (WSL2), which is complete Linux environment running on top of Windows. To install it simply select one of the many Linux distributions from the Windows store, such as Ubuntu 20.04 LTS or Debian GNU/Linux.

Disable threads

Many Numpy installations default to using the openBLAS library to provide its linear algebra routines, which supports multi-threading using the openMP parallelization standard. While this is useful in general, it is in fact detrimental in case Nutils is using parallel assembly, in which case the numerical operations are best performed sequentially. This can be achieved by setting the OMP_NUM_THREADS environment variable.

In Linux this can be done permanently by adding the following line to the shell's configuration file. In Linux this is typically ~/.bashrc:

export OMP_NUM_THREADS=1

The downside to this approach is that multithreading is disabled for all applications that use openBLAS, not just Nutils. Alternatively in Linux the setting can be specified one-off in the form of a prefix:

OMP_NUM_THREADS=1 NUTILS_NPROCS=8 python myscript.py

Consider a faster interpreter

The most commonly used Python interpreter is without doubt the CPython reference implementation, but it is not the only option. Before taking an application in production it may be worth testing if other implementations have useful performance benefits.

One interpreter of note is Pyston, which brings just-in-time compilation enhancements that in a typical application can yield a 20% speed improvement. After Pyston is installed, Nutils and dependencies can be installed as before simply replacing python by pyston3. As packages will be installed from source some development libraries may need to be installed, but what is missing can usually be inferred from the error messages.