Friday, August 7, 2015

GSoC: Vec, the little brother of NumPy array

Back from Bilbao I have been busy testing and hunting bugs. This took me quite some time to resolve all issues, because before that I did not run all tests regularly.

I have been working since on vectorizing "user code". The primary goal of this project is to speed up trace loops that iterate over NumPy arrays. But As I have said in the last post it might (or might not) make sense to optimize traces found in the user program.

Vec, the little brother of NumPy array

Using the following snippet one can build a class for vectors in Python and let the optimization speed up the computation. 

import array
class Vec(object):
    # ...
    def __init__(self, content, type='d'):
        self.size = len(content)
        self.type = type
        self.array = array.array(type, content)

    def __add__(self, other):
        return self.add(other)

    def add(self, other, out=None):

        # ...
        # Ensure that other is the right type and size,
        # out must be allocated if it is None
        # ...
        i = 0

        # execute pypy with --jit vectorize_user=1 to
        # enable the optimization

        while i < self.size:
            out.array[i] = self.array[i] + other.array[i]
            i += 1

    # ...

After tracing the loop in the add function a slightly better vector loop is generated. Let's run the program: 

# jit warmup
a,b,c = vec(...), vec(...), vec(...)
# start time.time()
for i in range(500):
   c = a * b
   a = c - b
# stop time.time()

Has the following results can be optained (after the JIT has warmed up):
PyPy (vecopt):    ~0.005
PyPy (no vecopt): ~0.008
CPython:          ~0.040

The PyPy has higer variance in the execution. The garbage collector might be the reason for that. The program has been run 5 times and the mean value is shown above.

What about Python lists?

Honestly, I'm unsure if there is a real benefit. Since PyPy stores integers/floats arrays (that are fully homogenous) without the overhead of embedding it in a PyObject, SIMD operations could be used for normal Python lists.

The problem with this optimization is that the program must run for a very long time and spend a significant fraction of time in the trace loop that has been optimized. The short evaluation above shows that there might be potential. I will further investigate, because this is a great way to find bugs in the implementation as well.

Traces emitted by the user program are much more complex than the one in the NumPy library. The last week I have been working I found many edge cases and even reminded my that I have left some TODOs in the source code.

No comments:

Post a Comment