From for loops to matrix computation

From for loops to matrix computation - python

I've got a piece of code taking an input and checking if the input meets requirements. The input is composed of a list of objects called S.
class S:
def __init__(self, f, t, tf, timeline):
self.f = f
self.t = t
self.tf = tf
self.timeline = timeline
To know if a combination of objects meets the requirement, I have functions taking a list of size N of objects and returning True or False.
input1 = [S_1, ..., S_N]
def c1(input1):
if condition_c1_valid:
return True
else:
return False
Now let's consider this example:
import itertools
possible_objects = [S(f, t, tf, timeline) for f in [...] for t in [..] ...]
inputs_to_check = list(itertools.combination_with_replacement(possible_objects, 5)
results = list()
for inp in inputs_to_check:
if c1(inp):
results.append(inp)
Right now, my solution is using a for loop on the N condition I'm checking every time.
The code keeps the inputs which meets the condition.
Could this be computed at once in a matrix fashion? (Vectorized)
I was thinking of something like this: (pseudo code)
Data[input, c1, ..., cN]
return where(all(c1, ..., cN) is True)
Can anyone tell me if it is achievable, and could point me towards examples? In the end, my list of inputs to check is very large. Thus it would be interesting to send the computation to the GPU. I thought that maybe this could be achieved through Tensorflow...
Thanks for the tips :)
EDIT: The example above is far from the reality. I'm using nested for loops on a large set, with a complexity of the 6th or 7th degree. The current solution is optimize with generators, but I would like to push this further.

In the most general sense, you won't be able to vectorize this. CPython is notoriously bad at parallel processing due to the GIL and it's primary matrix vectorization library (numpy) is for dealing with primative types (integers, floats, etc.), not python objects such as S.
There are a few things that could help:
If f, t, tf, timeline are numbers (which they look like they
may be), then you could form four numpy arrays of these values and
pass those through a vectorized version of c1 which returns a boolean array. You could then do np.asarray(input1)[c1_vec(f_vec, t_vec, tf_vec, timeline_vec)]
You said you've used generators instead of lists, but just to be especially sure your example should read as:
possible_objects = (S(f, t, tf, timeline) for f in (...) for t in (...) ...)
inputs_to_check = itertools.combination_with_replacement(possible_objects, 5)
results = [inp for inp in inputs_to_check if c1(inp)]
This saves a lot of time of writing objects to memory that can be avoided.
Use PyPy. It uses a JIT compiler to massively speed up python for loops. For very large loops this will get up to near C speed.
You mention using a GPU. CPython doesn't even run on more then one CPU core, running this on a GPU would be pointless unless using another implementation.

Related

list() function of Python modifies its argument?

(I am quite a newbie in Python, so lots of things puzzle me even after reading the tutorial...)
Initially, I had the code like the following:
strings = ['>abc', 'qwertyu', '>def', 'zxcvbnm']
matrix = zip(*strings)
for member in matrix:
print("".join(member)) # characters are printed as expected
-- which did what I expected. But then for some reason I wanted to determine the number of members in matrix; as len(matrix) gave an error, I decided to copy it with converting to the list: mtxlist = list(matrix). Surprisingly, after this line the content of matrix seems to be changed - or at least I cannot use it the same way as above:
strings = ['>abc', 'qwertyu', '>def', 'zxcvbnm']
matrix = zip(*strings)
mtxlist = list(matrix) # this assignment empties (?) the matrix
for member in matrix:
print("".join(member)) # nothing printed
Can anybody explain what is going on there?

You're using Python 3, correct?
zip returns a generator that can only be iterated once. If you want to use it more than once, then your options are:
Write zip(*strings) each time you need it.
matrix = tuple(zip(*strings))
(iterate matrix as many times as you like. This is the easy option. The downside is that if zip(*strings) is big then it uses a lot of memory that the generator doesn't.)
matrix1, matrix2 = itertools.tee(zip(*strings))
(iterate each of matrix1 and matrix2 once. This is worse than the tuple in your usage, but it's useful if you want to partially consume matrix1, then use some of matrix2, more of matrix1, etc)
def matrix():
return zip(*strings)
# or
matrix = lambda: zip(*strings)
(iterate but using matrix(), not matrix, as many times as you like. Doesn't use extra memory for a copy of the result like the tuple solution, but the syntax for using it is a bit annoying)
class ReusableIterable:
def __init__(self, func):
self.func = func
def __iter__(self):
return iter(self.func())
matrix = ReusableIterable(lambda: zip(*strings))
(iterate using matrix as many times as you like. Deals with the syntax annoyance, although you still have to beware that if you modify strings between iterations over matrix then you'll get different results.)

How to avoid using for-loops with numpy?

I have already written the following piece of code, which does exactly what I want, but it goes way too slow. I am certain that there is a way to make it faster, but I cant seem to find how it should be done. The first part of the code is just to show what is of which shape.
two images of measurements (VV1 and HH1)
precomputed values, VV simulated and HH simulated, which both depend on 3 parameters (precomputed for (101, 31, 11) values)
the index 2 is just to put the VV and HH images in the same ndarray, instead of making two 3darrays
VV1 = numpy.ndarray((54, 43)).flatten()
HH1 = numpy.ndarray((54, 43)).flatten()
precomp = numpy.ndarray((101, 31, 11, 2))
two of the three parameters we let vary
comp = numpy.zeros((len(parameter1), len(parameter2)))
for i,(vv,hh) in enumerate(zip(VV1,HH1)):
comp0 = numpy.zeros((len(parameter1),len(parameter2)))
for j in range(len(parameter1)):
for jj in range(len(parameter2)):
comp0[j,jj] = numpy.min((vv-precomp[j,jj,:,0])**2+(hh-precomp[j,jj,:,1])**2)
comp+=comp0
The obvious thing i know i should do is get rid of as many for-loops as I can, but I don't know how to make the numpy.min behave properly when working with more dimensions.
A second thing (less important if it can get vectorized, but still interesting) i noticed is that it takes mostly CPU time, and not RAM, but i searched a long time already, but i cant find a way to write something like "parfor" instead of "for" in matlab, (is it possible to make an #parallel decorator, if i just put the for-loop in a separate method?)
edit: in reply to Janne Karila: yeah that definately improves it a lot,
for (vv,hh) in zip(VV1,HH1):
comp+= numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)
Is definitely a lot faster, but is there any possibility to remove the outer for-loop too? And is there a way to make a for-loop parallel, with an #parallel or something?

This can replace the inner loops, j and jj
comp0 = numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)
This may be a replacement for the whole loop, though all this indexing is stretching my mind a bit. (this creates a large intermediate array though)
comp = numpy.sum(
numpy.min((VV1.reshape(-1,1,1,1) - precomp[numpy.newaxis,...,0])**2
+(HH1.reshape(-1,1,1,1) - precomp[numpy.newaxis,...,1])**2,
axis=2),
axis=0)

One way to parallelize the loop is to construct it in such a way as to use map. In that case, you can then use multiprocessing.Pool to use a parallel map.
I would change this:
for (vv,hh) in zip(VV1,HH1):
comp+= numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)
To something like this:
def buildcomp(vvhh):
vv, hh = vvhh
return numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)
if __name__=='__main__':
from multiprocessing import Pool
nthreads = 2
p = Pool(nthreads)
complist = p.map(buildcomp, np.column_stack((VV1,HH1)))
comp = np.dstack(complist).sum(-1)
Note that the dstack assumes that each comp.ndim is 2, because it will add a third axis, and sum along it. This will slow it down a bit because you have to build the list, stack it, then sum it, but these are all either parallel or numpy operations.
I also changed the zip to a numpy operation np.column_stack, since zip is much slower for long arrays, assuming they're already 1d arrays (which they are in your example).
I can't easily test this so if there's a problem, feel free to let me know.

In computer science, there is the concept of Big O notation, used for getting an approximation of how much work is required to do something. To make a program fast, do as little as possible.
This is why Janne's answer is so much faster, you do fewer calculations. Taking this principle farther, we can apply the concept of memoization, because you are CPU bound instead of RAM bound. You can use the memory library, if it needs to be more complex than the following example.
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
memo = AutoVivification()
def memoize(n, arr, end):
if not memo[n][arr][end]:
memo[n][arr][end] = (n-arr[...,end])**2
return memo[n][arr][end]
for (vv,hh) in zip(VV1,HH1):
first = memoize(vv, precomp, 0)
second = memoize(hh, precomp, 1)
comp+= numpy.min(first+second, axis=2)
Anything that has already been computed gets saved to memory in the dictionary, and we can look it up later instead of recomputing it. You can even break down the math being done into smaller steps that are each memoized if necessary.
The AutoVivification dictionary is just to make it easier to save the results inside of memoize, because I'm lazy. Again, you can memoize any of the math you do, so if numpy.min is slow, memoize it too.

Python - Difficulty Calculating Partition Function using Large NumPy arrays

I had a pretty compact way of computing the partition function of an Ising-like model using itertools, lambda functions, and large NumPy arrays. Given a network consisting of N nodes and Q "states"/node, I have two arrays, h-fields and J-couplings, of sizes (N,Q) and (N,N,Q,Q) respectively. J is upper-triangular, however. Using these arrays, I have been computing the partition function Z using the following method:
# Set up lambda functions and iteration tuples of the form (A_1, A_2, ..., A_n)
iters = itertools.product(range(Q),repeat=N)
hf = lambda s: h[range(N),s]
jf = lambda s: np.array([J[fi,fj,s[fi],s[fj]] \
for fi,fj in itertools.combinations(range(N),2)]).flatten()
# Initialize and populate partition function array
pf = np.zeros(tuple([Q for i in range(N)]))
for it in iters:
hterms = np.exp(hf(it)).prod()
jterms = np.exp(-jf(it)).prod()
pf[it] = jterms * hterms
# Calculates partition function
Z = pf.sum()
This method works quickly for small N and Q, say (N,Q) = (5,2). However, for larger systems (N,Q) = (18,3), this method cannot even create the pf array due to memory issues because it has Q^N nontrivial elements. Any ideas on how to either overcome this memory issue or how to alter the code to work on subarrays?
Edit: Made a small mistake in the definition of jf. It has been corrected.

You can avoid the large array just by initializing Z to 0, and incrementing it by jterms * iterms in each iteration. This still won't get you out of calculating and summing Q^N numbers, however. To do that, you probably need to figure out a way to simplify the partition function algebraically.

Not sure what you are trying to compute but I tested your code with ChrisB suggestion and jf will not work for Q=3.

Perhaps you shouldn't use a dense numpy array to encode your function? You could try sparse arrays or just straight Python with Numba compilation. This blogpost shows using Numba on the simple Ising model with good performance.

Pythonic iterable difference

I've written some code to find all the items that are in one iterable and not another and vice versa. I was originally using the built in set difference, but the computation was rather slow as there were millions of items being stored in each set. Since I know there will be at most a few thousand differences I wrote the below version:
def differences(a_iter, b_iter):
a_items, b_items = set(), set()
def remove_or_add_if_none(a_item, b_item, a_set, b_set):
if a_item is None:
if b_item in a_set:
a_set.remove(b_item)
else:
b_set.add(b)
def remove_or_add(a_item, b_item, a_set, b_set):
if a in b_set:
b_set.remove(a)
if b in a_set:
a_set.remove(b)
else:
b_set.add(b)
return True
return False
for a, b in itertools.izip_longest(a_iter, b_iter):
if a is None or b is None:
remove_or_add_if_none(a, b, a_items, b_items)
remove_or_add_if_none(b, a, b_items, a_items)
continue
if a != b:
if remove_or_add(a, b, a_items, b_items) or \
remove_or_add(b, a, b_items, a_items):
continue
a_items.add(a)
b_items.add(b)
return a_items, b_items
However, the above code doesn't seem very pythonic so I'm looking for alternatives or suggestions for improvement.

Here is a more pythonic solution:
a, b = set(a_iter), set(b_iter)
return a - b, b - a
Pythonic does not mean fast, but rather elegant and readable.
Here is a solution that might be faster:
a, b = set(a_iter), set(b_iter)
# Get all the candidate return values
symdif = a.symmetric_difference(b)
# Since symdif has much fewer elements, these might be faster
return symdif - b, symdif - a
Now, about writing custom “fast” algorithms in Python instead of using the built-in operations: it's a very bad idea.
The set operators are heavily optimized, and written in C, which is generally much, much faster than Python.
You could write an algorithm in C (or Cython), but then keep in mind that Python's set algorithms were written and optimized by world-class geniuses.
Unless you're extremely good at optimization, it's probably not worth the effort. On the other hand, if you do manage to speed things up substantially, please share your code; I bet it'd have a chance of getting into Python itself.
For a more realistic approach, try eliminating calls to Python code. For instance, if your objects have a custom equality operator, figure out a way to remove it.
But don't get your hopes up. Working with millions of pieces of data will always take a long time. I don't know where you're using this, but maybe it's better to make the computer busy for a minute than to spend the time optimizing set algorithms?

i think your code is broken - try it with [1,1] and [1,2] and you'll get that 1 is in one set but not the other.
> print differences([1,1],[1,2])
(set([1]), set([2]))
you can trace this back to the effect of the if a != b test (which is assuming something about ordering that is not present in simple set differences).
without that test, which probably discards many values, i don't think your method is going to be any faster than built-in sets. the argument goes something like: you really do need to create one set in memory to hold all the data (your bug came from not doing that). a naive set approach creates two sets. so the best you can do is save half the time, and you also have to do the work, in python, of what is probably efficient c code.

I would have thought python set operations would be the best performance you could get out of the standard library.
Perhaps it's the particular implementation you chose that's the problem, rather than the data structures and attendant operations themselves. Here's an alternate implementation that should be give you better performance.
For sequence comparison tasks in which the sequences are large, avoid, if at all possible, putting the objects that comprise the sequences into the containers used for the comparison--better to work with indices instead. If the objects in your sequences are unordered, then sort them.
So for instance, i use NumPy, the numerical python library, for these sort of tasks:
# a, b are 'fake' index arrays of type boolean
import numpy as NP
a, b = NP.random.randint(0, 2, 10), NP.random.randint(0, 2, 10)
a, b = NP.array(a, dtype=bool), NP.array(b, dtype=bool)
# items a and b have in common:
NP.sum(NP.logical_and(a, b))
# the converse (the differences)
NP.sum(NP.logical_or(a, b))

Best way to create a NumPy array from a dictionary?

I'm just starting with NumPy so I may be missing some core concepts...
What's the best way to create a NumPy array from a dictionary whose values are lists?
Something like this:
d = { 1: [10,20,30] , 2: [50,60], 3: [100,200,300,400,500] }
Should turn into something like:
data = [
[10,20,30,?,?],
[50,60,?,?,?],
[100,200,300,400,500]
]
I'm going to do some basic statistics on each row, eg:
deviations = numpy.std(data, axis=1)
Questions:
What's the best / most efficient way to create the numpy.array from the dictionary? The dictionary is large; a couple of million keys, each with ~20 items.
The number of values for each 'row' are different. If I understand correctly numpy wants uniform size, so what do I fill in for the missing items to make std() happy?
Update: One thing I forgot to mention - while the python techniques are reasonable (eg. looping over a few million items is fast), it's constrained to a single CPU. Numpy operations scale nicely to the hardware and hit all the CPUs, so they're attractive.

You don't need to create numpy arrays to call numpy.std().
You can call numpy.std() in a loop over all the values of your dictionary. The list will be converted to a numpy array on the fly to compute the standard variation.
The downside of this method is that the main loop will be in python and not in C. But I guess this should be fast enough: you will still compute std at C speed, and you will save a lot of memory as you won't have to store 0 values where you have variable size arrays.
If you want to further optimize this, you can store your values into a list of numpy arrays, so that you do the python list -> numpy array conversion only once.
if you find that this is still too slow, try to use psycho to optimize the python loop.
if this is still too slow, try using Cython together with the numpy module. This Tutorial claims impressive speed improvements for image processing. Or simply program the whole std function in Cython (see this for benchmarks and examples with sum function )
An alternative to Cython would be to use SWIG with numpy.i.
if you want to use only numpy and have everything computed at C level, try grouping all the records of same size together in different arrays and call numpy.std() on each of them. It should look like the following example.
example with O(N) complexity:
import numpy
list_size_1 = []
list_size_2 = []
for row in data.itervalues():
if len(row) == 1:
list_size_1.append(row)
elif len(row) == 2:
list_size_2.append(row)
list_size_1 = numpy.array(list_size_1)
list_size_2 = numpy.array(list_size_2)
std_1 = numpy.std(list_size_1, axis = 1)
std_2 = numpy.std(list_size_2, axis = 1)

While there are already some pretty reasonable ideas present here, I believe following is worth mentioning.
Filling missing data with any default value would spoil the statistical characteristics (std, etc). Evidently that's why Mapad proposed the nice trick with grouping same sized records.
The problem with it (assuming there isn't any a priori data on records lengths is at hand) is that it involves even more computations than the straightforward solution:
at least O(N*logN) 'len' calls and comparisons for sorting with an effective algorithm
O(N) checks on the second way through the list to obtain groups(their beginning and end indexes on the 'vertical' axis)
Using Psyco is a good idea (it's strikingly easy to use, so be sure to give it a try).
It seems that the optimal way is to take the strategy described by Mapad in bullet #1, but with a modification - not to generate the whole list, but iterate through the dictionary converting each row into numpy.array and performing required computations. Like this:
for row in data.itervalues():
np_row = numpy.array(row)
this_row_std = numpy.std(np_row)
# compute any other statistic descriptors needed and then save to some list
In any case a few million loops in python won't take as long as one might expect. Besides this doesn't look like a routine computation, so who cares if it takes extra second/minute if it is run once in a while or even just once.
A generalized variant of what was suggested by Mapad:
from numpy import array, mean, std
def get_statistical_descriptors(a):
if ax = len(shape(a))-1
functions = [mean, std]
return f(a, axis = ax) for f in functions
def process_long_list_stats(data):
import numpy
groups = {}
for key, row in data.iteritems():
size = len(row)
try:
groups[size].append(key)
except KeyError:
groups[size] = ([key])
results = []
for gr_keys in groups.itervalues():
gr_rows = numpy.array([data[k] for k in gr_keys])
stats = get_statistical_descriptors(gr_rows)
results.extend( zip(gr_keys, zip(*stats)) )
return dict(results)

numpy dictionary
You can use a structured array to preserve the ability to address a numpy object by a key, like a dictionary.
import numpy as np
dd = {'a':1,'b':2,'c':3}
dtype = eval('[' + ','.join(["('%s', float)" % key for key in dd.keys()]) + ']')
values = [tuple(dd.values())]
numpy_dict = np.array(values, dtype=dtype)
numpy_dict['c']
will now output
array([ 3.])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

From for loops to matrix computation - python

Related

list() function of Python modifies its argument?

How to avoid using for-loops with numpy?

Python - Difficulty Calculating Partition Function using Large NumPy arrays

Pythonic iterable difference

Best way to create a NumPy array from a dictionary?

Categories

Resources