I'm programming a scientific application in Python, and the performance of my algorithm so far is terrible. I'm trying to find an efficient way to code what I'm doing. Basically, I have to multiply
def get_thing(self, chi, n):
return np.sum(self.an[n][j] * pow(chi, -j) for j in xrange(1, self.j))
where self.an[i][j] is a previously generated array. Then I'll have to do this:
pot = np.sum(self.coeffs[n] * self.get_thing(chi, n) for n in xrange(0, self.n))
where chi changes and cannot be cached, as it's a point that is being generated outside this class. Of course, this is extremely slow and not very bright. How can I improve this?
Thanks!
Within get_things you could certainly simplify things as something like:
def get_thing(self, chi, n):
return np.sum(self.an[n,1:self.j] * np.power(chi,-np.arange(1,self.j)))
Note, that you don't want to index numpy arrays using [i][j] notation; instead use [i,j].
You may be able to make further improvements using higher level broadcasting as #eat suggested.
Edit:
Made a couple of changes to the above code to try to get the indexing to match the OP and changed a sign error in my code.
Simply, try to do the computations in higher level of abstraction, i.e. try to avoid python level looping.
Study carefully how to do element-wise operations and how broadcasting operates, and last but not least don't forget the power of linear algebra!
Related
I'm looking to set up a constraint-check in Python using PULP. Suppose I had variables A1,..,Xn and a constraint (AffineExpression) A1X1 + ... + AnXn <= B, where A1,..,An and B are all constants.
Given an assignment for X (e.g. X1=1, X2=4,...Xn=2), how can I check if the constraints are satisfied? I know how to do this with matrices using Numpy, but wondering if it's possible to do using PULP to let the library handle the work.
My hope here is that I can check specific variable assignments. I do not want to run an optimization algorithm on the problem (e.g. prob.solve()).
Can PULP do this? Is there a different Python library that would be better? I've thought about Google's OR-Tools but have found the documentation is a little bit harder to parse through than PULP's.
It looks like this is possible doing the following:
Define PULP variables and constraints and add them to an LpProblem
Make a dictionary of your assignments in the form {'variable name': value}
Use LpProblem.assignVarsVals(your_assignment_dict) to assign those values
Run LpProblem.valid() to check that your assignment meets all constraints and variable restrictions
Note that this will almost certainly be slower than using numpy and Ax <= b. Formulating the problem might be easier, but performance will suffer due to how PULP runs these checks.
You can stay in numpy and accomplish this. Looking at a single line from a matrix you can set your row of A equal to a vector and then create a row sum that allows you to check the index and find if it is true. For example:
a = A[0, :]
row_sum = a*x
sum(row_sum) <= B[0]
The last line will return just True or False. Then if you want to change a single index you could update your row_sum array by using
row_sum[3] = a[3]*new_val
and run your analysis again.
I have read this blog which shows how an algorithm had a 250x speed-up by using numpy. I have tried to improve the following code by using numpy but I couldn't make it work:
for i in nodes[1:]:
for lb in range(2, diameter+1):
not_valid_colors = set()
valid_colors = set()
for j in nodes:
if j == i:
break
if distances[i-1, j-1] >= lb:
not_valid_colors.add(c[j, lb])
else:
valid_colors.add(c[j, lb])
c[i, lb] = choose_color(not_valid_colors, valid_colors)
return c
Explanation
The code above is part of an algorithm used to calculate the self similar dimension of a graph. It works basically by constructing dual graphs G' where a node is connected to each other node if the distance between them is greater or equals to a given value (Lb) and then compute the graph coloring on those dual networks.
The algorithm description is the following:
Assign a unique id from 1 to N to all network nodes, without assigning any colors yet.
For all Lb values, assign a color value 0 to the node with id=1, i.e. C_1l = 0.
Set the id value i = 2. Repeat the following until i = N.
a) Calculate the distance l_ij from i to all the nodes in the network with id j less than i.
b) Set Lb = 1
c) Select one of the unused colors C[ j][l_ij] from all nodes j < i for which l_ij ≥ Lb . This is the color C[i][Lb] of node i for the given Lb value.
d) Increase Lb by one and repeat (c) until Lb = Lb_max.
e) Increase i by 1.
I wrote it in python but it takes more than a minute when try to use it with small networks which have 100 nodes and p=0.9.
As I'm still new to python and numpy I did not find the way to improve its efficiency.
Is it possible to remove the loops by using the numpy.where to find where the paths are longer than the given Lb? I tried to implement it but didn't work...
Vectorized operations with numpy arrays are fast since actual calculations are done with underlying libraries such as BLAS and LAPACK without Python overheads. With loop-intensive operations, you will not see those benefits.
You usually have to figure out a way to vectorize operations (usually possible with a smart use of array slicing). Some operations are inherently loop-intensive, however, and sometimes it is not easy to vectorize them (which seems to be the case for your code).
In those cases, you can first try Numba, which generates optimized machine code from a Python function without any modifications. (You just annotate the function and it will automatically do it for you). I do not have a lot of experience with it, and have not tried using this for complicated functions.
If this does not work, then you can use Cython, which converts Python-like code (with typed variables) into efficient C code automatically and generates a Python extension module that you can import and use in Python. That will usually give you at least an order of magnitude (usually two orders of magnitude) speedup for loop-intensive operations. I generally find Cython easy to use since unlike pure C, one can access your numpy arrays directly in Cython code.
I recommend using Anaconda Python distribution, since you will be able to install these packages easily. I'm sorry I don't have a specific answer for your code.
if you want to go to numpy, you can just change the lists into arrays,
for example distances[i-1][j-1] becomes distances[i-1, j-1] after you declare distances as a numpy array. same with c[i][lb]. About valid_colors and not_valid_colors you should think a bit more because with numpy arrays you cannot append things: the array have fixed length, so you should fix a maximum size before. Another idea is that after you have everything in numpy, you can cythonize your code http://docs.cython.org/src/tutorial/cython_tutorial.html it means that all your loops will become very fast. In any case, if you don't want cython and you look at the blog, you see that distances is declared as an array in the main()
I am new to numpy, and I'm already a little sick of its syntax.
Something which could be written like this in Octave/matlab
1/(2*m) * (X * theta - y)' * (X*theta -y)
Becomes this in numpy
np.true_divide(((X.dot(theta)-y).transpose()).dot((X.dot(theta)-y)),2*m)
This is much harder for me to write and debug. Is there any better way to write matrix operations like above so as to make life easier?
You can make some simplifications. By using from __future__ import division at the beginning of your program, all division will automatically be "true" division, so you won't need to use true_divide. (In Python 3 you don't even need to do this, since true division is automatically the default.) Also, you can use .T instead of .transpose(). Your code then becomes
1/(2*m) * ((X.dot(theta) - y).T).dot((X.dot(theta) - y))
which is a bit better.
In Python 3.5, a new matrix multiplication operator # is being added for basically this exact reason. This is not out yet, but when it is (and when numpy is updated to make use of it), your code will become very similar to the Octave version:
1/(2*m) * (X#theta - y).T # (X#theta - y)
You could try using np.matrix instead of np.ndarray for 2-dimensional arrays. It overloads the * operator so that it means matrix multiplication, so you can do away with all the .dots. Here are the docs.
There is a better way, but you will have to consult the numpy documentation to find it.
This page lists a bunch of equivalencies between matlab and numpy with simpler syntax. For example, a.transpose() can be written as a.T.
You can also look at the individual documentation for these functions, such as the one for true_divide which explains that the Python 3 / method works to do the same.
Good day, I'm writing a Python module for some numeric work. Since there's a lot of stuff going on, I've been spending the last few days optimizing code to improve calculations times.
However, I have a question concerning Numba.
Basically, I have a class with some fields which are numpy arrays, which I initialize in the following way:
def init(self):
a = numpy.arange(0, self.max_i, 1)
self.vibr_energy = self.calculate_vibr_energy(a)
def calculate_vibr_energy(i):
return numpy.exp(-self.harmonic * i - self.anharmonic * (i ** 2))
So, the code is vectorized, and using Numba's JIT results in some improvement. However, sometimes I need to access the calculate_vibr_energy function from outside the class, and pass a single integer instead of an array in place of i.
As far as I understand, if I use Numba's JIT on the calculate_vibr_energy, it will have to always take an array as an argument.
So, which of the following options is better:
1) Create a new function calculate_vibr_energy_single(i), which will only take a single integer number, and use Numba on it too
2) Replace all usages of the function that are similar to this one:
myclass.calculate_vibr_energy(1)
with this:
tmp = np.array([1])
myclass.calculate_vibr_energy(tmp)[0]
Or are there other, more efficient (or at least, more Python-ic) ways of doing that?
I have only played a little with numba yet so I may be mistaken, but as far as I've understood it, using the "autojit" decorator should give functions that can take arguments of any type.
See e.g. http://numba.pydata.org/numba-doc/dev/pythonstuff.html
There's this script called svnmerge.py that I'm trying to tweak and optimize a bit. I'm completely new to Python though, so it's not easy.
The current problem seems to be related to a class called RevisionSet in the script. In essence what it does is create a large hashtable(?) of integer-keyed boolean values. In the worst case - one for each revision in our SVN repository, which is near 75,000 now.
After that it performs set operations on such huge arrays - addition, subtraction, intersection, and so forth. The implementation is the simplest O(n) implementation, which, naturally, gets pretty slow on such large sets. The whole data structure could be optimized because there are long spans of continuous values. For example, all keys from 1 to 74,000 might contain true. Also the script is written for Python 2.2, which is a pretty old version and we're using 2.6 anyway, so there could be something to gain there too.
I could try to cobble this together myself, but it would be difficult and take a lot of time - not to mention that it might be already implemented somewhere. Although I'd like the learning experience, the result is more important right now. What would you suggest I do?
You could try doing it with numpy instead of plain python. I found it to be very fast for operations like these.
For example:
# Create 1000000 numbers between 0 and 1000, takes 21ms
x = numpy.random.randint(0, 1000, 1000000)
# Get all items that are larger than 500, takes 2.58ms
y = x > 500
# Add 10 to those items, takes 26.1ms
x[y] += 10
Since that's with a lot more rows, I think that 75000 should not be a problem either :)
Here's a quick replacement for RevisionSet that makes it into a set. It should be much faster. I didn't fully test it, but it worked with all of the tests that I did. There are undoubtedly other ways to speed things up, but I think that this will really help because it actually harnesses the fast implementation of sets rather than doing loops in Python which the original code was doing in functions like __sub__ and __and__. The only problem with it is that the iterator isn't sorted. You might have to change a little bit of the code to account for this. I'm sure there are other ways to improve this, but hopefully it will give you a good start.
class RevisionSet(set):
"""
A set of revisions, held in dictionary form for easy manipulation. If we
were to rewrite this script for Python 2.3+, we would subclass this from
set (or UserSet). As this class does not include branch
information, it's assumed that one instance will be used per
branch.
"""
def __init__(self, parm):
"""Constructs a RevisionSet from a string in property form, or from
a dictionary whose keys are the revisions. Raises ValueError if the
input string is invalid."""
revision_range_split_re = re.compile('[-:]')
if isinstance(parm, set):
print "1"
self.update(parm.copy())
elif isinstance(parm, list):
self.update(set(parm))
else:
parm = parm.strip()
if parm:
for R in parm.split(","):
rev_or_revs = re.split(revision_range_split_re, R)
if len(rev_or_revs) == 1:
self.add(int(rev_or_revs[0]))
elif len(rev_or_revs) == 2:
self.update(set(range(int(rev_or_revs[0]),
int(rev_or_revs[1])+1)))
else:
raise ValueError, 'Ill formatted revision range: ' + R
def sorted(self):
return sorted(self)
def normalized(self):
"""Returns a normalized version of the revision set, which is an
ordered list of couples (start,end), with the minimum number of
intervals."""
revnums = sorted(self)
revnums.reverse()
ret = []
while revnums:
s = e = revnums.pop()
while revnums and revnums[-1] in (e, e+1):
e = revnums.pop()
ret.append((s, e))
return ret
def __str__(self):
"""Convert the revision set to a string, using its normalized form."""
L = []
for s,e in self.normalized():
if s == e:
L.append(str(s))
else:
L.append(str(s) + "-" + str(e))
return ",".join(L)
Addition:
By the way, I compared doing unions, intersections and subtractions of the original RevisionSet and my RevisionSet above, and the above code is from 3x to 7x faster for those operations when operating on two RevisionSets that have 75000 elements. I know that other people are saying that numpy is the way to go, but if you aren't very experienced with Python, as your comment indicates, then you might not want to go that route because it will involve a lot more changes. I'd recommend trying my code, seeing if it works and if it does, then see if it is fast enough for you. If it isn't, then I would try profiling to see what needs to be improved. Only then would I consider using numpy (which is a great package that I use quite frequently).
For example, all keys from 1 to 74,000 contain true
Why not work on a subset? Just 74001 to the end.
Pruning 74/75th of your data is far easier than trying to write an algorithm more clever than O(n).
You should rewrite RevisionSet to have a set of revisions. I think the internal representation for a revision should be an integer and revision ranges should be created as needed.
There is no compelling reason to use code that supports python 2.3 and earlier.
Just a thought. I used to do this kind of thing using run-coding in binary image manipulation. That is, store each set as a series of numbers: number of bits off, number of bits on, number of bits off, etc.
Then you can do all sorts of boolean operations on them as decorations on a simple merge algorithm.