Optimizing nested Python for loops? - python

I am having some trouble with optimizing the following for loops. I have looked into the map function, comprehension expressions, and a bit of itertools and generators, but am unsure at how to approach optimization for nested loops. Any help or suggestions is greatly appreciated. Thanks in advance!
Note that object architecture is:
self.variables.parameters.index1/index2/value
self.variables.rate
self.state.num
Loop 1:
mat1 = np.zeros(m, n)
for idx, variable in enumerate(self.variables):
for parameter in variable.parameters:
tracking_idx = parameter.index1 + parameter.index2
mat1[tracking_idx, idx] = parameter.value
Loop 2:
mat2 = []
for variable in self.variables:
rate = variable.rate
for parameter in variable.parameters:
if parameter.value < 0 and self.state.num[parameter.index1, parameter.index2] <= 0:
rate = 0
mat2.append(rate)

With a numpy tag I assume 'optimize' means cast these loops as compiled numpy array expressions. I don't think there's a way to do without first collecting all the data as numpy arrays, which will require the same loops.
You have a list of n variables. Each variable has a list of ? parameters. Each parameter has 3 or 4 attributes.
So
rates = np.array([v.rate for v in variables])
values = np.array([[p.value for p in v.parameters] for v in variables]
index1s = <dito> (?,n) array
index2s = <dita> (?,n) array
self.state.num is already a 2d array, size compatible with the range of values in index1s and index2s.
Given those 1 and 2d arrays we should be able to derive mat1 and mat2 with whole array operations. If ? is small relative to m and the range of values in index1 and index2 it might be worth do this. I don't have a realistic feel for your data.
===========
You mention the map function, comprehension expressions, and a bit of itertools and generators. These can make the code look a little cleaner but don't make much difference in speed.
I demonstrated the use of list comprehensions. Those expression can be more elaborate, but often at the cost of readability. I like writing helper functions to hide details. A generator comprehension can replace a list comprehension that feeds another. Maps cover the same functionality.
Since variables and parameters have attributes I assume you have defined them in classes. You could write methods that extract those attributes as simple lists or arrays.
class Variable(....):
....
def get_values(self):
return [p.value for p in self.parameters]
def get_rate(self, state):
rate = self.rate
for parameter in self.parameters:
if parameter.value < 0 and
state.num[parameter.index1, parameter.index2] <= 0:
rate = 0
return rate
values = [v.get_values() for v in variables]
rates = [v.get_rate(self.state) for v in variables]
You could even write these helper function without the class structure.
That doesn't speed up anything; it just hides some details in the object.

Related

Is there an equivalent of Python's list and append feature in Matlab?

This is more of a Matlab programming question than it is a math question.
I'd like to run gradient descent multiple on different learning rates. I have a set of learning rates
alpha = [0.3, 0.1, 0.03, 0.01, 0.003, 0.001];
and each time I run gradient descent, I get a vector J_vals as output. However, I don't know Matlab well enough to know how to implement this besides doing something like:
[theta, J_vals] = gradientDescent(...., alpha(1),...);
J1 = J_vals;
[theta, J_vals] = gradientDescent(...., alpha(2),...);
J2 = J_vals;
and so on.
I thought about using a for loop, but then I don't know how I would deal with the J_vals's (not sure how to apply the for loop to J1, J2, and so on). Perhaps it would look something like this:
for i = len(alpha)
[theta, J_vals] = gradientDescent(..., alpha(i),...);
J(i) = J_vals;
end
Then I would have a vector of vectors.
In Python, I would just run a for loop and append each new result to the end of a list. How do I implement something like this in Matlab? Or is there a more efficient way?
If you know how many loops you are going have and the size of the J_vals (or at least a reasonable upper bound) I would suggest pre-allocating the size of the container array
J = zeros(n,1);
then on each loop insert the new values
J(start:start+n) = J_vals
That way you don't reallocate memory. If you don't know, you can append the values to the array. For example,
J = []; % initialize
for i = len(alpha)
[theta, J_vals] = gradientDescent(..., alpha(i),...);
J = [J; J_vals]; % Append column row
end
but this is re-allocating the size of the array every loop. If it's not too many loops then it should be ok.
Matlab's "cell arrays" are kind of like lists in Python. They are similar in that you can put variable datatypes into them. Nobody seems to be too sure, but most likely the cell array is implemented as an array of object pointers. That means that it is still somewhat expensive to append to it (cell_array{length(cell_array) + 1} = new_data), but at least you are only appending a pointer instead of the entire column. You would still have to convert the cell array to a normal matrix afterward using cell2mat.
The most idiomatic Matlab solution is to pre-allocate (as #dpmcmlxxvi suggested).
I think what you are describing is a really common use case, and it's unfortunate that Matlab requires such a verbose idiom for this. Also it's frustrating that the documentation is opaque on how cell arrays are implemented and whether it is expensive to append to a cell array.
Your solution works just fine as long as you add a : for the row subscript (assuming J_vals is a column vector):
for i = len(alpha)
[theta, J_vals] = gradientDescent(..., alpha(i),...);
J(:, i) = J_vals;
%// ^... all rows, column 'i'
end
You could even put that as the return value:
for i = len(alpha)
[theta, J(:, i)] = gradientDescent(..., alpha(i),...);
%// ^... add returned value directly to our list
end
Both of these methods allow you to preallocate your matrix for a potential speed gain.
If you want to build your list as you go, you can use the method in #dpmcmlxxvi's answer, or you can use the special subscript end. Neither of these methods are compatible with preallocation, though.
for i = len(alpha)
[theta, J(:, end+1)] = gradientDescent(..., alpha(i),...);
%// ^... add new vector after the current end of list
end
I would also like to suggest you not use i as a variable name in Matlab. I know it's natural for other languages, but in Matlab it overwrites the built-in imaginary constant i.
See: https://stackoverflow.com/a/14790765/1377097

Initialize Multiple Numpy Arrays (Multiple Assignment) - Like MATLAB deal()

I was unable to find anything describing how to do this, which leads to be believe I'm not doing this in the proper idiomatic Python way. Advice on the 'proper' Python way to do this would also be appreciated.
I have a bunch of variables for a datalogger I'm writing (arbitrary logging length, with a known maximum length). In MATLAB, I would initialize them all as 1-D arrays of zeros of length n, n bigger than the number of entries I would ever see, assign each individual element variable(measurement_no) = data_point in the logging loop, and trim off the extraneous zeros when the measurement was over. The initialization would look like this:
[dData gData cTotalEnergy cResFinal etc] = deal(zeros(n,1));
Is there a way to do this in Python/NumPy so I don't either have to put each variable on its own line:
dData = np.zeros(n)
gData = np.zeros(n)
etc.
I would also prefer not just make one big matrix, because keeping track of which column is which variable is unpleasant. Perhaps the solution is to make the (length x numvars) matrix, and assign the column slices out to individual variables?
EDIT: Assume I'm going to have a lot of vectors of the same length by the time this is over; e.g., my post-processing takes each log file, calculates a bunch of separate metrics (>50), stores them, and repeats until the logs are all processed. Then I generate histograms, means/maxes/sigmas/etc. for all the various metrics I computed. Since initializing 50+ vectors is clearly not easy in Python, what's the best (cleanest code and decent performance) way of doing this?
If you're really motivated to do this in a one-liner you could create an (n_vars, ...) array of zeros, then unpack it along the first dimension:
a, b, c = np.zeros((3, 5))
print(a is b)
# False
Another option is to use a list comprehension or a generator expression:
a, b, c = [np.zeros(5) for _ in range(3)] # list comprehension
d, e, f = (np.zeros(5) for _ in range(3)) # generator expression
print(a is b, d is e)
# False False
Be careful, though! You might think that using the * operator on a list or tuple containing your call to np.zeros() would achieve the same thing, but it doesn't:
h, i, j = (np.zeros(5),) * 3
print(h is i)
# True
This is because the expression inside the tuple gets evaluated first. np.zeros(5) therefore only gets called once, and each element in the repeated tuple ends up being a reference to the same array. This is the same reason why you can't just use a = b = c = np.zeros(5).
Unless you really need to assign a large number of empty array variables and you really care deeply about making your code compact (!), I would recommend initialising them on separate lines for readability.
Nothing wrong or un-Pythonic with
dData = np.zeros(n)
gData = np.zeros(n)
etc.
You could put them on one line, but there's no particular reason to do so.
dData, gData = np.zeros(n), np.zeros(n)
Don't try dData = gData = np.zeros(n), because a change to dData changes gData (they point to the same object). For the same reason you usually don't want to use x = y = [].
The deal in MATLAB is a convenience, but isn't magical. Here's how Octave implements it
function [varargout] = deal (varargin)
if (nargin == 0)
print_usage ();
elseif (nargin == 1 || nargin == nargout)
varargout(1:nargout) = varargin;
else
error ("deal: nargin > 1 and nargin != nargout");
endif
endfunction
In contrast to Python, in Octave (and presumably MATLAB)
one=two=three=zeros(1,3)
assigns different objects to the 3 variables.
Notice also how MATLAB talks about deal as a way of assigning contents of cells and structure arrays. http://www.mathworks.com/company/newsletters/articles/whats-the-big-deal.html
If you put your data in a collections.defaultdict you won't need to do any explicit initialization. Everything will be initialized the first time it is used.
import numpy as np
import collections
n = 100
data = collections.defaultdict(lambda: np.zeros(n))
for i in range(1, n):
data['g'][i] = data['d'][i - 1]
# ...
How about using map:
import numpy as np
n = 10 # Number of data points per array
m = 3 # Number of arrays being initialised
gData, pData, qData = map(np.zeros, [n] * m)

Numpy Array index problems

I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.

Python - Difficulty Calculating Partition Function using Large NumPy arrays

I had a pretty compact way of computing the partition function of an Ising-like model using itertools, lambda functions, and large NumPy arrays. Given a network consisting of N nodes and Q "states"/node, I have two arrays, h-fields and J-couplings, of sizes (N,Q) and (N,N,Q,Q) respectively. J is upper-triangular, however. Using these arrays, I have been computing the partition function Z using the following method:
# Set up lambda functions and iteration tuples of the form (A_1, A_2, ..., A_n)
iters = itertools.product(range(Q),repeat=N)
hf = lambda s: h[range(N),s]
jf = lambda s: np.array([J[fi,fj,s[fi],s[fj]] \
for fi,fj in itertools.combinations(range(N),2)]).flatten()
# Initialize and populate partition function array
pf = np.zeros(tuple([Q for i in range(N)]))
for it in iters:
hterms = np.exp(hf(it)).prod()
jterms = np.exp(-jf(it)).prod()
pf[it] = jterms * hterms
# Calculates partition function
Z = pf.sum()
This method works quickly for small N and Q, say (N,Q) = (5,2). However, for larger systems (N,Q) = (18,3), this method cannot even create the pf array due to memory issues because it has Q^N nontrivial elements. Any ideas on how to either overcome this memory issue or how to alter the code to work on subarrays?
Edit: Made a small mistake in the definition of jf. It has been corrected.
You can avoid the large array just by initializing Z to 0, and incrementing it by jterms * iterms in each iteration. This still won't get you out of calculating and summing Q^N numbers, however. To do that, you probably need to figure out a way to simplify the partition function algebraically.
Not sure what you are trying to compute but I tested your code with ChrisB suggestion and jf will not work for Q=3.
Perhaps you shouldn't use a dense numpy array to encode your function? You could try sparse arrays or just straight Python with Numba compilation. This blogpost shows using Numba on the simple Ising model with good performance.

Best way to create a NumPy array from a dictionary?

I'm just starting with NumPy so I may be missing some core concepts...
What's the best way to create a NumPy array from a dictionary whose values are lists?
Something like this:
d = { 1: [10,20,30] , 2: [50,60], 3: [100,200,300,400,500] }
Should turn into something like:
data = [
[10,20,30,?,?],
[50,60,?,?,?],
[100,200,300,400,500]
]
I'm going to do some basic statistics on each row, eg:
deviations = numpy.std(data, axis=1)
Questions:
What's the best / most efficient way to create the numpy.array from the dictionary? The dictionary is large; a couple of million keys, each with ~20 items.
The number of values for each 'row' are different. If I understand correctly numpy wants uniform size, so what do I fill in for the missing items to make std() happy?
Update: One thing I forgot to mention - while the python techniques are reasonable (eg. looping over a few million items is fast), it's constrained to a single CPU. Numpy operations scale nicely to the hardware and hit all the CPUs, so they're attractive.
You don't need to create numpy arrays to call numpy.std().
You can call numpy.std() in a loop over all the values of your dictionary. The list will be converted to a numpy array on the fly to compute the standard variation.
The downside of this method is that the main loop will be in python and not in C. But I guess this should be fast enough: you will still compute std at C speed, and you will save a lot of memory as you won't have to store 0 values where you have variable size arrays.
If you want to further optimize this, you can store your values into a list of numpy arrays, so that you do the python list -> numpy array conversion only once.
if you find that this is still too slow, try to use psycho to optimize the python loop.
if this is still too slow, try using Cython together with the numpy module. This Tutorial claims impressive speed improvements for image processing. Or simply program the whole std function in Cython (see this for benchmarks and examples with sum function )
An alternative to Cython would be to use SWIG with numpy.i.
if you want to use only numpy and have everything computed at C level, try grouping all the records of same size together in different arrays and call numpy.std() on each of them. It should look like the following example.
example with O(N) complexity:
import numpy
list_size_1 = []
list_size_2 = []
for row in data.itervalues():
if len(row) == 1:
list_size_1.append(row)
elif len(row) == 2:
list_size_2.append(row)
list_size_1 = numpy.array(list_size_1)
list_size_2 = numpy.array(list_size_2)
std_1 = numpy.std(list_size_1, axis = 1)
std_2 = numpy.std(list_size_2, axis = 1)
While there are already some pretty reasonable ideas present here, I believe following is worth mentioning.
Filling missing data with any default value would spoil the statistical characteristics (std, etc). Evidently that's why Mapad proposed the nice trick with grouping same sized records.
The problem with it (assuming there isn't any a priori data on records lengths is at hand) is that it involves even more computations than the straightforward solution:
at least O(N*logN) 'len' calls and comparisons for sorting with an effective algorithm
O(N) checks on the second way through the list to obtain groups(their beginning and end indexes on the 'vertical' axis)
Using Psyco is a good idea (it's strikingly easy to use, so be sure to give it a try).
It seems that the optimal way is to take the strategy described by Mapad in bullet #1, but with a modification - not to generate the whole list, but iterate through the dictionary converting each row into numpy.array and performing required computations. Like this:
for row in data.itervalues():
np_row = numpy.array(row)
this_row_std = numpy.std(np_row)
# compute any other statistic descriptors needed and then save to some list
In any case a few million loops in python won't take as long as one might expect. Besides this doesn't look like a routine computation, so who cares if it takes extra second/minute if it is run once in a while or even just once.
A generalized variant of what was suggested by Mapad:
from numpy import array, mean, std
def get_statistical_descriptors(a):
if ax = len(shape(a))-1
functions = [mean, std]
return f(a, axis = ax) for f in functions
def process_long_list_stats(data):
import numpy
groups = {}
for key, row in data.iteritems():
size = len(row)
try:
groups[size].append(key)
except KeyError:
groups[size] = ([key])
results = []
for gr_keys in groups.itervalues():
gr_rows = numpy.array([data[k] for k in gr_keys])
stats = get_statistical_descriptors(gr_rows)
results.extend( zip(gr_keys, zip(*stats)) )
return dict(results)
numpy dictionary
You can use a structured array to preserve the ability to address a numpy object by a key, like a dictionary.
import numpy as np
dd = {'a':1,'b':2,'c':3}
dtype = eval('[' + ','.join(["('%s', float)" % key for key in dd.keys()]) + ']')
values = [tuple(dd.values())]
numpy_dict = np.array(values, dtype=dtype)
numpy_dict['c']
will now output
array([ 3.])

Categories

Resources