I have a function written in python which does two procedures:
Preprocessing: read in data from an array and compute some values that I will later need to prevent repeated computation
Iterate and compute a 'summary' of the data at every stage and use this to solve an optimisation problem.
The code is as follows:
import numpy as np
def iterative_hessian(data, targets,
sketch_method, sketch_size, num_iters):
'''
Original problem is min 0.5*||Ax-b||_2^2
iterative_hessian asks us to minimise 0.5*||S_Ax||_2^2 - <A^Tb, x>
for a summary of the data S_A
'''
A = data
y = targets
n,d = A.shape
x0 = np.zeros(shape=(d,))
m = int(sketch_size) # sketching dimension
ATy = A.T#y
covariance_mat = A.T.dot(A)
for n_iter in range(int(num_iters)):
S_A = m**(-0.5)*np.random.normal(size=(m, n))
B = S_A.T.dot(S_A)
z = ATy - covariance_mat#x0 + np.dot(S_A.T, np.dot(S_A,x0)) #
x_new = np.linalg.solve(B,z)
x0 = x_new
return np.ravel(x0)
In practise I do not use the S_A = m**(-0.5)*np.random.normal(size=(m, n)) line but use a different random transform which is faster to apply but in principle it is sufficient for the question. This code works well for what I need but I was wondering if there is a reasonable way to do the following:
Instead of repeating the line S_A = m**(-0.5)*np.random.normal(size=(m, n)) for every iteration, is there a way to specify the number of independent random copies (num_iters - which can be thought of as between 10 and 30) of S_A that are needed prior to the iteration and scan through the input only once to generate all such copies? I think this would store the S_A variables in some kind of multi-dimensional array but I'm not sure how best to do this, or whether it is even practical. I have tried a basic example doing this in parallel but it is slower than repeatedly passing through the matrix.
Suppose that I want to endow this function with more properties, for instance I want to return the average time taken on line x_new = np.linalg.solve(B,z). Doing this is straightforward - import a time module and put the code in the function, however, this will always time the function and perhaps I only want to do this when testing. An easy way around this is to create a parameter in the function definition time_updates = False and then have if time_updates == False: proceed as above else: copy the exact same code but with some timing functionality added. Is there a better way to do this which can perhaps use classes in Python?
My intention is to use this iteration on blocks of data read in from a file which doesn't fit into memory. Whilst it might be possible to store a block in memory, it would be convenient if the function only passed over that block once rather than num_iters times. Passing over the quantities computed , S_A, covariance_matrix etc, is fine however.
Related
After some researches on StackOverflow, i didn't find a simple answer to my problem. So I share with you my code in order to find some help.
S=np.random.random((495,930,495,3,3))
#The shape of S is (495,930,495,3,3)
#I want to calculate for each small array (z,y,x,3,3) some features
for z in range(S.shape[0]):
for y in range(S.shape[1]):
for x in range(S.shape[2]):
res[z,y,x,0]=np.array(np.linalg.det(S[z,y,x])/np.trace(S[z,y,x]))
res[z,y,x,1]=np.array(S[z,y,x].mean())
res[z,y,x,2:]=np.array(np.linalg.eigvals(S[z,y,x]))
Here is my problem. The size of the S array is huge. So I was wondering if it is possible to make this for loop faster.
I had to reduce the shape to (49,93,49,3,3) so that it runs in acceptable time on my hardware. I was able to shave off 5-10% by avoiding unnecessary work (not optimizing your algorithm). Unnecessary work includes, but is not limited to:
Performing (global) lookups
Calculating the same value several times
You might also want to try a different python runtime, such as PyPy instead of CPython.
Here is my updated version of your script:
#!/usr/bin/python
import numpy as np
def main():
# avoid lookups
array = np.array
trace = np.trace
eigvals = np.linalg.eigvals
det = np.linalg.det
#The shape of S is (495,930,495,3,3)
shape = (49,93,49,3,3) # so my computer can run it
S=np.random.random(shape)
res = np.ndarray(shape) # missing from the question, I hope this is correct
#I want to calculate for each small array (z,y,x,3,3) some features
# get shape only once, instead of z times for shape1 and z*y times for shape2
shape1 = S.shape[1]
shape2 = S.shape[2]
for z in range(S.shape[0]):
for y in range(shape1):
for x in range(shape2):
# get value once instead of 4 times
s = S[z,y,x]
res[z,y,x,0]=array(det(s)/trace(s))
res[z,y,x,1]=array(s.mean())
res[z,y,x,2:]=array(eigvals(s))
# function to have local (vs. global) lookups
main()
Runtime was reduced from 25 to 23 seconds (measured with hyperfine).
Useful references:
Why does Python code run faster in a function?
Python: Two simple functions, Why is the first one faster than the second one
Python import X or from X import Y? (performance)
Is there a performance cost putting python imports inside functions?
I have some code that is calculating the value of a large number of discrete actions and outputting the best action and it's value.
A_max = 0
for i in...
A = f(i)
if A > A_max
x = i
A_max = A
I'd like to parallelize this code in order to save time. Now, my understanding is that as calculating f(i) doesn't depend on calculating f(j) first, I can just use joblib.Parallel for that part of the code and get something like:
results = Parallel(n_jobs=-1)(delayed(f)(i) for i in...)
A_max = max(results)
x = list.index(A_max)
is this correct?
My next issue is that my code contains a dictionary that the function f alters as it does it calculation. My understanding is that if the code is parallelized, each concurrent process will be altering the same dictionary. Is this correct and if so would creating copies of the dictionary at the beginning of f solve the issue?
Finally, in the documentation I'm seeing references to backends called "Lorky" and "threading", what is the difference between these backends?
I am looking for the correct approach to use a variable number of parameters as input for the optimizer in scipy.
I have a set of input parameters p1,...,pn and I calculate a quality criteria with a function func(p1,...,pn). I want to minimize this value.
The input parameters are either 0 or 1 indicating they should be used or not. I cannot simply delete all unused ones from the parameter list, since my function for the quality criteria requires them to be "0" to remove unused terms from equations.
def func(parameters):
...calculate one scalar as quality criteria...
solution = optimize.fmin_l_bfgs_b(func,parameters,approx_grad=1,bounds=((0.0, 5.0),...,(0.0,5.0)) # This will vary all parameters
Within my code the optimizer runs without errors, but of course all given parameters are changed to achieve the best solution.
Is there a way to have e.g. 10 input parameters for func, but only 5 of them are used in the optimizer?
So far I can only think of changing my func definition in a way that I will not need the "0" input from unused parameters. I would appreciate any ideas how to avoid that.
Thanks a lot for the help!
If I understand correctly, you are asking for a constrained best fit, such that rather than finding the best [p0,p1,p2...p10] for function func(), you want to find the best best [p0, p1, ...p5] for function func() under a condition that p6=fixed6, p7=fixed7, p8=fixed8... and so on.
Translate it into python code is straight forward if you use args=(somthing) in scipy.optimize.fmin_l_bfgs_b. Firstly, write a partially fixed function func_fixed()
def func_fixed(p_var, p_fixed):
return func(p_var+p_fixed)
# this will only work if both of them are lists. If they are numpy arrays, use hstack, append or similar
solution = optimize.fmin_l_bfgs_b(func_fixed,x0=guess_parameters,\
approx_grad=your_grad,\
bounds=your_bounds,\
args=(your_fixed_parameters), \ #this is the deal
other_things)
It is not necessary to have func_fixed(), you can use lambda. But it reads much easier this way.
I recently solved a similar problem where I want to optimise a different subset of parameters at each run but need all parameters to calculate the objective function. I added two arguments to my objective function:
an index array x_idx which indicates which parameters to optimise, i.e. 0 don't optimise and 1 optimise
an array x0 with the initial values of all parameters
In the objective function I set the list of the parameters according to the index array either to the parameters which are to be optimised or the initial values.
import numpy
import scipy.optimize
def objective_function(x_optimised, x_idx, x0):
x = []
j = 0
for i, idx in enumerate(x_idx):
if idx is 1:
x.append(x_optimised[j])
j = j + 1
else:
x.append(x0[i])
x = numpy.array(x)
return sum(x**2)
if __name__ == '__main__':
x_idx = [1, 1, 0]
x0 = [1.1, 1.3, 1.5]
x_initial = [x for i, x in enumerate(x0) if x_idx[i] is 1]
xopt, fopt, iter, funcalls, warnflag = scipy.optimize.fmin(objective_function, \
x_initial, args=(x_idx, x0,), \
maxfun = 200, full_output=True)
print xopt
I had a pretty compact way of computing the partition function of an Ising-like model using itertools, lambda functions, and large NumPy arrays. Given a network consisting of N nodes and Q "states"/node, I have two arrays, h-fields and J-couplings, of sizes (N,Q) and (N,N,Q,Q) respectively. J is upper-triangular, however. Using these arrays, I have been computing the partition function Z using the following method:
# Set up lambda functions and iteration tuples of the form (A_1, A_2, ..., A_n)
iters = itertools.product(range(Q),repeat=N)
hf = lambda s: h[range(N),s]
jf = lambda s: np.array([J[fi,fj,s[fi],s[fj]] \
for fi,fj in itertools.combinations(range(N),2)]).flatten()
# Initialize and populate partition function array
pf = np.zeros(tuple([Q for i in range(N)]))
for it in iters:
hterms = np.exp(hf(it)).prod()
jterms = np.exp(-jf(it)).prod()
pf[it] = jterms * hterms
# Calculates partition function
Z = pf.sum()
This method works quickly for small N and Q, say (N,Q) = (5,2). However, for larger systems (N,Q) = (18,3), this method cannot even create the pf array due to memory issues because it has Q^N nontrivial elements. Any ideas on how to either overcome this memory issue or how to alter the code to work on subarrays?
Edit: Made a small mistake in the definition of jf. It has been corrected.
You can avoid the large array just by initializing Z to 0, and incrementing it by jterms * iterms in each iteration. This still won't get you out of calculating and summing Q^N numbers, however. To do that, you probably need to figure out a way to simplify the partition function algebraically.
Not sure what you are trying to compute but I tested your code with ChrisB suggestion and jf will not work for Q=3.
Perhaps you shouldn't use a dense numpy array to encode your function? You could try sparse arrays or just straight Python with Numba compilation. This blogpost shows using Numba on the simple Ising model with good performance.
I'm just starting with NumPy so I may be missing some core concepts...
What's the best way to create a NumPy array from a dictionary whose values are lists?
Something like this:
d = { 1: [10,20,30] , 2: [50,60], 3: [100,200,300,400,500] }
Should turn into something like:
data = [
[10,20,30,?,?],
[50,60,?,?,?],
[100,200,300,400,500]
]
I'm going to do some basic statistics on each row, eg:
deviations = numpy.std(data, axis=1)
Questions:
What's the best / most efficient way to create the numpy.array from the dictionary? The dictionary is large; a couple of million keys, each with ~20 items.
The number of values for each 'row' are different. If I understand correctly numpy wants uniform size, so what do I fill in for the missing items to make std() happy?
Update: One thing I forgot to mention - while the python techniques are reasonable (eg. looping over a few million items is fast), it's constrained to a single CPU. Numpy operations scale nicely to the hardware and hit all the CPUs, so they're attractive.
You don't need to create numpy arrays to call numpy.std().
You can call numpy.std() in a loop over all the values of your dictionary. The list will be converted to a numpy array on the fly to compute the standard variation.
The downside of this method is that the main loop will be in python and not in C. But I guess this should be fast enough: you will still compute std at C speed, and you will save a lot of memory as you won't have to store 0 values where you have variable size arrays.
If you want to further optimize this, you can store your values into a list of numpy arrays, so that you do the python list -> numpy array conversion only once.
if you find that this is still too slow, try to use psycho to optimize the python loop.
if this is still too slow, try using Cython together with the numpy module. This Tutorial claims impressive speed improvements for image processing. Or simply program the whole std function in Cython (see this for benchmarks and examples with sum function )
An alternative to Cython would be to use SWIG with numpy.i.
if you want to use only numpy and have everything computed at C level, try grouping all the records of same size together in different arrays and call numpy.std() on each of them. It should look like the following example.
example with O(N) complexity:
import numpy
list_size_1 = []
list_size_2 = []
for row in data.itervalues():
if len(row) == 1:
list_size_1.append(row)
elif len(row) == 2:
list_size_2.append(row)
list_size_1 = numpy.array(list_size_1)
list_size_2 = numpy.array(list_size_2)
std_1 = numpy.std(list_size_1, axis = 1)
std_2 = numpy.std(list_size_2, axis = 1)
While there are already some pretty reasonable ideas present here, I believe following is worth mentioning.
Filling missing data with any default value would spoil the statistical characteristics (std, etc). Evidently that's why Mapad proposed the nice trick with grouping same sized records.
The problem with it (assuming there isn't any a priori data on records lengths is at hand) is that it involves even more computations than the straightforward solution:
at least O(N*logN) 'len' calls and comparisons for sorting with an effective algorithm
O(N) checks on the second way through the list to obtain groups(their beginning and end indexes on the 'vertical' axis)
Using Psyco is a good idea (it's strikingly easy to use, so be sure to give it a try).
It seems that the optimal way is to take the strategy described by Mapad in bullet #1, but with a modification - not to generate the whole list, but iterate through the dictionary converting each row into numpy.array and performing required computations. Like this:
for row in data.itervalues():
np_row = numpy.array(row)
this_row_std = numpy.std(np_row)
# compute any other statistic descriptors needed and then save to some list
In any case a few million loops in python won't take as long as one might expect. Besides this doesn't look like a routine computation, so who cares if it takes extra second/minute if it is run once in a while or even just once.
A generalized variant of what was suggested by Mapad:
from numpy import array, mean, std
def get_statistical_descriptors(a):
if ax = len(shape(a))-1
functions = [mean, std]
return f(a, axis = ax) for f in functions
def process_long_list_stats(data):
import numpy
groups = {}
for key, row in data.iteritems():
size = len(row)
try:
groups[size].append(key)
except KeyError:
groups[size] = ([key])
results = []
for gr_keys in groups.itervalues():
gr_rows = numpy.array([data[k] for k in gr_keys])
stats = get_statistical_descriptors(gr_rows)
results.extend( zip(gr_keys, zip(*stats)) )
return dict(results)
numpy dictionary
You can use a structured array to preserve the ability to address a numpy object by a key, like a dictionary.
import numpy as np
dd = {'a':1,'b':2,'c':3}
dtype = eval('[' + ','.join(["('%s', float)" % key for key in dd.keys()]) + ']')
values = [tuple(dd.values())]
numpy_dict = np.array(values, dtype=dtype)
numpy_dict['c']
will now output
array([ 3.])