After some researches on StackOverflow, i didn't find a simple answer to my problem. So I share with you my code in order to find some help.
S=np.random.random((495,930,495,3,3))
#The shape of S is (495,930,495,3,3)
#I want to calculate for each small array (z,y,x,3,3) some features
for z in range(S.shape[0]):
for y in range(S.shape[1]):
for x in range(S.shape[2]):
res[z,y,x,0]=np.array(np.linalg.det(S[z,y,x])/np.trace(S[z,y,x]))
res[z,y,x,1]=np.array(S[z,y,x].mean())
res[z,y,x,2:]=np.array(np.linalg.eigvals(S[z,y,x]))
Here is my problem. The size of the S array is huge. So I was wondering if it is possible to make this for loop faster.
I had to reduce the shape to (49,93,49,3,3) so that it runs in acceptable time on my hardware. I was able to shave off 5-10% by avoiding unnecessary work (not optimizing your algorithm). Unnecessary work includes, but is not limited to:
Performing (global) lookups
Calculating the same value several times
You might also want to try a different python runtime, such as PyPy instead of CPython.
Here is my updated version of your script:
#!/usr/bin/python
import numpy as np
def main():
# avoid lookups
array = np.array
trace = np.trace
eigvals = np.linalg.eigvals
det = np.linalg.det
#The shape of S is (495,930,495,3,3)
shape = (49,93,49,3,3) # so my computer can run it
S=np.random.random(shape)
res = np.ndarray(shape) # missing from the question, I hope this is correct
#I want to calculate for each small array (z,y,x,3,3) some features
# get shape only once, instead of z times for shape1 and z*y times for shape2
shape1 = S.shape[1]
shape2 = S.shape[2]
for z in range(S.shape[0]):
for y in range(shape1):
for x in range(shape2):
# get value once instead of 4 times
s = S[z,y,x]
res[z,y,x,0]=array(det(s)/trace(s))
res[z,y,x,1]=array(s.mean())
res[z,y,x,2:]=array(eigvals(s))
# function to have local (vs. global) lookups
main()
Runtime was reduced from 25 to 23 seconds (measured with hyperfine).
Useful references:
Why does Python code run faster in a function?
Python: Two simple functions, Why is the first one faster than the second one
Python import X or from X import Y? (performance)
Is there a performance cost putting python imports inside functions?
Related
In my newbie Python 3.7 project, the arguments in many functions are numpy.ndarray's. These must be two-dimensional r x n matrices. The row dimension r is essential: some functions require 1 x n vectors, others 2 x n matrices, with r up to three and possibly more. There're also functions defined for any r x n array. (The column dimension n is not essential for design purposes.)
From my Matlab experience, this requirement can get confusing and error-prone. So I've considered the following approaches:
Document the method arguments (of course!)
Unit tests (of course!)
Do validation and throw exceptions inside some functions. (However, this is not very functional, nor performant.)
Define data classes: OneRow, TwoRows, ThreeRows and FourPlusRows. Each has an ndarray field, validated in the constructor. The upside includes type hints and a better domain modelling, a la DDD. A downside is extra complexity.
Question: Given the type hints introduced in Python 3 and the trend towards functional programming, what's the current pythonic approach to this problem?
One of the best things about Python is duck typing, and Numpy is in general very compatible with that design approach. Say you have a vector-only function vecfunc. You can add some boilerplate to the beginning of the function that will inflate any 1D arrays into 1 x n vectors:
def vecfunc(arr):
if arr.ndim==1:
arr = arr[None, :]
...function body goes here...
This will avoid any problems due to arr having too few dimensions, and will likely still give correct behavior in most cases. However, it doesn't do anything to prevent a user from passing in, say, a r x n x m array, or a 15 x n array. Ultimately, you're going to have to go with approach 3. for a bunch of this stuff and just throw some exceptions where it seems appropriate. For example:
def vecfunc(arr):
if not 0 < arr.ndim < 3:
raise ValueError("arr must have ndim of 1 or 2. arr.ndim: %d" % arr.ndim)
elif arr.ndim==1:
arr = arr[None, :]
If it makes you feel any better, the code bases of both numpy and scipy have those kinds of shape-based exception checks in a number of functions, when and where they're needed.
Of course, you could always leave off adding those kinds of exception checks until the very end of developing any given function. You may be surprised at the range of input that produces reasonable behavior.
If you're dead set on type annotations, you can get something similar by writing your code using Cython. For example, if you wanted an add function that only took 2D integer arrays, you could write the following function in a .pyx file:
import numpy as np
def add(long[:, :] arr1, long[:, :] arr2):
assert tuple(arr1.shape) == tuple(arr2.shape)
result = np.zeros((arr1.shape[0], arr1.shape[1]), dtype=np.long)
cdef long[:, :] result_view = result
for x in range(arr1.shape[0]):
for y in range(arr1.shape[1]):
result_view[x, y] = arr1[x, y] + arr2[x, y]
return result
For more details on writing and compiling Cython, see the docs linked above.
This isn't so much "type annotations" as it is actual strong typing, but it may do what you want. Sadly, I wasn't able to find a way to fix the size of a single dimension, just the total number of dimensions.
I have a function written in python which does two procedures:
Preprocessing: read in data from an array and compute some values that I will later need to prevent repeated computation
Iterate and compute a 'summary' of the data at every stage and use this to solve an optimisation problem.
The code is as follows:
import numpy as np
def iterative_hessian(data, targets,
sketch_method, sketch_size, num_iters):
'''
Original problem is min 0.5*||Ax-b||_2^2
iterative_hessian asks us to minimise 0.5*||S_Ax||_2^2 - <A^Tb, x>
for a summary of the data S_A
'''
A = data
y = targets
n,d = A.shape
x0 = np.zeros(shape=(d,))
m = int(sketch_size) # sketching dimension
ATy = A.T#y
covariance_mat = A.T.dot(A)
for n_iter in range(int(num_iters)):
S_A = m**(-0.5)*np.random.normal(size=(m, n))
B = S_A.T.dot(S_A)
z = ATy - covariance_mat#x0 + np.dot(S_A.T, np.dot(S_A,x0)) #
x_new = np.linalg.solve(B,z)
x0 = x_new
return np.ravel(x0)
In practise I do not use the S_A = m**(-0.5)*np.random.normal(size=(m, n)) line but use a different random transform which is faster to apply but in principle it is sufficient for the question. This code works well for what I need but I was wondering if there is a reasonable way to do the following:
Instead of repeating the line S_A = m**(-0.5)*np.random.normal(size=(m, n)) for every iteration, is there a way to specify the number of independent random copies (num_iters - which can be thought of as between 10 and 30) of S_A that are needed prior to the iteration and scan through the input only once to generate all such copies? I think this would store the S_A variables in some kind of multi-dimensional array but I'm not sure how best to do this, or whether it is even practical. I have tried a basic example doing this in parallel but it is slower than repeatedly passing through the matrix.
Suppose that I want to endow this function with more properties, for instance I want to return the average time taken on line x_new = np.linalg.solve(B,z). Doing this is straightforward - import a time module and put the code in the function, however, this will always time the function and perhaps I only want to do this when testing. An easy way around this is to create a parameter in the function definition time_updates = False and then have if time_updates == False: proceed as above else: copy the exact same code but with some timing functionality added. Is there a better way to do this which can perhaps use classes in Python?
My intention is to use this iteration on blocks of data read in from a file which doesn't fit into memory. Whilst it might be possible to store a block in memory, it would be convenient if the function only passed over that block once rather than num_iters times. Passing over the quantities computed , S_A, covariance_matrix etc, is fine however.
This question already has an answer here:
Recursive definitions in Pandas
(1 answer)
Closed 7 years ago.
I'm trying to implement a low pass filter on accelerometer data(with x-acceleration(ax), y-acceleration(ay), z-acceleration(az))
I have calculated my alpha to be 0.2
DC component along the x direction is calculated using the formula
new_ax[n] = (1-alpha)*new_ax[n-1] + (alpha * ax[n])
I'm able to calculate this for a small dataset with few thousand records. But I have a dataset with a million records and it takes forever to run with the below code. I would appreciate any help to improvise my code for time complexity.
### df is a pandas dataframe object
n_ax = []
seq = range(0, 1000000, 128)
for w in range(len(seq)):
prev_x = 0
if w+1 <= len(seq):
subdf = df[seq[w]:seq[w+1]]
for i in range(len(subdf)):
n_ax.append((1-alpha)*prev_x + (alpha*subdf.ax[i]))
prev_x = n_ax[i]
First it seems you don't need
if w+1 <= len(seq):
the w variable will not surpass len(seq).
So to decrease time processing just use numpy module:
import numpy;
Here you will find arrays and methods that are much faster than built-in list. For example instead of looping trough every element in a numpy array to do some processing you can apply a numpy function directly on the array and get the results in seconds rather than hours. as an example:
data = numpy.arange(0, 1000000, 128);
shiftData = numpy.arange(128, 1000000, 128);
result = (1-alpha)*data[:-1] + shiftdata;
Check some tutorials on numpy. I use this module for processing image data and by comparison looping through lists would have taken me 2 weeks to processes 5000+ image while using numpy types takes maximum 2 minutes.
Assuming you are using python 2.7.
Use xrange.
Computing len(seq) inside the loop is not necessary, since its value is not changing.
Accessing seq it is not really needed, since you can compute it on the fly.
You don't really need the if statement, since in your code it always evaluate to true (w in range(len(seq)) means w maximium value will be len(seq)-1).
The slicing you are doing to get subdf is not really necessary, since you can access directly df (and slicing creates a new list).
See the code below.
n_ax = []
SUB_SAMPLE = 128
SAMPLE_LEN = 1000000
seq_len = SAMPLE_LEN/SUB_SAMPLE
for w in xrange(seq_len):
prev_x = 0
for i in xrange(w*SUB_SAMPLE,(w+1)*SUB_SAMPLE):
new_x = (1-alpha)*prev_x + (alpha*df.ax[i])
n_ax.append(new_x)
prev_x = new_x
I cannot think any other obvious optimization. If this is still slow, perhaps you should consider copying df data to a python native data type. If these are all floats, use python array which gives very good performance.
And if you still need better performance, you can try parallelism with the multiprocessing module, or write a C module that takes an array in memory and does the computation, and call it with ctypes python library.
I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.
I had a pretty compact way of computing the partition function of an Ising-like model using itertools, lambda functions, and large NumPy arrays. Given a network consisting of N nodes and Q "states"/node, I have two arrays, h-fields and J-couplings, of sizes (N,Q) and (N,N,Q,Q) respectively. J is upper-triangular, however. Using these arrays, I have been computing the partition function Z using the following method:
# Set up lambda functions and iteration tuples of the form (A_1, A_2, ..., A_n)
iters = itertools.product(range(Q),repeat=N)
hf = lambda s: h[range(N),s]
jf = lambda s: np.array([J[fi,fj,s[fi],s[fj]] \
for fi,fj in itertools.combinations(range(N),2)]).flatten()
# Initialize and populate partition function array
pf = np.zeros(tuple([Q for i in range(N)]))
for it in iters:
hterms = np.exp(hf(it)).prod()
jterms = np.exp(-jf(it)).prod()
pf[it] = jterms * hterms
# Calculates partition function
Z = pf.sum()
This method works quickly for small N and Q, say (N,Q) = (5,2). However, for larger systems (N,Q) = (18,3), this method cannot even create the pf array due to memory issues because it has Q^N nontrivial elements. Any ideas on how to either overcome this memory issue or how to alter the code to work on subarrays?
Edit: Made a small mistake in the definition of jf. It has been corrected.
You can avoid the large array just by initializing Z to 0, and incrementing it by jterms * iterms in each iteration. This still won't get you out of calculating and summing Q^N numbers, however. To do that, you probably need to figure out a way to simplify the partition function algebraically.
Not sure what you are trying to compute but I tested your code with ChrisB suggestion and jf will not work for Q=3.
Perhaps you shouldn't use a dense numpy array to encode your function? You could try sparse arrays or just straight Python with Numba compilation. This blogpost shows using Numba on the simple Ising model with good performance.