Is there an easier way to get the sum of all values (assuming they are all numbers) in an ndarray :
import numpy as np
m = np.array([[1,2],[3,4]])
result = 0
(dim0,dim1) = m.shape
for i in range(dim0):
for j in range(dim1):
result += m[i,j]
print result
The above code seems somewhat verbose for a straightforward mathematical operation.
Thanks!
Just use numpy.sum():
result = np.sum(matrix)
or equivalently, the .sum() method of the array:
result = matrix.sum()
By default this sums over all elements in the array - if you want to sum over a particular axis, you should pass the axis argument as well, e.g. matrix.sum(0) to sum over the first axis.
As a side note your "matrix" is actually a numpy.ndarray, not a numpy.matrix - they are different classes that behave slightly differently, so it's best to avoid confusing the two.
Yes, just use the sum method:
result = m.sum()
For example,
In [17]: m = np.array([[1,2],[3,4]])
In [18]: m.sum()
Out[18]: 10
By the way, NumPy has a matrix class which is different than "regular" numpy arrays. So calling a regular ndarray matrix causes some cognitive dissonance. To help others understand your code, you may want to change the name matrix to something else.
Related
I would like to calculate the log-ratios for my 2D array, e.g.
a = np.array([[3,2,1,4], [2,1,1,6], [1,5,9,1], [7,8,2,2], [5,3,7,8]])
The formula is ln(x/g(x)), where g(x) is the geometric mean of each row. I execute it like this:
logvalues = np.array(a) # the values will be overwritten through the code below.
for i in range(len(a)):
row = np.array(a[i])
geo_mean = row.prod()**(1.0/len(row))
flr = lambda x: math.log(x/geo_mean)
logvalues = np.array([flr(x) for x in row])
I was wondering if there is any way to vectorise the above lines (preferably without introducing other modules) to make it more efficient?
This should do the trick:
geo_means = a.prod(1)**(1/a.shape[1])
logvalues = np.log(a/geo_means[:, None])
Another way you could do this is just write the function as though for a single 1-D array, ignoring the 2-D aspect:
def f(x):
return np.log(x / x.prod()**(1.0 / len(x)))
Then if you want to apply it to all rows in a 2-D array (or N-D array):
>>> np.apply_along_axis(f, 1, a)
array([[ 0.30409883, -0.10136628, -0.79451346, 0.5917809 ],
[ 0.07192052, -0.62122666, -0.62122666, 1.17053281],
[-0.95166562, 0.65777229, 1.24555895, -0.95166562],
[ 0.59299864, 0.72653003, -0.65976433, -0.65976433],
[-0.07391256, -0.58473818, 0.26255968, 0.39609107]])
Some other general notes on your attempt:
for i in range(len(a)): If you want to loop over all rows in an array it's generally faster to do simply for row in a. NumPy can optimize this case somewhat, whereas if you do for idx in range(len(a)) then for each index you have to again index the array with a[idx] which is slower. But even then it's better not to use a for loop at all where possible, which you already know.
row = np.array(a[i]): The np.array() isn't necessary. If you index an multi-dimensional array the returned value is already an array.
lambda x: math.log(x/geo_mean): Don't use math functions with NumPy arrays. Use the equivalents in the numpy module. Wrapping this in a function adds unnecessary overhead as well. Since you use this like [flr(x) for x in row] that's just equivalent to the already vectorized NumPy operations: np.log(row / geo_mean).
There is a simple function, which intends to accept a scalar parameter, but also works for a numpy matrix. Why does the function fun works for a matrix?
>>> import numpy as np
>>> def fun(a):
return 1.0 / a
>>> b = 2
>>> c = np.mat([1,2,3])
>>> c
matrix([[1, 2, 3]])
>>> fun(b)
0.5
>>> fun(c)
matrix([[ 1. , 0.5 , 0.33333333]])
>>> v_fun = np.vectorize(fun)
>>> v_fun(b)
array(0.5)
>>> v_fun(c)
matrix([[ 1. , 0.5 , 0.33333333]])
It seems like fun is vectorized somehow, because the explictly vectorized function v_fun behaves same on matrix c. But their get different outputs on scalar b. Could anybody explain it? Thanks.
What happens in the case of fun is called broadcasting.
General Broadcasting Rules
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
If these conditions are not met, a ValueError: frames are not aligned exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.
fun already works for both scalars and arrays - because elementwise division is defined for both (their own methods). fun(b) does not involve numpy at all, that just a Python operation.
np.vectorize is meant to take a function that only works with scalars, and feed it elements from an array. In your example it first converts b into an array, np.array(b). For both c and this modified b, the result is an array of matching size. c is a 2d np.matrix, and result is the same. Notice that fun(b) is type array, not matrix.
This not a good example of using np.vectorize, nor an example of broadcasting. np.vectorize is a rather 'simple minded' function and doesn't handle scalars in a special way.
1/c or even b/c works because c, an array 'knows' about division. Similarly array multiplication and addition are defined: 1+c or 2*c.
I'm tempted to mark this as a duplicate of
Python function that handles scalar or arrays
I'm having trouble understanding what B = A(~any(A < threshold, 2), :); (in MATLAB) does given array A with dimensions N x 3.
Ultimately, I am trying to implement a function do perform the same operation in Python (so far, I have something like B = A[not any(A[:,1] < threshold), :], which I know to be incorrect), and I was wondering what the numpy equivalent to such an operation would be.
Thank you!
Not much of difference really. In MATLAB, you are performing ANY along the rows with any(...,2). In NumPy, you have axis to denote those dimensions and for a 2D array, it would be np.any(...,axis=1).
Thus, the NumPy equivalent implementation would be -
import numpy as np
B = A[~np.any(A < threshold,axis=1),:]
This indexing is also termed as slicing in NumPy terminology. Since, we are slicing along the first axis, we can drop the all-elements-selection along the rest of the axes. So, it would simplify to -
B = A[~np.any(A < threshold,axis=1)]
Finally, we can use the method ndarray.any and skip the mention of axis parameter to shorten the code further, like so -
B = A[~(A < threshold).any(1)]
I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.
I would like to apply a function to a monodimensional array 3 elements at a time, and output for each of them a single element.
for example I have an array of 13 elements:
a = np.arange(13)**2
and I want to apply a function, let's say np.std as an example.
Here is the equivalent list comprehension:
[np.std(a[i:i+3]) for i in range(0, len(a),3)]
[1.6996731711975948,
6.5489609014628334,
11.440668201153674,
16.336734339790461,
0.0]
does anyone know a more efficient way using numpy functions?
The simplest way is to reshape it and apply the function along an axis.
import numpy as np
a = np.arange(12)**2
b = a.reshape(4,3)
print np.std(b, axis=1)
If you need a little better performance than that, you could try stride_tricks. Below is the same as above except using stride_tricks. I was wrong about the performance gain, because as you can see below, b becomes exactly the same view as b above. I wouldn't be surprised if they compiled to exactly the same thing.
import numpy as np
a = np.arange(12)**2
b = np.lib.stride_tricks.as_strided(a, shape=(4,3), strides=(a.itemsize*3, a.itemsize))
print np.std(b, axis=1)
Are you talking about something like vectorize? http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
You can reshape it. But that does require that the size not change. If you can tack on some bogus entries at the end you can do this:
[np.std(s) for s in a.reshape(-1,3)]