How to calculate the mean of a stack of arrays?

How to calculate the mean of a stack of arrays? - python

my stack is something like this
array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]]])
I want this result:
array([[ 1.5, 2. , 2.5],
[ 3. , 3.5, 4. ],
[ 4.5, 5. , 5.5]])
I updated my question I think it's more clearer now.

Well, first, you don't have a stack of 2D arrays, you have three separate variables.
Fortunately, most functions in NumPy take an array_like argument. And the tuple (a, b, c) is "array-like" enough—it'll be converted into the 3D array that you should have had in the first place.
Anyway, the obvious function to take the mean is np.mean. As the docs say:
The average is taken over the flattened array by default, otherwise over the specified axis.
So just specify the axis you want—the newly-created axis 0.
np.mean((a,b,c), axis=0)
In your updated question, you now have a single 2x3x3 array, a, instead of three 2x2 arrays, a, b, and c, and you want the mean across the first axis (the one with dimension 2). This is the same thing, but slightly easier:
np.mean(a, axis=0)
Or course the mean of 4, 7, and 3 is 4.666666666666667, not 4. In your updated question, that seems to be what you want; in your original question… I'm not sure if you wanted to truncate or round, or if you wanted the median or something else rather than the mean, or anything else, but those are all easy (add dtype=int64 to the call, call .round() on the result, call median instead of mean, etc.).

>>> a = np.array([[1,2],[3,4]])
>>> b = np.array([[1,5],[6,7]])
>>> c = np.array([[1,8],[8,3]])
>>> np.mean((a,b,c), axis=0)
array([[ 1. , 5. ],
[ 5.66666667, 4.66666667]])
As per your output it seems you are looking for median rather than mean.
>>> np.median((a,b,c), axis=0)
array([[ 1., 5.],
[ 6., 4.]])

Related

Python ndarray dtype and buffer question - it returns different values

so I am trying to create a numpy darray object with the following code:
a = np.ndarray(shape=(3,3),dtype='float32',buffer=np.array([[100,2,3],[4,5,6],[7,8,9]]))
and this returns the following:
[[1.4e-43 0.0e+00 2.8e-45]
[0.0e+00 4.2e-45 0.0e+00]
[5.6e-45 0.0e+00 7.0e-45]]
why does it return the different values than what I've specified?
Seems like float32 is changing things, since when dtype='int' like:
a = np.ndarray(shape=(3,3),dtype='int',buffer=np.array([[100,2,3],[4,5,6],[7,8,9]]))
this returns the correct things like:
[[100 2 3]
[ 4 5 6]
[ 7 8 9]]
why it doesn't work when dtype='float32'?

In current state a is nothing but a array of random numbers with dtype 'float32'.
The error in first case is because of mismatch in dtype you provided dtype='float32' and the default dtype='int64' for np.array().
To get the required result you should add argument in np.array() dtype='float32'
a = np.ndarray(size=(3, 3), dtype='float32' buffer=np.array([[100, 2, 3],
[4, 5, 6], [7, 8, 9]], dtype='float32')
[[100., 2., 3.]
[4., 5., 6.]
[7., 8., 9.]]
Remember the dataType will be float for a that's why we have decimal in the end.
And second case works because of dtype='int' which is default dtype for np.array(). So datatype matches and it works.

Get indices of top N values in 2D numpy ndarray or numpy matrix

I have an array of N-dimensional vectors.
data = np.array([[5, 6, 1], [2, 0, 8], [4, 9, 3]])
In [1]: data
Out[1]:
array([[5, 6, 1],
[2, 0, 8],
[4, 9, 3]])
I'm using sklearn's pairwise_distances function to compute a matrix of distance values. Note that this matrix is symmetric about the diagonal.
dists = pairwise_distances(data)
In [2]: dists
Out[2]:
array([[ 0. , 9.69535971, 3.74165739],
[ 9.69535971, 0. , 10.48808848],
[ 3.74165739, 10.48808848, 0. ]])
I need the indices corresponding to the top N values in this matrix dists, because these indices will correspond the pairwise indices in data that represent vectors with the greatest distances between them.
I have tried doing np.argmax(np.max(distances, axis=1)) to get the index of the max value in each row, and np.argmax(np.max(distances, axis=0)) to get the index of the max value in each column, but note that:
In [3]: np.argmax(np.max(dists, axis=1))
Out[3]: 1
In [4]: np.argmax(np.max(dists, axis=0))
Out[4]: 1
and:
In [5]: dists[1, 1]
Out[5]: 0.0
Because the matrix is symmetric about the diagonal, and because argmax returns the first index it finds with the max value, I end up with the cell in the diagonal in the row and column matching where the max values are stored, instead of the row and column of the top values themselves.
At this point I'm sure I could write some more code to find the values I'm looking for, but surely there is an easier way to do what I'm trying to do. So I have two questions that are more or less equivalent:
How can I find the indices corresponding to the top N values in a matrix, or , how can I find the vectors with the top N pairwise distances from an array of vectors?

I'd ravel, argsort, and then unravel. I'm not claiming this is the best way, only that it's the first way that occurred to me, and I'll probably delete it in shame after someone posts something more obvious. :-)
That said (choosing the top 2 values, arbitrarily):
In [73]: dists = sklearn.metrics.pairwise_distances(data)
In [74]: dists[np.tril_indices_from(dists, -1)] = 0
In [75]: dists
Out[75]:
array([[ 0. , 9.69535971, 3.74165739],
[ 0. , 0. , 10.48808848],
[ 0. , 0. , 0. ]])
In [76]: ii = np.unravel_index(np.argsort(dists.ravel())[-2:], dists.shape)
In [77]: ii
Out[77]: (array([0, 1]), array([1, 2]))
In [78]: dists[ii]
Out[78]: array([ 9.69535971, 10.48808848])

As a slight improvement over the otherwise very good answer by DSM, instead of using np.argsort(), it is more efficient to use np.argpartition() if the order of the N greatest is of no consequence.
Partitioning an array arr with index i rearranges the elements such that the element at index i is the ith greatest, while those on the left are greater and on the right are lesser. The partitions on the left and right are not necessarily sorted. This has the advantage that it runs in linear time.

Find pair of index values that minimizes the Euclidian distance between two meshgrids and a column vector

I want to find the two find the pair of values and their index number in a meshgrid that a closets to another pair of values. Suppose I have two vectors a= np.array([0.01,0.5,0.9]) and b = np.array([0,3,6,10]) and two meshgrids X,Y = np.meshgrid(a,b). For illustration, they look as follows:
X= array([[ 0.1, 0.5, 0.9],
[ 0.1, 0.5, 0.9],
[ 0.1, 0.5, 0.9],
[ 0.1, 0.5, 0.9]])
Y =array([[ 0, 0, 0],
[ 3, 3, 3],
[ 6, 6, 6],
[10, 10, 10]])
Now, I have another array called c of dimension (2 x N). For illustration suppose c contains the following entries:
c = array([[ 0.07268017, 0.08816632, 0.11084398, 0.13352165, 0.1490078 ],
[ 0.00091219, 0.00091219, 0.00091219, 0.00091219, 0.00091219]])
Denote a column vector of c by x. For each vector x I want to find
To complicate matters a bit, I am in fact not only looking for the index with the smallest distance (i,j) but also the second smallest distance (i',j').
All my approaches so far turned out to be extremely complicated and involved a lot of side routes. Does someone have an idea for how to tackle the problem efficiently?

If X, Y always come from meshgrid(), your minimization is separable in X and Y. Just find the closest elements of X to c[0,] and the closest elements of Y to c[1,] ---
you don't need to calculate the 2-dimensional metric.
If either a or b have uniform steps, you can save yourself even more time if you scale the corresponding values of c onto the indexes. In your example, all(a == 0.1+0.4*arange(3)), so you can find the x values by inverting: x = (c[0,] - 0.1)/0.4. If you have an invertible (possibly non-linear) function that maps integers onto b, you can similarly find y values directly by applying the inverse function to c[1,].

This is more a comment than an answer but i like to [... lots of stuff mercifully deleted, that you can still see using the revision history ...]
Complete Revision
As a followup of my own comment, please look at the following
Setup
In [25]: from numpy import *
In [26]: from scipy.spatial import KDTree
In [27]: X= array([[ 0.1, 0.5, 0.9],
[ 0.1, 0.5, 0.9],
[ 0.1, 0.5, 0.9],
[ 0.1, 0.5, 0.9]])
In [28]: Y =array([[ 0, 0, 0],
[ 3, 3, 3],
[ 6, 6, 6],
[10, 10, 10]])
In [29]: c = array([[ 0.07268017, 0.08816632, 0.11084398, 0.13352165, 0.1490078 ],
[ 0.00091219, 0.00091219, 0.00091219, 0.00091219, 0.00091219]])
Solution
Two lines of code, please notice that you have to pass the transpose of your c array.
In [30]: tree = KDTree(zip(X.ravel(), Y.ravel()))
In [31]: tree.query(c.T,k=2)
Out[31]:
(array([[ 0.02733505, 0.4273208 ],
[ 0.01186879, 0.41183469],
[ 0.01088228, 0.38915709],
[ 0.03353406, 0.36647949],
[ 0.04901629, 0.35099339]]), array([[0, 1],
[0, 1],
[0, 1],
[0, 1],
[0, 1]]))
Comment
To interpret the result, the excellent scipy docs inform you that tree.query() gives you back two arrays, containing respectively for each point in c
a scalar or an array of length k>=2 giving you the distances
from the point to the closest point on grid, the second closest, etc,
a scalar or an array of length k>=2 giving you the indices
pointing to the grid point(s) closest (next close etc).
To access the grid point, KDTree maintains a copy of the grid data, e.g
In [32]: tree.data[[0,1]]
Out[32]:
array([[ 0.1, 0. ],
[ 0.5, 0. ]])
where [0,1] is the first element of the second output array.
Should you need the indices of the closest(s) point in the mesh matrices, it simply a matter of using divmod.

Not plotting 'zero' in matplotlib or change zero to None [Python]

I have the code below and I would like to convert all zero's in the data to None's (as I do not want to plot the data here in matplotlib). However, the code is notworking and 0. is still being printed
sd_rel_track_sum=np.sum(sd_rel_track, axis=1)
for i in sd_rel_track_sum:
print i
if i==0:
i=None
return sd_rel_track_sum
Can anyone think of a solution to this. Or just an answer for how I can transfer all 0 to None. Or just not plot the zero values in Matplotlib.

Why not use numpy for this?
>>> values = np.array([3, 5, 0, 3, 5, 1, 4, 0, 9], dtype=np.double)
>>> values[ values==0 ] = np.nan
>>> values
array([ 3., 5., nan, 3., 5., 1., 4., nan, 9.])
It should be noted that values cannot be an integer type array.

Using numpy is of course the better choice, unless you have any good reasons not to use it ;) For that, see Daniel's answer.
If you want to have a bare Python solution, you might do something like this:
values = [3, 5, 0, 3, 5, 1, 4, 0, 9]
def zero_to_nan(values):
"""Replace every 0 with 'nan' and return a copy."""
return [float('nan') if x==0 else x for x in values]
print(zero_to_nan(values))
gives you:
[3, 5, nan, 3, 5, 1, 4, nan, 9]
Matplotlib won't plot nan (not a number) values.

what does numpy.apply_along_axis perform exactly?

I have come across the numpy.apply_along_axis function in some code. And I don't understand the documentation about it.
This is an example of the documentation:
>>> def new_func(a):
... """Divide elements of a by 2."""
... return a * 0.5
>>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> np.apply_along_axis(new_func, 0, b)
array([[ 0.5, 1. , 1.5],
[ 2. , 2.5, 3. ],
[ 3.5, 4. , 4.5]])
As far I as thought I understood the documentation, I would have expected:
array([[ 0.5, 1. , 1.5],
[ 4 , 5 , 6 ],
[ 7 , 8 , 9 ]])
i.e. having applied the function along the axis [1,2,3] which is axis 0 in [[1,2,3], [4,5,6], [7,8,9]]
Obviously I am wrong. Could you correct me ?

apply_along_axis applies the supplied function along 1D slices of the input array, with the slices taken along the axis you specify. So in your example, new_func is applied over each slice of the array along the first axis. It becomes clearer if you use a vector valued function, rather than a scalar, like this:
In [20]: b = np.array([[1,2,3], [4,5,6], [7,8,9]])
In [21]: np.apply_along_axis(np.diff,0,b)
Out[21]:
array([[3, 3, 3],
[3, 3, 3]])
In [22]: np.apply_along_axis(np.diff,1,b)
Out[22]:
array([[1, 1],
[1, 1],
[1, 1]])
Here, numpy.diff (i.e. the arithmetic difference of adjacent array elements) is applied along each slice of either the first or second axis (dimension) of the input array.

The function is performed on 1-d arrays along axis=0. You can specify another axis using the "axis" argument. A usage of this paradigm is:
np.apply_along_axis(np.cumsum, 0, b)
The function was performed on each subarray along dimension 0. So, it is meant for 1-D functions and returns a 1D array for each 1-D input.
Another example is :
np.apply_along_axis(np.sum, 0, b)
Provides a scalar output for a 1-D array.
Of course you could just set the axis parameter in cumsum or sum to do the above, but the point here is that it can be used for any 1-D function you write.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate the mean of a stack of arrays? - python

my stack is something like this array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[2, 2, 2], [2, 2, 2], [2, 2, 2]]]) I want this result: array([[ 1.5, 2. , 2.5], [ 3. , 3.5, 4. ], [ 4.5, 5. , 5.5]]) I updated my question I think it's more clearer now.

Related

Python ndarray dtype and buffer question - it returns different values

Get indices of top N values in 2D numpy ndarray or numpy matrix

Find pair of index values that minimizes the Euclidian distance between two meshgrids and a column vector

Not plotting 'zero' in matplotlib or change zero to None [Python]

what does numpy.apply_along_axis perform exactly?

Categories

Resources