I have some array A, and the corresponding elements of the array bins contain each row's bin assignment. I want to construct an array S, such that
S[0, :] = (A[(bins == 0), :]).sum(axis=0)
This is rather easy to do with np.stack and list comprehensions, but it seems overly complicated and not terribly readable. Is there a more general way to sum (or even apply some general function to) slices of arrays with bin assignments? scipy.stats.binned_statistic is along the right lines, but requires that bin assignments and values to compute the functions on are the same shape (since I am using slices, this is not the case).
For example, if
A = np.array([[1., 2., 3., 4.],
[2., 3., 4., 5.],
[9., 8., 7., 6.],
[8., 7., 6., 5.]])
and
bins = np.array([0, 1, 0, 2])
then it should result in
S = np.array([[10., 10., 10., 10.],
[2., 3., 4., 5. ],
[8., 7., 6., 5. ]])
Here's an approach with matrix-multiplication using np.dot -
(bins == np.arange(bins.max()+1)[:,None]).dot(A)
Sample run -
In [40]: A = np.array([[1., 2., 3., 4.],
...: [2., 3., 4., 5.],
...: [9., 8., 7., 6.],
...: [8., 7., 6., 5.]])
In [41]: bins = np.array([0, 1, 0, 2])
In [42]: (bins == np.arange(bins.max()+1)[:,None]).dot(A)
Out[42]:
array([[ 10., 10., 10., 10.],
[ 2., 3., 4., 5.],
[ 8., 7., 6., 5.]])
Performance boost
A more efficient way to create the mask (bins == np.arange(bins.max()+1)[:,None]), would be like so -
mask = np.zeros((bins.max()+1, len(bins)), dtype=bool)
mask[bins, np.arange(len(bins))] = 1
You can use np.add.reduceat:
import numpy as np
# index to sort the bins
sort_index = bins.argsort()
# indices where the array needs to be split at
indices = np.concatenate(([0], np.where(np.diff(bins[sort_index]))[0] + 1))
# sum values where the bins are the same
np.add.reduceat(A[sort_index], indices, axis=0)
# array([[ 10., 10., 10., 10.],
# [ 2., 3., 4., 5.],
# [ 8., 7., 6., 5.]])
Related
I have a big numpy array and want to split it. I have read this solution but it could not help me. The target column can have several values but I know based on which one I want to split it. In my simplified example the target column is the third one and I want to split it based on the value 2.. This is my array.
import numpy as np
big_array = np.array([[0., 10., 2.],
[2., 6., 2.],
[3., 1., 7.1],
[3.3, 6., 7.8],
[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.],
[8., 5., 2.1]])
Rows that have this value (2.) make one split. Then, the next rows (number three and four) which are not 2., make another one. Again in my data set I see this value (2.) and make a split out of it and again I keep non 2. values (last row) as another split. The final result should look like this:
spl_array = [np.array([[0., 10., 2.],
[2., 6., 2.]]),
np.array([[3., 1., 7.1],
[3.3, 6., 7.8]]),
np.array([[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.]]),
np.array([[8., 5., 2.1]])]
In advance I do appreciate any help.
First you find all arrays which contains 2 or which do not contains 2. This array will be full with True and False values. Transform this array to an array with zeros and ones. Check where there are differences (like [0, 0, 1, 1, 0] will be: 0, 1, 0, -1.
Based on the change one can use numpy where to find the indices of those values.
Insert the index 0 and the last index for the big array, so you are able to zip them in a left and right slice.
import numpy as np
big_array = np.array([[0., 10., 2.],
[2., 6., 2.],
[3., 1., 7.1],
[3.3, 6., 7.8],
[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.],
[8., 5., 2.1]])
idx = [2 in array for array in big_array]
idx *= np.ones(len(idx))
slices = list(np.where(np.diff(idx) != 0)[0] + 1)
slices.insert(0,0)
slices.append(len(big_array))
result = list()
for left, right in zip(slices[:-1], slices[1:]):
result.append(big_array[left:right])
'''
[array([[ 0., 10., 2.],
[ 2., 6., 2.]]),
array([[3. , 1. , 7.1],
[3.3, 6. , 7.8]]),
array([[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.]]),
array([[8. , 5. , 2.1]])]
'''
You can do this with numpy
np.split(
big_array,
np.flatnonzero(np.diff(big_array[:,2] == 2) != 0) + 1
)
Output
[array([[ 0., 10., 2.],
[ 2., 6., 2.]]),
array([[3. , 1. , 7.1],
[3.3, 6. , 7.8]]),
array([[4., 5., 2.],
[6., 6., 2.],
[7., 1., 2.]]),
array([[8. , 5. , 2.1]])]
I have a 3d numpy array of following form:
array([[[ 1., 5., 4.],
[ 1., 5., 4.],
[ 1., 2., 4.]],
[[ 3., 6., 4.],
[ 6., 6., 4.],
[ 6., 6., 4.]]])
Is there a efficient way to convert it to a 2d array of form:
array([[1, 1, 1, 5, 5, 2, 4, 4, 4],
[3, 6, 6, 6, 6, 6, 4, 4, 4]])
Thanks a lot!
In [54]: arr = np.array([[[ 1., 5., 4.],
[ 1., 5., 4.],
[ 1., 2., 4.]],
[[ 3., 6., 4.],
[ 6., 6., 4.],
[ 6., 6., 4.]]])
In [61]: arr.reshape((arr.shape[0], -1), order='F')
Out[61]:
array([[ 1., 1., 1., 5., 5., 2., 4., 4., 4.],
[ 3., 6., 6., 6., 6., 6., 4., 4., 4.]])
The array arr has shape (2, 3, 3). We wish to keep the first axis of length 2, and flatten the two axes of length 3.
If we call arr.reshape(h, w) then NumPy will attempt to reshape arr to shape (h, w). If we call arr.reshape(h, -1) then NumPy will replace the -1 with whatever integer is needed for the reshape to make sense -- in this case, arr.size/h.
Hence,
In [63]: arr.reshape((arr.shape[0], -1))
Out[63]:
array([[ 1., 5., 4., 1., 5., 4., 1., 2., 4.],
[ 3., 6., 4., 6., 6., 4., 6., 6., 4.]])
This is almost what we want, but notice that the values in each subarray, such as
[[ 1., 5., 4.],
[ 1., 5., 4.],
[ 1., 2., 4.]]
are being traversed by marching from left to right before going down to the next row.
We want to march down the rows before going on to the next column.
To achieve that, use order='F'.
Usually the elements in a NumPy array are visited in C-order -- where the last index moves fastest. If we visit the elements in F-order then the first index moves fastest. Since in a 2D array of shape (h, w), the first axis is associated with the rows and the last axis the columns, traversing the array in F-order marches down each row before moving on to the next column.
I am attempting to add two arrays.
np.zeros((6,9,20)) + np.array([1,2,3,4,5,6,7,8,9])
I want to get something out that is like
array([[[ 1., 1., 1., ..., 1., 1., 1.],
[ 2., 2., 2., ..., 2., 2., 2.],
[ 3., 3., 3., ..., 3., 3., 3.],
...,
[ 7., 7., 7., ..., 7., 7., 7.],
[ 8., 8., 8., ..., 8., 8., 8.],
[ 9., 9., 9., ..., 9., 9., 9.]],
[[ 1., 1., 1., ..., 1., 1., 1.],
[ 2., 2., 2., ..., 2., 2., 2.],
[ 3., 3., 3., ..., 3., 3., 3.],
...,
[ 7., 7., 7., ..., 7., 7., 7.],
[ 8., 8., 8., ..., 8., 8., 8.],
[ 9., 9., 9., ..., 9., 9., 9.]],
So adding entries to each of the matrices at the corresponding column. I know I can code it in a loop of some sort, but I am trying to use a more elegant / faster solution.
You can bring broadcasting into play after extending the dimensions of the second array with None or np.newaxis, like so -
np.zeros((6,9,20))+np.array([1,2,3,4,5,6,7,8,9])[None,:,None]
If I understand you correctly, the best thing to use is NumPy's Broadcasting. You can get what you want with the following:
np.zeros((6,9,20))+np.array([1,2,3,4,5,6,7,8,9]).reshape((1,9,1))
I prefer using the reshape method to using slice notation for the indices the way Divakar shows, because I've done a fair bit of work manipulating shapes as variables, and it's a bit easier to pass around tuples in variables than slices. You can also do things like this:
array1.reshape(array2.shape)
By the way, if you're really looking for something as simple as an array that runs from 0 to N-1 along an axis, check out mgrid. You can get your above output with just
np.mgrid[0:6,1:10,0:20][1]
You could use tile (but you would also need swapaxes to get the correct shape).
A = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
B = np.tile(A, (6, 20, 1))
C = np.swapaxes(B, 1, 2)
In python we do some thing like this for example:
n = 30
A = numpy.zeros(shape=(n,n))
for i in range(0, n):
for j in range(0, n):
A[i, j] = i+j
#i+j just example of assignment
To manage a 2-dim array. It's so simple. just use nest loop to walk around rows and columns.
But my friend told me why it's so complicated. Could you give me the another way to manage it?
He told me in Mathematica have some way more easier to manage n-dim array (I'm not sure. I've never use Mathematica)
Can you give me the alternative way to manage value assignment on n-dim matrix/array(in Numpy) or list(ordinary one in Python)?
You are looking for numpy.fromfunction:
>>> numpy.fromfunction(lambda x, y: x + y, (5, 5))
array([[ 0., 1., 2., 3., 4.],
[ 1., 2., 3., 4., 5.],
[ 2., 3., 4., 5., 6.],
[ 3., 4., 5., 6., 7.],
[ 4., 5., 6., 7., 8.]])
You can simplify slightly using operator:
>>> from operator import add
>>> numpy.fromfunction(add, (5, 5))
array([[ 0., 1., 2., 3., 4.],
[ 1., 2., 3., 4., 5.],
[ 2., 3., 4., 5., 6.],
[ 3., 4., 5., 6., 7.],
[ 4., 5., 6., 7., 8.]])
You can use the mathematical rules for matrixes and vectors:
n = 30
w = numpy.arange(n).reshape(1,-1)
A = w+w.T
I'm working with 3-dimensional arrays (for the purpose of this example you can imagine they represent the RGB values at X, Y coordinates of the screen).
>>> import numpy as np
>>> a = np.floor(10 * np.random.random((2, 2, 3)))
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
What I would like to do, is to set to an arbitrary value the G channel for those pixels whose G channel is already below 5. I can manage to isolate the pixel I am interested in using:
>>> a[np.where(a[:, :, 1] < 5)]
array([[ 7., 3., 1.],
[ 8., 1., 1.]])
but I am struggling to understand how to assign a new value to the G channel only. I tried:
>>> a[np.where(a[:, :, 1] < 5)][1] = 9
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
...but it seems not to produce any effect. I also tried:
>>> a[np.where(a[:, :, 1] < 5), 1] = 9
>>> a
array([[[ 7., 3., 1.],
[ 9., 9., 9.]],
[[ 4., 6., 8.],
[ 9., 9., 9.]]])
...(failing to understand what is happening). Finally I tried:
>>> a[np.where(a[:, :, 1] < 5)][:, 1] = 9
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
I suspect I am missing something fundamental on how NumPy works (this is the first time I use the library). I would appreciate some help in how to achieve what I want as well as some explanation on what happened with my previous attempts.
Many thanks in advance for your help and expertise!
EDIT: The outcome I would like to get is:
>>> a
array([[[ 7., 9., 1.], # changed the second number here
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 9., 1.]]]) # changed the second number here
>>> import numpy as np
>>> a = np.array([[[ 7., 3., 1.],
... [ 9., 6., 9.]],
...
... [[ 4., 6., 8.],
... [ 8., 1., 1.]]])
>>> a
array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
>>> a[:,:,1][a[:,:,1] <; 5 ] = 9
>>> a
array([[[ 7., 9., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 9., 1.]]])
a[:,:,1] gives you G channel, I subsetted it by a[:,:,1] < 5 using it as index. then assigned value 9 to that selected elements.
there is no need to use where, you can directly index an array with the boolean array resulting from your comparison operator.
a=array([[[ 7., 3., 1.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 8., 1., 1.]]])
>>> a[a[:, :, 1] < 5]
array([[ 7., 3., 1.],
[ 8., 1., 1.]])
>>> a[a[:, :, 1] < 5]=9
>>> a
array([[[ 9., 9., 9.],
[ 9., 6., 9.]],
[[ 4., 6., 8.],
[ 9., 9., 9.]]])
you do not list the expected output in your question, so I am not sure this is what you want.