numpy classification comparison with 3d array - python

I'm trying to do some basic classification of numpy arrays...
I want to compare a 2d array against a 3d array, along the 3rd dimension, and make a classification based on the corresponding z-axis values.
so given 3 arrays that are stacked into a 3d array:
import numpy as np
a1 = np.array([[1,1,1],[1,1,1],[1,1,1]])
a2 = np.array([[3,3,3],[3,3,3],[3,3,3]])
a3 = np.array([[5,5,5],[5,5,5],[5,5,5]])
a3d = dstack((a1,a2,a3))
and another 2d array
a2d = np.array([[1,2,4],[5,5,2],[2,3,3]])
I want to be able to compare a2d against a3d, and return a 2d array of which level of a3d is closest. (or I suppose any custom function that can compare each value along the z-axis, and return a value base on that comparison.)
EDIT
I modified my arrays to more closely match my data. a1 would be the minimum values, a2 the average values, and a3 the maximum values. So I want to output if each a2d value is closer to a1 (classed "1") a2 (classed "2") or a3 (classed "3"). I'm doing as a 3d array because in the real data, it won't be a simple 3-array choice, but for SO purposes, it helps to keep it simple. We can assume that in the case of a tie, we'll take the lower, so 2 would be classed as level "1", 4 as level "2".

You can use the following list comprehension :
>>> [sum(sum(abs(i-j)) for i,j in z) for z in [zip(i,a2d) for i in a3d]]
[30.0, 22.5, 30.0]
In preceding code i create the following list with zip,that is the zip of each sub array of your 3d list then all you need is calculate the sum of the elemets of subtract of those pairs then sum of them again :
>>> [zip(i,a2d) for i in a3d]
[[(array([ 1., 3., 1.]), array([1, 2, 1])), (array([ 2., 2., 1.]), array([5, 5, 4])), (array([ 3., 1., 1.]), array([9, 8, 8]))], [(array([ 4., 6., 4.]), array([1, 2, 1])), (array([ 5. , 6.5, 4. ]), array([5, 5, 4])), (array([ 6., 4., 4.]), array([9, 8, 8]))], [(array([ 7., 9., 7.]), array([1, 2, 1])), (array([ 8., 8., 7.]), array([5, 5, 4])), (array([ 9., 7., 7.]), array([9, 8, 8]))]]
then for all of your sub arrays you'll have the following list:
[30.0, 22.5, 30.0]
that for each sub-list show a the level of difference with 2d array!and then you can get the relative sub-array from a3d like following :
>>> a3d[l.index(min(l))]
array([[ 4. , 6. , 4. ],
[ 5. , 6.5, 4. ],
[ 6. , 4. , 4. ]])
Also you can put it in a function:
>>> def find_nearest(sub,main):
... l=[sum(sum(abs(i-j)) for i,j in z) for z in [zip(i,sub) for i in main]]
... return main[l.index(min(l))]
...
>>> find_nearest(a2d,a3d)
array([[ 4. , 6. , 4. ],
[ 5. , 6.5, 4. ],
[ 6. , 4. , 4. ]])

You might consider a different approach using numpy.vectorize which lets you efficiently apply a python function to each element of your array.
In this case, your python function could just classify each pixel with whatever breaks you define:
import numpy as np
a2d = np.array([[1,2,4],[5,5,2],[2,3,3]])
def classify(x):
if x >= 4:
return 3
elif x >= 2:
return 2
elif x > 0:
return 1
else:
return 0
vclassify = np.vectorize(classify)
result = vclassify(a2d)

Thanks to #perrygeo and #Kasra - they got me thinking in a good direction.
Since I want a classification of the closest 3d array's z value, I couldn't do simple math - I needed the (z)index of the closest value.
I did it by enumerating both axes of the 2d array, and doing a proximity compare against the corresponding (z)index of the 3d array.
There might be a way to do this without iterating the 2d array, but at least I'm avoiding iterating the 3d.
import numpy as np
a1 = np.array([[1,1,1],[1,1,1],[1,1,1]])
a2 = np.array([[3,3,3],[3,3,3],[3,3,3]])
a3 = np.array([[5,5,5],[5,5,5],[5,5,5]])
a3d = np.dstack((a1,a2,a3))
a2d = np.array([[1,2,4],[5,5,2],[2,3,3]])
classOut = np.empty_like(a2d)
def find_nearest_idx(array,value):
idx = (np.abs(array-value)).argmin()
return idx
# enumerate to get indices
for i,a in enumerate(a2d):
for ii,v in enumerate(a):
valStack = a3d[i,ii]
nearest = find_nearest_idx(valStack,v)
classOut[i,ii] = nearest
print classOut
which gets me
[[0 0 1]
[2 2 0]
[0 1 1]]
This tells me that (for example) a2d[0,0] is closest to the 0-index of a3d[0,0], which in my case means it is closest to the min value for that 2d position. a2d[1,1] is closest to the 2-index, which in my case means closer to the max value for that 2d position.

Related

Efficiently index 2d numpy array using two 1d arrays

I have a large 2d numpy array and two 1d arrays that represent x/y indexes within the 2d array. I want to use these 1d arrays to perform an operation on the 2d array.
I can do this with a for loop, but it's very slow when working on a large array. Is there a faster way? I tried using the 1d arrays simply as indexes but that didn't work. See this example:
import numpy as np
# Two example 2d arrays
cnt_a = np.zeros((4,4))
cnt_b = np.zeros((4,4))
# 1d arrays holding x and y indices
xpos = [0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3]
ypos = [3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0]
# This method works, but is very slow for a large array
for i in range(0,len(xpos)):
cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
# This method is fast, but gives incorrect answer
cnt_b[xpos,ypos] = cnt_b[xpos,ypos]+1
# Print the results
print 'Good:'
print cnt_a
print ''
print 'Bad:'
print cnt_b
The output from this is:
Good:
[[ 2. 1. 2. 1.]
[ 0. 3. 1. 2.]
[ 1. 1. 1. 1.]
[ 1. 0. 0. 0.]]
Bad:
[[ 1. 1. 1. 1.]
[ 0. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 0. 0. 0.]]
For the cnt_b array numpy is obviously not summing correctly, but I'm unsure how to fix this without resorting to the (v. inefficient) for loop used to calculate cnt_a.
Another approach by using 1D indexing (suggested by #Shai) extended to answer the actual question:
>>> out = np.zeros((4, 4))
>>> idx = np.ravel_multi_index((xpos, ypos), out.shape) # extract 1D indexes
>>> x = np.bincount(idx, minlength=out.size)
>>> out.flat += x
np.bincount calculates how many times each of the index is present in the xpos, ypos and stores them in x.
Or, as suggested by #Divakar:
>>> out.flat += np.bincount(idx, minlength=out.size)
We could compute the linear indices, then accumulate into zeros-initialized output array with np.add.at. Thus, with xpos and ypos as arrays, here's one implementation -
m,n = xpos.max()+1, ypos.max()+1
out = np.zeros((m,n),dtype=int)
np.add.at(out.ravel(), xpos*n+ypos, 1)
Sample run -
In [95]: # 1d arrays holding x and y indices
...: xpos = np.array([0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3])
...: ypos = np.array([3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0])
...:
In [96]: cnt_a = np.zeros((4,4))
In [97]: # This method works, but is very slow for a large array
...: for i in range(0,len(xpos)):
...: cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
...:
In [98]: m,n = xpos.max()+1, ypos.max()+1
...: out = np.zeros((m,n),dtype=int)
...: np.add.at(out.ravel(), xpos*n+ypos, 1)
...:
In [99]: cnt_a
Out[99]:
array([[ 2., 1., 2., 1.],
[ 0., 3., 1., 2.],
[ 1., 1., 1., 1.],
[ 1., 0., 0., 0.]])
In [100]: out
Out[100]:
array([[2, 1, 2, 1],
[0, 3, 1, 2],
[1, 1, 1, 1],
[1, 0, 0, 0]])
you can iterate on both lists at once, and increment for each couple (if you are not used to it, zip can combine lists)
for x, y in zip(xpos, ypos):
cnt_b[x][y] += 1
But this will be about the same speed as your solution A.
If your lists xpos/ypos are of length n, I don't see how you can update your matrix in less than o(n) since you'll have to check each pair one way or an other.
Other solution: you could count (with collections.Counter possibly) the similar index pairs (ex: (0, 3) etc...) and update the matrix with the count value. But I doubt it would be much faster, since you the time gained on updating the matrix would be lost on counting multiple occurrences.
Maybe I am totally wrong tho, in which case I'd be curious too to see a not o(n) answer
I think you are looking for ravel_multi_index funciton
lidx = np.ravel_multi_index((xpos, ypos), cnt_a.shape)
converts to "flatten" 1D indices into cnt_a and cnt_b:
np.add.at( cnt_b, lidx, 1 )

how to multiply 2 numpy array with different dimensions

I try to multiply 2 matrix x,y with shape (41) and (41,6)
as it is supposed to broadcast the single matrix to every arrow in the multi-dimensions
I want to do it as :
x*y
but i get this error
ValueError: operands could not be broadcast together with shapes (41,6) (41,)
Is there anything I miss here to make that possible ?
Broadcasting involves 2 steps
give all arrays the same number of dimensions
expand the 1 dimensions to match the other arrays
With your inputs
(41,6) (41,)
one is 2d, the other 1d; broadcasting can change the 1d to (1, 41), but it does not automatically expand in the other direction (41,1).
(41,6) (1,41)
Neither (41,41) or (6,41) matches the other.
So you need to change your y to (41,1) or the x to (6,41)
x.T*y
x*y[:,None]
I'm assuming, of course, that you want element by element multiplication, not the np.dot matrix product.
You can try out this, it will works!
>>> import numpy as np
>>> x = np.array([[1, 2], [1, 2], [1, 2]])
>>> y = np.array([1, 2, 3])
>>> np.dot(y,x)
array([ 6, 12])
Not exactly sure, what you are trying to achieve. Maybe you could give an example of your input and your expected output. One possibility is:
import numpy as np
x = np.array([[1, 2], [1, 2], [1, 2]])
y = np.array([1, 2, 3])
res = x * np.transpose(np.array([y,]*2))
This will multiply each column of x with y, so the result of the above example is:
array([[1, 2],
[2, 4],
[3, 6]])
The multiplication of a ND array (say A) with a 1D one (B) is performed on the last axis by default, which means that the multiplication A * B is only valid if
A.shape[-1] == len(B)
A manipulation on A and B is needed to multiply A with B on another axis than -1:
Method 1: swapaxes
Swap the axes of A so that the axis to multiply with B appear on last postion
C = (A.swapaxes(axis, -1) * B).swapaxes(axis, -1)
example
A = np.arange(2 * 3 * 4).reshape((2, 3, 4))
B = np.array([0., 1., 2.])
print(A)
print(B)
pormpts :
(A)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
(B)
[0. 1. 2.]
A * B returns :
ValueError: operands could not be broadcast together with shapes (2,3,4) (3,)
now multiply A with B on axis 1
axis = 1
C = (A.swapaxes(axis, -1) * B).swapaxes(axis, -1)
returns C :
array([[[ 0., 0., 0., 0.],
[ 4., 5., 6., 7.],
[16., 18., 20., 22.]],
[[ 0., 0., 0., 0.],
[16., 17., 18., 19.],
[40., 42., 44., 46.]]])
Note that first raws of A have been multiplied by 0
last raws have been multiplied by 2
Method 2: reshape B
make B have the same number of dimensions than A, place the items of B on the dimension to be multiplied with A
A * B.reshape((1, len(B), 1))
or equivalently using the convenient 'numpy.newaxis' syntax :
A * B[np.newaxis, :, np.newaxis]
Depends on what you're expecting. One simple solution would be:
y*x
That should give you a matrix of dimensions (1,6).
If you wish to multiply X of dimension (n) to Y of dimension(n,m), you may consider the answers from this post
Tips can be found in the Wikipedia as well:
In Python with the numpy numerical library or the sympy symbolic library, multiplication of array objects as a1*a2 produces the Hadamard product, but with otherwise matrix objects m1*m2 will produce a matrix product.
Simply speaking, slice it up to arrays and perform x*y, or use other routes to fit the requirement.
So, if x has shape (41,6) and y (41,), I'd use np.expand_dims() to add an empty, second dimension (index 1) to y, i.e.,
x * np.expand_dims(y, 1)
This will automatically yield a result with shape (41,6).

Rank each number by position in a list of arrays - Python

I have a list of arrays and I want to rank the numbers against the similar positioned numbers in the other arrays in the list.
x = [[12,7,3],
[4 ,5,6],
[7 ,8,9]]
I tried the following and it ranked each numbers against all the numbers and also
the smallest number is ranked 1
scipy.stats.rankdata(x)
array([ 9.,5.5,1.,2., 3.,4.,5.5,7.,8. ])
I want to rank with largest number ranked 1 and each number only ranked against the number that is in the same position in each array of the list.
This is the output that I need.
[[1,2,3]
[3,3,2],
[2,1,1]]
You can also use magic numpy.argsort.
import numpy as np
x = np.array([[12,7,3],
[4 ,5,6],
[7 ,8,9]])
y = x.shape[0] - np.argsort(np.argsort(x, axis = 0), axis = 0)
Output:
In [111]: y
Out[111]:
array([[1, 2, 3],
[3, 3, 2],
[2, 1, 1]])
You have to specify the axis along which to rank. By default it gives the rank according to the ascending order. You can use your knowledge of the shape of your data to turn it into what you want. Example with your data
x = [[12,7,3],
[4 ,5,6],
[7 ,8,9]]
3-scipy.stats.mstats.rankdata(x,axis=0)+1
# you will get
array([[ 1., 2., 3.],
[ 3., 3., 2.],
[ 2., 1., 1.]])

Scipy Sparse Matrix special substraction

I'm doing a project and I'm doing a lot of matrix computation in it.
I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M non-zeros values. The example below is just to see my point.
Let's say I have:
A vector v of size (2)
A vector c of size (3)
A sparse matrix X of size (2,3)
v = np.asarray([10, 20])
c = np.asarray([ 2, 3, 4])
data = np.array([1, 1, 1, 1])
row = np.array([0, 0, 1, 1])
col = np.array([1, 2, 0, 2])
X = coo_matrix((data,(row,col)), shape=(2,3))
X.todense()
# matrix([[0, 1, 1],
# [1, 0, 1]])
Currently I'm doing:
result = np.zeros_like(v)
d = scipy.sparse.lil_matrix((v.shape[0], v.shape[0]))
d.setdiag(v)
tmp = d * X
print tmp.todense()
#matrix([[ 0., 10., 10.],
# [ 20., 0., 20.]])
# At this point tmp is csr sparse matrix
for i in range(tmp.shape[0]):
x_i = tmp.getrow(i)
result += x_i.data * ( c[x_i.indices] - x_i.data)
# I only want to do the subtraction on non-zero elements
print result
# array([-430, -380])
And my problem is the for loop and especially the subtraction.
I would like to find a way to vectorize this operation by subtracting only on the non-zero elements.
Something to get directly the sparse matrix on the subtraction:
matrix([[ 0., -7., -6.],
[ -18., 0., -16.]])
Is there a way to do this smartly ?
You don't need to loop over the rows to do what you are already doing. And you can use a similar trick to perform the multiplication of the rows by the first vector:
import scipy.sparse as sps
# number of nonzero entries per row of X
nnz_per_row = np.diff(X.indptr)
# multiply every row by the corresponding entry of v
# You could do this in-place as:
# X.data *= np.repeat(v, nnz_per_row)
Y = sps.csr_matrix((X.data * np.repeat(v, nnz_per_row), X.indices, X.indptr),
shape=X.shape)
# subtract from the non-zero entries the corresponding column value in c...
Y.data -= np.take(c, Y.indices)
# ...and multiply by -1 to get the value you are after
Y.data *= -1
To see that it works, set up some dummy data
rows, cols = 3, 5
v = np.random.rand(rows)
c = np.random.rand(cols)
X = sps.rand(rows, cols, density=0.5, format='csr')
and after run the code above:
>>> x = X.toarray()
>>> mask = x == 0
>>> x *= v[:, np.newaxis]
>>> x = c - x
>>> x[mask] = 0
>>> x
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
>>> Y.toarray()
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
The way you are accumulating your result requires that there are the same number of non-zero entries in every row, which seems a pretty weird thing to do. Are you sure that is what you are after? If that's really what you want you could get that value with something like:
result = np.sum(Y.data.reshape(Y.shape[0], -1), axis=0)
but I have trouble believing that is really what you are after...

NumPy array indexing a 2D matrix

I've a little issue while working on same big data. But for now, let's assume I've got an NumPy array filled with zeros
>>> x = np.zeros((3,3))
>>> x
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
Now I want to change some of these zeros with specific values. I've given the index of the cells I want to change.
>>> y = np.array([[0,0],[1,1],[2,2]])
>>> y
array([[0, 0],
[1, 1],
[2, 2]])
And I've got an array with the desired (for now random) numbers, as follow
>>> z = np.array(np.random.rand(3))
>>> z
array([ 0.04988558, 0.87512891, 0.4288157 ])
So now I thought I can do the following:
>>> x[y] = z
But than it's filling the whole array like this
>>> x
array([[ 0.04988558, 0.87512891, 0.4288157 ],
[ 0.04988558, 0.87512891, 0.4288157 ],
[ 0.04988558, 0.87512891, 0.4288157 ]])
But I was hoping to get
>>> x
array([[ 0.04988558, 0, 0 ],
[ 0, 0.87512891, 0 ],
[ 0, 0, 0.4288157 ]])
EDIT
Now I've used a diagonal index, but what in the case my index is not just diagonal. I was hoping following works:
>>> y = np.array([[0,1],[1,2],[2,0]])
>>> x[y] = z
>>> x
>>> x
array([[ 0, 0.04988558, 0 ],
[ 0, 0, 0.87512891 ],
0.4288157, 0, 0 ]])
But it's filling whole array just like above
Array indexing works a bit differently on multidimensional arrays
If you have a vector, you can access the first three elements by using
x[np.array([0,1,2])]
but when you're using this on a matrix, it will return the first few rows. Upon first sight, using
x[np.array([0,0],[1,1],[2,2]])]
sounds reasonable. However, NumPy array indexing works differently: It still treats all those indices in a 1D fashion, but returns the values from the vector in the same shape as your index vector.
To properly access 2D matrices you have to split both components into two separate arrays:
x[np.array([0,1,2]), np.array([0,1,2])]
This will fetch all elements on the main diagonal of your matrix. Assignments using this method is possible, too:
x[np.array([0,1,2]), np.array([0,1,2])] = 1
So to access the elements you've mentioned in your edit, you have to do the following:
x[np.array([0,1,2]), np.array([1,2,0])]

Categories

Resources