Create a matrix from a function - python

I want to create a matrix from a function, such that the (3,3) matrix C has values equal to 1 if the row index is smaller than a given threshold k.
import numpy as np
k = 3
C = np.fromfunction(lambda i,j: 1 if i < k else 0, (3,3))
However, this piece of code throws an error
"The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()" and I do not really understand why.

The code for fromfunction is:
dtype = kwargs.pop('dtype', float)
args = indices(shape, dtype=dtype)
return function(*args, **kwargs)
You see it calls function just once - with the whole array of indices. It is not iterative.
In [672]: idx = np.indices((3,3))
In [673]: idx
Out[673]:
array([[[0, 0, 0],
[1, 1, 1],
[2, 2, 2]],
[[0, 1, 2],
[0, 1, 2],
[0, 1, 2]]])
Your lambda expects scalar i,j values, not a 3d array
lambda i,j: 1 if i < k else 0
idx<3 is a 3d boolean array. The error arises when that is use in an if context.
np.vectorize or np.frompyfunc is better if you want to apply a scalar function to a set of arrays:
In [677]: np.vectorize(lambda i,j: 1 if i < 2 else 0)(idx[0],idx[1])
Out[677]:
array([[1, 1, 1],
[1, 1, 1],
[0, 0, 0]])
However it isn't faster than more direct iterative approaches, and way slower than functions that operation on whole arrays.
One of many whole-array approaches:
In [680]: np.where(np.arange(3)[:,None]<2, np.ones((3,3),int), np.zeros((3,3),int))
Out[680]:
array([[1, 1, 1],
[1, 1, 1],
[0, 0, 0]])

As suggested by #MarkSetchell you need to vectorize your function:
k = 3
f = lambda i,j: 1 if i < k else 0
C = np.fromfunction(np.vectorize(f), (3,3))
and you get:
C
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])

The problem is that np.fromfunction does not iterate over all elements, it only returns the indices in each dimension. You can use np.where() to apply a condition based on those indices, choosing from two alternatives depending on the condition:
import numpy as np
k = 3
np.fromfunction(lambda i, j: np.where(i < k, 1, 0), (5,3))
which gives:
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0],
[0, 0, 0]])
This avoids naming the lambda without things becoming too unwieldy. On my laptop, this approach was about 20 times faster than np.vectorize().

Related

Searching values in matrix in python

I am trying to resolve one task with matrix. I have function: fill_area(matrix, coordinates, value). If I would input coordinates for example (1,0) and value 3, it should write 3 in position [1][0] and rewrite the value to 3. If near numbers have the same value, it should also rewrite them. I would like to use an stack or queue for numbers, which have to change an value.
But I have no idea, how to check all nearest position.
matrix = [[2, 0, 1],
[0, 0, 1],
[0, 1, 1]]
fill_area(matrix, (1, 0), 3)
matrix = [[2, 0, 1],
[3, 3, 1],
[3, 1, 1]]
The preferred way to handle matrices is numpy but there seems to be no obvious way to handle this special case while avoiding loops over items in numpy. I take it from your example that you define "nearness" as the patch around the given coordinates but in coordinate-direction only. I have two versions, one for the entire coordinate-patch around the given coordinates, the other in ij-direction only (per your example). Both are list comprehensions, for better readability you might want to change that to regular loops.
def fill_area(mat, ij, val):
'''set patch around ij in mat to val for all positions where the original mat value == mat val at ij, full patch'''
return [[val if abs(i-ij[0])<2 and abs(j-ij[1])<2 and mat[i][j] == mat[ij[0]][ij[1]]
else mat[i][j] for j in range(len(mat[1]))] for i in range(len(mat))]
def fill_area1(mat, ij, val):
'''set patch around ij in mat to val for all positions where the original mat value == mat val at ij, patch ij-dir'''
return [[val if ((abs(i-ij[0])<2 and j==ij[1]) or (abs(j-ij[1])<2 and i==ij[0])) and mat[i][j] == mat[ij[0]][ij[1]]
else mat[i][j] for j in range(len(mat[1]))] for i in range(len(mat))]
matrix = [[2, 0, 1],
[0, 0, 1],
[0, 1, 1]]
print('entire patch: ', fill_area(matrix, (1, 0), 3))
print('ij-directional: ', fill_area1(matrix, (1, 0), 3))
which produces
entire patch: [[2, 3, 1], [3, 3, 1], [3, 1, 1]]
ij-directional: [[2, 0, 1], [3, 3, 1], [3, 1, 1]]

How to calculate intersection in numpy.array? [duplicate]

I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]

Quick method to find cross index between two numpy arrays [duplicate]

I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]

Using a numpy array to assign values to another array

I have the following numpy array matrix ,
matrix = np.zeros((3,5), dtype = int)
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
Suppose I have this numpy array indices as well
indices = np.array([[1,3], [2,4], [0,4]])
array([[1, 3],
[2, 4],
[0, 4]])
Question: How can I assign 1s to the elements in the matrix where their indices are specified by the indices array. A vectorized implementation is expected.
For more clarity, the output should look like:
array([[0, 1, 0, 1, 0], #[1,3] elements are changed
[0, 0, 1, 0, 1], #[2,4] elements are changed
[1, 0, 0, 0, 1]]) #[0,4] elements are changed
Here's one approach using NumPy's fancy-indexing -
matrix[np.arange(matrix.shape[0])[:,None],indices] = 1
Explanation
We create the row indices with np.arange(matrix.shape[0]) -
In [16]: idx = np.arange(matrix.shape[0])
In [17]: idx
Out[17]: array([0, 1, 2])
In [18]: idx.shape
Out[18]: (3,)
The column indices are already given as indices -
In [19]: indices
Out[19]:
array([[1, 3],
[2, 4],
[0, 4]])
In [20]: indices.shape
Out[20]: (3, 2)
Let's make a schematic diagram of the shapes of row and column indices, idx and indices -
idx (row) : 3
indices (col) : 3 x 2
For using the row and column indices for indexing into input array matrix, we need to make them broadcastable against each other. One way would be to introduce a new axis into idx, making it 2D by pushing the elements into the first axis and allowing a singleton dim as the last axis with idx[:,None], as shown below -
idx (row) : 3 x 1
indices (col) : 3 x 2
Internally, idx would be broadcasted, like so -
In [22]: idx[:,None]
Out[22]:
array([[0],
[1],
[2]])
In [23]: indices
Out[23]:
array([[1, 3],
[2, 4],
[0, 4]])
In [24]: np.repeat(idx[:,None],2,axis=1) # indices has length of 2 along cols
Out[24]:
array([[0, 0], # Internally broadcasting would be like this
[1, 1],
[2, 2]])
Thus, the broadcasted elements from idx would be used as row indices and column indices from indices for indexing into matrix for setting elements in it. Since, we had -
idx = np.arange(matrix.shape[0]),
Thus, we would end up with -
matrix[np.arange(matrix.shape[0])[:,None],indices] for setting elements.
this involves loop and hence may not be very efficient for large arrays
for i in range(len(indices)):
matrix[i,indices[i]] = 1
> matrix
Out[73]:
array([[0, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 0, 1]])

Can numpy bincount work with 2D arrays?

I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would it work with dbArray but fail with simarray?
>>> dbArray
array([[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 1, 0]])
>>> N.apply_along_axis(N.bincount,1,dbArray)
array([[2, 3],
[0, 5],
[1, 4],
[4, 1],
[3, 2],
[3, 2]], dtype=int64)
>>> simarray
array([[2, 0, 2, 0, 2],
[2, 1, 2, 1, 2],
[2, 1, 1, 1, 2],
[2, 0, 1, 0, 1],
[1, 0, 1, 1, 2],
[1, 1, 1, 1, 1]])
>>> N.apply_along_axis(N.bincount,1,simarray)
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
N.apply_along_axis(N.bincount,1,simarray)
File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 118, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (3)
The problem is that bincount isn't always returning the same shaped objects, in particular when values are missing. For example:
>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
>>> np.apply_along_axis(np.bincount, 1, m)
array([[2, 1],
[1, 2],
[0, 3]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([2, 1]), array([1, 2]), array([0, 3])]
works, but:
>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
>>> m
array([[0, 0, 0],
[1, 1, 0],
[1, 1, 0]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([3]), array([1, 2]), array([1, 2])]
>>> np.apply_along_axis(np.bincount, 1, m)
Traceback (most recent call last):
File "<ipython-input-49-72e06e26a718>", line 1, in <module>
np.apply_along_axis(np.bincount, 1, m)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (1)
won't.
You could use the minlength parameter and pass it using a lambda or partial or something:
>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
array([[3, 0],
[1, 2],
[1, 2]])
As #DSM has already mentioned, bincount of a 2d array cannot be done without knowing the maximum value of the array, because it would mean an inconsistency of array sizes.
But thanks to the power of numpy's indexing, it was fairly easy to make a faster implementation of 2d bincount, as it doesn't use concatenation or anything.
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = np.arange(len(arr))
for col in arr.T:
count[indexing, col] += 1
return count
t = np.array([[1,2,3],[4,5,6],[3,2,2]], dtype=np.int64)
print(bincount2d(t))
P.S.
This:
t = np.empty(shape=[10000, 100], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)
gives ~2 times faster result, than this:
t = np.empty(shape=[100, 10000], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)
because of the for loop iterating over columns. So, it's better to transpose your 2d array, if shape[0] < shape[1].
UPD
Better than this can't be done (using python alone, I mean):
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = (np.ones_like(arr).T * np.arange(len(arr))).T
np.add.at(count, (indexing, arr), 1)
return count
This is a function that does exactly what you want, but without any loops.
def sub_sum_partition(a, partition):
"""
Generalization of np.bincount(partition, a).
Sums rows of a matrix for each value of array of non-negative ints.
:param a: array_like
:param partition: array_like, 1 dimension, nonnegative ints
:return: matrix of shape ('one larger than the largest value in partition', a.shape[1:]). The i's element is
the sum of rows j in 'a' s.t. partition[j] == i
"""
assert partition.shape == (len(a),)
n = np.prod(a.shape[1:], dtype=int)
bins = ((np.tile(partition, (n, 1)) * n).T + np.arange(n, dtype=int)).reshape(-1)
sums = np.bincount(bins, a.reshape(-1))
if n > 1:
sums = sums.reshape(-1, *a.shape[1:])
return sums

Categories

Resources