reducing colors with numpy - python

I'm writing a script to reduce the number of colors in a list by finding clusters. The problem I seem to run into is that the clusters will have different dimensions. Here is my jumping off point after the original list of 6 colors got already seperated into 3 clusters:
import numpy
a = numpy.array([
[12, 44, 52],
[27, 0, 71],
[81, 99, 92]
])
b = numpy.array([
[ 12, 13, 93],
[128, 128, 128]
])
c = numpy.array([
[ 57, 14, 255]
])
clusters = numpy.array([a,b,c])
print(numpy.min(clusters, axis=1))
However now the function numpy.min() starts to throw an error - I suspect it's because of the differently sized arrays.
The cluster arrays will always have the shape (x, 3) (x number of colors, 3 components). I want to get an array with the minimums of all components of the colors in one cluster (n, 3) (n is number of clusters) - so array([12, 0, 52], [12, 13, 93], [57, 14, 255]) in this case.
Is there a way to do this? As I mentioned it works as long as all clusters have multiple values.

Since your arrays a, b and c don't have an equal shape, you can't put them in the same array (at least if you don't pad with some value). You could calculate the minimum first and then generate an array from these minima:
numpy.array([arr.min(axis=0) for arr in (a, b, c)])
Which gives you:
array([[ 12, 0, 52],
[ 12, 13, 93],
[ 57, 14, 255]])

Related

How to index 3d array (t, x, y) using another 3d array

I have a 3d array of values,
vals = np.array([
[
[10, 20, 30],
[40, 50, 60],
],
[
[15, 25, 35],
[45, 55, 65],
],
])
and a corresponding 3d array of coordinates
coords = np.array([
[
[0,1],
[0,2],
[1,1]
],
[
[0,0],
[1,1],
[1,2]
]
])
Each inner-most array of coords represents (x,y) coordinates corresponding to one of the 2d arrays within vals. For example, the coordinate [0,1] in coords corresponds to the value 20 and the coordinate [1,2] in coords corresponds to the value 65.
How do I use coords to subset vals in this manner?
I can solve this specific example like so
np.array([
vals[0][coords[0][:, 0], coords[0][:, 1]],
vals[1][coords[1][:, 0], coords[1][:, 1]]
])
array([[20, 30, 50],
[15, 55, 65]])
but obviously I'd like a more dynamic solution.
Funny how writing my questions always seems to lead me to an answer.. Staring at the answer matrix,
array([[20, 30, 50],
[15, 55, 65]])
I asked myself, "how would I reproduce this matrix from raw index values?". For example, to extract the value 20, I know I can do
vals[0, 0, 1]
If I wanted to extract the first row of values in the answer, [20, 30, 50] I should do
vals[[0,0,0], [0,0,1], [1,2,1]]
Then to get the full answer matrix, I should do
vals[[[0,0,0],[1,1,1]], [[0,0,1],[0,1,1]], [[1,2,1],[0,1,2]]]
From here, I set my focus on producing those three index matrices. They can be constructed as follows:
i1 = np.arange(coords.shape[0])[:, None].repeat(coords.shape[1], axis=1)
i2 = coords[:,:,0]
i3 = coords[:,:,1]
# Thus the generalized solution
vals[i1, i2, i3]
This answer is extremely similar to the advanced indexing solution mentioned by #Psidom in the comments, but perhaps less elegant.

Search a number in a sorted 2D array

I'm trying to find the number that I'm looking from in a 2D array list. However, it has to be sorted first before searching.
Everything seems to be working fine when I'm trying to find a number in the 2D array. It is just the fact of sorting the 2D array in a way that will still be working. Let's assume I want to sort a 3x3 2D array. The way that it should display is:
[[8, 27, 6],
[1, 0, 11],
[10, 9, 3]]
Then, I will be looking for a number by using the binary search method through the sorted 2D array. My mid value will be in the middle of the array from the search.
This is just an example, but what I want to accomplish when I put randomized numbers and then sort row and columns. Using this idea, I'm using the random.randint() library from Python to randomized my numbers. Then, I'm trying to sort afterward in my 2d array, but it isn't really sorting before continuing.
n = 5
m = 5
def findnum_arr(array, num):
low = 0
high = n * m - 1
while (high >= low):
mid = (low + high) // 2
i = mid // m
j = mid % m
if (num == array[i][j]):
return True
if (num < array[i][j]):
high = mid - 1
else:
low = mid + 1
return False
if __name__ == '__main__':
multi_array = [[random.randint(0, 20) for x in range(n)] for y in range(m)]
sorted(multi_array)
Sorted:
[[0, 1, 3],
[6, 8, 9],
[10, 11, 27]]
Should be the sorted 2D array. Is it possible that both the row and column are sorted respectively with the sorted function?
Calling sorted on a nested list that is just going to sort based on the first index in the list.
Example:
arr = [[8, 27, 6],[1, 0, 11],[10, 15, 3], [16, 12, 14], [4, 9, 13]]
is going to return
[[1, 0, 11], [4, 9, 13], [8, 27, 6], [10, 15, 3], [16, 12, 14]]
To do this way that you want, you are going to have to flatten and then reshape.
To do this, I would try introducing numpy.
import numpy as np
a = np.array(sorted(sum(arr, [])))
#sorted(sum(arr, [])) flattens the list
b = np.reshape(a, (-1,3)).tolist()
EDITED FOR CLARITY: You can use your m and n as parameters in np.reshape. The first parameter (m) would return the number of arrays, while (n) would return the number of arrays.
The use of -1 in either parameter means that the reshaped array will be fit to return the requirements of the other parameter.
b would return
[[0, 1, 3], [4, 6, 8], [9, 10, 11], [12, 13, 14], [15, 16, 27]]
Finally found out a proper solution without using numpy and avoiding sum() module.
if __name__ == '__main__':
x = 7
multi_array = [[random.randint(0, 200) for x in range(n)] for y in range(m)]
# one_array = sorted(list(itertools.chain.from_iterable(multi_array))) Another way if you are using itertools
one_array = sorted([x for row in multi_array for x in row])
sorted_2d = [one_array[i:i+m] for i in range(0, len(one_array), n)]
print("multi_array list is: \n{0}\n".format(multi_array))
print("sorted 2D array: \n{0}\n".format(sorted_2d))
if not findnum_arr(sorted_2d, x):
print("Not Found")
else:
print("Found")
output:
multi_array list is:
[[40, 107, 23, 27, 42], [150, 84, 108, 191, 172], [154, 22, 161, 26, 31], [18, 150, 197, 77, 191], [96, 124, 81, 1
25, 186]]
sorted 2D array:
[[18, 22, 23, 26, 27], [31, 40, 42, 77, 81], [84, 96, 107, 108, 124], [125, 150, 150, 154, 161], [172, 186, 191, 1
91, 197]]
Not Found
I wanted to find a standard library module where I could flat the 2D array into 1D and sort it. Then, I would make a list comprehension of my 1D array and build it into a 2D array to. This sounds a lot of works but seems to work fine. Let me know if there is a better way to do it without numpy and faster :)

identifying sub-arrays in numpy

I have two two dimensional arrays a and b (#columns of a <= #columns in b). I would like to find an efficient way of matching a row in array a to a contiguous part of a row in array b.
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24],
[ 28, 900]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
The output should be np.array([[0,0], [0,1], [1,0], [2,2], [3,1]]). Row 0 in array a matches Row 0 in array b (first two positions). Row 1 in array a matches row 0 in array b (third and fourth positions).
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, and then compare those patches against each row off a, all of it in a vectorized manner. Then, get the matching indices with np.argwhere -
# a and b from posted question
In [325]: from skimage.util.shape import view_as_windows
In [428]: w = view_as_windows(b,(1,a.shape[1]))
In [429]: np.argwhere((w == a).all(-1).any(-2))[:,::-1]
Out[429]:
array([[0, 0],
[1, 0],
[0, 1],
[3, 1],
[2, 2]])
Alternatively, we could get the indices by the order of rows in a by pushing forward the first axis of a while performing broadcasted comparisons -
In [444]: np.argwhere((w[:,:,0] == a[:,None,None,:]).all(-1).any(-1))
Out[444]:
array([[0, 0],
[0, 1],
[1, 0],
[2, 2],
[3, 1]])
Another way I can think of is to loop over each row in a and perform a 2D correlation between the b which you can consider as a 2D signal a row in a.
We would find the results which are equal to the sum of squares of all values in a. If we subtract our correlation result with this sum of squares, we would find matches with a zero result. Any rows that give you a 0 result would mean that the subarray was found in that row. If you are using floating-point numbers for example, you may want to compare with some small threshold that is just above 0.
If you can use SciPy, the scipy.signal.correlate2d method is what I had in mind.
import numpy as np
from scipy.signal import correlate2d
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
EPS = 1e-8
result = []
for (i, row) in enumerate(a):
out = correlate2d(b, row[None,:], mode='valid') - np.square(row).sum()
locs = np.where(np.abs(out) <= EPS)[0]
unique_rows = np.unique(locs)
for res in unique_rows:
result.append((i, res))
We get:
In [32]: result
Out[32]: [(0, 0), (0, 1), (1, 0), (2, 2)]
The time complexity of this could be better, especially since we're looping over each row of a to find any subarrays in b.

Multiply NumPy ndarray with every element in another binary ndarray of different size

I have two ndarrays :
a = [[30,40],
[60,90]]
b = [[0,0,1],
[1,0,1],
[1,1,1]]
please notice that a shape might be larger but always square array (50,50) , (100,100)
The wanted result is :
Result = [[a*0,a*0,a*1],
[[a*1,a*0,a*1],
[[a*1,a*1,a*1]]
I managed to get the right answer with this code but I think there would be a built in function in numpy that accomplish this task in fast manners
totalrows=[]
for row in range(b.shape[0]):
cells=[]
for column in range(b.shape[1]):
print row,column
cells.append(b[row,column]*a)
totalrows.append(np.concatenate(cells,axis=1))
return np.concatenate(totalrows,axis=0)
Indeed there's a NumPy built-in np.kron for such block-based elementwise multiplication problems. To solve your case, it could be used like so -
np.kron(b,a)
Sample run -
In [50]: a
Out[50]:
array([[30, 40],
[60, 90]])
In [51]: b
Out[51]:
array([[0, 0, 1],
[1, 0, 1],
[1, 1, 1]])
In [52]: np.kron(b,a)
Out[52]:
array([[ 0, 0, 0, 0, 30, 40],
[ 0, 0, 0, 0, 60, 90],
[30, 40, 0, 0, 30, 40],
[60, 90, 0, 0, 60, 90],
[30, 40, 30, 40, 30, 40],
[60, 90, 60, 90, 60, 90]])
3D array case
Now, let's say we are working with a as a 3D array (m,n,p) and b as (q,r) and assuming you are looking to perform such a block-wise multiplication iteratively along the last axis of a. Thus, the shapes are to be multiplied along the first two axes on the two inputs to get the output array. To achieve such an output, we need to extend the dimension of b by introducing a singleton dimension as the last axis. The final output would be of shape (m*q,n*r,p*1). The implementation would be simply -
np.kron(b[...,None],a)
Shape check -
In [161]: a = np.random.randint(0,99,(4,5,2))
...: b = np.random.randint(0,99,(6,7))
...:
In [162]: np.kron(b[...,None],a).shape
Out[162]: (24, 35, 2)

Python, neighbors on a regular grid

Let's suppose I have a set of 2D coordinates that represent the centers of cells of a 2D regular mesh. I would like to find, for each cell in the grid, the two closest neighbors in each direction.
The problem is quite straightforward if one assigns to each cell and index defined as follows:
idx_cell = idx+N*idy
where N is the total number of cells in the grid, idx=x/dx and idy=y/dx, with x and y being the x-coordinate and the y-coordinate of a cell and dx its size.
For example, the neighboring cells for a cell with idx_cell=5 are the cells with idx_cell equal to 4,6 (for the x-axis) and 5+N,5-N (for the y-axis).
The problem that I have is that my implementation of the algorithm is quite slow for large (N>1e6) data sets.
For instance, to get the neighbors of the x-axis I do
[x[(idx_cell==idx_cell[i]-1)|(idx_cell==idx_cell[i]+1)] for i in cells]
Do you think there's a fastest way to implement this algorithm?
You are basically reinventing the indexing scheme of a multidimensional array. It is relatively easy to code, but you can use the two functions unravel_index and ravel_multi_index to your advantage here.
If your grid is of M rows and N columns, to get the idx and idy of a single item you could do:
>>> M, N = 12, 10
>>> np.unravel_index(4, dims=(M, N))
(0, 4)
This also works if, instead of a single index, you provide an array of indices:
>>> np.unravel_index([15, 28, 32, 97], dims=(M, N))
(array([1, 2, 3, 9], dtype=int64), array([5, 8, 2, 7], dtype=int64))
So if cells has the indices of several cells you want to find neighbors to:
>>> cells = np.array([15, 28, 32, 44, 87])
You can get their neighbors as:
>>> idy, idx = np.unravel_index(cells, dims=(M, N))
>>> neigh_idx = np.vstack((idx-1, idx+1, idx, idx))
>>> neigh_idy = np.vstack((idy, idy, idy-1, idy+1))
>>> np.ravel_multi_index((neigh_idy, neigh_idx), dims=(M,N))
array([[14, 27, 31, 43, 86],
[16, 29, 33, 45, 88],
[ 5, 18, 22, 34, 77],
[25, 38, 42, 54, 97]], dtype=int64)
Or, if you prefer it like that:
>>> np.ravel_multi_index((neigh_idy, neigh_idx), dims=(M,N)).T
array([[14, 16, 5, 25],
[27, 29, 18, 38],
[31, 33, 22, 42],
[43, 45, 34, 54],
[86, 88, 77, 97]], dtype=int64)
The nicest thing about going this way is that ravel_multi_index has a mode keyword argument you can use to handle items on the edges of your lattice, see the docs.

Categories

Resources