This question already has answers here:
How to count RGB or HSV channel combination in an image?
(1 answer)
Count occurrences of unique arrays in array
(5 answers)
Closed 3 years ago.
I have a 3D integer tensor X with X.shape=(m, n, k)
I'd like to treat X as a (m, n) matrix with entries that are k sized integer vectors and count how many such unique entries are in each row.
So for example
>>> X
array([[[0, 1, 2],
[0, 1, 2],
[1, 2, 3],
[1, 2, 3]],
[[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8]]])
>>> X.shape
(2, 4, 3)
>>> count_unique(X)
[2, 4]
Since in the first row of the tensor there are 2 unique vectors and in the second row there are 4
Bonus points for returning the actual unique vectors, e.g.
>>> get_unique(X)
[[[0, 1, 2], [1, 2, 3]],\
[3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8]]]
My solution (partially vectorized) for the first question
count_unique = lambda X: [len(np.unique(row, axis=0)) for row in X]
unique_list = []
for sublist in X:
tmp_unique_list = []
for element in sublist:
if element not in tmp_unique_list:
tmp_unique_list.append(element)
unique_list.append(tmp_unique_list)
Output:
> unique list
[[[0, 1, 2], [1, 2, 3]], [[3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8]]]
And the count:
> [len(elem) for elem in unique_list]
[2, 4]
Related
Suppose I have two NumPy arrays
x = [[1, 2, 8],
[2, 9, 1],
[3, 8, 9],
[4, 3, 5],
[5, 2, 3],
[6, 4, 7],
[7, 2, 3],
[8, 2, 2],
[9, 5, 3],
[10, 2, 3],
[11, 2, 4]]
y = [0, 0, 1, 0, 1, 1, 2, 2, 2, 0, 0]
Note:
(values in x are not sorted in any way. I chose this example to better illustrate the example)
(These are just two examples of x and y. values of x and y can be arbitrarily many different numbers and y can have arbitrarily different numbers, but there are always as many values in x as there are in y)
I want to efficiently split the array x into sub-arrays according to the values in y.
My desired outputs would be
z_0 = [[1, 2, 8],
[2, 9, 1],
[4, 3, 5],
[10, 2, 3],
[11, 2, 4]]
z_1 = [[3, 8, 9],
[5, 2, 3],
[6, 4, 7],]
z_2 = [[7, 2, 3],
[8, 2, 2],
[9, 5, 3]]
Assuming that y starts with zero and is not sorted but grouped, what is the most efficient way to do this?
Note: This question is the unsorted version of this question:
Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array
One way to solve this is to build up a list of filter indexes for each y value and then simply select those elements of x. For example:
z_0 = x[[i for i, v in enumerate(y) if v == 0]]
z_1 = x[[i for i, v in enumerate(y) if v == 1]]
z_2 = x[[i for i, v in enumerate(y) if v == 2]]
Output
array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]])
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]])
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])
If you want to be more generic and support different sets of numbers in y, you could use a comprehension to produce a list of arrays e.g.
z = [x[[i for i, v in enumerate(y) if v == m]] for m in set(y)]
Output:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]
If y is also an np.array and the same length as x you can simplify this to use boolean indexing:
z = [x[y==m] for m in set(y)]
Output is the same as above.
Just use list comprehension and boolean indexing
x = np.array(x)
y = np.array(y)
z = [x[y == i] for i in range(y.max() + 1)]
z
Out[]:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]
Slight variation.
from operator import itemgetter
label = itemgetter(1)
Associate the implied information with the label ... (index,label)
y1 = [thing for thing in enumerate(y)]
Sort on the label
y1.sort(key=label)
Group by label and construct the results
import itertools
d = {}
for key,group in itertools.groupby(y1,label):
d[f'z{key}'] = [x[i] for i,k in group]
Pandas solution:
>>> import pandas as pd
>>> >>> df = pd.DataFrame({'points':[thing for thing in x],'cat':y})
>>> z = df.groupby('cat').agg(list)
>>> z
points
cat
0 [[1, 2, 8], [2, 9, 1], [4, 3, 5], [10, 2, 3], ...
1 [[3, 8, 9], [5, 2, 3], [6, 4, 7]]
2 [[7, 2, 3], [8, 2, 2], [9, 5, 3]]
I wish to split a list into same-sized chunks using a "sliding window", but instead of truncating the list at the ends, I want to wrap around so that the final chunk can be spread across the beginning and end of the list.
For example, given a list:
l = [1, 2, 3, 4, 5, 6]
I wish to generate chunks of size n=3 as follows:
[1, 2, 3], [4, 5, 6]
[2, 3, 4], [5, 6, 1]
[3, 4, 5], [6, 1, 2]
Or the same list with chunks of size n=2 should be split as follows:
[1, 2], [3, 4], [5, 6]
[2, 3], [4, 5], [6, 1]
The list may not be divided evenly into n sublists (e.g. if the original list has length 7 and n=3 - or any value other than 7 or 1). The rounded value len(l) / n can be used to determine the split size, as in the usual case.
This post is related, although it does not wrap around as I need. I have tried but not managed anything useful. Any suggestions would be most welcome!
You can use itertools.islice over a wrap-around iterator generated from itertools.cycle:
from itertools import cycle, islice
def rolling_chunks(l, n):
return ([list(islice(cycle(l), i + j, i + j + n)) for j in range(0, len(l), n)] for i in range(n))
so that list(rolling_chunks(l, 3)) returns:
[[[1, 2, 3], [4, 5, 6]], [[2, 3, 4], [5, 6, 1]], [[3, 4, 5], [6, 1, 2]]]
and that list(rolling_chunks(l, 2)) returns:
[[[1, 2], [3, 4], [5, 6]], [[2, 3], [4, 5], [6, 1]]]
I'm trying to find a way to fill an array with rows of values. It's much easier to express my desired output with an example. Given the input of an N x M matrix, array1,
array1 = np.array([[2, 3, 4],
[4, 8, 3],
[7, 6, 3]])
I would like to output an array of arrays in which each row is an N x N consisting of the values from the respective row. The output would be
[[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]],
[[4, 8, 3],
[4, 8, 3],
[4, 8, 3]],
[[7, 6, 3],
[7, 6, 3],
[7, 6, 3]]]
You can reshape the array from 2d to 3d, then use numpy.repeat() along the desired axis:
np.repeat(array1[:, None, :], 3, axis=1)
#array([[[2, 3, 4],
# [2, 3, 4],
# [2, 3, 4]],
# [[4, 8, 3],
# [4, 8, 3],
# [4, 8, 3]],
# [[7, 6, 3],
# [7, 6, 3],
# [7, 6, 3]]])
Or equivalently you can use numpy.tile:
np.tile(array1[:, None, :], (1,3,1))
Another solution which is sometimes useful is the following
out = np.empty((3,3,3), dtype=array1.dtype)
out[...] = array1[:, None, :]
This question already has answers here:
how to search for unique elements by the first column of a multidimensional array
(2 answers)
Closed 6 years ago.
I have an ndarray with the following content:
[0, 1]
[0, 5]
[1, 7]
[2, 9]
[2, 4]
[2, 4]
[3, 8]
[4, 2]
[4, 7]
Now I'd like to keep only the first row when the first element is the same for multiple rows. Would result in:
[0, 1]
[1, 7]
[2, 9]
[3, 8]
[4, 2]
How can I achieve this with numpy?
Given an input data as:
x = np.array([
[0, 1],
[0, 5],
[1, 7],
[2, 9],
[2, 4],
[2, 4],
[3, 8],
[4, 2],
[4, 7],
])
Then you could use numpy.unique with the return_index set to true (as #divakar mentioned in the commend) in order to find the unique indices of the first elements.
idx = numpyp.unique(x[:,0], return_index=True)[1]
Then you can just access them as:
x[idx]
Hope this helps.
I have a numpy array say
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I have an array 'replication' of the same size where replication[i,j](>=0) denotes how many times a[i][j] should be repeated along the row. Obiviously, replication array follows the invariant that np.sum(replication[i]) have the same value for all i.
For example, if
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
then the final array after replicating is:
new_a = array([[1, 2, 2, 3],
[4, 5, 6, 6],
[7, 7, 8, 9]])
Presently, I am doing this to create new_a:
##allocate new_a
h = a.shape[0]
w = a.shape[1]
for row in range(h):
ll = [[a[row][j]]*replicate[row][j] for j in range(w)]
new_a[row] = np.array([item for sublist in ll for item in sublist])
However, this seems to be too slow as it involves using lists. Can I do the intended entirely in numpy, without the use of python lists?
You can flatten out your replication array, then use the .repeat() method of a:
import numpy as np
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
new_a = a.repeat(replication.ravel()).reshape(a.shape[0], -1)
print(repr(new_a))
# array([[1, 2, 2, 3],
# [4, 5, 6, 6],
# [7, 7, 8, 9]])