I have a quick question about the numpy unique function. I want to return the unique column values for each row
import numpy as np
a = np.array([[3, 2, 3, 2, 1, 3, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 2, 3, 1, 1]]) # a.shape is (3,16)
np.unique(a)
array([1, 2, 3]) # not what I want
np.unique(a,axis=1)
array([[1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3],
[2, 3, 1, 1, 2, 2, 3, 1, 2, 2, 3],
[2, 3, 2, 3, 2, 3, 2, 1, 1, 2, 3]]) # also not what I want, and I'm not even sure what its doing
np.apply_along_axis(np.unique,1,a)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]) # this is what I want
The problem is that I also want to use other features of np.unqiue, like returning index values. Can anyone help me to get np.unique to work by itself?
You can loop over rows and collect unique values:
import numpy as np
a = np.array([[3, 2, 3, 2, 1, 3, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 2, 3, 1, 1]])
arr = np.empty((0,3), int)
for row in a:
arr = np.append(arr, np.array([np.unique(a)]), axis=0)
Output:
[[1 2 3]
[1 2 3]
[1 2 3]]
numpy will not be able to return a matrix with rows of different sizes. your example has exactly 3 distinct values per row which makes np.apply_along_axis work but if you had a value of 4 in one of the rows or only 1s and 2s on a row it would fail.
To obtain what you are looking for you will need to use a normal Python list as the result. You can build it using a list comprehension:
import numpy as np
a = np.array([[1, 2, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 4, 2, 2, 2, 3, 2, 2, 3, 1, 1]])
r = [ np.unique(row) for row in a ]
print(r)
# [array([1, 2]), array([1, 2, 3]), array([1, 2, 3, 4])]
r = [ np.unique(row,return_index=True)for row in a ]
print(r)
# [(array([1, 2]), array([0, 1])),
# (array([1, 2, 3]), array([11, 1, 0])),
# (array([1, 2, 3, 4]), array([14, 3, 0, 6]))]
One thing you could do is build a mask of the values that are the first of their kind on each row. This can be done using numpy.
Here's one way to do it (hopefully, numpy experts could suggest something less convoluted):
np.sum(np.cumsum(np.cumsum(a==np.unique(a)[:,None,None],axis=2),axis=2)==1,axis=0)
array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
Such a mask offers many processing options such as finding indices of the first occurrence on each line (using np.argwhere), erasing/assigning first or subsequent occurrences, and more.
Related
I'm having difficulty adding to a list iteratively.
Here's a MWE:
# Given a nested list of values, or sets
sets = [[1, 2, 3], [1, 2, 4], [1, 2, 5]]
# add a value to each sublist giving the number of that set in the list.
n_sets = len(sets)
for s in range(n_sets):
(sets[s]).insert(0, s)
# Now repeat those sets reps times
reps = 4
expanded_sets = [item for item in sets for i in range(reps)]
# then assign a repetition number to each occurance of a set.
rep_list = list(range(reps)) * n_sets
for i in range(n_sets * reps):
(expanded_sets[i]).insert(0, rep_list[i])
expanded_sets
which returns
[[3, 2, 1, 0, 0, 1, 2, 3],
[3, 2, 1, 0, 0, 1, 2, 3],
[3, 2, 1, 0, 0, 1, 2, 3],
[3, 2, 1, 0, 0, 1, 2, 3],
[3, 2, 1, 0, 1, 1, 2, 4],
[3, 2, 1, 0, 1, 1, 2, 4],
[3, 2, 1, 0, 1, 1, 2, 4],
[3, 2, 1, 0, 1, 1, 2, 4],
[3, 2, 1, 0, 2, 1, 2, 5],
[3, 2, 1, 0, 2, 1, 2, 5],
[3, 2, 1, 0, 2, 1, 2, 5],
[3, 2, 1, 0, 2, 1, 2, 5]]
instead of the desired
[[0, 0, 1, 2, 3],
[1, 0, 1, 2, 3],
[2, 0, 1, 2, 3],
[3, 0, 1, 2, 3],
[0, 1, 1, 2, 4],
[1, 1, 1, 2, 4],
[2, 1, 1, 2, 4],
[3, 1, 1, 2, 4],
[0, 2, 1, 2, 5],
[1, 2, 1, 2, 5],
[2, 2, 1, 2, 5],
[3, 2, 1, 2, 5]]
Just for fun, the first loop returns an expected value of sets
[[0, 1, 2, 3], [1, 1, 2, 4], [2, 1, 2, 5]]
but after the second loop sets changed to
[[3, 2, 1, 0, 0, 1, 2, 3], [3, 2, 1, 0, 1, 1, 2, 4], [3, 2, 1, 0, 2, 1, 2, 5]]
I suspect the issue has something to do with copies and references. I've tried adding .copy() and slices in various places, but with the indexed sublists I haven't come across a combo that works. I'm running Python 3.10.6.
Thanks for looking!
Per suggested solution, [list(range(reps)) for _ in range(n_sets)] doesn't correctly replace the list(range(reps)) * n_sets, since it gives [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]] instead of
the desired [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]. Do I need to flatten, or is there a syntax with the _ notation that gives me a single list?
Further update . . .
replacing
rep_list = list(range(reps)) * n_sets
with
rep_list_nest = [list(range(reps)) for _ in range(n_sets)]
rep_list = [i for sublist in rep_list_nest for i in sublist]
gives the same undesired result for expanded_sets.
The problem is here:
expanded_sets = [item
for item in sets
for i in range(reps)]
This list now contains the same element of sets four times in a row, followed by the next element repeated four times, and so on.
Creating copies of item fixes the issue:
expanded_sets = [item.copy()
for item in sets
for i in range(reps)]
Try it online
If you want a more pythonic approach, then recognize that the result is a product of two ranges, and your original sets all concatenated together:
from itertools import product
sets = [[1, 2, 3], [1, 2, 4], [1, 2, 5]]
expanded_sets = [[inner_counter, outer_counter] + sets_elem
for sets_elem, outer_counter, inner_counter in product(sets, range(len(sets)), range(4))]
Try it online
For example:
data = [[3, 0, 1, 1, 1, 0, 2, 1, 2, 3],
[0, 5, 3, 2, 2, 1, 1, 1, 3, 0],
[1, 3, 5, 3, 2, 1, 1, 1, 2, 1],
[1, 2, 3, 4, 1, 1, 2, 1, 1, 1],
[1, 2, 2, 1, 4, 0, 2, 2, 2, 1],
[0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
[2, 1, 1, 2, 2, 0, 4, 3, 2, 2],
[1, 1, 1, 1, 2, 0, 3, 3, 1, 1],
[2, 3, 2, 1, 2, 0, 2, 1, 5, 2],
[3, 0, 1, 1, 1, 0, 2, 1, 2, 4]]
I want to print the largest number in the nested list [2, 3, 2, 1, 2, 0, 2, 1, 5, 2] which is 5 which is contained in index[8][8].
I also want to print on which index of the nested list it was in.
This should help you:
data = [[3, 0, 1, 1, 1, 0, 2, 1, 2, 3], [0, 5, 3, 2, 2, 1, 1, 1, 3, 0], [1, 3, 5, 3, 2, 1, 1, 1, 2, 1], [1, 2, 3, 4, 1, 1, 2, 1, 1, 1], [1, 2, 2, 1, 4, 0, 2, 2, 2, 1], [0, 1, 1, 1, 0, 1, 0, 0, 0, 0], [2, 1, 1, 2, 2, 0, 4, 3, 2, 2], [1, 1, 1, 1, 2, 0, 3, 3, 1, 1], [2, 3, 2, 1, 2, 0, 2, 1, 5, 2], [3, 0, 1, 1, 1, 0, 2, 1, 2, 4]]
for lst in data:
maximum = max(lst)
index = [data.index(lst),lst.index(maximum)]
print(f"Largest Number = {maximum} , Index = [{index[0]}][{index[1]}]")
Output:
Largest Number = 3 , Index = [0][0]
Largest Number = 5 , Index = [1][1]
Largest Number = 5 , Index = [2][2]
Largest Number = 4 , Index = [3][3]
Largest Number = 4 , Index = [4][4]
Largest Number = 1 , Index = [5][1]
Largest Number = 4 , Index = [6][6]
Largest Number = 3 , Index = [7][6]
Largest Number = 5 , Index = [8][8]
Largest Number = 4 , Index = [9][9]
The above answer is good. However, if you wish to use the beauty of NumPy and python you can use the following snippet. The snippet also generalizes if there are multiple max elements.
import numpy as np
data = [[3, 0, 1, 1, 1, 0, 2, 1, 2, 3],
[0, 5, 3, 2, 2, 1, 1, 1, 3, 0],
[1, 3, 5, 3, 2, 1, 1, 1, 2, 1],
[1, 2, 3, 4, 1, 1, 2, 1, 1, 1],
[1, 2, 2, 1, 4, 0, 2, 2, 2, 1],
[0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
[2, 1, 1, 2, 2, 0, 4, 3, 2, 2],
[1, 1, 1, 1, 2, 0, 3, 3, 1, 1],
[2, 3, 2, 1, 2, 0, 2, 1, 5, 2],
[3, 0, 1, 1, 1, 0, 2, 1, 2, 4]]
# For max value across the matrix
indices = np.where(data == data.max())
max_indices = list(zip(indices[0], indices[1]))
print(max_indices)
[(1, 1), (2, 2), (8, 8)]
# If you want it for each row
max_args = data.max(axis=1)
result = [(i,j,np.where(data[j]==i)[0][0]) for j,i in enumerate(max_args)]
print(result)
# Format (number,row,col)
[(3, 0, 0),
(5, 1, 1),
(5, 2, 2),
(4, 3, 3),
(4, 4, 4),
(1, 5, 1),
(4, 6, 6),
(3, 7, 6),
(5, 8, 8),
(4, 9, 9)]
I'm trying to mark the value and indices of max values in a 3D array, getting the max in the third axis.
Now this would have been obvious in a lower dimension:
argmaxes=np.argmax(array)
maximums=array[argmaxes]
but NumPy doesn't understand the second syntax properly for higher than 1D.
Let's say my 3D array has shape (8,8,250). argmaxes=np.argmax(array,axis=-1)would return a (8,8) array with numbers between 0 to 250. Now my expected output is an (8,8) array containing the maximum number in the 3rd dimension. I can achieve this with maxes=np.max(array,axis=-1) but that's repeating the same calculation twice (because I need both values and indices for later calculations)
I can also just do a crude nested loop:
for i in range(8):
for j in range(8):
maxes[i,j]=array[i,j,argmaxes[i,j]]
But is there a nicer way to do this?
You can use advanced indexing. This is a simpler case when shape is (8,8,3):
arr = np.random.randint(99, size=(8,8,3))
x, y = np.indices(arr.shape[:-1])
arr[x, y, np.argmax(array,axis=-1)]
Sample run:
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7]])
>>> y
array([[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7]])
>>> np.argmax(arr,axis=-1)
array([[2, 1, 1, 2, 0, 0, 0, 1],
[2, 2, 2, 1, 0, 0, 1, 0],
[1, 2, 0, 1, 1, 1, 2, 0],
[1, 0, 0, 0, 2, 1, 1, 0],
[2, 0, 1, 2, 2, 2, 1, 0],
[2, 2, 0, 1, 1, 0, 2, 2],
[1, 1, 0, 1, 1, 2, 1, 0],
[2, 1, 1, 1, 0, 0, 2, 1]], dtype=int64)
This is a visual example of array to help to understand it better:
I try to use classification_report from sklearn.metrics:
sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False)
As input for prediction and label i've got one list each with the following form:
for pred:
[array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
for true:
[array([2, 3, 3, 2, 3, 3, 2, 3, 2, 2, 3, 3, 3, 2, 3, 2, 3, 2, 3, 2, 3, 3,
2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 3, 3, 2, 2, 2, 2, 2, 3, 2,
2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 2, 2, 3, 2, 2, 2, 2, 2])]
for the sklearn-function above i need a simple list. The array produces an error:
ValueError: multiclass-multioutput is not supported
I tried .tolist() already but didn't work for me.
I am searching a possibility to convert my array-list [?] to a simple list.
Thanks for your help.
Each of those objects is already a list, each of which contains a single element, which is an array.
To access the 1st element and convert it to a list, try something like:
x = [array([2, 3, 3, 2, 3, 3, 2, 3, 2, 2, 3, 3, 3, 2, 3, 2, 3, 2, 3, 2, 3, 3,
2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 3, 3, 2, 2, 2, 2, 2, 3, 2,
2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 2, 2, 3, 2, 2, 2, 2, 2])]
x_list = list(x[0])
And x_list will contain the array element in list form.
Way 1: Just index the lists e.g. pred[0]
Code:
pred = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
test = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
classification_report(pred[0], test[0])
Way 2:
Reform it to match sklearn requirements:
pred = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
test = [array([0, 0, 0, 3, 0, 3, 2, 2, 1, 1, 2, 0, 2, 3, 0, 2, 2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 1, 0, 3, 2, 2, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 2, 2,
2, 1, 0, 0, 0, 2, 2, 0, 2, 0, 2, 1, 0, 2, 2, 3, 0, 2, 0, 2])]
flat_pred = [item for sublist in pred for item in sublist]
flat_test = [item for sublist in test for item in sublist]
print(classification_report(flat_pred,flat_test))
I am looking for way to create a colormap in python/matplotlib that is specific to a designated integer value and remains that integer value. Here is an example of how I define the matlab colormap:
c_map = [1 1 1;... % Integer assignment = 0 - Air - white
0.6 1 0.8;... % Integer assignment = 1 - Water - cyan
1 0.6 0.2]; % Integer assignment = 2 - Sediments - orange
it is later called during the plotting routine by:
colormap(c_map);
pcolor(xvec,zvec,ColorIntegers);
shading interp, colormap, caxis([0 3]), axis ij
The integer values always stay the same (i.e., Air = 0 , Water = 1, sediments = 2).
I've scoured the matplotlib documentation and stack, but haven't found a way to create this specific style colormap, which relates corresponding integers to a color. Most questions deal with diverging, jet, centering, linear, non-linear as opposed to a consistent coloring of specific colormaps. Each color must correspond to that specific integer value.
Any help will be appreciated, thank you in advance.
Take a look at the ListedColormap.
For example,
In [103]: y
Out[103]:
array([[1, 1, 2, 2, 2, 1, 0, 0, 0, 1, 2, 2, 2, 2, 2, 1],
[1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 0],
[1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 0],
[1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 3, 3, 2, 1, 0],
[1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1],
[1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 2, 2, 1],
[1, 1, 0, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1],
[1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 2],
[0, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 1, 1, 2, 2],
[1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 2, 2, 2, 2, 2],
[2, 1, 1, 0, 0, 1, 2, 2, 3, 3, 3, 2, 2, 2, 2, 1],
[2, 1, 1, 0, 1, 2, 2, 3, 3, 3, 3, 2, 1, 1, 1, 1],
[2, 1, 1, 0, 2, 3, 3, 3, 3, 3, 2, 1, 0, 1, 1, 0]])
In [104]: c_map = [[1, 1, 1], [0.6, 1, 0.8], [1, 0.6, 0.2], [0.75, 0.25, 0.25]]
In [105]: cm = matplotlib.colors.ListedColormap(c_map)
In [106]: imshow(y, interpolation='nearest', cmap=cm)
generates