Updating by index in an multi-dimensional numpy array - python

I am using numpy to tally a lot of values across many large arrays, and keep track of which positions the maximum values appear in.
In particular, imagine I have a 'counts' array:
data = numpy.array([[ 5, 10, 3],
[ 6, 9, 12],
[13, 3, 9],
[ 9, 3, 1],
...
])
counts = numpy.zeros(data.shape, dtype=numpy.int)
data is going to change a lot, but I want 'counts' to reflect the number of times the max has appeared in each position:
max_value_indices = numpy.argmax(data, axis=1)
# this is now [1, 2, 0, 0, ...] representing the positions of 10, 12, 13 and 9, respectively.
From what I understand of broadcasting in numpy, I should be able to say:
counts[max_value_indices] += 1
What I expect is the array to be updated:
[[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[1, 0, 0],
...
]
But instead this increments ALL the values in counts giving me:
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
...
]
I also though perhaps if I transformed max_value_indices to a 100x1 array, it might work:
counts[max_value_indices[:,numpy.newaxis]] += 1
but this has effect of updating just the elements in positions 0, 1, and 2:
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0],
...
]
I'm also happy to turn the indices array into an array of 0's and 1's, and then add it to the counts array each time, but I'm not sure how to construct that.

You could use so-called advanced integer indexing (aka Multidimensional list-of-locations indexing):
In [24]: counts[np.arange(data.shape[0]),
np.argmax(data, axis=1)] += 1
In [25]: counts
Out[25]:
array([[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[1, 0, 0]])
The first array, np.arange(data.shape[0]) specifies the row. The second array, np.argmax(data, axis=1) specifies the column.

Related

How can I set the diagonal of an N-dim tensor to 0 along given dims?

I’m trying to figure out a way to set the diagonal of a 3-dimensional Tensor (along 2 given dims) equal to 0. An example of this would be, let’s say I have a Tensor of shape [N,N,N] and I wanted to set the diagonal along dim=1,2 equal to 0? How exactly could that be done?
I tried using fill_diagonal_ but that only does the k-th diagonal element for each sub-array, i.e:
>>> data = torch.ones(3,4,4)
>>> data.fill_diagonal_(0)
tensor([[[0, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 1]]])
whereas I would want the entire diagonal for each sub-matrix to be equal to 0 here. So, the desired outcome would be,
tensor([[[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]],
[[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]],
[[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]]])
Secondly, the reason I state for a given pair of dimension is, I need to repeat this `zeroing’ along 2 different pairs of dimensions (e.g. dim=(1,2) then dim=(0,1)) to get the required masking I need.
Is there a way to mask a given diagonal over 2 arbitrary dimensions for a 3D-tensor?
You can do this with a for loop over the sub-tensors:
# across dim0
for i in range(data.size(0)):
data[i].fill_diagonal_(0)
If you need to perform this over an arbitrary two dimensions of a 3d tensor, simply apply the fill to the appropriate slices:
# across dim1
for i in range(data.size(1)):
data[:,i].fill_diagonal_(0)
# across dim2
for i in range(data.size(2)):
data[:,:,i].fill_diagonal_(0)

Splitting a sorted array of repeated elements

I have an array of repeated elements, where each repeated element represents a class. What i would like to do is obtain the indices of the repeated elements and partition in order of the nth first elements in 3 slices. For example:
np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
split the first occurences in 3
[0, 2, 1] [2, 0, 1], [2, 1, 0, 0]
I would like to find the indices of the repeated elements and split the array in proportions of 3, where each sliced array will contain the first 3 repeated elements indices:
So for the array and it's splits, i'd like to obtain the following:
array[0, 2, 2, 1, 0, 1, 2, 1, 0, 0]
indices:[0, 1, 3], [2, 4, 5], [6, 7, 8, 9]
I've tried the following:
a = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
length = np.arange(len(a))
array_set = (([length[a ==unique] for unique in np.unique(a)]))
But i can't figure how to split the partitions in order of the first occurences like the above examples.
This is a way to split the array in proportions of 3, that is, the last 0 will be left out:
# unique values
uniques = np.unique(a)
# counting occurrence of each unique value
occ = np.cumsum(a == uniques[:,None], axis=1)
# maximum common occurrence
max_occ = occ.max(axis=1).min()
# masking the first occurrences
u = (occ[None,...] == (np.arange(max_occ)+1)[:,None, None])
# the indexes
idx = np.sort(np.argmax(u, axis=-1), axis=-1)
# the partitions
partitions = a[idx]
Output:
# idx
array([[0, 1, 3],
[2, 4, 5],
[6, 7, 8]])
# partitions
array([[0, 2, 1],
[2, 0, 1],
[2, 1, 0]])
This is a problem where np.concatenate(...) + some algorithm + np.split(...) does the trick, though they are slow methods.
Lets start from concatenation and referencing indexes where you split:
classes = [[0, 2, 1], [2, 0, 1], [2, 1, 0, 0]]
split_idx = np.cumsum(list(map(len, classes[:-1])))
flat_classes = np.concatenate(classes)
Then indexes that sorts an initial array and also indexes of starts of groups are needed. In this case sorted array is [0,0,0,0,1,1,1,2,2,2] and distinct groups start at 0, 4 and 7.
c = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
idx = np.argsort(c)
u, cnt = np.unique(c, return_counts=True)
marker_idx = np.r_[0, np.cumsum(cnt[:-1])]
Now this is a trickiest part. It is known that one of indexes 0, 4 or 7 changes in each step (while you iterate on flat_classes), so you can accumulate these changes in a special array called counter which has 3 columns for each index and after that access only these indexes where changes were met:
take = np.zeros((len(flat_classes), len(u)), dtype=int)
take[np.arange(len(flat_classes)), flat_classes] = 1
counter = np.cumsum(take, axis=0)
counter = counter + marker_idx - np.ones(len(u), dtype=int)
active_idx = counter[np.arange(len(flat_classes)), flat_classes]
splittable = idx[active_idx] #remember that we are working on indices that sorts array
output = np.split(splittable, split_idx)
Output
[array([0, 1, 3], dtype=int64),
array([2, 4, 5], dtype=int64),
array([6, 7, 8, 9], dtype=int64)]
Remark: the main idea of solution is to manipulate with changes of indexes of other indexes that sorts an array. This is example of changes for this problem:
>>> counter
array([[0, 3, 6],
[0, 3, 7],
[0, 4, 7],
[0, 4, 8],
[1, 4, 8],
[1, 5, 8],
[1, 5, 9],
[1, 6, 9],
[2, 6, 9],
[3, 6, 9]]

Merge three numpy arrays, keep largest value

I want to merge three numpy arrays, for example:
a = np.array([[0,0,1],[0,1,0],[1,0,0]])
b = np.array([[1,0,0],[0,1,0],[0,0,1]])
c = np.array([[0,1,0],[0,2,0],[0,1,0]])
a = array([[0, 0, 1],
[0, 1, 0],
[1, 0, 0]])
b = array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
c = array([[0, 1, 0],
[0, 2, 0],
[0, 1, 0]])
Desired result would be to overlay them but keep the largest value where multiple elements are not 0, like in the middle.
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])
I solved this by iterating over all elements with multiple if-conditions. Is there a more compact and more beautiful way to do this?
You can try of stacking arrays together in extra dimension with Numpy np.dstack method
and extract the maximum value specific to added dimension
# Stacking arrays together
d = np.dstack([a,b,c])
d.max(axis=2)
Out:
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])
NumPy's np.ufunc.reduce allows to apply a function cumulatively along a given axis. We can just concatenate the arrays and reduce with numpy.maximum to keep the accumulated elementwise maximum:
np.maximum.reduce([a,b,c])
array([[1, 1, 1],
[0, 2, 0],
[1, 1, 1]])

Numpy: swap values of 2D array based on a separate vector

Let's say I have a 3x4 numpy array, like so:
[[0, 1, 2],
[2, 0, 1],
[0, 2, 1],
[1, 2, 0]]
And let's say that I have an additional vector:
[2,
1,
2,
1]
For each row, I want to find the index of the value found in my additional vector, and swap it with the first column in my numpy array.
For example, the first entry in my vector is 2, and in the first row of my numpy array, 2 is in the 3rd column, so I want to swap the first and third columns for that row, and continue this for each additional row.
[[2, 1, 0], # the number in the 0th position (0) and 2 have swapped placement
[1, 0, 2], # the number in the 0th position (2) and 1 have swapped placement
[2, 0, 1], # the number in the 0th position (0) and 2 have swapped placement
[1, 2, 0] # the number in the 0th position (1) and 1 have swapped placement
What's the best way to accomplish this?
Setup
arr = np.array([[0, 1, 2], [2, 0, 1], [0, 2, 1], [1, 2, 0]])
vals = np.array([2, 1, 2, 1])
First, you need to find the index of your values, which we can accomplish using broadcasting and argmax (This will find the first index, not necessarily the only index):
idx = (arr == vals[:, None]).argmax(1)
# array([2, 2, 1, 0], dtype=int64)
Now using basic indexing and assignment:
r = np.arange(len(arr))
arr[r, idx], arr[:, 0] = arr[:, 0], arr[r, idx]
Output:
array([[2, 1, 0],
[1, 0, 2],
[2, 0, 1],
[1, 2, 0]])

Numpy 3D array arranging and reshaping

I have a 3D numpy array that I need to reshape and arrange. For example, I have x=np.array([np.array([np.array([1,0,1]),np.array([1,1,1]),np.array([0,1,0]),np.array([1,1,0])]),np.array([np.array([0,0,1]),np.array([0,0,0]),np.array([0,1,1]),np.array([1,0,0])]),np.array([np.array([1,0,0]),np.array([1,0,1]),np.array([1,1,1]),np.array([0,0,0])])])
Which is a shape of (3,4,3), when printing it I get:
array([[[1, 0, 1],
[1, 1, 1],
[0, 1, 0],
[1, 1, 0]],
[[0, 0, 1],
[0, 0, 0],
[0, 1, 1],
[1, 0, 0]],
[[1, 0, 0],
[1, 0, 1],
[1, 1, 1],
[0, 0, 0]]])
Now I need to reshape this array to a (4,3,3) by selecting the same index in each subarray and putting them together to end up with something like this:
array([[[1,0,1],[0,0,1],[1,0,0]],
[[1,1,1],[0,0,0],[1,0,1]],
[[0,1,0],[0,1,1],[1,1,1]],
[[1,1,0],[1,0,0],[0,0,0]]]
I tried reshape, all kinds of stacking and nothing worked (arranged the array like I need). I know I can do it manually but for large arrays manually isn't a choice.
Any help will be much appreciated.
Thanks
swapaxes will do what you want. That is, if your input array is x and your desired output is y, then
np.all(y==np.swapaxes(x, 1, 0))
should give True.
For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes:
import numpy as np
foo = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]])
foo.transpose(1, 0, 2)
result:
array([[[ 1, 2],
[ 5, 6],
[ 9, 10]],
[[ 3, 4],
[ 7, 8],
[11, 12]]])

Categories

Resources