How to merge 4 columns based on user-defined criterion - python

I need to merge 4 columns of an array into a single column
array([[0, 0, 0, 1],
[1, 0, 0, 0],
...,
[0, 1, 0, 0]])
The result should be:
array([3, 0, ..., 1])
In particular, I want to get column indices (starting from 0 and ending with 3) for those columns that have a value 1.

Based on each row containing only one '1' value, and only with zeros and ones.
EDIT: For some reason I lost the connection with numpy while trying to clarify question. This will not work if using numpy, but I'll leave it in case you were looking for something with just lists.
a = [[0, 0, 0, 1],
[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0]]
# One way
single_list = []
for x in a:
single_list.append(x.index(1))
# or
using_list_comprehension = [x.index(1) for x in a]

If every row has a unique one then the following will work:
np.where(a == 1)[1]
For example:
>>> a = np.array([[0, 0, 0, 1], [1, 0, 0, 0], [0, 1, 0, 0]])
>>> np.where(a == 1)[1]
array([3, 0, 1])
We can see it from the following:
>>> np.where(a == 1)
(array([0, 1, 2]), array([3, 0, 1]))
array([0, 1, 2]) are the row indexes having one values, and array([3, 0, 1] are the column indexes. This means that we have one values at coordinates (0, 3), (1, 0), (2, 1). Because each row will have one unique value of one, then there will be one column index for each row.

Related

How to keep a fixed size of unique values in random positions in an array while replacing others with a mask?

This can be a very simple question as I am still exploring Python. And for this issue I use numpy.
Updated 09/30/21: adopted and modified codes shown below for any potential future reference. I also added an elif in the loop for classes that have fewer counts than the wanted size. Some codes may be unnecessary tho.
new_array = test_array.copy()
uniques, counts = np.unique(new_array, return_counts=True)
print("classes:", uniques, "counts:", counts)
for unique, count in zip(uniques, counts):
#print (unique, count)
if unique != 0 and count > 3:
ids = np.random.choice(count, count-3, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = 0
elif unique != 0 and count <= 3:
ids = np.random.choice(count, count, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = unique
Below is original question.
Let's say I have a 2D array like this:
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
print("existing classes:", np.unique(test_array))
# "existing classes: [0 1 2 4]"
Now I want to keep a fixed size (e.g. 2 values) in each class that != 0 (in this case two 1s, two 2s, and two 4s) and replace the rest with 0. Where the value being replaced is random with each run (or from a seed).
For example, with run 1 I will have
([[0,0,0,0,0],
[1,0,0,1,0],
[0,0,0,0,0],
[2,0,0,0,4],
[4,0,0,2,0],
[0,0,0,0,0]])
with another run it might be
([[0,0,0,0,0],
[1,1,0,0,0],
[0,0,0,0,0],
[2,0,2,0,4],
[4,0,0,0,0],
[0,0,0,0,0]])
etc. Could anyone help me with this?
My strategy is
Create a new array initialized to all zeros
Find the elements in each class
For each class
Randomly sample two of elements to keep
Set those elements of the new array to the class value
The trick is keeping the shape of the indexes appropriate so you retain the shape of the original array.
import numpy as np
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
def sample_classes(arr, n_keep=2, random_state=42):
classes, counts = np.unique(test_array, return_counts=True)
rng = np.random.default_rng(random_state)
out = np.zeros_like(arr)
for klass, count in zip(classes, counts):
# Find locations of the class elements
indexes = np.nonzero(arr == klass)
# Sample up to n_keep elements of the class
keep_idx = rng.choice(count, n_keep, replace=False)
# Select the kept elements and reformat for indexing the output array and retaining its shape
keep_idx_reshape = tuple(ind[keep_idx] for ind in indexes)
out[keep_idx_reshape] = klass
return out
You can use it like
In [3]: sample_classes(test_array) [3/1174]
Out[3]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 0, 4, 0],
[4, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
In [4]: sample_classes(test_array, n_keep=3)
Out[4]:
array([[0, 0, 0, 0, 0],
[1, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 4, 0],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
In [5]: sample_classes(test_array, random_state=88)
Out[5]:
array([[0, 0, 0, 0, 0],
[0, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[4, 0, 4, 2, 2],
[0, 0, 0, 0, 0]])
In [6]: sample_classes(test_array, random_state=88, n_keep=4)
Out[6]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 1],
[0, 0, 0, 0, 0],
[2, 2, 0, 4, 4],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
Here is my not-so-elegant solution:
def unique(arr, num=2, seed=None):
np.random.seed(seed)
vals = {}
for i, row in enumerate(arr):
for j, val in enumerate(row):
if val in vals and val != 0:
vals[val].append((i, j))
elif val != 0:
vals[val] = [(i, j)]
new = np.zeros_like(arr)
for val in vals:
np.random.shuffle(vals[val])
while len(vals[val]) > num:
vals[val].pop()
for row, col in vals[val]:
new[row,col] = val
return new
The following should be O(n log n) in array size
def keep_k_per_class(data,k,rng):
out = np.zeros_like(data)
unq,cnts = np.unique(data,return_counts=True)
assert (cnts >= k).all()
# calculate class boundaries from class sizes
CNTS = cnts.cumsum()
# indirectly group classes together by partial sorting
idx = data.ravel().argpartition(CNTS[:-1])
# the following lines implement simultaneous drawing without replacement
# from all classes
# lower boundaries of intervals to draw random numbers from
# for each class they start with the lower class boundary
# and from there grow one by one - together with the
# swapping out below this implements "without replacement"
lb = np.add.outer(np.arange(k),CNTS-cnts)
pick = rng.integers(lb,CNTS,lb.shape)
for l,p in zip(lb,pick):
# populate output array
out.ravel()[idx[p]] = unq
# swap out used indices so still available ones occupy a linear
# range (per class)
idx[p] = idx[l]
return out
Examples:
rng = np.random.default_rng()
>>>
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 2, 0, 4],
[0, 4, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 0, 0],
[4, 0, 4, 0, 2],
[0, 0, 0, 0, 0]])
and a large one
>>> BIG = np.add.outer(np.tile(test_array,(100,100)),np.arange(0,500,5))
>>> BIG.size
30000000
>>> res = keep_k_per_class(BIG,30,rng)
### takes ~4 sec
### check
>>> np.unique(np.bincount(res.ravel()),return_counts=True)
(array([ 0, 30, 29988030]), array([100, 399, 1]))

In a matrix having a row of zeros, how do I replace the corresponding diagonal entry of the matrix with a one?

I have a square matrix, A, whose values are zero or one and that contains one or more rows of
zeros. For each row of zeros, I wish to replace the corresponding diagonal entry of A with a one.
For example, suppose
A=np.array([[0,1,1,0,1],[0,0,1,1,1],[0,0,0,0,0],[0,1,0,0,0],[0,0,0,0,0]])
for which rows 3 and 5 are all zeros. I wish to set A[3,3] and A[5,5] equal to one.
The matrix is:
>>> A
array([[0, 1, 1, 0, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0]])
We can find out the sum of all the rows:
>>> A.sum(axis=1)
array([3, 3, 0, 1, 0])
We want all the diagonals corresponding to 0-sum rows to be set to 1.
Thus, the following works:
>>> row_sums = A.sum(axis=1)
>>> A[row_sums == 0, row_sums == 0] = 1
>>> A
array([[0, 1, 1, 0, 1],
[0, 0, 1, 1, 1],
[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 1]])
Note that this works because row_sums == 0 is True for the desired rows:
>>> row_sums == 0
array([False, False, True, False, True])
and thus A[row_sums == 0, row_sums == 0] selects the required elements.

Changing the order of a matrix in numpy

I have a matrix
test = np.array([[0,1,0,0],[1,0,1,1],[0,1,0,1],[0,1,1,0]])
How do I reorder the columns so that they are like this matrix? (Basically the last row becomes the first row in reverse order and so on...)
np.array([[0,1,1,0],[1,0,1,0],[1,1,0,1],[0,0,1,0]])
Just reverse both axis
test[::-1,::-1]
array([[0, 1, 1, 0],
[1, 0, 1, 0],
[1, 1, 0, 1],
[0, 0, 1, 0]])
Update (ahh... Okay, I think I understand now.)
You can use negative steps for both the inner and outer steps.
test[::-1, ::-1]
Output:
array([[0, 1, 1, 0],
[1, 0, 1, 0],
[1, 1, 0, 1],
[0, 0, 1, 0]])
To reverse both the row and column you can use the np.flip, in your case:
test = np.array([[0,1,0,0],[1,0,1,1],[0,1,0,1],[0,1,1,0]])
reversed = np.flip(test, axis=[0,1])

numpy roll along a single axis

I have a numpy array with binary values that I need to change in the following way: The value of every element must be shifted one column to the left but only within the same row. As an example, I have the following array:
>>> arr = np.array([[0,0,1,0],[1,0,0,0],[0,0,1,1]])
>>> arr
array([[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 0, 1, 1]])
And it needs to be transformed to:
>>> arr
array([[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 1, 1, 0]])
I know that np.roll(arr,-1) would roll the values one cell to the left, but it doesn't seem to be able to roll them within the rows they belong to (i.e. the element on cell [1,0] goes to [0,3] instead of the desired [1,3]. Is there a way of doing this?
Thanks in advance.
roll accepts an axis parameter:
np.roll(arr,-1, axis=1)
array([[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 1, 1, 0]])

NumPy: sort matrix rows by number of non-zero entries

import numpy as np
def calc_size(matrix, index):
return np.nonzero(matrix[index,:])[1].size
def swap_rows(matrix, frm, to):
matrix[[frm, to],:] = matrix[[to, frm],:]
Numpy - Python 2.7
How can I achieve that matrix's rows are sorted after the size of the nonzero entries? I already wrote these two methods for doing the work but I need to give it to a sorting engine? The fullest rows should be at the beginning!
If you have an array arr:
array([[0, 0, 0, 0, 0],
[1, 0, 1, 1, 1],
[0, 1, 0, 1, 1],
[1, 1, 1, 1, 1]])
You could sort the array's rows according to the number of zero entries by writing:
>>> arr[(arr == 0).sum(axis=1).argsort()]
array([[1, 1, 1, 1, 1],
[1, 0, 1, 1, 1],
[0, 1, 0, 1, 1],
[0, 0, 0, 0, 0]])
This first counts the number of zero entries in each row with (arr == 0).sum(axis=1): this produces the array [5, 1, 2, 0].
Next, argsort sorts the indices of this array by their corresponding value, giving [3, 1, 2, 0].
Lastly, this argsorted array is used to rearrange the rows of arr.
P.S. If you have a matrix m (and not an array), you may need to ravel before using argsort:
m[(m == 0).sum(axis=1).ravel().argsort()]

Categories

Resources