This can be a very simple question as I am still exploring Python. And for this issue I use numpy.
Updated 09/30/21: adopted and modified codes shown below for any potential future reference. I also added an elif in the loop for classes that have fewer counts than the wanted size. Some codes may be unnecessary tho.
new_array = test_array.copy()
uniques, counts = np.unique(new_array, return_counts=True)
print("classes:", uniques, "counts:", counts)
for unique, count in zip(uniques, counts):
#print (unique, count)
if unique != 0 and count > 3:
ids = np.random.choice(count, count-3, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = 0
elif unique != 0 and count <= 3:
ids = np.random.choice(count, count, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = unique
Below is original question.
Let's say I have a 2D array like this:
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
print("existing classes:", np.unique(test_array))
# "existing classes: [0 1 2 4]"
Now I want to keep a fixed size (e.g. 2 values) in each class that != 0 (in this case two 1s, two 2s, and two 4s) and replace the rest with 0. Where the value being replaced is random with each run (or from a seed).
For example, with run 1 I will have
([[0,0,0,0,0],
[1,0,0,1,0],
[0,0,0,0,0],
[2,0,0,0,4],
[4,0,0,2,0],
[0,0,0,0,0]])
with another run it might be
([[0,0,0,0,0],
[1,1,0,0,0],
[0,0,0,0,0],
[2,0,2,0,4],
[4,0,0,0,0],
[0,0,0,0,0]])
etc. Could anyone help me with this?
My strategy is
Create a new array initialized to all zeros
Find the elements in each class
For each class
Randomly sample two of elements to keep
Set those elements of the new array to the class value
The trick is keeping the shape of the indexes appropriate so you retain the shape of the original array.
import numpy as np
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
def sample_classes(arr, n_keep=2, random_state=42):
classes, counts = np.unique(test_array, return_counts=True)
rng = np.random.default_rng(random_state)
out = np.zeros_like(arr)
for klass, count in zip(classes, counts):
# Find locations of the class elements
indexes = np.nonzero(arr == klass)
# Sample up to n_keep elements of the class
keep_idx = rng.choice(count, n_keep, replace=False)
# Select the kept elements and reformat for indexing the output array and retaining its shape
keep_idx_reshape = tuple(ind[keep_idx] for ind in indexes)
out[keep_idx_reshape] = klass
return out
You can use it like
In [3]: sample_classes(test_array) [3/1174]
Out[3]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 0, 4, 0],
[4, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
In [4]: sample_classes(test_array, n_keep=3)
Out[4]:
array([[0, 0, 0, 0, 0],
[1, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 4, 0],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
In [5]: sample_classes(test_array, random_state=88)
Out[5]:
array([[0, 0, 0, 0, 0],
[0, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[4, 0, 4, 2, 2],
[0, 0, 0, 0, 0]])
In [6]: sample_classes(test_array, random_state=88, n_keep=4)
Out[6]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 1],
[0, 0, 0, 0, 0],
[2, 2, 0, 4, 4],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
Here is my not-so-elegant solution:
def unique(arr, num=2, seed=None):
np.random.seed(seed)
vals = {}
for i, row in enumerate(arr):
for j, val in enumerate(row):
if val in vals and val != 0:
vals[val].append((i, j))
elif val != 0:
vals[val] = [(i, j)]
new = np.zeros_like(arr)
for val in vals:
np.random.shuffle(vals[val])
while len(vals[val]) > num:
vals[val].pop()
for row, col in vals[val]:
new[row,col] = val
return new
The following should be O(n log n) in array size
def keep_k_per_class(data,k,rng):
out = np.zeros_like(data)
unq,cnts = np.unique(data,return_counts=True)
assert (cnts >= k).all()
# calculate class boundaries from class sizes
CNTS = cnts.cumsum()
# indirectly group classes together by partial sorting
idx = data.ravel().argpartition(CNTS[:-1])
# the following lines implement simultaneous drawing without replacement
# from all classes
# lower boundaries of intervals to draw random numbers from
# for each class they start with the lower class boundary
# and from there grow one by one - together with the
# swapping out below this implements "without replacement"
lb = np.add.outer(np.arange(k),CNTS-cnts)
pick = rng.integers(lb,CNTS,lb.shape)
for l,p in zip(lb,pick):
# populate output array
out.ravel()[idx[p]] = unq
# swap out used indices so still available ones occupy a linear
# range (per class)
idx[p] = idx[l]
return out
Examples:
rng = np.random.default_rng()
>>>
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 2, 0, 4],
[0, 4, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 0, 0],
[4, 0, 4, 0, 2],
[0, 0, 0, 0, 0]])
and a large one
>>> BIG = np.add.outer(np.tile(test_array,(100,100)),np.arange(0,500,5))
>>> BIG.size
30000000
>>> res = keep_k_per_class(BIG,30,rng)
### takes ~4 sec
### check
>>> np.unique(np.bincount(res.ravel()),return_counts=True)
(array([ 0, 30, 29988030]), array([100, 399, 1]))
I have a square matrix, A, whose values are zero or one and that contains one or more rows of
zeros. For each row of zeros, I wish to replace the corresponding diagonal entry of A with a one.
For example, suppose
A=np.array([[0,1,1,0,1],[0,0,1,1,1],[0,0,0,0,0],[0,1,0,0,0],[0,0,0,0,0]])
for which rows 3 and 5 are all zeros. I wish to set A[3,3] and A[5,5] equal to one.
The matrix is:
>>> A
array([[0, 1, 1, 0, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0]])
We can find out the sum of all the rows:
>>> A.sum(axis=1)
array([3, 3, 0, 1, 0])
We want all the diagonals corresponding to 0-sum rows to be set to 1.
Thus, the following works:
>>> row_sums = A.sum(axis=1)
>>> A[row_sums == 0, row_sums == 0] = 1
>>> A
array([[0, 1, 1, 0, 1],
[0, 0, 1, 1, 1],
[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 1]])
Note that this works because row_sums == 0 is True for the desired rows:
>>> row_sums == 0
array([False, False, True, False, True])
and thus A[row_sums == 0, row_sums == 0] selects the required elements.
I have a matrix
test = np.array([[0,1,0,0],[1,0,1,1],[0,1,0,1],[0,1,1,0]])
How do I reorder the columns so that they are like this matrix? (Basically the last row becomes the first row in reverse order and so on...)
np.array([[0,1,1,0],[1,0,1,0],[1,1,0,1],[0,0,1,0]])
Just reverse both axis
test[::-1,::-1]
array([[0, 1, 1, 0],
[1, 0, 1, 0],
[1, 1, 0, 1],
[0, 0, 1, 0]])
Update (ahh... Okay, I think I understand now.)
You can use negative steps for both the inner and outer steps.
test[::-1, ::-1]
Output:
array([[0, 1, 1, 0],
[1, 0, 1, 0],
[1, 1, 0, 1],
[0, 0, 1, 0]])
To reverse both the row and column you can use the np.flip, in your case:
test = np.array([[0,1,0,0],[1,0,1,1],[0,1,0,1],[0,1,1,0]])
reversed = np.flip(test, axis=[0,1])
I have a numpy array with binary values that I need to change in the following way: The value of every element must be shifted one column to the left but only within the same row. As an example, I have the following array:
>>> arr = np.array([[0,0,1,0],[1,0,0,0],[0,0,1,1]])
>>> arr
array([[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 0, 1, 1]])
And it needs to be transformed to:
>>> arr
array([[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 1, 1, 0]])
I know that np.roll(arr,-1) would roll the values one cell to the left, but it doesn't seem to be able to roll them within the rows they belong to (i.e. the element on cell [1,0] goes to [0,3] instead of the desired [1,3]. Is there a way of doing this?
Thanks in advance.
roll accepts an axis parameter:
np.roll(arr,-1, axis=1)
array([[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 1, 1, 0]])
import numpy as np
def calc_size(matrix, index):
return np.nonzero(matrix[index,:])[1].size
def swap_rows(matrix, frm, to):
matrix[[frm, to],:] = matrix[[to, frm],:]
Numpy - Python 2.7
How can I achieve that matrix's rows are sorted after the size of the nonzero entries? I already wrote these two methods for doing the work but I need to give it to a sorting engine? The fullest rows should be at the beginning!
If you have an array arr:
array([[0, 0, 0, 0, 0],
[1, 0, 1, 1, 1],
[0, 1, 0, 1, 1],
[1, 1, 1, 1, 1]])
You could sort the array's rows according to the number of zero entries by writing:
>>> arr[(arr == 0).sum(axis=1).argsort()]
array([[1, 1, 1, 1, 1],
[1, 0, 1, 1, 1],
[0, 1, 0, 1, 1],
[0, 0, 0, 0, 0]])
This first counts the number of zero entries in each row with (arr == 0).sum(axis=1): this produces the array [5, 1, 2, 0].
Next, argsort sorts the indices of this array by their corresponding value, giving [3, 1, 2, 0].
Lastly, this argsorted array is used to rearrange the rows of arr.
P.S. If you have a matrix m (and not an array), you may need to ravel before using argsort:
m[(m == 0).sum(axis=1).ravel().argsort()]