How to split up an array by unique keys?

How to split up an array by unique keys? - python

I am finding the lowest value in the array "value" using a min function. The lowest value is assigned a 1, all else 0. I have several descriptive column variables: drug, size, strength, form, time. I want to find the min value of each unique key rather than the lowest value in the entire array "value".
I have tried running loops for each column variable.
def min_mask(arr):
m = np.min(arr)
return np.vectorize(lambda x: x == m)(arr).astype(int)
if __name__ == '__main__':
my_arr = np.array(meltDF["value"])
print(min_mask(my_arr))

There are many options here, for example:
1) Pre-initialize the mask and use argmin to fill in the appropriate places:
arr = np.random.rand(10, 4)
indices = np.argmin(arr, axis=0)
mask = np.zeros_like(arr, dtype=np.int)
mask[indices, range(len(indices))] = 1
2) Using apply_along_axis is probably the style you prefer:
def is_minimum(v):
return v == np.min(v)
mask = np.apply_along_axis(is_minimum, axis=0, arr=arr).astype(np.int)
These solutions assume that each column corresponds to a unique key.

You can compare elements to their column-wise mins, then case to uint8 to save a bit of space:
>>> import numpy as np
>>> np.random.seed(444)
>>> arr = np.random.rand(10, 4)
>>> (arr == arr.min(axis=0)).astype(np.uint8)
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 0],
[0, 0, 1, 0]], dtype=uint8)
Because of NumPy's broadcasting, the comparison arr == arr.min(axis=0) will produce a result that is the same shape of arr, even though arr.min(axis=0) will have shape (4,).
Note that if columns have duplicate minimums, this may generate more than one "1" in a single column.

Related

Replacing the values of a numpy array of zeros using a array of indexes

I'm working with numpy and I got a problem with index, I have a numpy array of zeros, and a 2D array of indexes, what I need is to use this indexes to change the values of the array of zeros by the value of 1, I tried something, but it's not working, here is what I tried.
import numpy as np
idx = np.array([0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1)) #This repeats the array of zeros to match the number of rows of the index array
res = []
for i, j in zip(repeat, idx):
res.append(i[j] = 1) #Here I try to replace the matching index by the value of 1
output = np.array(res)
but I get the syntax error
expression cannot contain assignment, perhaps you meant "=="?
my desired output should be
output = [[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]]
This is just an example, the idx array can be bigger, I think the problem is the indexing, and I believe there is a much simple way of doing this without repeating the array of zeros and using the zip function, but I can't figure it out, any help would be aprecciated, thank you!
EDIT: When I change the = by == I get a boolean array which I don't need, so I don't know what's happening there either.

You can use np.put_along_axis to assign values into the array repeat based on indices in idx. This is more efficient than a loop (and easier).
import numpy as np
idx = np.array([[0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6).astype(int) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1))
np.put_along_axis(repeat, idx, 1, 1)
repeat will then be:
array([[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]])
FWIW, you can also make the array of zeros directly by passing in the shape:
np.zeros([idx.shape[0], 6])

How to set a limited defined random values in numpy matrix

How to set a limited random values by amount and range in nupmy matrix ?
Means instead :
random_matrix = np.random.rand(5, 5)
[[0.38555213 0.96454126 0.91586422 0.92638243 0.85516641]
[0.64717218 0.2716665 0.70945594 0.74754943 0.48870502]
[0.23381316 0.01992578 0.86749684 0.85797792 0.19308509]
[0.63565231 0.7056163 0.69110815 0.73506642 0.804646 ]
[0.35512519 0.54900446 0.66311323 0.04899527 0.49349834]]
the wanted setting for example is 3 random integers between the range 1-5
in a null matrix :
0,0,0,4,0
0,0,0,0,0
0,1,0,0,0
0,0,0,3,0
0,0,0,0,0
Thanks in advance

If i understand the question correctly, you want to create a matrix that is zero in all places except for 3 random indices that will have a random value between the range 1-5.
For this i would suggest doing:
null_matrix = np.zeros((5,5), dtype=np.int32)
rng = np.random.default_rng()
x = rng.choice(5, size=3, replace=False)
y = rng.choice(5, size=3, replace=False)
null_matrix[x,y] = rng.choice(np.arange(1,5), 3)
print(null_matrix)
Output:
array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[4, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 2]], dtype=int32)

Changing items in numpy array

I want to change all items in array A (in axis=1) into 0, according to the following criteria (toy code):
import numpy as np
A = np.array([[1,3], [2,5], [6,2]] )
B = np.array([[1,1,0,0,0],[1,0,0,2,0],[0,0,2,2,2],[0,0,0,2,0],[6,6,0,0,0]])
for i in A:
if i[1]<=2:
B[B==i[0]]=0
# result
>>> B
array([[1, 1, 0, 0, 0],
[1, 0, 0, 2, 0],
[0, 0, 2, 2, 2],
[0, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
But, in numpy way, that is NO 'for' loops :) Thanks!

You can use a conditional list comprehension to create a list of the first value in a tuple pair where the second value is less than or equal to two (in the example for A, it is the last item which gives a value of 6).
Then use slicing with np.isin to find the elements in B what are contained within the values from the previous condition, and then set those values to zero.
target_val = 2
B[np.isin(B, [a[0] for a in A if a[1] <= target_val])] = 0
>>> B
array([[1, 1, 0, 0, 0],
[1, 0, 0, 2, 0],
[0, 0, 2, 2, 2],
[0, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
Alternatively, you could also use np.where instead of slicing.
np.where(np.isin(B, [a[0] for a in A if a[1] <= target_val]), 0, B)

In one line: B[np.isin(B, A[A[:, 1] <= 2][:, 0])] = 0
Explanation:
c = A[:, 1] <= 2 # broadcast the original `if i[1]<=2:` check along axis=1
# i.e., mask A according to where the second values of the pairs are <= 2
d = c[:, 0] # index with the mask, and select the old `i[0]` values, here just `6`
e = np.isin(B, d) # mask B according to where the values are in the above
B[e] = 0 # and zero out those positions, i.e. where the old B value is 6

Updating numpy 2-dimensional array according to conditions across different 2-D arrays

In the code that I am writing, I have three 2D numpy arrays with the same dimensions (m x n), with each 2D array containing info about a specific trait, but each corresponding cell (with a specific row/col value) across all three 2D arrays corresponding to a specific person. The three 2D arrays are trait1, trait2, and trait3. As an example, person (0, 0) will have traits 1, 2, but not three, if only trait1 and trait2 have a value of 1 at location (0,0), but trait3 does not.
What would be an efficient method of updating a 2D array at a specific location based on the values of other corresponding 2D arrays of the same dimension at the same location? That is, how can I efficiently update a 2D array at a specific location such that the other 2D arrays at this same location fulfill specific conditions?
I am currently trying to update the values of the 2D array trait1 and trait2 according to the current values of trait1 and trait2 (such that the corresponding trait1 value == 1, and the corresponding trait2 value == 0); I am also trying to update the values of trait3 according to the current values of trait1, and trait2 (under the same conditions as the previous). However, I am having trouble doing this without using nested for loops, which greatly slows down my program.
Below is my current approach, which works, but is much too slow for my purposes:
for i in range (0, m):
for j in range (0, n):
if trait1[i][j] == 1:
if trait2[i][j] == 0:
trait1[i][j] = 0
trait2[i][j] = 1
new_color(i, j, 1) #updates the color of the specific person on a grid
trait3[i][j] = 0
elif trait1[i][j] == 0:
if trait2[i][j] <= 0:
trait1[i][j] = 1
trait2[i][j] = 0
new_color(i, j, 0)

Numpy array are really slow if you use loop indeed. If you can use matrices operations / numpy function for everything, it will go much faster.
In your case, you could first extract the indices you're interested about, and then update your matrices like this:
import numpy as np
np.random.seed(1)
# Generate some sample data
trait1, trait2, trait3 = ( np.random.randint(0,2, [4,4]) for _ in range(3) )
In [4]: trait1
Out[4]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[0, 1, 1, 0]])
In [5]: trait2
Out[5]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
In [6]: trait3
Out[6]:
array([[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 0, 1]])
And then:
cond1_idx = np.where((trait1 == 1) & (trait2==0))
cond2_idx = np.where((trait1 == 0) & (trait2<=0))
trait1[cond1_idx] = 0
trait2[cond1_idx] = 1
trait3[cond1_idx] = 0
[ new_color(i, j, 1) for i,j in zip(*cond1_idx) ]
trait1[cond2_idx] = 1
trait2[cond2_idx] = 0
[ new_color(i, j, 0) for i,j in zip(*cond2_idx) ]
Result:
In [2]: trait1
Out[2]:
array([[0, 1, 1, 1],
[0, 1, 0, 0],
[1, 1, 1, 0],
[0, 0, 0, 1]])
In [3]: trait2
Out[3]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 0]])
In [4]: trait3
Out[4]:
array([[0, 1, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 0, 0, 1]])
I cannot really test the new_color though since I don't have the function

Efficient way to subset and combine arrays of different lengths

Given a 3 dimensional boolean data:
np.random.seed(13)
bool_data = np.random.randint(2, size=(2,3,6))
>> bool_data
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
I wish to count the number of consecutive 1's bounded by two 0's in each row (along axis=1) and return a single array with the tally. For bool_data, this would give array([1, 1, 2, 4]).
Due to the 3D structure of bool_data and the variable tallies for each row, I had to clumsily convert the tallies into nested lists, flatten them using itertools.chain, then back-convert the list into an array:
# count consecutive 1's bounded by two 0's
def count_consect_ones(input):
return np.diff(np.where(input==0)[0])-1
# run tallies across all rows in bool_data
consect_ones = []
for i in range(len(bool_data)):
for j in range(len(bool_data[i])):
res = count_consect_ones(bool_data[i, j])
consect_ones.append(list(res[res!=0]))
>> consect_ones
[[], [1, 1], [], [2], [4], []]
# combines nested lists
from itertools import chain
consect_ones_output = np.array(list(chain.from_iterable(consect_ones)))
>> consect_ones_output
array([1, 1, 2, 4])
Is there a more efficient or clever way for doing this?

consect_ones.append(list(res[res!=0]))
If you use .extend instead, the content of the sequence is appended directly. That saves the step to combine the nested lists afterwards:
consect_ones.extend(res[res!=0])
Furthermore, you could skip the indexing, and iterate over the dimensions directly:
consect_ones = []
for i in bool_data:
for j in i:
res = count_consect_ones(j)
consect_ones.extend(res[res!=0])

We could use a trick to pad the columns with zeros and then look for ramp-up and ramp-down indices on a flattened version and finally filter out the indices corresponding to the border ones to give ourselves a vectorized solution, like so -
# Input 3D array : a
b = np.pad(a, ((0,0),(0,0),(1,1)), 'constant', constant_values=(0,0))
# Get ramp-up and ramp-down indices/ start-end indices of 1s islands
s0 = np.flatnonzero(b[...,1:]>b[...,:-1])
s1 = np.flatnonzero(b[...,1:]<b[...,:-1])
# Filter only valid ones that are not at borders
n = b.shape[2]
valid_mask = (s0%(n-1)!=0) & (s1%(n-1)!=a.shape[2])
out = (s1-s0)[valid_mask]
Explanation -
The idea with padding zeros at either ends of each row as "sentients" is that when we get one-off sliced array versions and compare, we could detect the ramp-up and ramp-down places with b[...,1:]>b[...,:-1] and b[...,1:]<b[...,:-1] respectively. Thus, we get s0 and s1 as the start and end indices for each of the islands of 1s. Now, we don't want the border ones, so we need to get their column indices traced back to the original un-padded input array, hence that bit : s0%(n-1) and s1%(n-1). We need to remove all cases where the start of each island of 1s are at the left border and end of each island of 1s at the right side border. The starts and ends are s0 and s1. So, we use those to check if s0 is 0 and s1 is a.shape[2]. These give us the valid ones. The island lengths are obtained with s1-s0, so mask it with valid-mask to get our desired output.
Sample input, output -
In [151]: a
Out[151]:
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
In [152]: out
Out[152]: array([1, 1, 2, 4])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split up an array by unique keys? - python

Related

Replacing the values of a numpy array of zeros using a array of indexes

How to set a limited defined random values in numpy matrix

Changing items in numpy array

Updating numpy 2-dimensional array according to conditions across different 2-D arrays

Efficient way to subset and combine arrays of different lengths

Categories

Resources