Related
I'm working with numpy and I got a problem with index, I have a numpy array of zeros, and a 2D array of indexes, what I need is to use this indexes to change the values of the array of zeros by the value of 1, I tried something, but it's not working, here is what I tried.
import numpy as np
idx = np.array([0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1)) #This repeats the array of zeros to match the number of rows of the index array
res = []
for i, j in zip(repeat, idx):
res.append(i[j] = 1) #Here I try to replace the matching index by the value of 1
output = np.array(res)
but I get the syntax error
expression cannot contain assignment, perhaps you meant "=="?
my desired output should be
output = [[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]]
This is just an example, the idx array can be bigger, I think the problem is the indexing, and I believe there is a much simple way of doing this without repeating the array of zeros and using the zip function, but I can't figure it out, any help would be aprecciated, thank you!
EDIT: When I change the = by == I get a boolean array which I don't need, so I don't know what's happening there either.
You can use np.put_along_axis to assign values into the array repeat based on indices in idx. This is more efficient than a loop (and easier).
import numpy as np
idx = np.array([[0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6).astype(int) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1))
np.put_along_axis(repeat, idx, 1, 1)
repeat will then be:
array([[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]])
FWIW, you can also make the array of zeros directly by passing in the shape:
np.zeros([idx.shape[0], 6])
I am a newbie in Pytorch. Even though I read the documentation, it is unclear for me how does torch.argmax() applied to first dimension work when we have 4-dimensional input. Also, how does keepdims=True change the output?
Here is an example of each case:
k = torch.rand(2, 3, 4, 4)
print(k):
tensor([[[[0.2912, 0.4818, 0.1123, 0.3196],
[0.6606, 0.1547, 0.0368, 0.9475],
[0.4753, 0.7428, 0.5931, 0.3615],
[0.6729, 0.7069, 0.1569, 0.3086]],
[[0.6603, 0.7777, 0.3546, 0.2850],
[0.3681, 0.5295, 0.8812, 0.6093],
[0.9165, 0.2842, 0.0260, 0.1768],
[0.9371, 0.9889, 0.6936, 0.7018]],
[[0.5880, 0.0349, 0.0419, 0.3913],
[0.5884, 0.9408, 0.1707, 0.1893],
[0.3260, 0.4410, 0.6369, 0.7331],
[0.9448, 0.7130, 0.3914, 0.2775]]],
[[[0.9433, 0.8610, 0.9936, 0.1314],
[0.8627, 0.3103, 0.3066, 0.3547],
[0.3396, 0.1892, 0.0385, 0.5542],
[0.4943, 0.0256, 0.7875, 0.5562]],
[[0.2338, 0.2498, 0.4749, 0.2520],
[0.4405, 0.1605, 0.6219, 0.8955],
[0.2326, 0.1816, 0.5032, 0.8732],
[0.2089, 0.6131, 0.1898, 0.0517]],
[[0.1472, 0.8059, 0.6958, 0.9047],
[0.6403, 0.2875, 0.5746, 0.5908],
[0.8668, 0.4602, 0.8224, 0.9307],
[0.2077, 0.5665, 0.8671, 0.4365]]]])
argmax = torch.argmax(k, axis=1)
print(argmax):
tensor([[[1, 1, 1, 2],
[0, 2, 1, 0],
[1, 0, 2, 2],
[2, 1, 1, 1]],
[[0, 0, 0, 2],
[0, 0, 1, 1],
[2, 2, 2, 2],
[0, 1, 2, 0]]])
argmax = torch.argmax(k, axis=1, keepdims=True)
print(argmax):
tensor([[[[1, 1, 1, 2],
[0, 2, 1, 0],
[1, 0, 2, 2],
[2, 1, 1, 1]]],
[[[0, 0, 0, 2],
[0, 0, 1, 1],
[2, 2, 2, 2],
[0, 1, 2, 0]]]])
If k is a tensor of shape (2, 3, 4, 4), by definition, torch.argmax with axis=1 should give you an output of shape (2, 4, 4). To understand why this happens, you have to understand what happens in lower dimensions first.
If I have a 2D (2, 2) tensor A, like:
[[1,2],
[3,4]]
Then torch.argmax(A, axis=1) gives the output of shape (2) with values (1, 1). The axis argument means axis along which to operate. So setting axis=1 means that it will look at values from each column one by one, before deciding a max. For row 0, it looks at column values 1, 2 and decides that 2 (at index 1) is the max. For row 1, it looks at column vales 3, 4 and decides that 4 (at index 1) is the max. So the argmax result is [1, 1].
Moving up to 3D, let's have a hypothetical array of dimensions (I, J, K). If we call argmax with axis = 1, we can break it down to the following:
I, J, K = 3, 4, 5
A = torch.rand(I, J, K)
out = torch.zeros((I, K), dtype=torch.int32)
for i in range(I):
for k in range(K):
out[i,k] = torch.argmax(A[i,:,k])
print(out)
print(torch.argmax(A, axis=1))
Out:
tensor([[3, 3, 2, 3, 2],
[1, 1, 0, 1, 0],
[0, 1, 0, 3, 3]], dtype=torch.int32)
tensor([[3, 3, 2, 3, 2],
[1, 1, 0, 1, 0],
[0, 1, 0, 3, 3]])
So what happens is, in your 3D tensor, you're once again calculating argmax along the columns/axis 1. So for each unique pair of (i, k), you have exactly J values along the axis 1, right? The index of the maximum value within those J values is inserted into position (i,k) of the output.
If you understand this, then you can understand what happens in 4D. For any 4D tensor of dimensions (I, J, K, L), if you call argmax with axis=1, then for each combination of (i, k, l) you'll have exactly J values along axis 1 - and the argmax of those J values will be present at output[i,k,l].
The keepdims argument is merely conserving the number of dimensions of your matrix. For example, argmax at axis 1 on the 4D matrix gives a 3D result of shape (I,K,L), but using keepdims, the result will be 4D as well with the shape (I,1,K,L).
Argmax gives the index corresponding to highest value across a given dimension. so the number of dimensions is not an issue. so when you apply argmax across the given dimension, PyTorch by default collapses that dimension since its values are replaced by a single index. Now if you don't want to remove that dimension and instead keep it as one, then you could use keepdims=True.
In the code that I am writing, I have three 2D numpy arrays with the same dimensions (m x n), with each 2D array containing info about a specific trait, but each corresponding cell (with a specific row/col value) across all three 2D arrays corresponding to a specific person. The three 2D arrays are trait1, trait2, and trait3. As an example, person (0, 0) will have traits 1, 2, but not three, if only trait1 and trait2 have a value of 1 at location (0,0), but trait3 does not.
What would be an efficient method of updating a 2D array at a specific location based on the values of other corresponding 2D arrays of the same dimension at the same location? That is, how can I efficiently update a 2D array at a specific location such that the other 2D arrays at this same location fulfill specific conditions?
I am currently trying to update the values of the 2D array trait1 and trait2 according to the current values of trait1 and trait2 (such that the corresponding trait1 value == 1, and the corresponding trait2 value == 0); I am also trying to update the values of trait3 according to the current values of trait1, and trait2 (under the same conditions as the previous). However, I am having trouble doing this without using nested for loops, which greatly slows down my program.
Below is my current approach, which works, but is much too slow for my purposes:
for i in range (0, m):
for j in range (0, n):
if trait1[i][j] == 1:
if trait2[i][j] == 0:
trait1[i][j] = 0
trait2[i][j] = 1
new_color(i, j, 1) #updates the color of the specific person on a grid
trait3[i][j] = 0
elif trait1[i][j] == 0:
if trait2[i][j] <= 0:
trait1[i][j] = 1
trait2[i][j] = 0
new_color(i, j, 0)
Numpy array are really slow if you use loop indeed. If you can use matrices operations / numpy function for everything, it will go much faster.
In your case, you could first extract the indices you're interested about, and then update your matrices like this:
import numpy as np
np.random.seed(1)
# Generate some sample data
trait1, trait2, trait3 = ( np.random.randint(0,2, [4,4]) for _ in range(3) )
In [4]: trait1
Out[4]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[0, 1, 1, 0]])
In [5]: trait2
Out[5]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
In [6]: trait3
Out[6]:
array([[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 0, 1]])
And then:
cond1_idx = np.where((trait1 == 1) & (trait2==0))
cond2_idx = np.where((trait1 == 0) & (trait2<=0))
trait1[cond1_idx] = 0
trait2[cond1_idx] = 1
trait3[cond1_idx] = 0
[ new_color(i, j, 1) for i,j in zip(*cond1_idx) ]
trait1[cond2_idx] = 1
trait2[cond2_idx] = 0
[ new_color(i, j, 0) for i,j in zip(*cond2_idx) ]
Result:
In [2]: trait1
Out[2]:
array([[0, 1, 1, 1],
[0, 1, 0, 0],
[1, 1, 1, 0],
[0, 0, 0, 1]])
In [3]: trait2
Out[3]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 0]])
In [4]: trait3
Out[4]:
array([[0, 1, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 0, 0, 1]])
I cannot really test the new_color though since I don't have the function
Given a 3 dimensional boolean data:
np.random.seed(13)
bool_data = np.random.randint(2, size=(2,3,6))
>> bool_data
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
I wish to count the number of consecutive 1's bounded by two 0's in each row (along axis=1) and return a single array with the tally. For bool_data, this would give array([1, 1, 2, 4]).
Due to the 3D structure of bool_data and the variable tallies for each row, I had to clumsily convert the tallies into nested lists, flatten them using itertools.chain, then back-convert the list into an array:
# count consecutive 1's bounded by two 0's
def count_consect_ones(input):
return np.diff(np.where(input==0)[0])-1
# run tallies across all rows in bool_data
consect_ones = []
for i in range(len(bool_data)):
for j in range(len(bool_data[i])):
res = count_consect_ones(bool_data[i, j])
consect_ones.append(list(res[res!=0]))
>> consect_ones
[[], [1, 1], [], [2], [4], []]
# combines nested lists
from itertools import chain
consect_ones_output = np.array(list(chain.from_iterable(consect_ones)))
>> consect_ones_output
array([1, 1, 2, 4])
Is there a more efficient or clever way for doing this?
consect_ones.append(list(res[res!=0]))
If you use .extend instead, the content of the sequence is appended directly. That saves the step to combine the nested lists afterwards:
consect_ones.extend(res[res!=0])
Furthermore, you could skip the indexing, and iterate over the dimensions directly:
consect_ones = []
for i in bool_data:
for j in i:
res = count_consect_ones(j)
consect_ones.extend(res[res!=0])
We could use a trick to pad the columns with zeros and then look for ramp-up and ramp-down indices on a flattened version and finally filter out the indices corresponding to the border ones to give ourselves a vectorized solution, like so -
# Input 3D array : a
b = np.pad(a, ((0,0),(0,0),(1,1)), 'constant', constant_values=(0,0))
# Get ramp-up and ramp-down indices/ start-end indices of 1s islands
s0 = np.flatnonzero(b[...,1:]>b[...,:-1])
s1 = np.flatnonzero(b[...,1:]<b[...,:-1])
# Filter only valid ones that are not at borders
n = b.shape[2]
valid_mask = (s0%(n-1)!=0) & (s1%(n-1)!=a.shape[2])
out = (s1-s0)[valid_mask]
Explanation -
The idea with padding zeros at either ends of each row as "sentients" is that when we get one-off sliced array versions and compare, we could detect the ramp-up and ramp-down places with b[...,1:]>b[...,:-1] and b[...,1:]<b[...,:-1] respectively. Thus, we get s0 and s1 as the start and end indices for each of the islands of 1s. Now, we don't want the border ones, so we need to get their column indices traced back to the original un-padded input array, hence that bit : s0%(n-1) and s1%(n-1). We need to remove all cases where the start of each island of 1s are at the left border and end of each island of 1s at the right side border. The starts and ends are s0 and s1. So, we use those to check if s0 is 0 and s1 is a.shape[2]. These give us the valid ones. The island lengths are obtained with s1-s0, so mask it with valid-mask to get our desired output.
Sample input, output -
In [151]: a
Out[151]:
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
In [152]: out
Out[152]: array([1, 1, 2, 4])
Say I have a 1D numpy array of numbers myArray = ([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0 ,1, 2, 1, 1, 1]).
I want to create a 2D numpy array that describe the first (column 1) and last (column 2) indices of any "streak" of consecutive 1's that is longer than 2.
So for the example above, the 2D array should look like this:
indicesArray =
([5, 8],
[13, 15])
Since there are at least 3 consecutive ones in the 5th, 6th, 7th, 8th places and in the 13th, 14th, 15th places.
Any help would be appreciated.
Approach #1
Here's one approach inspired by this post -
def start_stop(a, trigger_val, len_thresh=2):
# "Enclose" mask with sentients to catch shifts later on
mask = np.r_[False,np.equal(a, trigger_val),False]
# Get the shifting indices
idx = np.flatnonzero(mask[1:] != mask[:-1])
# Get lengths
lens = idx[1::2] - idx[::2]
return idx.reshape(-1,2)[lens>len_thresh]-[0,1]
Sample run -
In [47]: myArray
Out[47]: array([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0, 1, 2, 1, 1, 1])
In [48]: start_stop(myArray, trigger_val=1, len_thresh=2)
Out[48]:
array([[ 5, 8],
[13, 15]])
Approach #2
Another with binary_erosion -
from scipy.ndimage.morphology import binary_erosion
mask = binary_erosion(myArray==1,structure=np.ones((3)))
idx = np.flatnonzero(mask[1:] != mask[:-1])
out = idx.reshape(-1,2)+[0,1]