Related
Having a matrix with d features and n samples, I would like to compare each feature of a sample (row) against the mean of the column corresponding to that feature and then assign a corresponding label 1 or 0.
Eg. for a matrix X = [x11, x12; x21, x22] I compute the mean of the two columns (mu1, mu2) and then I keep on comparing (x11, x21 with mu1 and so on) to check whether these are greater or smaller than mu and to then assign a label to them according to the if statement (see below).
I have the mean vector for each column i.e. of length d.
I am now using for-loops however these are not computationally effective.
X_copy = X_train;
mu = np.mean(X_train, axis = 0)
for i in range(X_train.shape[0]):
for j in range(X_train.shape[1]):
if X_train[i,j]<mu[j]: #less than mean for the col, assign 0
X_copy[i,j] = 0
else:
X_copy[i,j] = 1 #more than or equal to mu for the col, assign 1
Is there any better alternative?
I don't have much experience with python hence thank you for understanding.
Direct comparison, which makes the average vector compare on each row of the original array. Then convert the data type of the result to int:
>>> X_train = np.random.rand(3, 4)
>>> X_train
array([[0.4789953 , 0.84095907, 0.53538172, 0.04880835],
[0.64554335, 0.50904539, 0.34069036, 0.5290601 ],
[0.84664389, 0.63984867, 0.66111495, 0.89803495]])
>>> (X_train >= X_train.mean(0)).astype(int)
array([[0, 1, 1, 0],
[0, 0, 0, 1],
[1, 0, 1, 1]])
Update:
There is a broadcast mechanism for operations between numpy arrays. For example, an array is compared with a number, which will make the number swim among all elements of the array and compare them one by one:
>>> X_train > 0.5
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
>>> X_train > np.full(X_train.shape, 0.5) # Equivalent effect.
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
Similarly, you can compare a vector with a 2D array, as long as the length of the vector is the same as that of the first dimension of the array:
>>> mu = X_train.mean(0)
>>> X_train > mu
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
>>> X_train > np.tile(mu, (X_train.shape[0], 1)) # Equivalent effect.
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
How do I compare other axes? My English is not good, so it is difficult for me to explain. Here I provide the official explanation of numpy. I hope you can get started through it: Broadcasting
Take the following example. I have an array test and want to get a boolean mask with True's for all elements that are equal to elements of ref.
import numpy as np
test = np.array([[2, 3, 1, 0], [5, 4, 2, 3], [6, 7, 5 ,4]])
ref = np.array([3, 4, 5])
I am looking for something equivalent to
mask = (test == ref[0]) | (test == ref[1]) | (test == ref[2])
which in this case should yield
>>> print(mask)
[[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]]
but without having to resort to any loops.
Numpy comes with a function isin that does exactly this
np.isin(test, ref)
which return
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
You can use numpy broadcasting:
mask = (test[:,None] == ref[:,None]).any(1)
output:
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
NB. this is faster that numpy.isin, but creates a (X, X, Y) sized intermediate array where X, Y is the shape of test, so this will consume some memory on very large arrays
I have two tensors like this:
1st tensor
[[0,0],[0,1],[0,2],[1,3],[1,4],[2,1],[2,4]]
2nd tensor
[[0,1],[0,2],[1,4],[2,4]]
I want the result tensor to be like this:
[[0,0],[1,3],[2,1]] # differences between 1st tensor and 2nd tensor
I have tried to use set, list, torch.where,.. and couldn't find any good way to achieve this. Is there any way to get the different rows between two different sizes of tensors? (need to be efficient)
You can perform a pairwairse comparation to see which elements of the first tensor are present in the second vector.
a = torch.as_tensor([[0,0],[0,1],[0,2],[1,3],[1,4],[2,1],[2,4]])
b = torch.as_tensor([[0,1],[0,2],[1,4],[2,4]])
# Expand a to (7, 1, 2) to broadcast to all b
a_exp = a.unsqueeze(1)
# c: (7, 4, 2)
c = a_exp == b
# Since we want to know that all components of the vector are equal, we reduce over the last fim
# c: (7, 4)
c = c.all(-1)
print(c)
# Out: Each row i compares the ith element of a against all elements in b
# Therefore, if all row is false means that the a element is not present in b
tensor([[False, False, False, False],
[ True, False, False, False],
[False, True, False, False],
[False, False, False, False],
[False, False, True, False],
[False, False, False, False],
[False, False, False, True]])
non_repeat_mask = ~c.any(-1)
# Apply the mask to a
print(a[non_repeat_mask])
tensor([[0, 0],
[1, 3],
[2, 1]])
If you feel cool you can do it one liner :)
a[~a.unsqueeze(1).eq(b).all(-1).any(-1)]
In case someone is looking for a solution with a vector of dim=1, this is the adaptation of #Guillem solution
a = torch.tensor(list(range(0, 10)))
b = torch.tensor(list(range(5,15)))
a[~a.unsqueeze(1).eq(b).any(1)]
outputs:
tensor([0, 1, 2, 3, 4])
Here is another solution, when you want the absolute difference, and not just comparing the first with the second. Be careful when using it, because order here doesnt matter
combined = torch.cat((a, b))
uniques, counts = combined.unique(return_counts=True)
difference = uniques[counts == 1]
outputs
tensor([ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14])
I essentially want to crop an image with numpy—I have a 3-dimension numpy.ndarray object, ie:
[ [0,0,0,0], [255,255,255,255], ....]
[0,0,0,0], [255,255,255,255], ....] ]
where I want to remove whitespace, which, in context, is known to be either entire rows or entire columns of [0,0,0,0].
Letting each pixel just be a number for this example, I'm trying to essentially do this:
Given this: *EDIT: chose a slightly more complex example to clarify
[ [0,0,0,0,0,0]
[0,0,1,1,1,0]
[0,1,1,0,1,0]
[0,0,0,1,1,0]
[0,0,0,0,0,0]]
I'm trying to create this:
[ [0,1,1,1],
[1,1,0,1],
[0,0,1,1] ]
I can brute force this with loops, but intuitively I feel like numpy has a better means of doing this.
In general, you'd want to look into scipy.ndimage.label and scipy.ndimage.find_objects to extract the bounding box of contiguous regions fulfilling a condition.
However, in this case, you can do it fairly easily with "plain" numpy.
I'm going to assume you have a nrows x ncols x nbands array here. The other convention of nbands x nrows x ncols is also quite common, so have a look at the shape of your array.
With that in mind, you might do something similar to:
mask = im == 0
all_white = mask.sum(axis=2) == 0
rows = np.flatnonzero((~all_white).sum(axis=1))
cols = np.flatnonzero((~all_white).sum(axis=0))
crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1, :]
For your 2D example, it would look like:
import numpy as np
im = np.array([[0,0,0,0,0,0],
[0,0,1,1,1,0],
[0,1,1,0,1,0],
[0,0,0,1,1,0],
[0,0,0,0,0,0]])
mask = im == 0
rows = np.flatnonzero((~mask).sum(axis=1))
cols = np.flatnonzero((~mask).sum(axis=0))
crop = im[rows.min():rows.max()+1, cols.min():cols.max()+1]
print crop
Let's break down the 2D example a bit.
In [1]: import numpy as np
In [2]: im = np.array([[0,0,0,0,0,0],
...: [0,0,1,1,1,0],
...: [0,1,1,0,1,0],
...: [0,0,0,1,1,0],
...: [0,0,0,0,0,0]])
Okay, now let's create a boolean array that meets our condition:
In [3]: mask = im == 0
In [4]: mask
Out[4]:
array([[ True, True, True, True, True, True],
[ True, True, False, False, False, True],
[ True, False, False, True, False, True],
[ True, True, True, False, False, True],
[ True, True, True, True, True, True]], dtype=bool)
Also, note that the ~ operator functions as logical_not on boolean arrays:
In [5]: ~mask
Out[5]:
array([[False, False, False, False, False, False],
[False, False, True, True, True, False],
[False, True, True, False, True, False],
[False, False, False, True, True, False],
[False, False, False, False, False, False]], dtype=bool)
With that in mind, to find rows where all elements are false, we can sum across columns:
In [6]: (~mask).sum(axis=1)
Out[6]: array([0, 3, 3, 2, 0])
If no elements are True, we'll get a 0.
And similarly to find columns where all elements are false, we can sum across rows:
In [7]: (~mask).sum(axis=0)
Out[7]: array([0, 1, 2, 2, 3, 0])
Now all we need to do is find the first and last of these that are not zero. np.flatnonzero is a bit easier than nonzero, in this case:
In [8]: np.flatnonzero((~mask).sum(axis=1))
Out[8]: array([1, 2, 3])
In [9]: np.flatnonzero((~mask).sum(axis=0))
Out[9]: array([1, 2, 3, 4])
Then, you can easily slice out the region based on min/max nonzero elements:
In [10]: rows = np.flatnonzero((~mask).sum(axis=1))
In [11]: cols = np.flatnonzero((~mask).sum(axis=0))
In [12]: im[rows.min():rows.max()+1, cols.min():cols.max()+1]
Out[12]:
array([[0, 1, 1, 1],
[1, 1, 0, 1],
[0, 0, 1, 1]])
One way of implementing this for arbitrary dimensions would be:
import numpy as np
def trim(arr, mask):
bounding_box = tuple(
slice(np.min(indexes), np.max(indexes) + 1)
for indexes in np.where(mask))
return arr[bounding_box]
A slightly more flexible solution (where you could indicate which axis to act on) is available in FlyingCircus (Disclaimer: I am the main author of the package).
You could use np.nonzero function to find your zero values, then slice nonzero elements from your original array and reshape to what you want:
import numpy as np
n = np.array([ [0,0,0,0,0,0],
[0,0,1,1,1,0],
[0,0,1,1,1,0],
[0,0,1,1,1,0],
[0,0,0,0,0,0]])
elems = n[n.nonzero()]
In [415]: elems
Out[415]: array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In [416]: elems.reshape(3,3)
Out[416]:
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
I want to check if a NumPyArray has values in it that are in a set, and if so set that area in an array = 1. If not set a keepRaster = 2.
numpyArray = #some imported array
repeatSet= ([3, 5, 6, 8])
confusedRaster = numpyArray[numpy.where(numpyArray in repeatSet)]= 1
Yields:
<type 'exceptions.TypeError'>: unhashable type: 'numpy.ndarray'
Is there a way to loop through it?
for numpyArray
if numpyArray in repeatSet
confusedRaster = 1
else
keepRaster = 2
To clarify and ask for a bit further help:
What I am trying to get at, and am currently doing, is putting a raster input into an array. I need to read values in the 2-d array and create another array based on those values. If the array value is in a set then the value will be 1. If it is not in a set then the value will be derived from another input, but I'll say 77 for now. This is what I'm currently using. My test input has about 1500 rows and 3500 columns. It always freezes at around row 350.
for rowd in range(0, width):
for cold in range (0, height):
if numpyarray.item(rowd,cold) in repeatSet:
confusedArray[rowd][cold] = 1
else:
if numpyarray.item(rowd,cold) == 0:
confusedArray[rowd][cold] = 0
else:
confusedArray[rowd][cold] = 2
In versions 1.4 and higher, numpy provides the in1d function.
>>> test = np.array([0, 1, 2, 5, 0])
>>> states = [0, 2]
>>> np.in1d(test, states)
array([ True, False, True, False, True], dtype=bool)
You can use that as a mask for assignment.
>>> test[np.in1d(test, states)] = 1
>>> test
array([1, 1, 1, 5, 1])
Here are some more sophisticated uses of numpy's indexing and assignment syntax that I think will apply to your problem. Note the use of bitwise operators to replace if-based logic:
>>> numpy_array = numpy.arange(9).reshape((3, 3))
>>> confused_array = numpy.arange(9).reshape((3, 3)) % 2
>>> mask = numpy.in1d(numpy_array, repeat_set).reshape(numpy_array.shape)
>>> mask
array([[False, False, False],
[ True, False, True],
[ True, False, True]], dtype=bool)
>>> ~mask
array([[ True, True, True],
[False, True, False],
[False, True, False]], dtype=bool)
>>> numpy_array == 0
array([[ True, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> numpy_array != 0
array([[False, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
>>> confused_array[mask] = 1
>>> confused_array[~mask & (numpy_array == 0)] = 0
>>> confused_array[~mask & (numpy_array != 0)] = 2
>>> confused_array
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Another approach would be to use numpy.where, which creates a brand new array, using values from the second argument where mask is true, and values from the third argument where mask is false. (As with assignment, the argument can be a scalar or an array of the same shape as mask.) This might be a bit more efficient than the above, and it's certainly more terse:
>>> numpy.where(mask, 1, numpy.where(numpy_array == 0, 0, 2))
array([[0, 2, 2],
[1, 2, 1],
[1, 2, 1]])
Here is one possible way of doing what you whant:
numpyArray = np.array([1, 8, 35, 343, 23, 3, 8]) # could be n-Dimensional array
repeatSet = np.array([3, 5, 6, 8])
mask = (numpyArray[...,None] == repeatSet[None,...]).any(axis=-1)
print mask
>>> [False True False False False True True]
In recent numpy you could use a combination of np.isin and np.where to achieve this result. The first method outputs a boolean numpy array that evaluates to True where its vlaues are equal to an array-like specified test element (see doc), while with the second you could create a new array that set some a value where the specified confition evaluates to True and another value where False.
Example
I'll make an example with a random array but using the specific values you provided.
import numpy as np
repeatSet = ([2, 5, 6, 8])
arr = np.array([[1,5,1],
[0,1,0],
[0,0,0],
[2,2,2]])
out = np.where(np.isin(arr, repeatSet), 1, 77)
> out
array([[77, 1, 77],
[77, 77, 77],
[77, 77, 77],
[ 1, 1, 1]])