Vectorizing a positionally reliant function in NumPy - python

I understand the concept of vectorization, and how you can avoid using a loop to run through the elements when you want to adjust each individual element, however what I can't figure out it how to do this when we have a conditional based on the neighbouring values of a pixel.
For example, if I have a mask:
mask = np.array([[0,0,0,0],
[1,0,0,0],
[0,0,0,1],
[1,0,0,0]])
And I wanted to change an element by evaluating neighboring components in the mask, like so:
if sum(mask[j-1:j+2,i-1:i+2].flatten())>1 and mask[j,i]!=1:
out[j,i]=1
How can I vectorize the operation when I specifically need to access the neighboring elements?
Thanks in advance.
Full loop:
import numpy as np
mask = np.array([[0,0,0,0], [1,0,0,0], [0,0,0,1], [1,0,0,0]])
out = np.zeros(mask.shape)
for j in range(len(mask)):
for i in range(len(mask[0])):
if sum(mask[j-1:j+2,i-1:i+2].flatten())>1 and mask[j,i]!=1:
out[j,i]=1
Output:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 0. 0.]]

Such a 'neighborhood sum' operation is often called a 2D convolution. In your case since you don't have any weighting it is efficiently implemented in the (IMO somewhat poorly named) scipy.ndimage.uniform_filter, which can compute the mean of a neighborhood (and the sum is
just the mean multiplied by the size).
import numpy as np
from scipy.ndimage import uniform_filter
mask = np.array([[0,0,0,0], [1,0,0,0], [0,0,0,1], [1,0,0,0]])
neighbor_sum = 9 * uniform_filter(mask.astype(np.float32), 3, mode="constant")
neighbor_sum = np.rint(neighbor_sum).astype(int)
out = ((neighbor_sum > 1) & (mask != 1)).astype(int)
print(out)
Output (which is different than your example but looking at it by hand is correct, assuming you don't want the edges to wrap around):
[[0 0 0 0]
[0 0 0 0]
[1 1 0 0]
[0 0 0 0]]
If you do want the edges to wrap around (or other edge behavior), look at the mode argument of uniform_filter.

Related

Calculating the difference between each element against other randomly generated elements in python

I am calculating the difference of each element in a numpy array. My code is
import numpy as np
M = 10
x = np.random.uniform(0,1,M)
y = np.array([x])
# Calculate the difference
z = np.array(y[:,None]-y)
When I run my code I get [[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]. I don't get a 10 by 10 array.
Where do I go wrong?
You should read the broadcasting rules for numpy
y.T - x
Another way:
np.subtract.outer(x, x)
You are not getting 10 by 10 array because value of M is 10. Try:
M = (10,10)

Convert probability vector into target vector in python?

I am doing logistic regression on iris dataset from sklearn, I know the math and try to implement it. At the final step, I get a prediction vector, this prediction vector represents the probability of that data point being to class 1 or class 2 (binary classification).
Now I want to turn this prediction vector into target vector. Say if probability is greater than 50%, that corresponding data point will belong to class 1, otherwise class 2. Use 0 to represent class 1, 1 for class 2.
I know there is a for loop version of it, just looping through the whole vector. But when the size get large, for loop is very expensive, so I want to do it more efficiently, like numpy's matrix operation, it is faster than doing matrix operation in for loop.
Any suggestion on the faster method?
import numpy as np
a = np.matrix('0.1 0.82')
print(a)
a[a > 0.5] = 1
a[a <= 0.5] = 0
print(a)
Output:
[[ 0.1 0.82]]
[[ 0. 1.]]
Update:
import numpy as np
a = np.matrix('0.1 0.82')
print(a)
a = np.where(a > 0.5, 1, 0)
print(a)
A more general solution to a 2D array which has many vectors with many classes:
import numpy as np
a = np.array( [ [.5, .3, .2],
[.1, .2, .7],
[ 1, 0, 0] ] )
idx = np.argmax(a, axis=-1)
a = np.zeros( a.shape )
a[ np.arange(a.shape[0]), idx] = 1
print(a)
Output:
[[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]]
Option 1: If you do binary classification and have 1d prediction vector then your solution is numpy.round:
prob = model.predict(X_test)
Y = np.round(prob)
Option 2: If you have an n-dimensional one-hot prediction matrix, but want to have labels then you can use numpy.argmax. This will return 1d vector with labels:
prob = model.predict(X_test)
y = np.argmax(prob, axis=1)
In case you want to procede with a confusion matrix etc. afterwards and get the original format of a target variable in scikit again: array([1 0 ... 1])you can use:
a = clf.predict_proba(X_test)[:,1]
a = np.where(a>0.5, 1, 0)
The [:,1] referes to the second class (in my case: 1), the first class in my case was 0
for multi class, or a more generalized solution, use
np.argmax(y_hat, axis=1)

Reading 2d arrays into a 3d array in python

I searched stackoverflow but could not find an answer to this specific question. Sorry if it is a naive question, I am a newbie to python.
I have several 2d arrays (or lists) that I would like to read into a 3d array (list) in python. In Matlab, I can simply do
for i=1:N
# read 2d array "a"
newarray(:,:,i)=a(:,:)
end
so newarray is a 3d array with "a" being the 2d slices arranged along the 3rd dimension.
Is there a simple way to do this in python?
Edit: I am currently trying the following:
for file in files:
img=mpimg.imread(file)
newarray=np.array(0.289*cropimg[:,:,0]+0.5870*cropimg[:,:,1]+0.1140*cropimg[:,:,2])
i=i+1
I tried newarray[:,:,i] and it gives me an error
NameError: name 'newarray' is not defined
Seems like I have to define newarray as a numpy array? Not sure.
Thanks!
If you're familiar with MATLAB, translating that into using NumPy is fairly straightforward.
Lets say you have a couple arrays
a = np.eye(3)
b = np.arange(9).reshape((3, 3))
print(a)
# [[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
print(b)
# [[0 1 2]
# [3 4 5]
# [6 7 8]]
If you simply want to put them into another dimension, pass them both to the array constructor in an iterable (e.g. a list) like so:
x = np.array([a, b])
print(x)
# [[[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
#
# [[ 0. 1. 2.]
# [ 3. 4. 5.]
# [ 6. 7. 8.]]]
Numpy is smart enough to recognize the arrays are all the same size and creates a new dimension to hold it all.
print(x.shape)
# (2, 3, 3)
You can loop through it, but if you want to apply the same operations to it across some dimensions, I would strongly suggest you use broadcasting so that NumPy can vectorize the operation and it runs a whole lot faster.
For example, across one dimension, lets multiply one slice by 2, another by 3. (If it's not a pure scalar, we need to reshape the array to the same number of dimensions to broadcast, then the size on each needs to either match the array or be 1). Note that I'm working along the 0th axis, your image is probably different. I don't have a handy image to load up to toy with
y = x * np.array([2, 3]).reshape((2, 1, 1))
print(y)
#[[[ 2. 0. 0.]
# [ 0. 2. 0.]
# [ 0. 0. 2.]]
#
# [[ 0. 3. 6.]
# [ 9. 12. 15.]
# [ 18. 21. 24.]]]
Then we can add them up
z = np.sum(y, axis=0)
print(z)
#[[ 2. 3. 6.]
# [ 9. 14. 15.]
# [ 18. 21. 26.]]
If you're using NumPy arrays, you can translate almost directly from Matlab:
for i in range(1, N+1):
# read 2d array "a"
newarray[:, :, i] = a[:, :]
Of course you'd probably want to use range(N), because arrays use 0-based indexing. And obviously you're going to need to pre-create newarray in some way, just as you'd have to in Matlab, but you can translate that pretty directly too. (Look up the zeros function if you're not sure how.)
If you're using lists, you can't do this directly—but you probably don't want to anyway. A better solution would be to build up a list of 2D lists on the fly:
newarray = []
for i in range(N):
# read 2d list of lists "a"
newarray.append(a)
Or, more simply:
newarray = [read_next_2d_list_of_lists() for i in range(N)]
Or, even better, make that read function a generator, then just:
newarray = list(read_next_2d_list_of_lists())
If you want to transpose the order of the axes, you can use the zip function for that.

Appending np.arrays to a blank array in Python

I am trying to save the results from a loop in a np.array.
import numpy as np
p=np.array([])
points= np.array([[3,0,0],[-1,0,0]])
for i in points:
for j in points:
if j[0]!=0:
n=i+j
p= np.append(p,n)
However the resulting array is a 1D array of 6 members.
[2. 0. 0. -2. 0. 0.]
Instead I am looking for, but have been unable to produce:
[[2,0,0],[-2,0,0]]
Is there any way to get the result above?
Thank you.
One possibility is to turn p into a list, and convert it into a NumPy array right at the end:
p = []
for i in points:
...
p.append(n)
p = np.array(p)
What you're looking for is vertically stacking your results:
import numpy as np
p=np.empty((0,3))
points= np.array([[3,0,0],[-1,0,0]])
for i in points:
for j in points:
if j[0]!=0:
n=i+j
p= np.vstack((p,n))
print p
which gives:
[[ 2. 0. 0.]
[-2. 0. 0.]]
Although you could also reshape your result afterwards:
import numpy as np
p=np.array([])
points= np.array([[3,0,0],[-1,0,0]])
for i in points:
for j in points:
if j[0]!=0:
n=i+j
p= np.append(p,n)
p=np.reshape(p,(-1,3))
print p
Which gives the same result
.
I must warn you, hovever, that your code fails if j[0]!=0 as that would make n undefined...
np.vstack
np.empty
np.reshape

How to change chunks of data in a numpy array

I have a large numpy 1 dimensional array of data in Python and want entries x (500) to y (520) to be changed to equal 1. I could use a for loop but is there a neater, faster numpy way of doing this?
for x in range(500,520)
numpyArray[x] = 1.
Here is the for loop that could be used but it seems like there could be a function in numpy that I'm missing - I'd rather not use the masked arrays that numpy offers
You can use [] to access a range of elements:
import numpy as np
a = np.ones((10))
print(a) # Original array
# [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
startindex = 2
endindex = 4
a[startindex:endindex] = 0
print(a) # modified array
# [ 1. 1. 0. 0. 1. 1. 1. 1. 1. 1.]

Categories

Resources