Scatter plot with specifc conditions - python

Suppose that I have this data set
x1 = np.array([0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7])
x2 = np.array([0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6])
c = np.array([ 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
I want a scatter plot of x1,x2 under the following condition: if the corresponding index for (x1[i],x2[i])
in c is c[i]==1 then plot this a marker as a red X , but if the corresponding index for (x1[i], x2[i]) in c is c[i]==0 then plot the marker at (x1[i], x2[i]) as a blue O.
Any idea on how to do it?

Since you're using numpy, it is rather easy to get elements (or indices) where certain rule (comparison) is satisfied.
You can get an array of boolean values which corresponds to some condition acted on elements of existing array:
print(c==0)
[False False False False False True True True True True]
Also, you can access only some elements of array using the array of booleans (same as the size of initial array):
print(x1[c==1])
[0.1 0.3 0.1 0.6 0.4]
or more complex operations and/or set values:
x1[x2 < 0.5] = 0
print(x1)
x2[x2 < 0.5] = 10
print(x2)
[0. 0. 0.1 0.6 0. 0. 0.5 0. 0. 0.7]
[10. 10. 0.5 0.9 10. 10. 0.6 10. 10. 0.6]
For even more complicated conditions, there are logical functions for NumPy arrays.
This approach (NumPy) is significantly faster than using loops and you should utilize it whenever it is possible!
Applying above, it is easy to solve your problem:
import numpy as np
import matplotlib.pyplot as plt
x1 = np.array([0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7])
x2 = np.array([0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6])
c=np.array([ 1,1,1,1,1,0,0,0,0,0 ])
plt.plot(x1[c==0], x2[c==0], 'bo')
plt.plot(x1[c==1], x2[c==1], 'rx')

I don't know it is efficient way or not but you can do this,
i = 0
for (p1,p2) in zip(x1,x2):
if(c[i]==1):
plt.scatter(p1, p2, marker='x', color='red')
else:
plt.scatter(p1, p2, marker='o', color='blue')
i+=1
check other marker shapes here,
https://matplotlib.org/3.1.1/api/markers_api.html

Related

How to compare two numpy arrays with multiple condition

I have 2 NumPy arrays like the below
array_1 = np.array([1.2, 2.3, -1.0, -0.5])
array_2 = np.array([-0.5, 1.3, 2.5, -0.9])
We can do the element-wise simple arithmetic calculation (addition, subtraction, division etc) easily using different np functions
array_sum = np.add(array_1, array_2)
print(array_sum) # [ 0.7 3.6 3.5 -0.4]
array_sign = np.sign(array_1 * array_2)
print(array_sign) # [-1. 1. 1. -1.]
However, I need to check element-wise multiple conditions for 2 arrays and want to save them in 2 new arrays (say X and Y).
For example, if both elements contain different sign (e.g.: 1st and 3rd element pairs of the given example)) then, X will contain 0 and Y will be the sum of the poitive element and abs(negative element)
X = [0]
Y = [1.7]
When both elements are positive (e.g.: 2nd element pair of the given example) then, X will contain the lower value and Y will contain the greater value
X = [1.3]
Y = [2.3]
If both elements are negative, then, X will be 0 and Y will be the sum of the abs(negative element) and abs(negative element)
So, the final X and Y will be something like
X = [0, 1.3, 0, 0]
Y = [1.7, 2.3, 3.5, 1.4]
I have gone through some posts (this, and this) that described, the comparison procedures between 2 arrays, but not getting idea for multiple conditions. Here, 2 arrays are very small but, my real arrays are very large (e.g.: contains 2097152 element per array).
Any ideas are highly appreciated.
Try with numpy.select:
conditions = [(array_1>0)&(array_2>0), (array_1<0)&(array_2<0)]
choiceX = [np.minimum(array_1, array_2), np.zeros(len(array_1))]
choiceY = [np.maximum(array_1, array_2), -np.add(array_1,array_2)]
X = np.select(conditions, choiceX)
Y = np.select(conditions, choiceY, np.add(np.abs(array_1), np.abs(array_2)))
>>> X
array([0. , 1.3, 0. , 0. ])
>>> Y
array([1.7, 2.3, 3.5, 1.4])
This will do it. It does require vertically stacking the two arrays. I'm sure someone will pipe up if there is a more efficient solution.
import numpy as np
array_1 = np.array([1.2, 2.3, -1.0, -0.5])
array_2 = np.array([-0.5, 1.3, 2.5, -0.9])
def pick(t):
if t[0] < 0 or t[1] < 0:
return (0,abs(t[0])+abs(t[1]))
return (t.min(), t.max())
print( np.apply_along_axis( pick, 0, np.vstack((array_1,array_2))))
Output:
[[0. 1.3 0. 0. ]
[1.7 2.3 3.5 1.4]]
The second line of the function can also be written:
return (0,np.abs(t).sum())
But since these will only be two-element arrays, I doubt that saves anything at all.

Apply logical and/or operations along an axis in numpy python [duplicate]

For machine learning, I'm appliying Parzen Window algorithm.
I have an array (m,n). I would like to check on each row if any of the values is > 0.5 and if each of them is, then I would return 0, otherwise 1.
I would like to know if there is a way to do this without a loop thanks to numpy.
You can use np.all with axis=1 on a boolean array.
import numpy as np
arr = np.array([[0.8, 0.9], [0.1, 0.6], [0.2, 0.3]])
print(np.all(arr>0.5, axis=1))
>> [True False False]
import numpy as np
# Value Initialization
a = np.array([0.75, 0.25, 0.50])
y_predict = np.zeros((1, a.shape[0]))
#If the value is greater than 0.5, the value is 1; otherwise 0
y_predict = (a > 0.5).astype(float)
I have an array (m,n). I would like to check on each row if any of the values is > 0.5
That will be stored in b:
import numpy as np
a = # some np.array of shape (m,n)
b = np.any(a > 0.5, axis=1)
and if each of them is, then I would return 0, otherwise 1.
I'm assuming you mean 'and if this is the case for all rows'. In this case:
c = 1 - 1 * np.all(b)
c contains your return value, either 0 or 1.

Aggregating 2 NumPy arrays by confidence

I have 2 np arrays containing values in the interval [0,1].
I want to create the third array, containing the most "confident" values, meaning to take elementwise, the number from the array which is closer to 1 or 0. Consider the following example:
[0.7,0.12,1,0.5]
[0.1,0.99,0.001,0.49]
so my constructed array would be:
[0.1,0.99,1,0.49]
import numpy as np
A = np.array([0.7,0.12,1,0.5])
B = np.array([0.1,0.99,0.001,0.49])
maxi = np.maximum(A,B)
mini = np.minimum(A,B)
# Find where the maximum is closer to 1 than the minimum is to 0
idx = 1-maxi < mini
maxi*idx + mini*~idx
returns
array([ 0.1 , 0.99, 1. , 0.49])
You can try this:
c=np.array([a[i] if min(1-a[i],a[i])<min(1-b[i],b[i]) else b[i] for i in range(len(a))])
The result is:
array([ 0.1 , 0.99, 1. , 0.49])
Another way of stating your "confidence" measure is to ask which of the two numbers are furtest away from 0.5. That is, which of the two numbers x yields the largest abs(0.5 - x). The following solution constructs a 2D array c with the original arrays as columns. Then we construct and apply a boolean mask based on abs(0.5 - c):
import numpy as np
a = np.array([0.7,0.12,1,0.5])
b = np.array([0.1,0.99,0.001,0.49])
# Combine
c = np.concatenate((a, b)).reshape((2, len(a))).T
# Create mask
b_or_a = np.asarray(np.argmax(np.abs((0.5 - c)), axis=1), dtype=bool)
mask = np.zeros(c.shape, dtype=bool)
mask[:, 0] = ~b_or_a
mask[:, 1] = b_or_a
# Applt mask
d = c[mask]
print(d) # [ 0.1 0.99 1. 0.49]

How to sample in the matrix according to the probability in each cell

I tried to code the formula in pattern recognition but I can not find proper function to do the work. The problem is that I have an binary adjacency matrix A (M*N) and want to assign value 1 or 0 to each cell. Every cell has fixed probability P to be 1 and zero otherwise. I search method about sampling in python and it seems that the most methods only support sample several elements in list without considering probability. I really need help about this and any idea is appreciated.
you could use
A = (P > numpy.random.rand(4, 5)).astype(int)
Where P is your matrix of probabilities.
To make sure the probabilities are right you can test it using
P = numpy.ones((4, 5)) * 0.2
S = numpy.zeros((4, 5))
for i in range(100000):
S += (P > numpy.random.rand(4, 5)).astype(int)
print S # each element should be approximately 20000
print S.mean() # the average should be approximately 20000, too
Let's say you have your matrix of probabilities of adjacency as follows :
# Create your matrix
matrix = np.random.randint(0, 10, (3, 3))/10.
# Returns :
array([[ 0. , 0.4, 0.2],
[ 0.9, 0.7, 0.4],
[ 0.1, 0. , 0.5]])
# Now you can use np.where
threshold = 0.5
np.where(matrix<threshold, 0, 1) # you can set your threshold as you like.
# Here set to 0.5
# Returns :
array([[0, 0, 0],
[1, 1, 0],
[0, 0, 1]])

Filtering through numpy arrays by one row's information

I am asking for help on filtering through numpy arrays. I currently have a numpy array which contains the following information:
[[x1_1, x1_2, ..., x1_n], [x2_1, x2_2, ..., x2_n], [y1, y2, ..., yn]
ie. the array is essentially a dataset where x1, x2 are features (coordinates), and y is the output (value). Each data point has an appropriate x1, x2, and y, so for example, the info corresponding to data point i is x1_i, x2_i, and yi.
Now, I want to extract all the data points by filtering through y, meaning I want to know all the data points in which y > some value. In my case, I want the info (still with the same numpy structure) for all cases where y > 0. I don't really know how to do that -- I've been playing around with boolean indexing such as d[0:2,y>0] or d[d[2]>0], but haven't gotten anywhere.
A clarifying example:
Given the dataset:
d = [[0.1, 0.2, 0.3], [-0.1,-0.2,-0.3], [1,1,-1]]
I pull all points or instances where y > 0, ie. d[2] > 0, and it should return the values:
[[0.1, 0.2],[-0.1,-0.2],[1,1]]
Any advice or help would be appreciated.
You can use:
import numpy as np
d = np.array([[0.1, 0.2, 0.3], [-0.1,-0.2,-0.3], [1,1,-1]])
print (d)
[[ 0.1 0.2 0.3]
[-0.1 -0.2 -0.3]
[ 1. 1. -1. ]]
#select last row by d[-1]
print (d[-1]>0)
[ True True False]
print (d[:,d[-1]>0])
[[ 0.1 0.2]
[-0.1 -0.2]
[ 1. 1. ]]

Categories

Resources