Why this boolean is in this bayes classifier? (Python question?) - python

I'm studying GANs (and I'm a beginner in python) and I found this part of the code in the previous exercises that I don't understand. Concretely I don't understand why is used the boolean of the 9th line (Xk = X[Y == k]) for the reasons that I write down below
class BayesClassifier:
def fit(self, X, Y):
# assume classes are numbered 0...K-1
self.K = len(set(Y))
self.gaussians = []
self.p_y = np.zeros(self.K)
for k in range(self.K):
Xk = X[Y == k]
self.p_y[k] = len(Xk)
mean = Xk.mean(axis=0)
cov = np.cov(Xk.T)
g = {'m': mean, 'c': cov}
self.gaussians.append(g)
# normalize p(y)
self.p_y /= self.p_y.sum()
That boolean return a 0 or a 1 depending on the trueness of the Y ==
k, and for that reason always Xk will be the first or the second value of the X list. Y don't find the utility of that.
In the 10th line, len(Xk) always will be 1, why does it use that argument instead of a single 1?
The mean and covariance of the next lines are calculated only with one value each time.
I feel that I'm not understanding something very basic.

You should take into account that X, Y, k are NumPy arrays, not scalars, and some operators are overloaded for them. Particularly, == and Boolean-based indexing. == will be element-wise comparison, not the whole array comparison.
See how it works:
In [9]: Y = np.array([0,1,2])
In [10]: k = np.array([0,1,3])
In [11]: Y==k
Out[11]: array([ True, True, False])
So, the result of == is a Boolean array.
In [12]: X=np.array([0,2,4])
In [13]: X[Y==k]
Out[13]: array([0, 2])
The result is an array with elements selected from X when the condition is True
Hence len(Xk) will be the number of matched elements between X and k.

Thanks, Artem,
You are right. I found another answer by another channel, here it is:
It's a Numpy array - it's a special feature of NumPy arrays called
boolean indexing that lets you filter out only the values in the array
where the filter returns True:
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html?fbclid=IwAR3sGlgSwhv3i7IETsIxp4ROu9oZvNaaaBxZS01DrM5ShjWWRz22ShP2rIg#boolean-or-mask-index-arrays
import numpy as np
a = np.array([1, 2, 3, 4, 5]) filter = a > 3
print(filter)
[False, False, False, True, True]
print(a[filter])
[4, 5]

Related

add a label to a numpy lists [duplicate]

This question already has answers here:
convert numpy array to 0 or 1
(7 answers)
Closed 2 years ago.
I have this function:
if elem < 0:
elem = 0
else:
elem = 1
I want to apply this function to every element in a NumPy array, which would be done with a for loop when performing this function for only the same dimensions. But in this case, I need it to work regardless of the array dimensions and shape. Would there be any way this can be achieved in Python with NumPy?
Or would there be any general way to apply any def to every element in a NumPy n-dimensional array?
Isn't it
arr = (arr >= 0).astype(int)
np.where
np.where(arr < 0, 0, 1)
You can use a boolean mask to define an array of decisions. Let's work through a concrete example. You have an array of positive and negative numbers and you want to take the square root only at non-negative locations:
arr = np.random.normal(size=100)
You compute a mask like
mask = arr >= 0
The most straightforward way to apply the mask is to create an output array, and fill in the required elements:
result = np.empty(arr.shape)
result[mask] = np.sqrt(arr[mask])
result[~mask] = arr[~mask]
This is not super efficient because you have to compute the inverse of the mask and apply it multiple times. For this specific example, your can take advantage of the fact that np.sqrt is a ufunc and use its where keyword:
result = arr.copy()
np.sqrt(arr, where=mask, out=result)
One popular way to apply the mask would be to use np.where but I specifically constructed this example to show the caveats. The simplistic approach would be to compute
result = np.where(mask, np.sqrt(arr), arr)
where chooses the value from either np.sqrt(arr) or arr depending on whether mask is truthy or not. This is a very good method in many cases, but you have to have the values pre-computed for both branches, which is exactly what to want to avoid with a square root.
TL;DR
Your specific example is looking for a representation of the mask itself. If you don't care about the type:
result = arr >= 0
If you do care about the type:
result = (arr >= 0).astype(int)
OR
result = -np.clip(arr, -1, 0)
These solutions create a different array from the input. If you want to replace values in the same buffer,
mask = arr >= 0
arr[mask] = 1
arr[~mask] = 0
You can do something like this:
import numpy as np
a=np.array([-2,-1,0,1,2])
a[a>=0]=1
a[a<0]=0
>>> a
array([0, 0, 1, 1, 1])
An alternative to the above solutions could be combining list comprenhension with ternary operators.
my_array = np.array([-1.2, 3.0, -10.11, 5.2])
sol = np.asarray([0 if val < 0 else 1 for val in my_array])
take a look to these sources
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
https://book.pythontips.com/en/latest/ternary_operators.html
Use numpy.vectorize():
import numpy as np
def unit(elem):
if elem < 0:
elem = 0
else:
elem = 1
a = np.array([[1, 2, -0.5], [0.5, 2, 3]])
vfunc = np.vectorize(unit)
vfunc(a)
# array([[1, 1, 0], [1, 1, 1]])

What does sum(x%2==0) mean?? (python)

import numpy as np
x = np.array([1, -1, 2, 5, 7])
print(sum(x%2==0))
This is the code, and I can't understand what does ' sum(x%2==0) ' mean.
Does it mean to sum even number?
I'm studying for school test and My professor said output of the above code is 1.
But I can't understand what does ' sum(x%2==0)' mean..
x % 2 == 0 will change your array to [False, False, True, False, False]
Because every element will be converted to a boolean, which represents, if the number is even or odd
Then the sum gets evaluated, where False = 0 and True = 1
0 + 0 + 1 + 0 + 0 = 1
import numpy as np
x = np.array([1, -1, 2, 5, 7])
# step 1: create an intermediate array which contains the modulo 2 of each element (if the element is even it will be True, otherwise False)
y = x % 2 == 0 # [False, False, True, False, False]
# step 2: sum the intermediate array up. In this case the False values count as 0 and the True values as 1. There is one True value so the sum is 1
z = sum(y) # 1
For your purposes, here's an explanation. For Stack Overflow's purposes, I'm recommending to close this question as it's more coding help than a novel coding question.
The operations in this expresssion are as follows:
# operation 1
intermediate_result_1 = x%2
# operation 2
intermediate_result_2 = (intermediate_result_1 == 0)
# operation 3
sum(intermediate_result_2)
Operation 1: the modulo operator essentially returns the remainder when the first term is divided by the second term. Most basic mathematical operations (e.g. +,-,*,/,%,==,!=, etc) are implemented element-wise in numpy, which means that the operation is performed independently on each element in the array. Thus, the output from operation 1:
intermediate_result_1 = np.Array([1,1,0,1,1])
Operation 2: same for the equality operator ==. Each element of the array is compared to the right-hand value, and the resulting array has True (or 1) where the equality expression holds, and False (or 0) otherwise.
intermediate_result_2 = np.Array([0,0,1,0,0])
Operation 3: Lastly, the default sum() operator for a numpy array sums all values in the array. Note that numpy provides its own sum function which allows for summing along individual dimensions. Quite evidently the sum of this array's elements is 1.
numpy makes it easy for you to operate on the array object
as many answers already suggest that
x%2==0 returns [False, False, True, False, False]
but if you are still confused then try to understand it like this
lets make a function which checks if a value is even or not.
def is_even(ele):
return ele%2==0
then we use the map function
map() function returns a map object(which is an iterator) of the
results after applying the given function to each item of a given
iterable (list, tuple etc.)
NOTE: copied from GeeksforGeeks
then we take a simple list and map it with this function like so:
l=[1, -1, 2, 5, 7] # this is not a np array
print(map(is_even, l)) # this prints [False, False, True, False, False]
print(sum(map(is_even, l))) # this prints 1

Apply conditional function to every element of a numpy array [duplicate]

This question already has answers here:
convert numpy array to 0 or 1
(7 answers)
Closed 2 years ago.
I have this function:
if elem < 0:
elem = 0
else:
elem = 1
I want to apply this function to every element in a NumPy array, which would be done with a for loop when performing this function for only the same dimensions. But in this case, I need it to work regardless of the array dimensions and shape. Would there be any way this can be achieved in Python with NumPy?
Or would there be any general way to apply any def to every element in a NumPy n-dimensional array?
Isn't it
arr = (arr >= 0).astype(int)
np.where
np.where(arr < 0, 0, 1)
You can use a boolean mask to define an array of decisions. Let's work through a concrete example. You have an array of positive and negative numbers and you want to take the square root only at non-negative locations:
arr = np.random.normal(size=100)
You compute a mask like
mask = arr >= 0
The most straightforward way to apply the mask is to create an output array, and fill in the required elements:
result = np.empty(arr.shape)
result[mask] = np.sqrt(arr[mask])
result[~mask] = arr[~mask]
This is not super efficient because you have to compute the inverse of the mask and apply it multiple times. For this specific example, your can take advantage of the fact that np.sqrt is a ufunc and use its where keyword:
result = arr.copy()
np.sqrt(arr, where=mask, out=result)
One popular way to apply the mask would be to use np.where but I specifically constructed this example to show the caveats. The simplistic approach would be to compute
result = np.where(mask, np.sqrt(arr), arr)
where chooses the value from either np.sqrt(arr) or arr depending on whether mask is truthy or not. This is a very good method in many cases, but you have to have the values pre-computed for both branches, which is exactly what to want to avoid with a square root.
TL;DR
Your specific example is looking for a representation of the mask itself. If you don't care about the type:
result = arr >= 0
If you do care about the type:
result = (arr >= 0).astype(int)
OR
result = -np.clip(arr, -1, 0)
These solutions create a different array from the input. If you want to replace values in the same buffer,
mask = arr >= 0
arr[mask] = 1
arr[~mask] = 0
You can do something like this:
import numpy as np
a=np.array([-2,-1,0,1,2])
a[a>=0]=1
a[a<0]=0
>>> a
array([0, 0, 1, 1, 1])
An alternative to the above solutions could be combining list comprenhension with ternary operators.
my_array = np.array([-1.2, 3.0, -10.11, 5.2])
sol = np.asarray([0 if val < 0 else 1 for val in my_array])
take a look to these sources
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
https://book.pythontips.com/en/latest/ternary_operators.html
Use numpy.vectorize():
import numpy as np
def unit(elem):
if elem < 0:
elem = 0
else:
elem = 1
a = np.array([[1, 2, -0.5], [0.5, 2, 3]])
vfunc = np.vectorize(unit)
vfunc(a)
# array([[1, 1, 0], [1, 1, 1]])

Fastest method to compare each character in string in a list of strings

I am working with an bioinformatic tool and constructed it using two loops to iterate over the each character.
The inputs (i.e. seq1 and sequence) are string of nucleotides such as 'AGATGCTAGTA' of identical lengths. The sequence_info is a list of all of the sequences.
It was incredibly slow and so I increased the speed by using continue instead of adding zero and storing bio_array as a numpy array. Here's the new code.
for (sequence, sequence_location) in sequence_info:
value = slow_function(seq1, sequence)
def slow_function(seq1,sequence):
calc=0
for i,nt in enumerate(seq1):
if nt == sequence[i]:
continue
else:
calc += bio_array[i]
return float(calc)
Using %%timeit in jupyter notebooks it is still around a 100ms. I would need it to be around or below 1-5ms. I have tried transforming the function into an iterator and using list comprehensions/map instead of using loops. But these methods didn't have a significant affect.
I think that it might be possible to use numpy, but I haven't been able to find a method to use from looking at the documentation or on stackoverflow. As I need specific values from bio_array to be added together if there are mismatches in the sequence, I would need to compare each character value in the string individually I would think.
What would be the best way to increase the speed of this code to be as fast as possible?
If I understand correctly, your problem is that you wish to sum elements of an array based on where the two sequences of strings don't match. You can simply create characters array of your sequences and then use numpy conditional indexing to get the non-matching values. Here is a reduced example:
seq_a = np.array(list('ABCDEFGH'))
seq_b = np.array(list('ABCZEFZH'))
bio_array = np.array([1, 5, 9, 4, 3, 8, 2, 7])
Then, doing an element-wise comparison between seq_a and seq_b gives you:
>>> seq_a != seq_b
array([False, False, False, True, False, False, True, False])
You can then index bio_array with this result to get the relevant values and then sum them:
>>> bio_array[seq_a != seq_b]
array([4, 2])
>>> bio_array[seq_a != seq_b].sum()
6
You should accept #sshashank124's answer, but here's a quick bit of code to show what is going on and how much it differs:
import numpy as np
from timeit import timeit
def slow_function(seq1, seq2, costs):
calc = 0
for i, nt in enumerate(seq1):
if nt == seq2[i]:
continue
else:
calc += costs[i]
return float(calc)
def shorter_slow_function(seq1, seq2, costs):
return sum(costs[i] for i in range(len(seq1)) if seq1[i] != seq2[i])
def faster_numpy_function(seq1, seq2, costs):
return costs[seq1 != seq2].sum()
x = np.array(list('ABCDE'))
y = np.array(list('XBCDY'))
c = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
print(timeit(lambda: slow_function(x, y, c)))
print(timeit(lambda: shorter_slow_function(x, y, c)))
print(timeit(lambda: faster_numpy_function(x, y, c)))
Results:
6.7421024
6.665790399999999
5.321171700000001

Python: element-wise comparison of array to non-array

I'm trying to plot some complex functions using numpy. Example of some working code:
import numpy as np
from PIL import Image
size = 1000
w = np.linspace(-10, 10, size)
x, y = np.meshgrid(w, w)
r = x + 1j*y
def f(q):
return np.angle(q)
z = f(r)
normalized = ((255/(np.amax(z) - np.amin(z)))*(z+abs(np.amin(z)))).astype(int)
data = [i for j in normalized for i in j]
img = Image.new('L', (size, size))
img.putdata(data[::-1]) #pixels are done bottom to top
img.show()
However, suppose I want the function f to have a simple comparison in it, like this:
def f(q):
if np.abs(q) < 4:
return 1
else:
return 0
I get the error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
For the np.abs(q) < 4 check.
I did some digging and realized it's because Python is doing the operation on the entire r array, and it can't compare an array to an integer. So, I tried looking for ways to do element-wise comparisons.
This page looked promising: it says I can do element-wise comparisons by using np.less(a, b), so I tried
def f(q):
if np.less(np.abs(q), 4):
return 1
else:
return 0
and got the same ValueError. It seems as though both arguments for np.less() need to be arrays of the same size.
What I want is to compare each element of my array to a single, non-array quantity. I suppose I could make a dummy array of the same size filled with identical 4's, but there has to be a more elegant way of doing this.
The key is to return an array value instead of trying to coerce an array into a single bool, which is what if (some_array): keeps trying to do. There being no unambiguous way to decide what single boolean np.array([True, False]) should convert to, it doesn't even try.
So don't even branch:
def f(q):
return abs(q) < 4
gives an array like
>>> f(np.array([1,3,5]))
array([ True, True, False], dtype=bool)
which as numbers will behave like
>>> f(np.array([1,3,5])).astype(int)
array([1, 1, 0])
and give

Categories

Resources