Why MultinomialNB outputs 0.5 when there is only one feature?

Why MultinomialNB outputs 0.5 when there is only one feature? - python

A naive Bayesian classification problem
Code
import numpy as np
X = np.array([[1], [2], [3], [4], [5], [6]], dtype=int)
Y = np.array([1, 1, 1, 0, 0, 0], dtype=int)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.partial_fit(X, Y, [0, 1])
print clf.predict_proba(X)
Output
[[ 0.5 0.5]
[ 0.5 0.5]
[ 0.5 0.5]
[ 0.5 0.5]
[ 0.5 0.5]
[ 0.5 0.5]]
MultinomialNB seems to calculate only the priori probability
How to solve this problem?
For example, the expected output
[[ 0.1 0.9]
[ 0.1 0.9]
[ 0.1 0.9]
[ 0.9 0.1]
[ 0.9 0.1]
[ 0.9 0.1]]

Related

Getting bincount of float values

Wondering if there is an easy function in numpy to get counts of values within a ranges. For example
import numpy as np
rand_vals = np.random.rand(10)
#Out > arrayarray([[0.15068161, 0.51291888, 0.99576726, 0.05944532, 0.72641707,
0.09693093, 0.61988549, 0.19811334, 0.88184011, 0.16775108]])
bins = np.linspace(0,1,11)
#Out> array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
#Expected Out > [2, 3, 0, 0, 0, 1, 1, 1, 1, 1]
#The first entry is 2 since there are two values between 0 to 0.1 (0.0584432, 0.09693093)
#The second entry is 3 since there are 3 values between 0.1 to 0.2 (0.15068161, 0.1981134, 0.16775108)
#So on ..

You can use numpy.histogram():
import numpy as np
bincount, bins = np.histogram(rand_vals, bins=np.linspace(0,1,11))
print(bincount) # => [0 2 1 0 2 0 1 1 2 1]
print(bins) # => [0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]

Add a scalar to a numpy matrix based on the indices in a different numpy array

I'm sorry if this question isn't framed well. So I would rather explain with an example.
I have a numpy matrix:
a = np.array([[0.5, 0.8, 0.1], [0.6, 0.9, 0.3], [0.7, 0.4, 0.8], [0.8, 0.7, 0.6]])
And another numpy array as shown:
b = np.array([1, 0, 2, 2])
With the given condition that values in b will be in the range(a.shape[1]) and that b.shape[1] == a.shape[0]. Now this is the operation I need to perform.
For every index i of a, and the corresponding index i of b, I need to subtract 1 from the index j of a[i] where j== b[i]
So in my example, a[0] == [0.5 0.8 0.1] and b[0] == 1. Therefore I need to subtract 1 from a[0][b[0]] so that a[0] = [0.5, -0.2, 0.1]. This has to be done for all rows of a. Any direct solution without me having to iterate through all rows or columns one by one?
Thanks.

Use numpy indexing. See this post for a nice introduction:
import numpy as np
a = np.array([[0.5, 0.8, 0.1], [0.6, 0.9, 0.3], [0.7, 0.4, 0.8], [0.8, 0.7, 0.6]])
b = np.array([1, 0, 2, 2])
a[np.arange(a.shape[0]), b] -= 1
print(a)
Output
[[ 0.5 -0.2 0.1]
[-0.4 0.9 0.3]
[ 0.7 0.4 -0.2]
[ 0.8 0.7 -0.4]]
As an alternative use substract.at:
np.subtract.at(a, (np.arange(a.shape[0]), b), 1)
print(a)
Output
[[ 0.5 -0.2 0.1]
[-0.4 0.9 0.3]
[ 0.7 0.4 -0.2]
[ 0.8 0.7 -0.4]]
The main idea is that:
np.arange(a.shape[0]) # shape[0] is equals to the numbers of rows
generates the indices of the rows:
[0 1 2 3]

Tensorflow sparse tensor with vector value to dense tensor

I have some sparse indices:
[[0 0]
[0 1]
[1 0]
[1 1]
[1 2]
[2 0]]
The corresponding value of each index is:
[[0.1 0.2 0.3]
[0.4 0.5 0.6]
[0.7 0.8 0.9]
[1.0 1.1 1.2]
[1.3 1.4 1.5]
[1.6 1.7 1.8]]
How to convert the 6x3 value tensor to 3x3x3 dense tensor in tensorflow? The value for indices not specified in indices is zero vector [0. 0. 0.]. The dense tensor is just like this:
[[[0.1 0.2 0.3]
[0.4 0.5 0.6]
[0.0 0.0 0.0]]
[[0.7 0.8 0.9]
[1.0 1.1 1.2]
[1.3 1.4 1.5]]
[[1.6 1.7 1.8]
[0.0 0.0 0.0]
[0.0 0.0 0.0]]]

You can do that with tf.scatter_nd:
import tensorflow as tf
with tf.Graph().as_default(), tf.Session() as sess:
indices = tf.constant(
[[0, 0],
[0, 1],
[1, 0],
[1, 1],
[1, 2],
[2, 0]])
values = tf.constant(
[[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9],
[1.0, 1.1, 1.2],
[1.3, 1.4, 1.5],
[1.6, 1.7, 1.8]])
out = tf.scatter_nd(indices, values, [3, 3, 3])
print(sess.run(out))
Output:
[[[0.1 0.2 0.3]
[0.4 0.5 0.6]
[0. 0. 0. ]]
[[0.7 0.8 0.9]
[1. 1.1 1.2]
[1.3 1.4 1.5]]
[[1.6 1.7 1.8]
[0. 0. 0. ]
[0. 0. 0. ]]]

There is no definite way to do it in Tensorflow using any reshape kind of function. I could only think about an iterative solution by creating a list and converting it back to a Tensor. This is perhaps not the most efficient solution, but this might work for your code.
# list of indices
idx=[[0,0],[0,1], [1,0],[1,1], [1,2], [2,0]]
# Original Tensor to reshape
dense_tensor=tf.Variable([[0.1, 0.2 ,0.3],[0.4, 0.5, 0.6], [0.7, 0.8, 0.9], [1.0,1.1,1.2],[1.3,1.4,1.5], [1.6,1.7,1.8]])
# creating a temporary list to later convert to Tensor
c=np.zeros([3,3,3]).tolist()
for i in range(3):
count=0
for j in range(3):
if([i,j] in idx):
c[i][j]=dense_tensor[count]
count=count+1
else:
c[i][j]=tf.Variable([0,0,0], dtype=tf.float32)
# Convert obtained list to Tensor
converted_tensor = tf.convert_to_tensor(c, dtype=tf.float32)
You can define the ranges depending upon the size of Tensor you want. For your case, I have chosen 3 as you wanted a 3x3x3 Tensor. I hope this helps!

How to split numpy array based on rows and keep those values into different arrays

I have this numpy array:
sample=
[[0.8 0.2 0.7 0.1]
[0.7 0.5 0.5 0.0]
[0.7 0.5 0.5 0.1]
[0.7 0.5 0.3 0.3]
[0.9 0.6 0.2 0.1]
[0.8 0.6 0.5 0.0]]
I want split it based on the rows(6) as well put those values into different numpy arrays.
For example:
sample_row_1 = [0.8 0.2 0.7 0.1]
sample_row_2 = [0.7 0.5 0.5 0.0]
sample_row_3 = [0.7 0.5 0.5 0.1]
sample_row_4 = [0.7 0.5 0.3 0.3]
sample_row_5 = [0.9 0.6 0.2 0.1]
sample_row_6 = [0.8 0.6 0.5 0.0]

There should be a compelling reason why basic array indexing via A[i] is insufficient and you need to extract to multiple variables.
And, if there is a compelling reason, you shouldn't define a variable number of variables. Use a dictionary instead:
import numpy as np
A = np.arange(16).reshape((4, 4))
arrs = {i: A[i] for i in range(A.shape[0])}
print(arrs)
{0: array([0, 1, 2, 3]),
1: array([4, 5, 6, 7]),
2: array([ 8, 9, 10, 11]),
3: array([12, 13, 14, 15])}

Or put them in a list of ndarrays:
import numpy as np
A = np.arange(16).reshape((4, 4))
y = [a for a in A]
print(y)
# and access by index
print(y[0])

Write coordinates to file from 4 numpy arrays Python

There are 4 numpy matrices,for exemple, 3x3 with coordinates:
Xg [[-0.5 0.3 1.1]
[-0.5 0.3 1.1]
[-0.5 0.3 1.1]]
Yg [[-0.5 -0.5 -0.5]
[ 0.3 0.3 0.3]
[ 1.1 1.1 1.1]]
u [[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
v [[ 1.03793 0.25065 -0.28944]
[-0.21591 -0.93072 -0.10047]
[-0.08591 -0.11284 -0.06082]]
How I can write coordinates in file like this:
# in file should be ", {{" x_coordinate","y_coordinate"},{"u_coordinate","v_coordinate"}}")
file = open("coordinates.txt", "w")
file.write(",{{" + x + "," + y + "},{" + u + "," + v + "}}")
#Output
,{{-0.5,-0.5},{1,1.03793}}, {{0.3,-0.5},{1,0.25065}}, {{1.1,-0.5},{1,-0.28944}},...

You could do nested for loops, like this:
X = [[-0.5, 0.3, 1.1],
[-0.5, 0.3, 1.1],
[-0.5, 0.3, 1.1]]
Y = [[-0.5, -0.5, -0.5],
[0.3, 0.3, 0.3],
[1.1, 1.1, 1.1]]
U = [[1, 1, 1, ],
[1, 1, 1, ],
[1, 1, 1, ]]
V = [[1.03793, 0.25065, -0.28944],
[-0.21591, -0.93072, -0.10047],
[-0.08591, -0.11284, -0.06082]]
with open("coordinates.txt", "w") as f:
for i in range(3):
for j in range(3):
f.write("{{{0},{1}}}, {{{2}, {3}}}\n".format(X[j][i], Y[j][i], U[j][i], V[j][i]))
Which gives
{-0.5,-0.5}, {1, 1.03793}
{-0.5,0.3}, {1, -0.21591}
{-0.5,1.1}, {1, -0.08591}
{0.3,-0.5}, {1, 0.25065}
{0.3,0.3}, {1, -0.93072}
{0.3,1.1}, {1, -0.11284}
{1.1,-0.5}, {1, -0.28944}
{1.1,0.3}, {1, -0.10047}
{1.1,1.1}, {1, -0.06082}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why MultinomialNB outputs 0.5 when there is only one feature? - python

Related

Getting bincount of float values

Add a scalar to a numpy matrix based on the indices in a different numpy array

Tensorflow sparse tensor with vector value to dense tensor

How to split numpy array based on rows and keep those values into different arrays

Write coordinates to file from 4 numpy arrays Python

Categories

Resources