searching k nearest neighbors in numpy

searching k nearest neighbors in numpy - python

I'm new to Python. I want to use numpy and sklearn to do KNN. However, there's a nan in my data. I set dtype of genfromtxt to None but the array will look like below:
[('ADT1_YEAST', 0.58, 0.61, 0.47, 0.13, 0.5, 0.0, 0.48, 0.22, 'MIT')
('ADT2_YEAST', 0.43, 0.67, 0.48, 0.27, 0.5, 0.0, 0.53, 0.22, 'MIT')
('ADT3_YEAST', 0.64, 0.62, 0.49, 0.15, 0.5, 0.0, 0.53, 0.22, 'MIT') ...,
('ZNRP_YEAST', 0.67, 0.57, 0.36, 0.19, 0.5, 0.0, 0.56, 0.22, 'ME2')
('ZUO1_YEAST', 0.43, 0.4, 0.6, 0.16, 0.5, 0.0, 0.53, 0.39, 'NUC')
('G6PD_YEAST', 0.65, 0.54, 0.54, 0.13, 0.5, 0.0, 0.53, 0.22, 'CYT')]
then, I will get data type not understood on NearestNeighbors function.
Here is my code:
npGem = np.genfromtxt('temp.data', dtype=None)
X = np.array(npGem)
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree').fit(X)
can anyone teach me how to make the list be read? Thanks in advance.

If I understand the problem, you're really asking how to encode the categorical variables such that they can be properly interpreted by the nearest neighbors algorithm. You can do this with sklearn as explained in 4.2.4. Encoding categorical features. On the other hand, if you have incomplete features, 4.2.6. Imputation of missing values.

I think you need to get the data into a matrix properly. I typically do something like this:
import numpy as np
features = [] # list of lists of the feature vairables.
classes = [] # list of the target variables
for line in f:
line = line.strip().split() # will split the line into pieces on any white spaces
features.append(line[1:-1]) # or whatever indices your features are in
classes.append(line[-1]) # or whatever index your target variable is in
classes = np.array(classes)
features = np.array(features,dtype=np.float)

Related

Is there a way to fetch item from a particular index from lists within a dictionary(values)

I have a dictionary with key names as step1, step2, step3 accordingly, and each key has associated values that are list with say 5 items in each one of them.
My requirement is to get average of each item in the dictionary:
mydict = {
'step1': [0.94, 0.94, 0.94, 0.96, 0.94],
'step2': [0.94, 0.94, 0.94, 0.94, 0.94],
'step3': [0.92, 0.86, 0.98, 0.92, 0.94]
}
As hard code I can write this- but I want to make it more dynamic:
avg_each_item1 = (
mydict['step1'][0]
+ mydict['step2'][0]
+ mydict['step3'][0]
+ mydict['step4'][0]
+ mydict['step5'][0]
) / 5
Any quick tips on it are highly appreciated

Please Have A Look At This Snippet Does This Solve Your Prop?
import numpy as np # Make Sure You Have 'numpy' installed
mydict = {'step1': [0.94, 0.94, 0.94, 0.96, 0.94], 'step2': [
0.94, 0.94, 0.94, 0.94, 0.94], 'step3': [0.92, 0.86, 0.98, 0.92, 0.94]}
for k, v in mydict.items():
print("Average For", k, ":", np.average(mydict[k])) # Calculating The Average for each step.
Hope so this snippet would help.
Happy Coding!

Assuming the size of each list is known in advance you can do it in one line:
>>> [sum([mydict[j][i] for j in mydict])/len(mydict.keys()) for i in range(5)]
[0.9280000000000002, 0.892, 0.9640000000000001, 0.932, 0.9399999999999998]

Append in an array results in a list Python

I have the following code
points=candies
K=5
centers = []
for i in range(K):
centers.append(random.choice(points))
centers
which results in basically a list of arrays
[array([0.6 , 0.92, 0.29]),
array([0.99, 0.23, 0.45]),
array([0.65, 0.6 , 0.03]),
array([0.21, 0.22, 0.55]),
array([0.62, 0.84, 0.83])]
What I want would be a single array like
array[[0.6 , 0.92, 0.29],
[0.99, 0.23, 0.45],
[0.65, 0.6 , 0.03],
[0.21, 0.22, 0.55],
[0.62, 0.84, 0.83]]
What do I have to change?

Either convert the list of arrays to a 2D array:
np.array(centers)
Or start right from an empty array and populate it:
centers = np.empty((K,3))
for i in range(K):
centers[i] = random.choice(points)

Finding the probability of a variable in collection of lists

I have a selection of lists of variables
import numpy.random as npr
w = [0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17]
x = 1
y = False
z = [0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175]
v = npr.choice(w, x, y, z)
I want to find the probability of the value V being a selection of variables eg; False or 0.12.
How do I do this.
Heres what I've tried;
import numpy.random as npr
import math
w = [0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17]
x = 1
y = False
z = [0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175]
v = npr.choice(w, x, y, z)
from collections import Counter
c = Counter(0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17,1,False,0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175)
def probability(0.12):
return float(c[v]/len(w,x,y,z))
which I'm getting that 0.12 is an invalid syntax

There are several issues in the code, I think you want the following:
import numpy.random as npr
import math
from collections import Counter
def probability(v=0.12):
return float(c[v]/len(combined))
w = [0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17]
x = [1]
y = [False]
z = [0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175]
combined = w + x + y + z
v = npr.choice(combined)
c = Counter(combined)
print(probability())
print(probability(v=0.05))
1) def probability(0.12) does not make sense; you will have to pass a variable which can also have a default value (above I use 0.12)
2) len(w, x, y, z) does not make much sense either; you probably look for a list that combines all the elements of w, x, y and z. I put all of those in the list combined.
3) One would also have to put in an additional check, in case the user passes e.g. v=12345 which is not included in combined (I leave this to you).
The above will print
0.0625
0.125
which gives the expected outcome.

Original arguements get overwritten

So I have this function in Python:
def newk(kor, flds):
field=0.5*flds
knw=[]
for i in range(flds):
ktemp=kor
if ktemp[2]+i>field:
ktemp[2]-=(i-1)
else:
ktemp[2]+=i
knw+=[ktemp]
print knw
print ktemp
print kor, '\n'
return knw
which is called by:
knew=newk(kvals, folds)
My original kvals gets overwritten for some reason. Kvals is a list.
Also ktemp keeps accumulating like knw suppose to and it screws
everything up. My output looks like this:
[[0.05, 0.05, 0.166667]] [0.05, 0.05, 0.166667] [0.05, 0.05, 0.166667]
[[0.05, 0.05, 1.166667], [0.05, 0.05, 1.166667]] [0.05, 0.05,
1.166667] [0.05, 0.05, 1.166667]
[[0.05, 0.05, -0.8333330000000001], [0.05, 0.05, -0.8333330000000001],
[0.05, 0.05, -0.8333330000000001]] [0.05, 0.05, -0.8333330000000001]
[0.05, 0.05, -0.8333330000000001]
K point values are: [0.05, 0.05, -0.8333330000000001] (original kvals was [0.05,0.05,0.166667])
But I need my output to look like this: knw would be [[0.05, 0.05, 0.166667],[0.05, 0.05, 1.166667],[0.05, 0.05, -0.833333], kval would be [0.05, 0.05, 0.166667]
Also, when i change ktemp=kor in the loop to constant ktemp=[0.05, 0.05, 0.166667] everything works.

When you ktemp=kor you end up with two names pointing at the same list object & so a modification to ktemp is the same as modifying kor. If you want a copy of the list, you need to say ktemp = kor[:] (assuming kor is just numbers - if you want a 'deep copy' of a list with complex objects, that's a different issue).

Creating and accessing list of lists in python

I am bit new to python. I started today. My code looks like this as follows
testcases=[
(([0.5,0.4,0.3],'HHTH'),[0.4166666666666667, 0.432, 0.42183098591549295, 0.43639398998330553]),
(([0.14,0.32,0.42,0.81,0.21],'HHHTTTHHH'),[0.5255789473684211, 0.6512136991788505, 0.7295055220497553, 0.6187139453483192, 0.4823974597714815, 0.3895729901052968, 0.46081730193074644, 0.5444108434105802, 0.6297110187222278]),
(([0.14,0.32,0.42,0.81,0.21],'TTTHHHHHH'),[0.2907741935483871, 0.25157009005730924, 0.23136284577678012, 0.2766575695593804, 0.3296000585271367, 0.38957299010529806, 0.4608173019307465, 0.5444108434105804, 0.6297110187222278]),
(([0.12,0.45,0.23,0.99,0.35,0.36],'THHTHTTH'),[0.28514285714285714, 0.3378256513026052, 0.380956725493104, 0.3518717367468537, 0.37500429586037076, 0.36528605387582497, 0.3555106542906013, 0.37479179323540324]),
(([0.03,0.32,0.59,0.53,0.55,0.42,0.65],'HHTHTTHTHHT'),[0.528705501618123, 0.5522060353798126, 0.5337142767315369, 0.5521920592821695, 0.5348391689038525, 0.5152373451083692, 0.535385450497415, 0.5168208803156963, 0.5357708613431963, 0.5510509656933194, 0.536055356823069])]
print 'Inputs'
print '======'
for inputs,output in testcases:
print inputs[0]
print 'Outputs'
print '======='
for inputs,output in testcases:
print output[0]
In the above code gives output as follows
Inputs
======
[0.5, 0.4, 0.3]
[0.14, 0.32, 0.42, 0.81, 0.21]
[0.14, 0.32, 0.42, 0.81, 0.21]
[0.12, 0.45, 0.23, 0.99, 0.35, 0.36]
[0.03, 0.32, 0.59, 0.53, 0.55, 0.42, 0.65]
Outputs
=======
0.416666666667
0.525578947368
0.290774193548
0.285142857143
0.528705501618
But I need to access each row of the testcases list above like to get the output as follows what should be the code?
([0.5,0.4,0.3],'HHTH'),[0.4166666666666667, 0.432, 0.42183098591549295, 0.43639398998330553])

This will print what you're after (if that's what you mean).
>>> for inputs,outputs in testcases:
... print '%r, %r' % (inputs, outputs)
([0.5, 0.4, 0.3], 'HHTH'), [0.4166666666666667, 0.432, 0.42183098591549295, 0.43639398998330553]
([0.14, 0.32, 0.42, 0.81, 0.21], 'HHHTTTHHH'), [0.5255789473684211, 0.6512136991788505, 0.7295055220497553, 0.6187139453483192, 0.4823974597714815, 0.3895729901052968, 0.46081730193074644, 0.5444108434105802, 0.6297110187222278]
([0.14, 0.32, 0.42, 0.81, 0.21], 'TTTHHHHHH'), [0.2907741935483871, 0.25157009005730924, 0.23136284577678012, 0.2766575695593804, 0.3296000585271367, 0.38957299010529806, 0.4608173019307465, 0.5444108434105804, 0.6297110187222278]
([0.12, 0.45, 0.23, 0.99, 0.35, 0.36], 'THHTHTTH'), [0.28514285714285714, 0.3378256513026052, 0.380956725493104, 0.3518717367468537, 0.37500429586037076, 0.36528605387582497, 0.3555106542906013, 0.37479179323540324]
([0.03, 0.32, 0.59, 0.53, 0.55, 0.42, 0.65], 'HHTHTTHTHHT'), [0.528705501618123, 0.5522060353798126, 0.5337142767315369, 0.5521920592821695, 0.5348391689038525, 0.5152373451083692, 0.535385450497415, 0.5168208803156963, 0.5357708613431963, 0.5510509656933194, 0.536055356823069]
And if you really require the extra ) at the end of the line change the print statement to:
print '%r, %r)' % (inputs, outputs)

Seems like you need a nested loop for that. e.g:
for inputs,output in testcases:
for output in outputs:
print output

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

searching k nearest neighbors in numpy - python

Related

Is there a way to fetch item from a particular index from lists within a dictionary(values)

Append in an array results in a list Python

Finding the probability of a variable in collection of lists

Original arguements get overwritten

Creating and accessing list of lists in python

Categories

Resources