Related
I have an array arr100 = np.ones(100). I need to replace these values with decimals,
where arr100[0] has a value 1, arr100[1] = 1/2, arr100[2] = 1/3,
and so on until arr100[99] = 1/100.
How to do this using a for loop in Python?
You can do something like this:
arr100 = np.ones(100)
for n in range(1, 101):
arr100[n-1] /= n
which changes arr100 to:
array([1. , 0.5 , 0.33333333, 0.25 , 0.2 ,
0.16666667, 0.14285714, 0.125 , 0.11111111, 0.1 ,
[....]
for i in range(101):
arr100[i] = 1/arr100[i]
You dont actually need an imported module (numpy) to do this. It can be done in a natural python list.
so you can just do this:
result = [1/(i+1) for i in range(100)]
print(result)
which returns this:
[1.0, 0.5, 0.3333333333333333, 0.25, 0.2, 0.16666666666666666, 0.14285714285714285, 0.125, 0.1111111111111111, 0.1, 0.09090909090909091, 0.08333333333333333, 0.07692307692307693, 0.07142857142857142, 0.06666666666666667, 0.0625, 0.058823529411764705, 0.05555555555555555, 0.05263157894736842, 0.05, 0.047619047619047616, 0.045454545454545456, 0.043478260869565216, 0.041666666666666664, 0.04, 0.038461538461538464, 0.037037037037037035, 0.03571428571428571, 0.034482758620689655, 0.03333333333333333, 0.03225806451612903, 0.03125, 0.030303030303030304, 0.029411764705882353, 0.02857142857142857, 0.027777777777777776, 0.02702702702702703, 0.02631578947368421, 0.02564102564102564, 0.025, 0.024390243902439025, 0.023809523809523808, 0.023255813953488372, 0.022727272727272728, 0.022222222222222223, 0.021739130434782608, 0.02127659574468085, 0.020833333333333332, 0.02040816326530612, 0.02, 0.0196078431372549, 0.019230769230769232, 0.018867924528301886, 0.018518518518518517, 0.01818181818181818, 0.017857142857142856, 0.017543859649122806, 0.017241379310344827, 0.01694915254237288, 0.016666666666666666, 0.01639344262295082, 0.016129032258064516, 0.015873015873015872, 0.015625, 0.015384615384615385, 0.015151515151515152, 0.014925373134328358, 0.014705882352941176, 0.014492753623188406, 0.014285714285714285, 0.014084507042253521, 0.013888888888888888, 0.0136986301369863, 0.013513513513513514, 0.013333333333333334, 0.013157894736842105, 0.012987012987012988, 0.01282051282051282, 0.012658227848101266, 0.0125, 0.012345679012345678, 0.012195121951219513, 0.012048192771084338, 0.011904761904761904, 0.011764705882352941, 0.011627906976744186, 0.011494252873563218, 0.011363636363636364, 0.011235955056179775, 0.011111111111111112, 0.01098901098901099, 0.010869565217391304, 0.010752688172043012, 0.010638297872340425, 0.010526315789473684, 0.010416666666666666, 0.010309278350515464, 0.01020408163265306, 0.010101010101010102, 0.01]
or you could do this to get the same:
import numpy as np
arr100 = np.ones(100)
for i,j in enumerate(arr100):
arr100[i] = 1/(i+1)
print(arr100)
I have a dictionary with key names as step1, step2, step3 accordingly, and each key has associated values that are list with say 5 items in each one of them.
My requirement is to get average of each item in the dictionary:
mydict = {
'step1': [0.94, 0.94, 0.94, 0.96, 0.94],
'step2': [0.94, 0.94, 0.94, 0.94, 0.94],
'step3': [0.92, 0.86, 0.98, 0.92, 0.94]
}
As hard code I can write this- but I want to make it more dynamic:
avg_each_item1 = (
mydict['step1'][0]
+ mydict['step2'][0]
+ mydict['step3'][0]
+ mydict['step4'][0]
+ mydict['step5'][0]
) / 5
Any quick tips on it are highly appreciated
Please Have A Look At This Snippet Does This Solve Your Prop?
import numpy as np # Make Sure You Have 'numpy' installed
mydict = {'step1': [0.94, 0.94, 0.94, 0.96, 0.94], 'step2': [
0.94, 0.94, 0.94, 0.94, 0.94], 'step3': [0.92, 0.86, 0.98, 0.92, 0.94]}
for k, v in mydict.items():
print("Average For", k, ":", np.average(mydict[k])) # Calculating The Average for each step.
Hope so this snippet would help.
Happy Coding!
Assuming the size of each list is known in advance you can do it in one line:
>>> [sum([mydict[j][i] for j in mydict])/len(mydict.keys()) for i in range(5)]
[0.9280000000000002, 0.892, 0.9640000000000001, 0.932, 0.9399999999999998]
I have a list of lists:
list_of_lists = []
list_1 = [-1, 0.67, 0.23, 0.11]
list_2 = [-1]
list_3 = [0.54, 0.24, -1]
list_4 = [0.2, 0.85, 0.8, 0.1, 0.9]
list_of_lists.append(list_1)
list_of_lists.append(list_2)
list_of_lists.append(list_3)
list_of_lists.append(list_4)
The position is meaningful. I want to return a list that contains the mean per position, excluding -1. That is, I want:
[(0.54+0.2)/2, (0.67+0.24+0.85)/3, (0.23+0.8)/2, (0.11+0.1)/2, 0.9/1]
which is actually:
[0.37, 0.5866666666666667, 0.515, 0.10500000000000001, 0.9]
How can I do this in a pythonic way?
EDIT:
I am working with Python 2.7, and I am not looking for the mean of each list; instead, I'm looking for the mean of 'all list elements at position 0 excluding -1', and the mean of 'all list elements at position 1 excluding -1', etc.
The reason I had:
[(0.54+0.2)/2, (0.67+0.24+0.85)/3, (0.23+0.8)/2, (0.11+0.1)/2, 0.9/1]
is that values in position 0 are -1, -1, 0.54, and 0.2, and I want to exclude -1; position 1 has 0.67, 0.24, and 0.85; position 3 has 0.23, -1, and 0.8, etc.
A solution without third-party libraries:
from itertools import zip_longest
from statistics import mean
def f(lst):
return [mean(x for x in t if x != -1) for t in zip_longest(*lst, fillvalue=-1)]
>>> f(list_of_lists)
[0.37, 0.5866666666666667, 0.515, 0.10500000000000001, 0.9]
It uses itertools.zip_longest with fillvalue set to -1 to "transpose" the list and set missing values to -1 (will be ignored at the next step). Then, a generator expression and statistics.mean are used to filter out -1s and get the average.
Here is a vectorised numpy-based solution.
import numpy as np
a = [[-1, 0.67, 0.23, 0.11],
[-1],
[0.54, 0.24, -1],
[0.2, 0.85, 0.8, 0.1, 0.9]]
# first create non-jagged numpy array
b = -np.ones([len(a), max(map(len, a))])
for i, j in enumerate(a):
b[i][0:len(j)] = j
# count negatives per column (for use later)
neg_count = [np.sum(b[:, i]==-1) for i in range(b.shape[1])]
# set negatives to 0
b[b==-1] = 0
# calculate means
means = [np.sum(b[:, i])/(b.shape[0]-neg_count[i]) \
if (b.shape[0]-neg_count[i]) != 0 else 0 \
for i in range(b.shape[1])]
# [0.37,
# 0.58666666666666667,
# 0.51500000000000001,
# 0.10500000000000001,
# 0.90000000000000002]
You can use pandas module to process.Code would like this :
import numpy as np
import pandas as pd
list_1 = [-1, 0.67, 0.23, 0.11,np.nan]
list_2 = [-1,np.nan,np.nan,np.nan,np.nan]
list_3 = [0.54, 0.24, -1,np.nan,np.nan]
list_4 = [0.2, 0.85, 0.8, 0.1, 0.9]
df=pd.DataFrame({"list_1":list_1,"list_2":list_2,"list_3":list_3,"list_4":list_4})
df=df.replace(-1,np.nan)
print(list(df.mean(axis=1)))
I'm new to Python. I want to use numpy and sklearn to do KNN. However, there's a nan in my data. I set dtype of genfromtxt to None but the array will look like below:
[('ADT1_YEAST', 0.58, 0.61, 0.47, 0.13, 0.5, 0.0, 0.48, 0.22, 'MIT')
('ADT2_YEAST', 0.43, 0.67, 0.48, 0.27, 0.5, 0.0, 0.53, 0.22, 'MIT')
('ADT3_YEAST', 0.64, 0.62, 0.49, 0.15, 0.5, 0.0, 0.53, 0.22, 'MIT') ...,
('ZNRP_YEAST', 0.67, 0.57, 0.36, 0.19, 0.5, 0.0, 0.56, 0.22, 'ME2')
('ZUO1_YEAST', 0.43, 0.4, 0.6, 0.16, 0.5, 0.0, 0.53, 0.39, 'NUC')
('G6PD_YEAST', 0.65, 0.54, 0.54, 0.13, 0.5, 0.0, 0.53, 0.22, 'CYT')]
then, I will get data type not understood on NearestNeighbors function.
Here is my code:
npGem = np.genfromtxt('temp.data', dtype=None)
X = np.array(npGem)
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree').fit(X)
can anyone teach me how to make the list be read? Thanks in advance.
If I understand the problem, you're really asking how to encode the categorical variables such that they can be properly interpreted by the nearest neighbors algorithm. You can do this with sklearn as explained in 4.2.4. Encoding categorical features. On the other hand, if you have incomplete features, 4.2.6. Imputation of missing values.
I think you need to get the data into a matrix properly. I typically do something like this:
import numpy as np
features = [] # list of lists of the feature vairables.
classes = [] # list of the target variables
for line in f:
line = line.strip().split() # will split the line into pieces on any white spaces
features.append(line[1:-1]) # or whatever indices your features are in
classes.append(line[-1]) # or whatever index your target variable is in
classes = np.array(classes)
features = np.array(features,dtype=np.float)
I am bit new to python. I started today. My code looks like this as follows
testcases=[
(([0.5,0.4,0.3],'HHTH'),[0.4166666666666667, 0.432, 0.42183098591549295, 0.43639398998330553]),
(([0.14,0.32,0.42,0.81,0.21],'HHHTTTHHH'),[0.5255789473684211, 0.6512136991788505, 0.7295055220497553, 0.6187139453483192, 0.4823974597714815, 0.3895729901052968, 0.46081730193074644, 0.5444108434105802, 0.6297110187222278]),
(([0.14,0.32,0.42,0.81,0.21],'TTTHHHHHH'),[0.2907741935483871, 0.25157009005730924, 0.23136284577678012, 0.2766575695593804, 0.3296000585271367, 0.38957299010529806, 0.4608173019307465, 0.5444108434105804, 0.6297110187222278]),
(([0.12,0.45,0.23,0.99,0.35,0.36],'THHTHTTH'),[0.28514285714285714, 0.3378256513026052, 0.380956725493104, 0.3518717367468537, 0.37500429586037076, 0.36528605387582497, 0.3555106542906013, 0.37479179323540324]),
(([0.03,0.32,0.59,0.53,0.55,0.42,0.65],'HHTHTTHTHHT'),[0.528705501618123, 0.5522060353798126, 0.5337142767315369, 0.5521920592821695, 0.5348391689038525, 0.5152373451083692, 0.535385450497415, 0.5168208803156963, 0.5357708613431963, 0.5510509656933194, 0.536055356823069])]
print 'Inputs'
print '======'
for inputs,output in testcases:
print inputs[0]
print 'Outputs'
print '======='
for inputs,output in testcases:
print output[0]
In the above code gives output as follows
Inputs
======
[0.5, 0.4, 0.3]
[0.14, 0.32, 0.42, 0.81, 0.21]
[0.14, 0.32, 0.42, 0.81, 0.21]
[0.12, 0.45, 0.23, 0.99, 0.35, 0.36]
[0.03, 0.32, 0.59, 0.53, 0.55, 0.42, 0.65]
Outputs
=======
0.416666666667
0.525578947368
0.290774193548
0.285142857143
0.528705501618
But I need to access each row of the testcases list above like to get the output as follows what should be the code?
([0.5,0.4,0.3],'HHTH'),[0.4166666666666667, 0.432, 0.42183098591549295, 0.43639398998330553])
This will print what you're after (if that's what you mean).
>>> for inputs,outputs in testcases:
... print '%r, %r' % (inputs, outputs)
([0.5, 0.4, 0.3], 'HHTH'), [0.4166666666666667, 0.432, 0.42183098591549295, 0.43639398998330553]
([0.14, 0.32, 0.42, 0.81, 0.21], 'HHHTTTHHH'), [0.5255789473684211, 0.6512136991788505, 0.7295055220497553, 0.6187139453483192, 0.4823974597714815, 0.3895729901052968, 0.46081730193074644, 0.5444108434105802, 0.6297110187222278]
([0.14, 0.32, 0.42, 0.81, 0.21], 'TTTHHHHHH'), [0.2907741935483871, 0.25157009005730924, 0.23136284577678012, 0.2766575695593804, 0.3296000585271367, 0.38957299010529806, 0.4608173019307465, 0.5444108434105804, 0.6297110187222278]
([0.12, 0.45, 0.23, 0.99, 0.35, 0.36], 'THHTHTTH'), [0.28514285714285714, 0.3378256513026052, 0.380956725493104, 0.3518717367468537, 0.37500429586037076, 0.36528605387582497, 0.3555106542906013, 0.37479179323540324]
([0.03, 0.32, 0.59, 0.53, 0.55, 0.42, 0.65], 'HHTHTTHTHHT'), [0.528705501618123, 0.5522060353798126, 0.5337142767315369, 0.5521920592821695, 0.5348391689038525, 0.5152373451083692, 0.535385450497415, 0.5168208803156963, 0.5357708613431963, 0.5510509656933194, 0.536055356823069]
And if you really require the extra ) at the end of the line change the print statement to:
print '%r, %r)' % (inputs, outputs)
Seems like you need a nested loop for that. e.g:
for inputs,output in testcases:
for output in outputs:
print output