Read online excel file with a specific sheet and only selected columns

Read online excel file with a specific sheet and only selected columns - python

I have to read through pandas the CTG.xls file from the following path:
https://archive.ics.uci.edu/ml/machine-learning-databases/00193/.
From this file I have to select the sheet Data. Moreover I have to select from column K to the column AT of the file. So at the end one have a dataset with these column:
["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV" ,"MLTV" ,"Width","Min","Max" ,"Nmax","Nzeros","Mode","Mean" ,"Median" ,"Variance" ,"Tendency" ,"CLASS","NSP"]
How can I do this using the read function in pandas?

Use:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'
df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)
Output
LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV Width Min Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0 120 0 0 0 0 0 0 73 0.5 43 2.4 64 62 126 2 0 120 137 121 73 1 9 2
1 132 0.00638 0 0.00638 0.00319 0 0 17 2.1 0 10.4 130 68 198 6 1 141 136 140 12 0 6 1
2 133 0.003322 0 0.008306 0.003322 0 0 16 2.1 0 13.4 130 68 198 5 1 141 135 138 13 0 6 1
3 134 0.002561 0 0.007682 0.002561 0 0 16 2.4 0 23 117 53 170 11 0 137 134 137 13 1 6 1
4 132 0.006515 0 0.008143 0 0 0 16 2.4 0 19.9 117 53 170 9 0 137 136 138 11 1 2 1
... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
2121 140 0 0 0.007426 0 0 0 79 0.2 25 7.2 40 137 177 4 0 153 150 152 2 0 5 2
2122 140 0.000775 0 0.006971 0 0 0 78 0.4 22 7.1 66 103 169 6 0 152 148 151 3 1 5 2
2123 140 0.00098 0 0.006863 0 0 0 79 0.4 20 6.1 67 103 170 5 0 153 148 152 4 1 5 2
2124 140 0.000679 0 0.00611 0 0 0 78 0.4 27 7 66 103 169 6 0 152 147 151 4 1 5 2
2125 142 0.001616 0.001616 0.008078 0 0 0 74 0.4 36 5 42 117 159 2 1 145 143 145 1 0 1 1
[2126 rows x 23 columns]

Related

Trying to predict the next number in cyphertext using tensorflow

I am experimenting with machine learning and I wanted to see how difficult it would be to predict a number given a series of other numbers. I have seen it accomplished with people making vectors such as 1-10. However, I wanted to try to do something more difficult. I wanted to do it based on the ciphertext. Here is what I have tried so far:
import numpy as np
import matplotlib.pyplot as plt
#from sklearn.linear_model import LinearRegression
from tensorflow.keras import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Lambda, SimpleRNN
from tensorflow.keras import backend as K
from numpy.polynomial import polynomial as poly
from sklearn.feature_extraction import DictVectorizer
import Pyfhel
def generateInput(x, length):
return np.append(x, [0 for i in range(length)], axis=0)
def main():
HE = Pyfhel.Pyfhel()
HE.contextGen(scheme='BFV', n=2048, q=34, t=34, t_bits=35, sec=128)
HE.keyGen()
a = "Hello"
a = np.asarray(bytearray(a, "utf-8"))
a = HE.encode(a)
ct = HE.encrypt(a).to_bytes('none')
ct = np.asarray([c for c in ct])
length = 100 # How many records to take into account
batch_size = 1
n_features = 1
epochs = 1
generator = TimeseriesGenerator(ct, ct, stride=length, length=length, batch_size=batch_size)
model = Sequential()
model.add(SimpleRNN(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(256, activation='softmax'))
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=['accuracy'])
history = model.fit(generator, epochs=epochs)
for i in range(1, length):
try:
x_input = np.asarray(generateInput(ct[:i], length-len(ct[:i]))).reshape((1, length))
yhat = model.predict(x_input).tolist()
yhat_normalized = [float(i)/sum(yhat[0]) for i in yhat[0]]
yhat_max = max(yhat_normalized)
yhat_index = yhat_normalized.index(yhat_max)
print("based on {} actual {} predicted {}".format(ct[:i], ct[i], yhat_index))
except Exception as e:
print("Error {}".format(e))
if __name__=="__main__":
main()
Now the problem is that all of my predictions are 0. Can anyone explain to me why this is happening? How can I fix this?
Here's what my current output looks like:
based on [94] actual 161 predicted 0
based on [ 94 161] actual 16 predicted 0
based on [ 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0] actual 105 predicted 0
based on [ 94 161 16 3 7 0 0 0 105] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78] actual 6 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78] actual 65 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45] actual 23 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23] actual 12 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12] actual 234 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234] actual 155 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217] actual 42 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42] actual 230 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122] actual 64 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99] actual 53 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53] actual 143 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143] actual 104 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104] actual 96 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96] actual 158 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158] actual 146 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217] actual 34 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34] actual 140 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140] actual 238 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238] actual 76 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76] actual 135 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135] actual 237 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0] actual 2 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0] actual 8 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0] actual 1 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0] actual 240 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240] actual 63 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63] actual 94 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94] actual 161 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0] actual 24 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0 16] actual 0 predicted 0

Trouble with PyTorchLSTM in Thinc

Running the following code:
from thinc.api import chain, PyTorchLSTM, Sigmoid, Embed, with_padded, with_array2d
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 2
model = chain(
Embed(nV=vocab_size, nO=embedding_dim),
with_padded(PyTorchLSTM(nI=embedding_dim,nO=hidden_dim, depth=n_layers)),
with_array2d(Sigmoid(nI=hidden_dim, nO=output_size))
)
model.initialize(X=train_x[:5], Y=train_y[:5])
I get this error: ValueError: Provided 'x' array should be 2-dimensional, but found 3 dimension(s).
Here is x[0], y[0]
[ 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
21025 308 6 3 1050 207 8 2138 32 1 171 57
15 49 81 5785 44 382 110 140 15 5194 60 154
9 1 4975 5852 475 71 5 260 12 21025 308 13
1978 6 74 2395 5 613 73 6 5194 1 24103 5
1983 10166 1 5786 1499 36 51 66 204 145 67 1199
5194 19869 1 37442 4 1 221 883 31 2988 71 4
1 5787 10 686 2 67 1499 54 10 216 1 383
9 62 3 1406 3686 783 5 3483 180 1 382 10
1212 13583 32 308 3 349 341 2913 10 143 127 5
7690 30 4 129 5194 1406 2326 5 21025 308 10 528
12 109 1448 4 60 543 102 12 21025 308 6 227
4146 48 3 2211 12 8 215 23] 1
I am relatively new to building these models, but I think it has to do with the fact that the output of the Pytorch LSTM layer has two dimensions. In a typical torch LSTM you'd stack the output from the LSTM layer (I think), but I'm not sure how to do that here. I assumed with_array2d would help but it doesn't seem to.

Renaming a number of columns using for loop (python)

The dataframe below has a number of columns but columns names are random numbers.
daily1=
0 1 2 3 4 5 6 7 8 9 ... 11 12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 4 0 0 0 ... 640 777 674 842 786 865 809 674 679 852
1 0 0 0 0 0 0 0 0 0 0 ... 108 29 74 102 82 62 83 68 30 61
2 rows × 244 columns
I would like to organise columns names in numerical order(from 0 to 243)
I tried
for i, n in zip(daily1.columns, range(244)):
asd=daily1.rename(columns={i:n})
asd
but output has not shown...
Ideal output is
0 1 2 3 4 5 6 7 8 9 ... 234 235 236 237 238 239 240 241 242 243
0 0 0 0 0 0 0 4 0 0 0 ... 640 777 674 842 786 865 809 674 679 852
1 0 0 0 0 0 0 0 0 0 0 ... 108 29 74 102 82 62 83 68 30 61
Could I get some advice guys? Thank you

If you want to reorder the columns you can try that
columns = sorted(list(df.columns), reverse=False)
df = df[columns]
If you just want to rename the columns then you can try
df.columns = [i for i in range(df.shape[1])]

Reading XML (NIST .n42 file) and data extraction

I have a xml file that I need to extract data from 'channelData' in the below xml.
from xml.dom import minidom
xmldoc = minidom.parse('Annex_B_n42.xml')
itemlist = xmldoc.getElementsByTagName('ChannelData')
print(len(itemlist))
print(itemlist[0].attributes['compressionCode'].value)
for s in itemlist:
print(s.attributes['compressionCode'].value)
Which doesn't return the data, just the value 'None'.
I also tried another approach from an another example:
import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
#value=[]
for type_tag in root.findall('Spectrum'):
value = type_tag.get('id')
print(value)
print("data from file " +str(value))
This did not work at all and value is not being populated. I really don't understand how to parse xml.
Here is the xml file
<?xml version="1.0"?>
<?xml-model href="http://physics.nist.gov/N42/2011/N42/schematron/n42.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<RadInstrumentData xmlns="http://physics.nist.gov/N42/2011/N42" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://physics.nist.gov/N42/2011/N42 file:///d:/Data%20Files/ANSI%20N42%2042/V2/Schema/n42.xsd" n42DocUUID="d72b7fa7-4a20-43d4-b1b2-7e3b8c6620c1">
<RadInstrumentInformation id="RadInstrumentInformation-1">
<RadInstrumentManufacturerName>RIIDs R Us</RadInstrumentManufacturerName>
<RadInstrumentModelName>iRIID</RadInstrumentModelName>
<RadInstrumentClassCode>Radionuclide Identifier</RadInstrumentClassCode>
<RadInstrumentVersion>
<RadInstrumentComponentName>Software</RadInstrumentComponentName>
<RadInstrumentComponentVersion>1.1</RadInstrumentComponentVersion>
</RadInstrumentVersion>
</RadInstrumentInformation>
<RadDetectorInformation id="RadDetectorInformation-1">
<RadDetectorCategoryCode>Gamma</RadDetectorCategoryCode>
<RadDetectorKindCode>NaI</RadDetectorKindCode>
</RadDetectorInformation>
<EnergyCalibration id="EnergyCalibration-1">
<CoefficientValues>-21.8 12.1 6.55e-03</CoefficientValues>
</EnergyCalibration>
<RadMeasurement id="RadMeasurement-1">
<MeasurementClassCode>Foreground</MeasurementClassCode>
<StartDateTime>2003-11-22T23:45:19-07:00</StartDateTime>
<RealTimeDuration>PT60S</RealTimeDuration>
<Spectrum id="RadMeasurement-1Spectrum-1" radDetectorInformationReference="RadDetectorInformation-1" energyCalibrationReference="EnergyCalibration-1">
<LiveTimeDuration>PT59.61S</LiveTimeDuration>
<ChannelData compressionCode="None">
0 0 0 22 421 847 1295 1982 2127 2222 2302 2276
2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760
679 641 542 529 443 423 397 393 322 272 294 227
216 224 208 191 189 163 167 173 150 137 136 129
150 142 160 159 140 103 90 82 83 85 67 76
73 84 63 74 70 69 76 61 49 61 63 65
58 62 48 75 56 61 46 56 43 37 55 47
50 40 38 54 43 41 45 51 32 35 29 33
40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30
21 23 10 9 5 13 11 11 6 7 7 9
11 4 8 8 14 14 11 9 13 5 5 6
10 9 3 4 3 7 5 5 4 5 3 6
5 0 5 6 3 1 4 4 3 10 11 4
1 4 2 11 9 6 3 5 5 1 4 2
6 6 2 3 0 2 2 2 2 0 1 3
1 1 2 3 2 4 5 2 6 4 1 0
3 1 2 1 1 0 1 0 0 2 0 1
0 0 0 1 0 0 0 0 0 0 0 2
0 0 0 1 0 1 0 0 2 1 0 0
0 0 1 3 0 0 0 1 0 1 0 0
0 0 0 0
</ChannelData>
</Spectrum>
</RadMeasurement>
</RadInstrumentData>

You can use BeautifulSoup to get the channeldata tag value like following
from bs4 import BeautifulSoup
with open('Annex_B_n42.xml') as f:
xml = f.read()
bs_obj = BeautifulSoup(xml)
print(bs_obj.find_all("channeldata")[0].text)
That will print you
' 0 0 0 22 421 847 1295 1982 2127 2222 2302 2276 2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760 679 641 542 529 443 423 397 393 322 272 294 227 216 224 208 191 189 163 167 173 150 137 136 129 150 142 160 159 140 103 90 82 83 85 67 76 73 84 63 74 70 69 76 61 49 61 63 65 58 62 48 75 56 61 46 56 43 37 55 47 50 40 38 54 43 41 45 51 32 35 29 33 40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30 21 23 10 9 5 13 11 11 6 7 7 9 11 4 8 8 14 14 11 9 13 5 5 6 10 9 3 4 3 7 5 5 4 5 3 6 5 0 5 6 3 1 4 4 3 10 11 4 1 4 2 11 9 6 3 5 5 1 4 2 6 6 2 3 0 2 2 2 2 0 1 3 1 1 2 3 2 4 5 2 6 4 1 0 3 1 2 1 1 0 1 0 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 1 0 1 0 0 2 1 0 0 0 0 1 3 0 0 0 1 0 1 0 0 0 0 0 0 '

Try this:
import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
elems = root.findall(".//*[#compressionCode='None']")
print(elems[0].text)

Pandas assign points ranging 5-1 depending upon timestamp, most recent get max point

While designing recommendation system, I have stumbled upon a case where voting or something similar is required for collaborative filtering implementation.
But in our system we don't have any field for rating/voting. I am willing to deduce similar kind of rating/voting depending upon the timestamp on which user watched the show.
This is what view-history looks like
subscriber_id content_id timestamp
1 123 1576833135000
1 124 1576833140000
1 125 1576833145000
1 126 1576833150000
1 127 1576833155000
1 128 1576833160000
1 129 1576833165000
1 130 1576833170000
1 131 1576833175000
1 132 1576833180000
2 123 1576833135000
2 124 1576833140000
2 125 1576833145000
2 126 1576833150000
2 127 1576833155000
2 128 1576833160000
2 129 1576833165000
2 130 1576833170000
2 131 1576833175000
2 132 1576833180000
2 133 1576833185000
2 134 1576833190000
2 135 1576833195000
2 136 1576833200000
2 137 1576833205000
2 138 1576833210000
2 139 1576833215000
2 140 1576833220000
2 141 1576833225000
2 142 1576833230000
2 143 1576833235000
2 144 1576833240000
I want to assign a number to each of these entries, ranging from 5-1(5 being most recent), I have implemented the rank system, but it is not working for the range.
df1['rank'] = df1.sort_values(['subscriber_id','timestamp']) \
.groupby(['subscriber_id'])['timestamp'] \
.rank(method='max').astype(int)
Expected Output:
subscriber_id content_id timestamp rating
1 123 1576833135000 1
1 124 1576833140000 1
1 125 1576833145000 2
1 126 1576833150000 2
1 127 1576833155000 3
1 128 1576833160000 3
1 129 1576833165000 4
1 130 1576833170000 4
1 131 1576833175000 5
1 132 1576833180000 5
2 123 1576833135000 1
2 124 1576833140000 1
2 125 1576833145000 1
2 126 1576833150000 1
2 127 1576833155000 2
2 128 1576833160000 2
2 129 1576833165000 2
2 130 1576833170000 2
2 131 1576833175000 3
2 132 1576833180000 3
2 133 1576833185000 3
2 134 1576833190000 3
2 135 1576833195000 4
2 136 1576833200000 4
2 137 1576833205000 4
2 138 1576833210000 4
2 139 1576833215000 4
2 140 1576833220000 5
2 141 1576833225000 5
2 142 1576833230000 5
2 143 1576833235000 5
2 144 1576833240000 5
Any help would be much appreciated!

Now it make sense. Solution is to create list with ranks based on modulo value from dividing number of data for selected user by 5. There You go :)
import pandas as pd
from io import StringIO
data = StringIO("""
content_id subscriber_id timestamp
123 1 1576833135000
124 1 1576833140000
125 1 1576833145000
126 1 1576833150000
127 1 1576833155000
128 1 1576833160000
129 1 1576833165000
130 1 1576833170000
131 1 1576833175000
132 1 1576833180000
123 2 1576833135000
124 2 1576833140000
125 2 1576833145000
126 2 1576833150000
127 2 1576833155000
128 2 1576833160000
129 2 1576833165000
130 2 1576833170000
131 2 1576833175000
132 2 1576833180000
133 2 1576833185000
134 2 1576833190000
135 2 1576833195000
136 2 1576833200000
137 2 1576833205000
138 2 1576833210000
139 2 1576833215000
140 2 1576833220000
141 2 1576833225000
142 2 1576833230000
143 2 1576833235000
144 2 1576833240000
""")
# load data into data frame
df = pd.read_csv(data, sep=' ')
# get unique users
user_list = df['subscriber_id'].unique()
# collect results
results = pd.DataFrame(columns=['content_id','subscriber_id','timestamp','rating'])
for user in user_list:
# select data range for one user
df2 = df[df['subscriber_id'] == user]
items_numer = df2.shape[0]
modulo_remider = items_numer % 5
ranks_repeat = int(items_numer / 5)
# create rating list based on modulo
if modulo_remider > 0:
rating = []
for i in range(1, 6, 1):
l = [i for j in range(ranks_repeat)]
for number in l:
rating.append(number)
if modulo_remider == 1:
rating.insert(rating.index(5), 5)
if modulo_remider == 2:
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
if modulo_remider == 3:
rating.insert(rating.index(3), 3)
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
if modulo_remider == 4:
rating.insert(rating.index(2), 2)
rating.insert(rating.index(3), 3)
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
df2.insert(3, 'rating', rating, True)
else:
rating = []
for i in range(1, 6, 1):
l = [i for j in range(ranks_repeat)]
for number in l:
rating.append(number)
df2.insert(3, 'rating', rating, True)
# collect results
results = results.append(df2)
Result:
content_id subscriber_id timestamp rating
0 123 1 1576833135000 1
1 124 1 1576833140000 1
2 125 1 1576833145000 2
3 126 1 1576833150000 2
4 127 1 1576833155000 3
5 128 1 1576833160000 3
6 129 1 1576833165000 4
7 130 1 1576833170000 4
8 131 1 1576833175000 5
9 132 1 1576833180000 5
10 123 2 1576833135000 1
11 124 2 1576833140000 1
12 125 2 1576833145000 1
13 126 2 1576833150000 1
14 127 2 1576833155000 2
15 128 2 1576833160000 2
16 129 2 1576833165000 2
17 130 2 1576833170000 2
18 131 2 1576833175000 3
19 132 2 1576833180000 3
20 133 2 1576833185000 3
21 134 2 1576833190000 3
22 135 2 1576833195000 4
23 136 2 1576833200000 4
24 137 2 1576833205000 4
25 138 2 1576833210000 4
26 139 2 1576833215000 4
27 140 2 1576833220000 5
28 141 2 1576833225000 5
29 142 2 1576833230000 5
30 143 2 1576833235000 5
31 144 2 1576833240000 5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read online excel file with a specific sheet and only selected columns - python

Related

Trying to predict the next number in cyphertext using tensorflow

Trouble with PyTorchLSTM in Thinc

Renaming a number of columns using for loop (python)

Reading XML (NIST .n42 file) and data extraction

Pandas assign points ranging 5-1 depending upon timestamp, most recent get max point

Categories

Resources