Trouble with PyTorchLSTM in Thinc - python

Running the following code:
from thinc.api import chain, PyTorchLSTM, Sigmoid, Embed, with_padded, with_array2d
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 2
model = chain(
Embed(nV=vocab_size, nO=embedding_dim),
with_padded(PyTorchLSTM(nI=embedding_dim,nO=hidden_dim, depth=n_layers)),
with_array2d(Sigmoid(nI=hidden_dim, nO=output_size))
)
model.initialize(X=train_x[:5], Y=train_y[:5])
I get this error: ValueError: Provided 'x' array should be 2-dimensional, but found 3 dimension(s).
Here is x[0], y[0]
[ 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
21025 308 6 3 1050 207 8 2138 32 1 171 57
15 49 81 5785 44 382 110 140 15 5194 60 154
9 1 4975 5852 475 71 5 260 12 21025 308 13
1978 6 74 2395 5 613 73 6 5194 1 24103 5
1983 10166 1 5786 1499 36 51 66 204 145 67 1199
5194 19869 1 37442 4 1 221 883 31 2988 71 4
1 5787 10 686 2 67 1499 54 10 216 1 383
9 62 3 1406 3686 783 5 3483 180 1 382 10
1212 13583 32 308 3 349 341 2913 10 143 127 5
7690 30 4 129 5194 1406 2326 5 21025 308 10 528
12 109 1448 4 60 543 102 12 21025 308 6 227
4146 48 3 2211 12 8 215 23] 1
I am relatively new to building these models, but I think it has to do with the fact that the output of the Pytorch LSTM layer has two dimensions. In a typical torch LSTM you'd stack the output from the LSTM layer (I think), but I'm not sure how to do that here. I assumed with_array2d would help but it doesn't seem to.

Related

Trying to predict the next number in cyphertext using tensorflow

I am experimenting with machine learning and I wanted to see how difficult it would be to predict a number given a series of other numbers. I have seen it accomplished with people making vectors such as 1-10. However, I wanted to try to do something more difficult. I wanted to do it based on the ciphertext. Here is what I have tried so far:
import numpy as np
import matplotlib.pyplot as plt
#from sklearn.linear_model import LinearRegression
from tensorflow.keras import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Lambda, SimpleRNN
from tensorflow.keras import backend as K
from numpy.polynomial import polynomial as poly
from sklearn.feature_extraction import DictVectorizer
import Pyfhel
def generateInput(x, length):
return np.append(x, [0 for i in range(length)], axis=0)
def main():
HE = Pyfhel.Pyfhel()
HE.contextGen(scheme='BFV', n=2048, q=34, t=34, t_bits=35, sec=128)
HE.keyGen()
a = "Hello"
a = np.asarray(bytearray(a, "utf-8"))
a = HE.encode(a)
ct = HE.encrypt(a).to_bytes('none')
ct = np.asarray([c for c in ct])
length = 100 # How many records to take into account
batch_size = 1
n_features = 1
epochs = 1
generator = TimeseriesGenerator(ct, ct, stride=length, length=length, batch_size=batch_size)
model = Sequential()
model.add(SimpleRNN(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(256, activation='softmax'))
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=['accuracy'])
history = model.fit(generator, epochs=epochs)
for i in range(1, length):
try:
x_input = np.asarray(generateInput(ct[:i], length-len(ct[:i]))).reshape((1, length))
yhat = model.predict(x_input).tolist()
yhat_normalized = [float(i)/sum(yhat[0]) for i in yhat[0]]
yhat_max = max(yhat_normalized)
yhat_index = yhat_normalized.index(yhat_max)
print("based on {} actual {} predicted {}".format(ct[:i], ct[i], yhat_index))
except Exception as e:
print("Error {}".format(e))
if __name__=="__main__":
main()
Now the problem is that all of my predictions are 0. Can anyone explain to me why this is happening? How can I fix this?
Here's what my current output looks like:
based on [94] actual 161 predicted 0
based on [ 94 161] actual 16 predicted 0
based on [ 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0] actual 105 predicted 0
based on [ 94 161 16 3 7 0 0 0 105] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78] actual 6 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78] actual 65 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45] actual 23 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23] actual 12 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12] actual 234 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234] actual 155 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217] actual 42 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42] actual 230 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122] actual 64 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99] actual 53 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53] actual 143 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143] actual 104 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104] actual 96 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96] actual 158 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158] actual 146 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217] actual 34 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34] actual 140 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140] actual 238 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238] actual 76 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76] actual 135 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135] actual 237 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0] actual 2 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0] actual 8 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0] actual 1 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0] actual 240 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240] actual 63 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63] actual 94 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94] actual 161 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0] actual 24 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0 16] actual 0 predicted 0

Read online excel file with a specific sheet and only selected columns

I have to read through pandas the CTG.xls file from the following path:
https://archive.ics.uci.edu/ml/machine-learning-databases/00193/.
From this file I have to select the sheet Data. Moreover I have to select from column K to the column AT of the file. So at the end one have a dataset with these column:
["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV" ,"MLTV" ,"Width","Min","Max" ,"Nmax","Nzeros","Mode","Mean" ,"Median" ,"Variance" ,"Tendency" ,"CLASS","NSP"]
How can I do this using the read function in pandas?
Use:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'
df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)
Output
LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV Width Min Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0 120 0 0 0 0 0 0 73 0.5 43 2.4 64 62 126 2 0 120 137 121 73 1 9 2
1 132 0.00638 0 0.00638 0.00319 0 0 17 2.1 0 10.4 130 68 198 6 1 141 136 140 12 0 6 1
2 133 0.003322 0 0.008306 0.003322 0 0 16 2.1 0 13.4 130 68 198 5 1 141 135 138 13 0 6 1
3 134 0.002561 0 0.007682 0.002561 0 0 16 2.4 0 23 117 53 170 11 0 137 134 137 13 1 6 1
4 132 0.006515 0 0.008143 0 0 0 16 2.4 0 19.9 117 53 170 9 0 137 136 138 11 1 2 1
... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
2121 140 0 0 0.007426 0 0 0 79 0.2 25 7.2 40 137 177 4 0 153 150 152 2 0 5 2
2122 140 0.000775 0 0.006971 0 0 0 78 0.4 22 7.1 66 103 169 6 0 152 148 151 3 1 5 2
2123 140 0.00098 0 0.006863 0 0 0 79 0.4 20 6.1 67 103 170 5 0 153 148 152 4 1 5 2
2124 140 0.000679 0 0.00611 0 0 0 78 0.4 27 7 66 103 169 6 0 152 147 151 4 1 5 2
2125 142 0.001616 0.001616 0.008078 0 0 0 74 0.4 36 5 42 117 159 2 1 145 143 145 1 0 1 1
[2126 rows x 23 columns]

Renaming a number of columns using for loop (python)

The dataframe below has a number of columns but columns names are random numbers.
daily1=
0 1 2 3 4 5 6 7 8 9 ... 11 12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 4 0 0 0 ... 640 777 674 842 786 865 809 674 679 852
1 0 0 0 0 0 0 0 0 0 0 ... 108 29 74 102 82 62 83 68 30 61
2 rows × 244 columns
I would like to organise columns names in numerical order(from 0 to 243)
I tried
for i, n in zip(daily1.columns, range(244)):
asd=daily1.rename(columns={i:n})
asd
but output has not shown...
Ideal output is
0 1 2 3 4 5 6 7 8 9 ... 234 235 236 237 238 239 240 241 242 243
0 0 0 0 0 0 0 4 0 0 0 ... 640 777 674 842 786 865 809 674 679 852
1 0 0 0 0 0 0 0 0 0 0 ... 108 29 74 102 82 62 83 68 30 61
Could I get some advice guys? Thank you
If you want to reorder the columns you can try that
columns = sorted(list(df.columns), reverse=False)
df = df[columns]
If you just want to rename the columns then you can try
df.columns = [i for i in range(df.shape[1])]

Reading XML (NIST .n42 file) and data extraction

I have a xml file that I need to extract data from 'channelData' in the below xml.
from xml.dom import minidom
xmldoc = minidom.parse('Annex_B_n42.xml')
itemlist = xmldoc.getElementsByTagName('ChannelData')
print(len(itemlist))
print(itemlist[0].attributes['compressionCode'].value)
for s in itemlist:
print(s.attributes['compressionCode'].value)
Which doesn't return the data, just the value 'None'.
I also tried another approach from an another example:
import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
#value=[]
for type_tag in root.findall('Spectrum'):
value = type_tag.get('id')
print(value)
print("data from file " +str(value))
This did not work at all and value is not being populated. I really don't understand how to parse xml.
Here is the xml file
<?xml version="1.0"?>
<?xml-model href="http://physics.nist.gov/N42/2011/N42/schematron/n42.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<RadInstrumentData xmlns="http://physics.nist.gov/N42/2011/N42" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://physics.nist.gov/N42/2011/N42 file:///d:/Data%20Files/ANSI%20N42%2042/V2/Schema/n42.xsd" n42DocUUID="d72b7fa7-4a20-43d4-b1b2-7e3b8c6620c1">
<RadInstrumentInformation id="RadInstrumentInformation-1">
<RadInstrumentManufacturerName>RIIDs R Us</RadInstrumentManufacturerName>
<RadInstrumentModelName>iRIID</RadInstrumentModelName>
<RadInstrumentClassCode>Radionuclide Identifier</RadInstrumentClassCode>
<RadInstrumentVersion>
<RadInstrumentComponentName>Software</RadInstrumentComponentName>
<RadInstrumentComponentVersion>1.1</RadInstrumentComponentVersion>
</RadInstrumentVersion>
</RadInstrumentInformation>
<RadDetectorInformation id="RadDetectorInformation-1">
<RadDetectorCategoryCode>Gamma</RadDetectorCategoryCode>
<RadDetectorKindCode>NaI</RadDetectorKindCode>
</RadDetectorInformation>
<EnergyCalibration id="EnergyCalibration-1">
<CoefficientValues>-21.8 12.1 6.55e-03</CoefficientValues>
</EnergyCalibration>
<RadMeasurement id="RadMeasurement-1">
<MeasurementClassCode>Foreground</MeasurementClassCode>
<StartDateTime>2003-11-22T23:45:19-07:00</StartDateTime>
<RealTimeDuration>PT60S</RealTimeDuration>
<Spectrum id="RadMeasurement-1Spectrum-1" radDetectorInformationReference="RadDetectorInformation-1" energyCalibrationReference="EnergyCalibration-1">
<LiveTimeDuration>PT59.61S</LiveTimeDuration>
<ChannelData compressionCode="None">
0 0 0 22 421 847 1295 1982 2127 2222 2302 2276
2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760
679 641 542 529 443 423 397 393 322 272 294 227
216 224 208 191 189 163 167 173 150 137 136 129
150 142 160 159 140 103 90 82 83 85 67 76
73 84 63 74 70 69 76 61 49 61 63 65
58 62 48 75 56 61 46 56 43 37 55 47
50 40 38 54 43 41 45 51 32 35 29 33
40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30
21 23 10 9 5 13 11 11 6 7 7 9
11 4 8 8 14 14 11 9 13 5 5 6
10 9 3 4 3 7 5 5 4 5 3 6
5 0 5 6 3 1 4 4 3 10 11 4
1 4 2 11 9 6 3 5 5 1 4 2
6 6 2 3 0 2 2 2 2 0 1 3
1 1 2 3 2 4 5 2 6 4 1 0
3 1 2 1 1 0 1 0 0 2 0 1
0 0 0 1 0 0 0 0 0 0 0 2
0 0 0 1 0 1 0 0 2 1 0 0
0 0 1 3 0 0 0 1 0 1 0 0
0 0 0 0
</ChannelData>
</Spectrum>
</RadMeasurement>
</RadInstrumentData>
You can use BeautifulSoup to get the channeldata tag value like following
from bs4 import BeautifulSoup
with open('Annex_B_n42.xml') as f:
xml = f.read()
bs_obj = BeautifulSoup(xml)
print(bs_obj.find_all("channeldata")[0].text)
That will print you
' 0 0 0 22 421 847 1295 1982 2127 2222 2302 2276 2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760 679 641 542 529 443 423 397 393 322 272 294 227 216 224 208 191 189 163 167 173 150 137 136 129 150 142 160 159 140 103 90 82 83 85 67 76 73 84 63 74 70 69 76 61 49 61 63 65 58 62 48 75 56 61 46 56 43 37 55 47 50 40 38 54 43 41 45 51 32 35 29 33 40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30 21 23 10 9 5 13 11 11 6 7 7 9 11 4 8 8 14 14 11 9 13 5 5 6 10 9 3 4 3 7 5 5 4 5 3 6 5 0 5 6 3 1 4 4 3 10 11 4 1 4 2 11 9 6 3 5 5 1 4 2 6 6 2 3 0 2 2 2 2 0 1 3 1 1 2 3 2 4 5 2 6 4 1 0 3 1 2 1 1 0 1 0 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 1 0 1 0 0 2 1 0 0 0 0 1 3 0 0 0 1 0 1 0 0 0 0 0 0 '
Try this:
import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
elems = root.findall(".//*[#compressionCode='None']")
print(elems[0].text)

How to change this dataframe with python in order to use collaborative filtering

Here is my original data:
enter image description here
As you can see.The cust_id column records the consumption record for each ID.And second column means the product name,third is the munber they bought each time.
I want to get this kind of data:
enter image description here
The result data shows each customer bought which product and how many.If they never bought,then the data is None.I think this is Sparse matrix.
I have tried many ways and still can't fix it up.....
Maybe pandas?Numpy?
There is problem with duplicates, I add last row with same cust_id and prd_id values for demonstrate it.
print (df)
cust_id prd_id prd_number
8 462 40 1
9 462 46 3
10 462 59 1
11 462 63 13
12 462 67 1
13 462 82 12
14 462 88 1
15 462 163 3
16 463 68 1
17 463 90 1
18 463 159 2
16 464 93 11
20 464 94 8
21 464 96 1
22 464 142 4
23 465 50 1
24 465 50 5
Then need groupby by columns cust_id and prd_id with aggreagting some function like mean() or sum(). Last unstack with replacing NaN to 0:
print (df.groupby(['cust_id', 'prd_id'])['prd_number'].sum().unstack(fill_value=0))
prd_id 40 46 50 59 63 67 68 82 88 90 93 94 96 142 \
cust_id
462 1 3 0 1 13 1 0 12 1 0 0 0 0 0
463 0 0 0 0 0 0 1 0 0 1 0 0 0 0
464 0 0 0 0 0 0 0 0 0 0 11 8 1 4
465 0 0 6 0 0 0 0 0 0 0 0 0 0 0
prd_id 159 163
cust_id
462 0 3
463 2 0
464 0 0
465 0 0

Categories

Resources