Reshape data to new format for object detection - python

I have a data set in this format in dataframe
0--Parade/0_Parade_marchingband_1_849.jpg
2
449 330 122 149 0 0 0 0 0 0
0--Parade/0_Parade_Parade_0_904.jpg
1
361 98 263 339 0 0 0 0 0 0
0--Parade/0_Parade_marchingband_1_799.jpg
45
78 221 7 8 0 0 0 0 0
78 238 14 17 2 0 0 0 0 0
3 232 11 15 2 0 0 0 2 0
20 215 12 16 2 0 0 0 2 0
0--Parade/0_Parade_marchingband_1_117.jpg
23
69 359 50 36 1 0 0 0 0 1
227 382 56 43 1 0 1 0 0 1
296 305 44 26 1 0 0 0 0 1
353 280 40 36 2 0 0 0 2 1
885 377 63 41 1 0 0 0 0 1
819 391 34 43 2 0 0 0 1 0
727 342 37 31 2 0 0 0 0 1
598 246 33 29 2 0 0 0 0 1
740 308 45 33 1 0 0 0 2 1
0--Parade/0_Parade_marchingband_1_778.jpg
35
27 226 33 36 1 0 0 0 2 0
63 95 16 19 2 0 0 0 0 0
64 63 17 18 2 0 0 0 0 0
88 13 16 15 2 0 0 0 1 0
231 1 13 13 2 0 0 0 1 0
263 122 14 20 2 0 0 0 0 0
367 68 15 23 2 0 0 0 0 0
198 98 15 18 2 0 0 0 0 0
293 161 52 59 1 0 0 0 1 0
412 36 14 20 2 0 0 0 1 0
Can anyone tell me how to put these in dataframe where 1st column contain all the .jpg path next column contains all the coordinates but all the coordinate should be in correspondence to that .jpg path
eg.
Column1 coulmn2 column3
0--Parade/0_Parade_marchingband_1_849.jpg | 2 | 449 330 122 149 0 0 0 0 0 0
0--Parade/0_Parade_Parade_0_904.jpg | 1 | 361 98 263 339 0 0 0 0 0 0
0--Parade/0_Parade_marchingband_1_799.jpg | 45 | 78 221 7 8 0 0 0 0 0
| | 78 238 14 17 2 0 0 0 0 0
| | 3 232 11 15 2 0 0 0 2 0
| | 20 215 12 16 2 0 0 0 2 0
I have tried this
count1=0
count2=0
dict1 = {}
dict2 = {}
dict3 = {}
for i in data[0]:
if (i.find('.jpg') == -1):
dict1[count1] = i
count1+=1
else:
dict2[count2] = i
count2+=1

Related

Read, Process and save bson file

I have a .bson file.
Inside the .bson file, I have a PDF whose data type is bytes.
I need to burn the PDF. which is inside the .bson file in a readable format. Does PDF make sense?
I need help, the steps I have to do in between
Note: I already saved the content in a PDF file and it says the file is damaged
My code:
with open ('LOL.bson') as myfile:
content = myfile.read()
print(content)
{"_id":{"$oid":"59d3522618206812388e35f1"},"files_id":{"$oid":"59d3522618206812388e35f0"},"n":0,"data":{"$binary":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhwdC1QVCkgL1N0cnVjdFRyZWVSb290IDUzIDAgUi9NYXJrSW5mbzw8L01hcmtlZCB0cnVlPj4+Pg0KZW5kb2JqDQoyIDAgb2JqDQo8PC9UeXBlL1BhZ2VzL0NvdW50IDEvS2lkc1sgMyAwIFJdID4+DQplbmRvYmoNCjMgMCBvYmoNCjw8L1R5cGUvUGFnZS9QYXJlbnQgMiAwIFIvUmVzb3VyY2VzPDwvRXh0R1N0YXRlPDwvR1M1IDUgMCBSL0dTOCA4IDA....
Type of data
read_content = bson.json_util.loads(content)
print(read_content['data'])
b'%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n<</Type/Catalog/Pages 2 0 R/Lang(pt-PT) /StructTreeRoot 130 0 R/MarkInfo<</Marked true>>>>\r\nendobj\r\n2 0 obj\r\n<</Type/Pages/Count 1/Kids[ 3 0 R] >>\r\nendobj\r\n3 0 obj\r\n<</Type/Page/Parent 2 0 R/Resources<</ExtGState<</GS5 5 0 R/GS8 8 0 R>>/Font<</F1 6 0 R/F2 29 0 R>>/XObject<</Image9 9 0 R/Image11 11 0 R/Image13 13 0 R/Image15 15 0 R/Image17 17 0 R/Image19 19 0 R/Image21 21 0 R/Image23 23 0 R/Image25 25 0 R/Image27 27 0 R/Image32 32 0 R/Image34 34 0 R/Image35 35 0 R/Image37 37 0 R/Image39 39 0 R/Image41 41 0 R/Image43 43 0 R/Image45 45 0 R/Image47 47 0 R/Image49 49 0 R/Image51 51 0 R/Image53 53 0 R/Image55 55 0 R/Image57 57 0 R/Image59 59 0 R/Image61 61 0 R/Image63 63 0 R/Image65 65 0 R/Image67 67 0 R/Image69 69 0 R/Image71 71 0 R/Image73 73 0 R/Image75 75 0 R/Image77 77 0 R/Image79 79 0 R/Image81 81 0 R/Image83 83 0 R/Image85 85 0 R/Image87 87 0 R/Image89 89 0 R/Image91 91 0 R/Image93 93 0 R/Image95 95 0 R/Image97 97 0 R/Image99 99 0 R/Image101 101 0 R/Image103 103 0 R/Image105 105 0 R/Image107 107 0 R/Image109 109 0 R/Image111 111 0 R/Image113 113 0 R/Image115 115 0 R/Image117 117 0 R/Image119 119 0 R/Image121 121 0 R/Image123 123 0 R/Image125 125 0 R/Image127 127 0 R>>/Pattern<</P31 31 0 R/P33 33 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 960 540] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>\r\nendobj\r\n4 0 obj\r\n<</Filter/FlateDecode/Length 4008>>\r\nstream\r\nx\x9c\xbd[\xcb\x8e\x1d\xb7\x11\xdd\x0f0\xff\xd0K\xc9\x80Z|?\x00\xc3\x0b?"\xd8\x88\x11\'V\x90\x85\xe1\x850\x91\x15\x07\x1a\t\x91\x8c
read_content = bson.json_util.loads(content)
print(type(read_content['data']))
> `<class 'bytes'>
How to save .bson content in a readable format (PDF).

Trying to predict the next number in cyphertext using tensorflow

I am experimenting with machine learning and I wanted to see how difficult it would be to predict a number given a series of other numbers. I have seen it accomplished with people making vectors such as 1-10. However, I wanted to try to do something more difficult. I wanted to do it based on the ciphertext. Here is what I have tried so far:
import numpy as np
import matplotlib.pyplot as plt
#from sklearn.linear_model import LinearRegression
from tensorflow.keras import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Lambda, SimpleRNN
from tensorflow.keras import backend as K
from numpy.polynomial import polynomial as poly
from sklearn.feature_extraction import DictVectorizer
import Pyfhel
def generateInput(x, length):
return np.append(x, [0 for i in range(length)], axis=0)
def main():
HE = Pyfhel.Pyfhel()
HE.contextGen(scheme='BFV', n=2048, q=34, t=34, t_bits=35, sec=128)
HE.keyGen()
a = "Hello"
a = np.asarray(bytearray(a, "utf-8"))
a = HE.encode(a)
ct = HE.encrypt(a).to_bytes('none')
ct = np.asarray([c for c in ct])
length = 100 # How many records to take into account
batch_size = 1
n_features = 1
epochs = 1
generator = TimeseriesGenerator(ct, ct, stride=length, length=length, batch_size=batch_size)
model = Sequential()
model.add(SimpleRNN(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(256, activation='softmax'))
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=['accuracy'])
history = model.fit(generator, epochs=epochs)
for i in range(1, length):
try:
x_input = np.asarray(generateInput(ct[:i], length-len(ct[:i]))).reshape((1, length))
yhat = model.predict(x_input).tolist()
yhat_normalized = [float(i)/sum(yhat[0]) for i in yhat[0]]
yhat_max = max(yhat_normalized)
yhat_index = yhat_normalized.index(yhat_max)
print("based on {} actual {} predicted {}".format(ct[:i], ct[i], yhat_index))
except Exception as e:
print("Error {}".format(e))
if __name__=="__main__":
main()
Now the problem is that all of my predictions are 0. Can anyone explain to me why this is happening? How can I fix this?
Here's what my current output looks like:
based on [94] actual 161 predicted 0
based on [ 94 161] actual 16 predicted 0
based on [ 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0] actual 105 predicted 0
based on [ 94 161 16 3 7 0 0 0 105] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78] actual 6 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78] actual 65 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45] actual 23 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23] actual 12 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12] actual 234 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234] actual 155 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217] actual 42 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42] actual 230 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122] actual 64 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99] actual 53 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53] actual 143 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143] actual 104 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104] actual 96 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96] actual 158 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158] actual 146 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217] actual 34 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34] actual 140 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140] actual 238 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238] actual 76 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76] actual 135 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135] actual 237 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0] actual 2 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0] actual 8 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0] actual 1 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0] actual 240 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240] actual 63 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63] actual 94 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94] actual 161 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0] actual 24 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0 16] actual 0 predicted 0

Trouble with PyTorchLSTM in Thinc

Running the following code:
from thinc.api import chain, PyTorchLSTM, Sigmoid, Embed, with_padded, with_array2d
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 2
model = chain(
Embed(nV=vocab_size, nO=embedding_dim),
with_padded(PyTorchLSTM(nI=embedding_dim,nO=hidden_dim, depth=n_layers)),
with_array2d(Sigmoid(nI=hidden_dim, nO=output_size))
)
model.initialize(X=train_x[:5], Y=train_y[:5])
I get this error: ValueError: Provided 'x' array should be 2-dimensional, but found 3 dimension(s).
Here is x[0], y[0]
[ 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
21025 308 6 3 1050 207 8 2138 32 1 171 57
15 49 81 5785 44 382 110 140 15 5194 60 154
9 1 4975 5852 475 71 5 260 12 21025 308 13
1978 6 74 2395 5 613 73 6 5194 1 24103 5
1983 10166 1 5786 1499 36 51 66 204 145 67 1199
5194 19869 1 37442 4 1 221 883 31 2988 71 4
1 5787 10 686 2 67 1499 54 10 216 1 383
9 62 3 1406 3686 783 5 3483 180 1 382 10
1212 13583 32 308 3 349 341 2913 10 143 127 5
7690 30 4 129 5194 1406 2326 5 21025 308 10 528
12 109 1448 4 60 543 102 12 21025 308 6 227
4146 48 3 2211 12 8 215 23] 1
I am relatively new to building these models, but I think it has to do with the fact that the output of the Pytorch LSTM layer has two dimensions. In a typical torch LSTM you'd stack the output from the LSTM layer (I think), but I'm not sure how to do that here. I assumed with_array2d would help but it doesn't seem to.

Pandas Groupby, MultiIndex, Multiple Columns

I just worked on creating some columns using .transform() to count some entries.
I used this reference.
For example:
userID deviceName POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
0 24 IR_00 85 0 39 0 0
1 24 IR_00 85 0 39 0 0
2 24 IR_00 85 0 39 0 0
3 24 IR_00 85 0 39 0 0
4 25 BED_08 0 109 78 0 0
5 25 BED_08 0 109 78 0 0
6 25 BED_08 0 109 78 0 0
7 24 IR_00 85 0 39 0 0
8 23 IR_09 2 0 0 0 0
9 23 V33_17 3 0 2 0 134
10 23 V33_17 3 0 2 0 134
11 23 V33_17 3 0 2 0 134
12 23 V33_17 3 0 2 0 134
I want to group them by userID and deviceName?
So that it would look like:
userID deviceName POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
0 23 IR_09 2 0 0 0 0
1 V33_17 3 0 2 0 134
2 24 IR_00 85 0 39 0 0
3 25 BED_08 0 109 78 0 0
I also want them to be sorted by userID and maybe make userID and deviceName as multi-index.
I tried the df = df.groupby(['userID', 'deviceName'])
but returned a
<pandas.core.groupby.DataFrameGroupBy object at0x00000249BBB13DD8>.
not the dataframe.
By the way, Im sorry. I dont know how to copy a Jupyter notebook In and Out.
I believe need drop_duplicates with sort_values:
df1 = df.drop_duplicates(['userID', 'deviceName']).sort_values('userID')
print (df1)
userID deviceName POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
8 23 IR_09 2 0 0 0 0
9 23 V33_17 3 0 2 0 134
0 24 IR_00 85 0 39 0 0
4 25 BED_08 0 109 78 0 0
If want create MultiIndex add set_index:
df1 = (df.drop_duplicates(['userID', 'deviceName'])
.sort_values('userID')
.set_index(['userID', 'deviceName']))
print (df1)
POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
userID deviceName
23 IR_09 2 0 0 0 0
V33_17 3 0 2 0 134
24 IR_00 85 0 39 0 0
25 BED_08 0 109 78 0 0

How to change this dataframe with python in order to use collaborative filtering

Here is my original data:
enter image description here
As you can see.The cust_id column records the consumption record for each ID.And second column means the product name,third is the munber they bought each time.
I want to get this kind of data:
enter image description here
The result data shows each customer bought which product and how many.If they never bought,then the data is None.I think this is Sparse matrix.
I have tried many ways and still can't fix it up.....
Maybe pandas?Numpy?
There is problem with duplicates, I add last row with same cust_id and prd_id values for demonstrate it.
print (df)
cust_id prd_id prd_number
8 462 40 1
9 462 46 3
10 462 59 1
11 462 63 13
12 462 67 1
13 462 82 12
14 462 88 1
15 462 163 3
16 463 68 1
17 463 90 1
18 463 159 2
16 464 93 11
20 464 94 8
21 464 96 1
22 464 142 4
23 465 50 1
24 465 50 5
Then need groupby by columns cust_id and prd_id with aggreagting some function like mean() or sum(). Last unstack with replacing NaN to 0:
print (df.groupby(['cust_id', 'prd_id'])['prd_number'].sum().unstack(fill_value=0))
prd_id 40 46 50 59 63 67 68 82 88 90 93 94 96 142 \
cust_id
462 1 3 0 1 13 1 0 12 1 0 0 0 0 0
463 0 0 0 0 0 0 1 0 0 1 0 0 0 0
464 0 0 0 0 0 0 0 0 0 0 11 8 1 4
465 0 0 6 0 0 0 0 0 0 0 0 0 0 0
prd_id 159 163
cust_id
462 0 3
463 2 0
464 0 0
465 0 0

Categories

Resources