I have following dataframe in pandas
code amnt pre_amnt
123 200 200
124 234 0
125 231 231
126 236 0
128 122 130
I want to do a subtraction only when pre_amnt is non zero. My desired dataframe would be
code amnt pre_amnt diff
123 200 200 0
124 234 0 0
125 231 231 0
126 236 0 0
128 122 130 8
So, if pre_amnt is zero then diff should be also 0. How can I do it in pandas?
Use numpy.where:
m = df['pre_amnt'] > 0
df['diff'] = np.where(m, df['pre_amnt'] - df['amnt'], 0)
Another solution with where:
df['diff'] = (df['pre_amnt'] - df['amnt']).where(m, 0)
print (df)
code amnt pre_amnt diff
0 123 200 200 0
1 124 234 0 0
2 125 231 231 0
3 126 236 0 0
4 128 122 130 8
another approach ?
data['diff'] = 0
data.loc[data['pre_amnt'] != 0, 'diff'] = abs(data['pre_amnt'] - data['amnt'])
code amnt pre_amnt diff
0 123 200 200 0
1 124 234 0 0
2 125 231 231 0
3 126 236 0 0
4 128 122 130 8
Related
I am experimenting with machine learning and I wanted to see how difficult it would be to predict a number given a series of other numbers. I have seen it accomplished with people making vectors such as 1-10. However, I wanted to try to do something more difficult. I wanted to do it based on the ciphertext. Here is what I have tried so far:
import numpy as np
import matplotlib.pyplot as plt
#from sklearn.linear_model import LinearRegression
from tensorflow.keras import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Lambda, SimpleRNN
from tensorflow.keras import backend as K
from numpy.polynomial import polynomial as poly
from sklearn.feature_extraction import DictVectorizer
import Pyfhel
def generateInput(x, length):
return np.append(x, [0 for i in range(length)], axis=0)
def main():
HE = Pyfhel.Pyfhel()
HE.contextGen(scheme='BFV', n=2048, q=34, t=34, t_bits=35, sec=128)
HE.keyGen()
a = "Hello"
a = np.asarray(bytearray(a, "utf-8"))
a = HE.encode(a)
ct = HE.encrypt(a).to_bytes('none')
ct = np.asarray([c for c in ct])
length = 100 # How many records to take into account
batch_size = 1
n_features = 1
epochs = 1
generator = TimeseriesGenerator(ct, ct, stride=length, length=length, batch_size=batch_size)
model = Sequential()
model.add(SimpleRNN(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(100, activation='leaky_relu', input_shape=(length, n_features)))
model.add(Dense(256, activation='softmax'))
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=['accuracy'])
history = model.fit(generator, epochs=epochs)
for i in range(1, length):
try:
x_input = np.asarray(generateInput(ct[:i], length-len(ct[:i]))).reshape((1, length))
yhat = model.predict(x_input).tolist()
yhat_normalized = [float(i)/sum(yhat[0]) for i in yhat[0]]
yhat_max = max(yhat_normalized)
yhat_index = yhat_normalized.index(yhat_max)
print("based on {} actual {} predicted {}".format(ct[:i], ct[i], yhat_index))
except Exception as e:
print("Error {}".format(e))
if __name__=="__main__":
main()
Now the problem is that all of my predictions are 0. Can anyone explain to me why this is happening? How can I fix this?
Here's what my current output looks like:
based on [94] actual 161 predicted 0
based on [ 94 161] actual 16 predicted 0
based on [ 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0] actual 105 predicted 0
based on [ 94 161 16 3 7 0 0 0 105] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78] actual 6 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6] actual 78 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78] actual 65 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45] actual 23 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23] actual 12 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12] actual 234 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234] actual 155 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155] actual 45 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217] actual 42 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42] actual 230 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122] actual 64 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99] actual 53 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53] actual 143 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143] actual 104 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104] actual 96 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96] actual 158 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158] actual 146 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0] actual 99 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99] actual 122 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122] actual 217 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217] actual 34 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34] actual 140 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140] actual 238 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238] actual 76 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76] actual 135 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135] actual 237 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0] actual 2 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0] actual 8 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0] actual 1 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0] actual 240 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240] actual 63 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63] actual 94 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94] actual 161 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16] actual 3 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3] actual 7 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0] actual 24 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24] actual 128 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0] actual 0 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0] actual 16 predicted 0
based on [ 94 161 16 3 7 0 0 0 105 128 0 0 0 0 0 0 78 6
78 65 45 23 12 234 155 45 217 42 230 122 64 99 53 143 104 96
158 146 0 99 122 217 34 140 238 76 135 237 0 2 0 0 0 0
0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 240 63 94 161 16 3 7 0 0 0 24
128 0 0 0 0 0 0 0 16] actual 0 predicted 0
I have 2 dataframes:
q = pd.DataFrame({'ID':[700,701,701,702,703,703,702],'TX':[0,0,1,0,0,1,1],'REF':[100,120,144,100,103,105,106]})
ID TX REF
0 700 0 100
1 701 0 120
2 701 1 144
3 702 0 100
4 703 0 103
5 703 1 105
6 702 1 106
and
p = pd.DataFrame({'ID':[700,701,701,702,703,703,702,708],'REF':[100,121,149,100,108,105,106,109],'NOTE':['A','B','V','V','T','A','L','M']})
ID REF NOTE
0 700 100 A
1 701 121 B
2 701 149 V
3 702 100 V
4 703 108 T
5 703 105 A
6 702 106 L
7 708 109 M
I wish to merge p with q in such way that ID are equals AND the REF is exact OR higher.
Example 1:
for p: ID=700 and REF=100 and
for q: ID=700 and RED=100 So that's a clear match!
Example 2
for p:
1 701 0 120
2 701 1 144
they would match to:
1 701 121 B
2 701 149 V
this way:
1 701 0 120 121 B 121 is just after 120
2 701 1 144 149 V 149 comes after 144
When I use the below code NOTE: I only indicate the REF which is wrong. Should be ID AND REF:
p = p.sort_values(by=['REF'])
q = q.sort_values(by=['REF'])
pd.merge_asof(p, q, on='REF', direction='forward').sort_values(by=['ID_x','TX'])
I get this problem:
My expected result should be something like this:
ID TX REF REF_2 NOTE
0 700 0 100 100 A
1 701 0 120 121 B
2 701 1 144 149 V
3 702 0 100 100 V
4 703 0 103 108 T
5 703 1 105 105 A
6 702 1 106 109 L
Does this work?
pd.merge_asof(q.sort_values(['REF', 'ID']),
p.sort_values(['REF', 'ID']),
on='REF',
direction='forward',
by='ID').sort_values('ID')
Output:
ID TX REF NOTE
0 700 0 100 A
5 701 0 120 B
6 701 1 144 V
1 702 0 100 V
4 702 1 106 L
2 703 0 103 A
3 703 1 105 A
The dataframe below has a number of columns but columns names are random numbers.
daily1=
0 1 2 3 4 5 6 7 8 9 ... 11 12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 4 0 0 0 ... 640 777 674 842 786 865 809 674 679 852
1 0 0 0 0 0 0 0 0 0 0 ... 108 29 74 102 82 62 83 68 30 61
2 rows × 244 columns
I would like to organise columns names in numerical order(from 0 to 243)
I tried
for i, n in zip(daily1.columns, range(244)):
asd=daily1.rename(columns={i:n})
asd
but output has not shown...
Ideal output is
0 1 2 3 4 5 6 7 8 9 ... 234 235 236 237 238 239 240 241 242 243
0 0 0 0 0 0 0 4 0 0 0 ... 640 777 674 842 786 865 809 674 679 852
1 0 0 0 0 0 0 0 0 0 0 ... 108 29 74 102 82 62 83 68 30 61
Could I get some advice guys? Thank you
If you want to reorder the columns you can try that
columns = sorted(list(df.columns), reverse=False)
df = df[columns]
If you just want to rename the columns then you can try
df.columns = [i for i in range(df.shape[1])]
I just want a dot product. I am unsure of why I can't have it.
Here are some print statements that describe my data, which I picture as 60,000 vectors of length 784. However, I will just being using the first of these vectors.
print(type(data))
print(data.shape)
print(type(data[0]))
print(data[0].shape)
print(data[0])
print("Result of np.dot: " + str( np.dot(data[0],data[0])) )
print("Result of np.inner: " + str( np.inner(data[0],data[0]) ))
Output:
<class 'numpy.ndarray'>
(60000, 784)
<class 'numpy.ndarray'>
(784,)
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 13 73 0 0 1 4 0 0 0
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
36 136 127 62 54 0 0 0 1 3 4 0 0 3 0 0 0 0
0 0 0 0 0 0 0 0 6 0 102 204 176 134 144 123 23 0
0 0 0 12 10 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 155 236 207 178 107 156 161 109 64 23 77 130 72 15 0 0
0 0 0 0 0 0 0 0 0 1 0 69 207 223 218 216 216 163
127 121 122 146 141 88 172 66 0 0 0 0 0 0 0 0 0 1
1 1 0 200 232 232 233 229 223 223 215 213 164 127 123 196 229 0
0 0 0 0 0 0 0 0 0 0 0 0 0 183 225 216 223 228
235 227 224 222 224 221 223 245 173 0 0 0 0 0 0 0 0 0
0 0 0 0 0 193 228 218 213 198 180 212 210 211 213 223 220 243
202 0 0 0 0 0 0 0 0 0 0 1 3 0 12 219 220 212
218 192 169 227 208 218 224 212 226 197 209 52 0 0 0 0 0 0
0 0 0 0 6 0 99 244 222 220 218 203 198 221 215 213 222 220
245 119 167 56 0 0 0 0 0 0 0 0 0 4 0 0 55 236
228 230 228 240 232 213 218 223 234 217 217 209 92 0 0 0 1 4
6 7 2 0 0 0 0 0 237 226 217 223 222 219 222 221 216 223
229 215 218 255 77 0 0 3 0 0 0 0 0 0 0 62 145 204
228 207 213 221 218 208 211 218 224 223 219 215 224 244 159 0 0 0
0 0 18 44 82 107 189 228 220 222 217 226 200 205 211 230 224 234
176 188 250 248 233 238 215 0 0 57 187 208 224 221 224 208 204 214
208 209 200 159 245 193 206 223 255 255 221 234 221 211 220 232 246 0
3 202 228 224 221 211 211 214 205 205 205 220 240 80 150 255 229 221
188 154 191 210 204 209 222 228 225 0 98 233 198 210 222 229 229 234
249 220 194 215 217 241 65 73 106 117 168 219 221 215 217 223 223 224
229 29 75 204 212 204 193 205 211 225 216 185 197 206 198 213 240 195
227 245 239 223 218 212 209 222 220 221 230 67 48 203 183 194 213 197
185 190 194 192 202 214 219 221 220 236 225 216 199 206 186 181 177 172
181 205 206 115 0 122 219 193 179 171 183 196 204 210 213 207 211 210
200 196 194 191 195 191 198 192 176 156 167 177 210 92 0 0 74 189
212 191 175 172 175 181 185 188 189 188 193 198 204 209 210 210 211 188
188 194 192 216 170 0 2 0 0 0 66 200 222 237 239 242 246 243
244 221 220 193 191 179 182 182 181 176 166 168 99 58 0 0 0 0
0 0 0 0 0 40 61 44 72 41 35 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
Result of np.dot: 183
Result of np.inner: 183
I've done the calculation, and 183 is indeed an underestimate. Could I get an explanation as to what is happening here?
The reason for this behavior is integer overflow.
print(type(data[0][0]))
result:
<class 'numpy.uint8'>
While designing recommendation system, I have stumbled upon a case where voting or something similar is required for collaborative filtering implementation.
But in our system we don't have any field for rating/voting. I am willing to deduce similar kind of rating/voting depending upon the timestamp on which user watched the show.
This is what view-history looks like
subscriber_id content_id timestamp
1 123 1576833135000
1 124 1576833140000
1 125 1576833145000
1 126 1576833150000
1 127 1576833155000
1 128 1576833160000
1 129 1576833165000
1 130 1576833170000
1 131 1576833175000
1 132 1576833180000
2 123 1576833135000
2 124 1576833140000
2 125 1576833145000
2 126 1576833150000
2 127 1576833155000
2 128 1576833160000
2 129 1576833165000
2 130 1576833170000
2 131 1576833175000
2 132 1576833180000
2 133 1576833185000
2 134 1576833190000
2 135 1576833195000
2 136 1576833200000
2 137 1576833205000
2 138 1576833210000
2 139 1576833215000
2 140 1576833220000
2 141 1576833225000
2 142 1576833230000
2 143 1576833235000
2 144 1576833240000
I want to assign a number to each of these entries, ranging from 5-1(5 being most recent), I have implemented the rank system, but it is not working for the range.
df1['rank'] = df1.sort_values(['subscriber_id','timestamp']) \
.groupby(['subscriber_id'])['timestamp'] \
.rank(method='max').astype(int)
Expected Output:
subscriber_id content_id timestamp rating
1 123 1576833135000 1
1 124 1576833140000 1
1 125 1576833145000 2
1 126 1576833150000 2
1 127 1576833155000 3
1 128 1576833160000 3
1 129 1576833165000 4
1 130 1576833170000 4
1 131 1576833175000 5
1 132 1576833180000 5
2 123 1576833135000 1
2 124 1576833140000 1
2 125 1576833145000 1
2 126 1576833150000 1
2 127 1576833155000 2
2 128 1576833160000 2
2 129 1576833165000 2
2 130 1576833170000 2
2 131 1576833175000 3
2 132 1576833180000 3
2 133 1576833185000 3
2 134 1576833190000 3
2 135 1576833195000 4
2 136 1576833200000 4
2 137 1576833205000 4
2 138 1576833210000 4
2 139 1576833215000 4
2 140 1576833220000 5
2 141 1576833225000 5
2 142 1576833230000 5
2 143 1576833235000 5
2 144 1576833240000 5
Any help would be much appreciated!
Now it make sense. Solution is to create list with ranks based on modulo value from dividing number of data for selected user by 5. There You go :)
import pandas as pd
from io import StringIO
data = StringIO("""
content_id subscriber_id timestamp
123 1 1576833135000
124 1 1576833140000
125 1 1576833145000
126 1 1576833150000
127 1 1576833155000
128 1 1576833160000
129 1 1576833165000
130 1 1576833170000
131 1 1576833175000
132 1 1576833180000
123 2 1576833135000
124 2 1576833140000
125 2 1576833145000
126 2 1576833150000
127 2 1576833155000
128 2 1576833160000
129 2 1576833165000
130 2 1576833170000
131 2 1576833175000
132 2 1576833180000
133 2 1576833185000
134 2 1576833190000
135 2 1576833195000
136 2 1576833200000
137 2 1576833205000
138 2 1576833210000
139 2 1576833215000
140 2 1576833220000
141 2 1576833225000
142 2 1576833230000
143 2 1576833235000
144 2 1576833240000
""")
# load data into data frame
df = pd.read_csv(data, sep=' ')
# get unique users
user_list = df['subscriber_id'].unique()
# collect results
results = pd.DataFrame(columns=['content_id','subscriber_id','timestamp','rating'])
for user in user_list:
# select data range for one user
df2 = df[df['subscriber_id'] == user]
items_numer = df2.shape[0]
modulo_remider = items_numer % 5
ranks_repeat = int(items_numer / 5)
# create rating list based on modulo
if modulo_remider > 0:
rating = []
for i in range(1, 6, 1):
l = [i for j in range(ranks_repeat)]
for number in l:
rating.append(number)
if modulo_remider == 1:
rating.insert(rating.index(5), 5)
if modulo_remider == 2:
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
if modulo_remider == 3:
rating.insert(rating.index(3), 3)
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
if modulo_remider == 4:
rating.insert(rating.index(2), 2)
rating.insert(rating.index(3), 3)
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
df2.insert(3, 'rating', rating, True)
else:
rating = []
for i in range(1, 6, 1):
l = [i for j in range(ranks_repeat)]
for number in l:
rating.append(number)
df2.insert(3, 'rating', rating, True)
# collect results
results = results.append(df2)
Result:
content_id subscriber_id timestamp rating
0 123 1 1576833135000 1
1 124 1 1576833140000 1
2 125 1 1576833145000 2
3 126 1 1576833150000 2
4 127 1 1576833155000 3
5 128 1 1576833160000 3
6 129 1 1576833165000 4
7 130 1 1576833170000 4
8 131 1 1576833175000 5
9 132 1 1576833180000 5
10 123 2 1576833135000 1
11 124 2 1576833140000 1
12 125 2 1576833145000 1
13 126 2 1576833150000 1
14 127 2 1576833155000 2
15 128 2 1576833160000 2
16 129 2 1576833165000 2
17 130 2 1576833170000 2
18 131 2 1576833175000 3
19 132 2 1576833180000 3
20 133 2 1576833185000 3
21 134 2 1576833190000 3
22 135 2 1576833195000 4
23 136 2 1576833200000 4
24 137 2 1576833205000 4
25 138 2 1576833210000 4
26 139 2 1576833215000 4
27 140 2 1576833220000 5
28 141 2 1576833225000 5
29 142 2 1576833230000 5
30 143 2 1576833235000 5
31 144 2 1576833240000 5