I'm training a model whose output is a softmax layer of size 19. When I try model.predict(x), for each input, I get what appears to be a probability distribution across the 19 classes. I tried model.predict_classes, and got a numpy array of the size of x, with each output equal to 0. How can I get one hot vectors for the output?
So a documentation of predcit_classes is somehow misleading because if you check carefully its implementation, you'll find out that it works only for binary classification. In order to solve your problem you may use the numpy library (basically - a function argmax) in a following way:
import numpy as np
classes = np.argmax(model.predict(x), axis = 1)
.. in order to get an array with a class number for each example. In order to get a one-hot vector - you might use a keras built-in function to_categorical in a following manner:
import numpy as np
from keras.utils.np_utils import to_categorical
classes_one_hot = to_categorical(np.argmax(model.predict(x), axis = 1))
Related
I am getting correct_eval as 0. I have used boston dataset. Splitted into training and testing. Used tensorflow for training the model. (Not keras). The neural networks consists of 2 hidden layers of size 13 each and input size is also 13.
import pandas as pd
import numpy as np
data=pd.read_csv("Boston_Housing.csv")
x=data.iloc[:,0:13]
x=np.array(x)
y=data.iloc[:,13]
y=np.array(y)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y)
import tensorflow as tf
tf.__version__
input_width=13;
num_layers=2
n_hidden_layer1=13
n_hidden_layer2=13
n_op=1
weights={
"w_h1":tf.Variable(tf.random_normal([input_width,n_hidden_layer1])),
"w_h2":tf.Variable(tf.random_normal([n_hidden_layer1,n_hidden_layer2])),
"w_op":tf.Variable(tf.random_normal([n_hidden_layer2,n_op]))
}
biases={
"b_h1":tf.Variable(tf.random_normal([n_hidden_layer1])),
"b_h2":tf.Variable(tf.random_normal([n_hidden_layer2])),
"b_op":tf.Variable(tf.random_normal([n_op]))
}
tf.trainable_variables()
def forwardPropagation(x,weights,biases):
ip_h1=tf.add(tf.matmul(x,weights['w_h1']),biases['b_h1'])
op_h1=tf.nn.relu(ip_h1)
ip_h2=tf.add(tf.matmul(op_h1,weights['w_h2']),biases['b_h2'])
op_h2=tf.nn.relu(ip_h2)
ip_op=tf.add(tf.matmul(op_h2,weights['w_op']),biases['b_op'])
op_op=tf.nn.relu(ip_op)
return op_op
s=tf.Session()
s.run(tf.global_variables_initializer())
x=tf.placeholder("float",[None,input_width])
y=tf.placeholder("float",[None,n_op])
pred=forwardPropagation(x,weights,biases)
correct_pred=tf.equal(pred,y_train)
pred_eval,correct_eval=s.run([pred,correct_pred],feed_dict={x:x_train,y:y_train})
pred_eval,correct_eval
correct_eval.sum()
correct_eval
correct_eval is 0. Which means no prediction is correct. pred values are mostly 0 or completely random. kindly help me resolve this.
Take a look at this line of code:
correct_pred=tf.equal(pred,y_train)
You are evaluating outputs from an untrained regression model using equality. There are a couple problems with this.
The values in y_train are produced by 3 layers that have random weights and biases. Each layer transforms the inputs using completely random transformations. Before you train your model on the dataset, the values in y_train will be very different from the values in pred.
pred and y_train both contain continuous values. It is almost always a bad idea to check for absolute equality between two continuous values because they need to be exactly the same values for the equality to be True. Say that you have trained your model, and the outputs in pred match the values in y_train very closely. Unless they exactly match up to the last significant digit, the comparison will always be False. Therefore, you always get correct_eval=0.
Most probably, you will want to calculate a metric like the mean square error (MSE) between pred and y_train. tf.keras.losses.MeanSquaredError is the common way to calculate the MSE in Tensorflow 2.0.
As for this,
pred values are mostly 0 or completely random.
You are passing the outputs from the last layer through a ReLU function, which returns 0 for all negative inputs. Again, since the network's outputs come from random transformations, the outputs are random values with zeros in place of negative values. This is expected, and you will need to train your network for it to give any meaningful outputs.
It also looks like you are using Tensorflow 1.x, in which case you can use tf.losses.mean_squared_error.
Good luck!
I would like to plot all the different loss functions available in Keras. Therefore I have created a dataframe and invoke the loss function. But how can I get back the values from the tensor?
import numpy as np
import pandas as pd
from keras import losses
points = 100
df = pd.DataFrame({"error": np.linspace(-3,3,points)})
df["mean_squared_error"] = losses.mean_squared_error(np.zeros(points), df["error"])
df.plot(x="error")
The loss functions in Keras return a Tensor object. You need to evaluate that Tensor object using the eval() function from the backend to get its actual value. Further, if you take a look at the definition of loss functions in Keras, say mean_squared_error(), you would realize that there is K.mean() operation which takes the average over the last axis which is the output axis (don't confuse this with batch or sample axis). Therefore, you may need to pass the true and predicted values in a shape of (n_samples, n_outputs), hence the reshapes in the following code:
import numpy as np
import pandas as pd
from keras import losses
from keras import backend as K
points = 100
df = pd.DataFrame({"error": np.linspace(-3,3,points)})
mse_loss = losses.mean_squared_error(np.zeros((points,1)), df["error"].values.reshape(-1,1))
df["mean_squared_error"] = K.eval(mse_loss)
df.plot(x="error")
Here is the output plot:
Everyone!
I am trying to develop a neural network using Keras and TensorFlow, which should be able to take variable length arrays as input and give either some single value (see the toy example below) or classify them (that a problem for later and will not be touched in this question).
The idea is fairly simple.
We have variable length arrays. I am currently using very simple toy data, which is generated by the following code:
import numpy as np
import pandas as pd
from keras import models as kem
from keras import activations as kea
from keras import layers as kel
from keras import regularizers as ker
from keras import optimizers as keo
from keras import losses as kelo
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
n = 100
x = pd.DataFrame(columns=['data','res'])
mms = MinMaxScaler(feature_range=(-1,1))
for i in range(n):
k = np.random.randint(20,100)
ss = np.random.randint(0,100,size=k)
idres = np.sum(ss[np.arange(0,k,2)])-np.sum(ss[np.arange(1,k,2)])
x.loc[i,'data'] = ss
x.loc[i,'res'] = idres
x.res = mms.fit_transform(x.res)
x_train,x_test,y_train, y_test = train_test_split(x.data,x.res,test_size=0.2)
x_train = sliding_window(x_train.as_matrix(),2,2)
x_test = sliding_window(x_test.as_matrix(),2,2)
To put it simple, I generate arrays with random length and the result (output) for each array is sum of even elements - sum of odd elements. Obviously, it can be negative and positive. The output then scaled to the range [-1,1] to fit with tanh activation function.
The Sequential model is generated as following:
model = kem.Sequential()
model.add(kel.LSTM(20,return_sequences=False,input_shape=(None,2),recurrent_activation='tanh'))
model.add(kel.Dense(20,activation='tanh'))
model.add(kel.Dense(10,activation='tanh'))
model.add(kel.Dense(5,activation='tanh'))
model.add(kel.Dense(1,activation='tanh'))
sgd = keo.SGD(lr=0.1)
mseloss = kelo.mean_squared_error
model.compile(optimizer=sgd,loss=mseloss,metrics=['accuracy'])
And the training of the model is doing in the following way:
def calcMSE(model,x_test,y_test):
nTest = len(x_test)
sum = 0
for i in range(nTest):
restest = model.predict(np.reshape(x_test[i],(1,-1,2)))
sum+=(restest-y_test[0,i])**2
return sum/nTest
i = 1
mse = calcMSE(model,x_test,np.reshape(y_test.values,(1,-1)))
lrPar = 0
lrSteps = 30
while mse>0.04:
print("Epoch %i" % (i))
print(mse)
for j in range(len(x_train)):
ntrain=j
model.train_on_batch(np.reshape(x_train[ntrain],(1,-1,2)),np.reshape(y_train.values[ntrain],(-1,1)))
i+=1
mse = calcMSE(model,x_test,np.reshape(y_test.values,(1,-1)))
The problem is that optimiser gets stuck usually around MSE=0.05 (on test set). Last time I tested, it actually stuck around MSE=0.12 (on test data).
Moreover, if you will look at what the model gives on test data (left column) in comparison with the correct output (right column):
[[-0.11888303]] 0.574923547401
[[-0.17038491]] -0.452599388379
[[-0.20098214]] 0.065749235474
[[-0.22307695]] -0.437308868502
[[-0.2218809]] 0.371559633028
[[-0.2218741]] 0.039755351682
[[-0.22247596]] -0.434250764526
[[-0.17094387]] -0.151376146789
[[-0.17089397]] -0.175840978593
[[-0.16988073]] 0.025993883792
[[-0.16984619]] -0.117737003058
[[-0.17087571]] -0.515290519878
[[-0.21933308]] -0.366972477064
[[-0.09379648]] -0.178899082569
[[-0.17016701]] -0.333333333333
[[-0.17022927]] -0.195718654434
[[-0.11681376]] 0.452599388379
[[-0.21438009]] 0.224770642202
[[-0.12475857]] 0.151376146789
[[-0.2225963]] -0.380733944954
And on training set the same is:
[[-0.22209576]] -0.00764525993884
[[-0.17096499]] -0.247706422018
[[-0.22228305]] 0.276758409786
[[-0.16986915]] 0.340978593272
[[-0.16994311]] -0.233944954128
[[-0.22131597]] -0.345565749235
[[-0.17088912]] -0.145259938838
[[-0.22250554]] -0.792048929664
[[-0.17097935]] 0.119266055046
[[-0.17087702]] -0.2874617737
[[-0.1167363]] -0.0045871559633
[[-0.08695849]] 0.159021406728
[[-0.17082921]] 0.374617737003
[[-0.15422876]] -0.110091743119
[[-0.22185338]] -0.7125382263
[[-0.17069265]] -0.678899082569
[[-0.16963181]] -0.00611620795107
[[-0.17089556]] -0.249235474006
[[-0.17073657]] -0.414373088685
[[-0.17089497]] -0.351681957187
[[-0.17138508]] -0.0917431192661
[[-0.22351067]] 0.11620795107
[[-0.17079701]] -0.0795107033639
[[-0.22246087]] 0.22629969419
[[-0.17044055]] 1.0
[[-0.17090379]] -0.0902140672783
[[-0.23420531]] -0.0366972477064
[[-0.2155242]] 0.0366972477064
[[-0.22192241]] -0.675840978593
[[-0.22220723]] -0.354740061162
[[-0.1671907]] -0.10244648318
[[-0.22705412]] 0.0443425076453
[[-0.22943887]] -0.249235474006
[[-0.21681401]] 0.065749235474
[[-0.12495813]] 0.466360856269
[[-0.17085686]] 0.316513761468
[[-0.17092516]] 0.0275229357798
[[-0.17277785]] -0.325688073394
[[-0.22193027]] 0.139143730887
[[-0.17088208]] 0.422018348624
[[-0.17093034]] -0.0886850152905
[[-0.17091317]] -0.464831804281
[[-0.22241674]] -0.707951070336
[[-0.1735626]] -0.337920489297
[[-0.16984227]] 0.00764525993884
[[-0.16756304]] 0.515290519878
[[-0.22193302]] -0.414373088685
[[-0.22419722]] -0.351681957187
[[-0.11561158]] 0.17125382263
[[-0.16640976]] -0.321100917431
[[-0.21557514]] -0.313455657492
[[-0.22241823]] -0.117737003058
[[-0.22165506]] -0.646788990826
[[-0.22238114]] -0.261467889908
[[-0.1709189]] 0.0902140672783
[[-0.17698884]] -0.626911314985
[[-0.16984172]] 0.587155963303
[[-0.22226149]] -0.590214067278
[[-0.16950315]] -0.469418960245
[[-0.22180589]] -0.133027522936
[[-0.2224243]] -1.0
[[-0.22236891]] 0.152905198777
[[-0.17089345]] 0.435779816514
[[-0.17422611]] -0.233944954128
[[-0.17177556]] -0.324159021407
[[-0.21572633]] -0.347094801223
[[-0.21509495]] -0.646788990826
[[-0.17086846]] -0.34250764526
[[-0.17595944]] -0.496941896024
[[-0.16803505]] -0.382262996942
[[-0.16983894]] -0.348623853211
[[-0.17078683]] 0.363914373089
[[-0.21560851]] -0.186544342508
[[-0.22416025]] -0.374617737003
[[-0.1723443]] -0.186544342508
[[-0.16319042]] -0.0122324159021
[[-0.18837349]] -0.181957186544
[[-0.17371364]] -0.539755351682
[[-0.22232121]] -0.529051987768
[[-0.22187822]] -0.149847094801
As you can see, model output is actually all quite close to each other unlike the training set, where variability is much bigger (although, I should admit, that negative values are dominants in both training and test set.
What am I doing wrong here? Why training gets stuck or is it normal process and I should leave it for much longer (I was doing several hudreds epochs couple of times and still stay stuck). I also tried to use variable learning rate (used, for example, cosine annealing with restarts (as in I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with restarts.
arXiv preprint arXiv:1608.03983, 2016.)
I would appreciate any suggestions both from network structure and training approach and from coding/detailed sides.
Thank you very much in advance for help.
I am attempting to gather the indices of specific tensors/(vectors/matrices) within a tensor in keras. Therefore, I attempted to use tf.gather with tf.where to get the indices to use in the gather function.
However, tf.where provides element wise indices for the matching values when testing for equality. I would like to have the ability to find the indices (rows) for tensors (vectors) which are equal to another.
This is especially useful for finding the one-hot vectors within a tensor which match a set of one-hot vectors of interest.
I have some code to illustrate the shortcoming so far:
# standard
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import LabelBinarizer
sess = tf.Session()
# one-hot vector encoding labels
l = LabelBinarizer()
l.fit(['a','b','c'])
# input tensor
t = tf.constant(l.transform(['a','a','c','b', 'a']))
# find the indices where 'c' is label
# ***THIS WORKS***
np.all(t.eval(session = sess) == l.transform(['c']), axis = 1)
# We need to do everything in tensorflow and then wrap in Lambda layer for keras so...
from keras import backend as K
# ***THIS DOES NOT WORK***
K.all(t.eval(session = sess) == l.transform(['c']), axis = 1)
# go on from here to get smaller subset of vectors from another tensor with the indicies given by `tf.gather`
Clearly the code above shows I have tried to get this conditional by axis to work, and it does fine in numpy, but the tensorflow version is not as easily ported from numpy.
Is there a better way to do this?
Similarly to what you do, we can use tf.reduce_all which is the tensorflow equivalent of np.all:
tf.reduce_all(t.eval(session = sess) == l.transform(['c']), axis = 1)
I'm building a chacter-based rnn model using Keras (Theano backend). One thing to note is that I don't want to use a prebuilt loss function. Instead, I want to calculate loss for some datapoints. Here's what I mean.
Vectoried training set and its label look like this:
X_train = np.array([[0,1,2,3,4]])
y_train = np.array([[1,2,3,4,5]])
But I replaced first k element in the y_train with 0 for some reason. So, for example, new y_train is
y_train = np.array([[0,0,3,4,5]])
The reason why I set the first two elements to 0 is I don't want to include them when computing loss. In other words, I want to calculate the loss between X_train[2:] and y_train[2:].
Here's my try.
import numpy as np
np.random.seed(0) # for reproducibility
from keras.preprocessing import sequence
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Embedding
from keras.layers import LSTM
from keras.layers.wrappers import TimeDistributed
X_train = np.array([[0,1,2,3,4]])
y_train = np.array([[0,0,3,4,5]])
y_3d = np.zeros((y_train.shape[0], y_train.shape[1], 6))
for i in range(y_train.shape[0]):
for j in range(y_train.shape[1]):
y_3d[i, j, y_train[i,j]] = 1
model = Sequential()
model.add(Embedding(6, 5, input_length=5, dropout=0.2))
model.add(LSTM(5, input_shape=(5, 12), return_sequences=True) )
model.add(TimeDistributed(Dense(6))) #output classes =6
model.add(Activation('softmax'))
from keras import backend as K
import theano.tensor as T
def custom_objective(y_true,y_pred):
# Find the last index of minimum value in y_true, axis=-1
# For example, y_train = np.array([[0,0,3,4,5]]) in my example, and
# I'd like to calculate the loss only between X_train[3:] and y_train[3:] because the values
# in y_train[:3] (i.e.0) are dummies. The following is pseudo code if y_true is 1-d numpy array, which is not true.
def rindex(y_true):
for i in range(len(y_true), -1, -1):
if y_true(i) == 0:
return i
starting_point = rindex(y_true)
return K.categorical_crossentropy(y_pred[starting_point:], y_true[starting_point:])
model.compile(loss=custom_objective,
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_t, batch_size=batch_size, nb_epoch=1)
Appart from minor errors like the wrong paranthesis in line 35 and a wrong variable name in the last line, there are two problems with your code.
First, the model you defined will return a matrix of probability distributions (due to the softmax activation) for classes at each timestep.
But in custom_objective you are treating the output as vectors. You are already correctly transforming y_train to a matrix above.
So you would first have to get the actual predictions, the most simplest case is assigning the class with highest probability, i.e.:
y_pred = y_pred.argmax(axis=2)
y_true = y_true.argmax(axis=2) # this reconstructs y_train resp. a subset thereof
The second problem is that you are treating these like real variables (numpy arrays).
However, y_true and y_pred are symbolic tensors. The error you get clearly states one of the resulting problems:
TypeError: object of type 'TensorVariable' has no len()
TensorVariables have no length, as it is simply not known before real values are inserted! This then also makes iteration the way you implemented it impossible.
By the way, in cases where you iterate real vectors you might want to do it backward iteration like this: range(len(y_true)-1, -1, -1) to not go out of bounds, or even for val in y_true[::-1]
To achieve what you want, you need to treat the corresponding variables as what they are and use methods supplied for tensors.
The center of this calculation is the argmin function to find the minimum. By default this returns the first occurrence of this minimum.
Since you want to find the last occurrence of this minimum, we need to apply it to the reversed tensor and calculate it back to an index into the origianl vector.
starting_point = y_true.shape[0] - y_true[::-1].argmin() - 1
Possibly, there might be an even simpler solution to your problem as it looks like you are trying to implement something like masking.
You might want to take a look at the mask_zero=True flag for Embedding layers. This would work on the input side, though.