How to find number of parameters of a neuralfit model? - python

I am using the neuralfit library to evolve a neural network, but I can't figure out the total number of hyperparameters of the model. I already monitor the size of the neural network, which should give the bias parameters, but it does not include weights of connections.
import neuralfit
import numpy as np
x = np.asarray([[0],[1],[2],[3],[4]])
y = np.asarray([[4],[3],[2],[1],[0]])
model = neuralfit.Model(1,1)
model.compile('alpha', loss='mse', monitors=['size'])
Epoch 100/100 - ... - loss: 0.000000 - size: 4

There are two ways in which this is possible, but the method you want to use depends on the answer you want. If you want to get the total number of parameters that NeuralFit considers, you can do
num_param = len(model.get_nodes()) + len(model.get_connections())
which gives the number of biases + weights of the model.
Alternative method: via Keras
Since NeuralFit models can be exported to Keras, we can use this question to find the number of parameters for a Keras model. However, it adds an additional parameter for every output node of the original model. This is because of the way that NeuralFit's models are converted.
import keras.backend as K
keras_model = model.to_keras()
num_param = np.sum([K.count_params(w) for w in keras_model.trainable_weights])


When is it appropriate to use sample_weights in keras?

According to this question, I learnt that class_weight in keras is applying a weighted loss during training, and sample_weight is doing something sample-wise if I don't have equal confidence in all the training samples.
So my questions would be,
Is the loss during validation weighted by the class_weight, or is it only weighted during training?
My dataset has 2 classes, and I don't actually have a seriously imbalanced class ditribution. The ratio is approx. 1.7 : 1. Is that neccessary to use class_weight to balance the loss or even use oversampling? Is that OK to leave the slightly imbalanced data as the usual dataset treated?
Can I simply consider sample_weight as the weights I give to each train sample? And my trainig samples can be treated with equal confidence, so I probably I don't need to use this.
From the keras documentation it says
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
So class_weight does only affect the loss during traning. I myself have been interested in understanding how the class and sample weights is handled during testing and training. Looking at the keras github repo and the code for metric and loss, it does not seem that either loss or metric is affected by them. The printed values are quite hard to track in the training code like and its corresponding tensorflow backend training functions. So I decided to make a test code to test the possible scenarios, see code below. The conclusion is that both class_weight and sample_weight only affect training loss, no effect on any metrics or validation loss. A little surprising as val_sample_weights (which you can specify) seems to do nothing(??).
This types of question always depends on you problem, how skewed the date is and in what way you try to optimize the model. Are you optimizing for accuracy, then as long as the training data is equally skewed as when the model is in production, the best result will be achieved just training without any over/under sampling and/or class weights.
If you on the other hand have something where one class is more important (or expensive) than another then you should be weighting the data. For example in fraud prevention, where fraud normally is much more expensive than the income of non-fraud. I would suggest you try out unweighted classes, weighted classes and some under/over-sampling and check which gives the best validation results. Use a validation function (or write your own) that best will compare different models (for-example weighting true-positive, false-positive, true-negative and false-negative differently dependent on cost).
A relatively new loss-function that has shown great result at kaggle competitions on skewed data is Focal-loss. Focal-loss reduce the need for over/under-sampling. Unfortunately Focal-loss is not a built inn function in keras (yet), but can be manually programmed.
Yes I think you are correct. I normally use sample_weight for two reasons. 1, the training data have some kind of measuring uncertainty, which if known can be used to weight accurate data more than inaccurate measurements. Or 2, we can weight newer data more than old, forcing the model do adapt to new behavior more quickly, without ignoring valuable old data.
The code for comparing with and without class_weights and sample_weights, while holding the model and everything else static.
import tensorflow as tf
import numpy as np
data_size = 100
x_train = np.random.rand(data_size ,input_size)
y_train= np.random.randint(0,classes,data_size )
#sample_weight_train = np.random.rand(data_size)
x_val = np.random.rand(data_size ,input_size)
y_val= np.random.randint(0,classes,data_size )
#sample_weight_val = np.random.rand(data_size )
inputs = tf.keras.layers.Input(shape=(input_size))
pred=tf.keras.layers.Dense(classes, activation='softmax')(inputs)
model = tf.keras.models.Model(inputs=inputs, outputs=pred)
loss = tf.keras.losses.sparse_categorical_crossentropy
metrics = tf.keras.metrics.sparse_categorical_accuracy
model.compile(loss=loss , metrics=[metrics], optimizer='adam')
# Make model static, so we can compare it between different scenarios
for layer in model.layers:
layer.trainable = False
# base model no weights (same result as without class_weights)
#,y=y_train, validation_data=(x_val,y_val))
class_weights={0:1.,1:1.,2:1.},y=y_train, class_weight=class_weights, validation_data=(x_val,y_val))
# which outputs:
> loss: 1.1882 - sparse_categorical_accuracy: 0.3300 - val_loss: 1.1965 - val_sparse_categorical_accuracy: 0.3100
#changing the class weights to zero, to check which loss and metric that is affected
class_weights={0:0,1:0,2:0},y=y_train, class_weight=class_weights, validation_data=(x_val,y_val))
# which outputs:
> loss: 0.0000e+00 - sparse_categorical_accuracy: 0.3300 - val_loss: 1.1945 - val_sparse_categorical_accuracy: 0.3100
#changing the sample_weights to zero, to check which loss and metric that is affected
sample_weight_train = np.zeros(100)
sample_weight_val = np.zeros(100),y=y_train,sample_weight=sample_weight_train, validation_data=(x_val,y_val,sample_weight_val))
# which outputs:
> loss: 0.0000e+00 - sparse_categorical_accuracy: 0.3300 - val_loss: 1.1931 - val_sparse_categorical_accuracy: 0.3100
There are some small deviations between using weights and not (even when all weights are one), possible due to fit using different backend functions for weighted and unweighted data or due to rounding error?

Tensorflow 2.0 doesn't compute the gradient

I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convolutional layer, choose the feature map and find the gradients with the respect to the input. The idea is to change the input in such a way that will maximize the activation of the desired feature map. Using tensorflow 2.0 I have a GradientTape that follows the function and then computes the gradient, however the gradient returns None, why is it unable to compute the gradient?
import tensorflow as tf
import matplotlib.pyplot as plt
import time
import numpy as np
from tensorflow.keras.applications import vgg16
class maxFeatureMap():
def __init__(self, model):
self.model = model
self.optimizer = tf.keras.optimizers.Adam()
def getNumLayers(self, layer_name):
for layer in self.model.layers:
if == layer_name:
weights = layer.get_weights()
num = weights[1].shape[0]
return ("There are {} feature maps in {}".format(num, layer_name))
def getGradient(self, layer, feature_map):
pic = vgg16.preprocess_input(np.random.uniform(size=(1,96,96,3))) ## Creates values between 0 and 1
pic = tf.convert_to_tensor(pic)
model = tf.keras.Model(inputs=self.model.inputs,
with tf.GradientTape() as tape:
## predicts the output of the model and only chooses the feature_map indicated
predictions = model.predict(pic, steps=1)[0][:,:,feature_map]
loss = tf.reduce_mean(predictions)
gradients = tape.gradient(loss, pic[0])
self.optimizer.apply_gradients(zip(gradients, pic))
model = vgg16.VGG16(weights='imagenet', include_top=False)
x = maxFeatureMap(model)
x.getGradient(1, 24)
This is a common pitfall with GradientTape; the tape only traces tensors that are set to be "watched" and by default tapes will watch only trainable variables (meaning tf.Variable objects created with trainable=True). To watch the pic tensor, you should add as the very first line inside the tape context.
Also, I'm not sure if the indexing (pic[0]) will work, so you might want to remove that -- since pic has just one entry in the first dimension it shouldn't matter anyway.
Furthermore, you cannot use model.predict because this returns a numpy array, which basically "destroys" the computation graph chain so gradients won't be backpropagated. You should simply use the model as a callable, i.e. predictions = model(pic).
Did you define your own loss function? Did you convert tensor to numpy in your loss function?
As a freshman, I also met the same problem:
When using tape.gradient(loss, variables), it turns out None because I convert tensor to numpy array in my own loss function. It seems to be a stupid but common mistake for freshman.
FYI: When GradientTape is not working, there is a possibility of TensorFlow issue. Checking the TF github if the TF functions being used have known issues would be one of the problem determinations.
Gradients do not exist for variables after tf.concat(). #37726.

Is it possible to update existing text classification model in tensorflow?

I am new to Python and have been performing text classification with tensorflow. I would like to know if this text classification model could be updated with every new data that I might acquire in future so that I would not have to train the model from scratch. Also, sometimes with time, the number of classes might also be more since I am mostly dealing with customer data. Is it possible to update this existing text classification model with data containing more number of classes by using the existing checkpoints?
Given that you are asking 2 different question I'm now answering both separately:
1) Yes, you can continue the training with the new data you have acquired. This is very simple, you just need to restore your model as you do now to use it. Instead of running some placeholder like outputs, or prediction, you should run the optimizer operation.
This translates into the following code:
model = build_model() # this is the function that build the model graph
saver = tf.train.Saver()
with tf.Session() as session:
saver.restore(session, "/path/to/model.ckpt")
########### keep training #########
data_x, data_y = load_new_data(new_data_path)
for epoch in range(1, epochs+1):
all_losses = list()
num_batches = 0
for b_x, b_y in batchify(data_x, data_y)
_, loss =[model.opt, model.loss], feed_dict={model.input:b_x, model.input_y : b_y}
all_losses.append(loss * len(batch_x))
num_batches += 1
print("epoch %d - loss: %2f" % (epoch, sum(losses) / num_batches))
note that you need to now the name of the operations defined by the model in order to run the optimizer (model.opt) and the loss op (model.loss) to train and monitor the loss during training.
2) If you want to change the number of labels you want to use then it is a bit more complicated. If your network is 1 layer feed forward then there is not much to do, because you need to change the matrix dimensionality then you need to retrain everything from scratch. On the other hand, if you have for example a multi-layer network (e.g. an LSTM + dense layer that do the classification) then you can restore the weights of the old model and just train from scratch the last layer. To do that i recommend you to read this answer

Keras variable length input for regression

I am trying to develop a neural network using Keras and TensorFlow, which should be able to take variable length arrays as input and give either some single value (see the toy example below) or classify them (that a problem for later and will not be touched in this question).
The idea is fairly simple.
We have variable length arrays. I am currently using very simple toy data, which is generated by the following code:
import numpy as np
import pandas as pd
from keras import models as kem
from keras import activations as kea
from keras import layers as kel
from keras import regularizers as ker
from keras import optimizers as keo
from keras import losses as kelo
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
n = 100
x = pd.DataFrame(columns=['data','res'])
mms = MinMaxScaler(feature_range=(-1,1))
for i in range(n):
k = np.random.randint(20,100)
ss = np.random.randint(0,100,size=k)
idres = np.sum(ss[np.arange(0,k,2)])-np.sum(ss[np.arange(1,k,2)])
x.loc[i,'data'] = ss
x.loc[i,'res'] = idres
x.res = mms.fit_transform(x.res)
x_train,x_test,y_train, y_test = train_test_split(,x.res,test_size=0.2)
x_train = sliding_window(x_train.as_matrix(),2,2)
x_test = sliding_window(x_test.as_matrix(),2,2)
To put it simple, I generate arrays with random length and the result (output) for each array is sum of even elements - sum of odd elements. Obviously, it can be negative and positive. The output then scaled to the range [-1,1] to fit with tanh activation function.
The Sequential model is generated as following:
model = kem.Sequential()
sgd = keo.SGD(lr=0.1)
mseloss = kelo.mean_squared_error
And the training of the model is doing in the following way:
def calcMSE(model,x_test,y_test):
nTest = len(x_test)
sum = 0
for i in range(nTest):
restest = model.predict(np.reshape(x_test[i],(1,-1,2)))
return sum/nTest
i = 1
mse = calcMSE(model,x_test,np.reshape(y_test.values,(1,-1)))
lrPar = 0
lrSteps = 30
while mse>0.04:
print("Epoch %i" % (i))
for j in range(len(x_train)):
mse = calcMSE(model,x_test,np.reshape(y_test.values,(1,-1)))
The problem is that optimiser gets stuck usually around MSE=0.05 (on test set). Last time I tested, it actually stuck around MSE=0.12 (on test data).
Moreover, if you will look at what the model gives on test data (left column) in comparison with the correct output (right column):
[[-0.11888303]] 0.574923547401
[[-0.17038491]] -0.452599388379
[[-0.20098214]] 0.065749235474
[[-0.22307695]] -0.437308868502
[[-0.2218809]] 0.371559633028
[[-0.2218741]] 0.039755351682
[[-0.22247596]] -0.434250764526
[[-0.17094387]] -0.151376146789
[[-0.17089397]] -0.175840978593
[[-0.16988073]] 0.025993883792
[[-0.16984619]] -0.117737003058
[[-0.17087571]] -0.515290519878
[[-0.21933308]] -0.366972477064
[[-0.09379648]] -0.178899082569
[[-0.17016701]] -0.333333333333
[[-0.17022927]] -0.195718654434
[[-0.11681376]] 0.452599388379
[[-0.21438009]] 0.224770642202
[[-0.12475857]] 0.151376146789
[[-0.2225963]] -0.380733944954
And on training set the same is:
[[-0.22209576]] -0.00764525993884
[[-0.17096499]] -0.247706422018
[[-0.22228305]] 0.276758409786
[[-0.16986915]] 0.340978593272
[[-0.16994311]] -0.233944954128
[[-0.22131597]] -0.345565749235
[[-0.17088912]] -0.145259938838
[[-0.22250554]] -0.792048929664
[[-0.17097935]] 0.119266055046
[[-0.17087702]] -0.2874617737
[[-0.1167363]] -0.0045871559633
[[-0.08695849]] 0.159021406728
[[-0.17082921]] 0.374617737003
[[-0.15422876]] -0.110091743119
[[-0.22185338]] -0.7125382263
[[-0.17069265]] -0.678899082569
[[-0.16963181]] -0.00611620795107
[[-0.17089556]] -0.249235474006
[[-0.17073657]] -0.414373088685
[[-0.17089497]] -0.351681957187
[[-0.17138508]] -0.0917431192661
[[-0.22351067]] 0.11620795107
[[-0.17079701]] -0.0795107033639
[[-0.22246087]] 0.22629969419
[[-0.17044055]] 1.0
[[-0.17090379]] -0.0902140672783
[[-0.23420531]] -0.0366972477064
[[-0.2155242]] 0.0366972477064
[[-0.22192241]] -0.675840978593
[[-0.22220723]] -0.354740061162
[[-0.1671907]] -0.10244648318
[[-0.22705412]] 0.0443425076453
[[-0.22943887]] -0.249235474006
[[-0.21681401]] 0.065749235474
[[-0.12495813]] 0.466360856269
[[-0.17085686]] 0.316513761468
[[-0.17092516]] 0.0275229357798
[[-0.17277785]] -0.325688073394
[[-0.22193027]] 0.139143730887
[[-0.17088208]] 0.422018348624
[[-0.17093034]] -0.0886850152905
[[-0.17091317]] -0.464831804281
[[-0.22241674]] -0.707951070336
[[-0.1735626]] -0.337920489297
[[-0.16984227]] 0.00764525993884
[[-0.16756304]] 0.515290519878
[[-0.22193302]] -0.414373088685
[[-0.22419722]] -0.351681957187
[[-0.11561158]] 0.17125382263
[[-0.16640976]] -0.321100917431
[[-0.21557514]] -0.313455657492
[[-0.22241823]] -0.117737003058
[[-0.22165506]] -0.646788990826
[[-0.22238114]] -0.261467889908
[[-0.1709189]] 0.0902140672783
[[-0.17698884]] -0.626911314985
[[-0.16984172]] 0.587155963303
[[-0.22226149]] -0.590214067278
[[-0.16950315]] -0.469418960245
[[-0.22180589]] -0.133027522936
[[-0.2224243]] -1.0
[[-0.22236891]] 0.152905198777
[[-0.17089345]] 0.435779816514
[[-0.17422611]] -0.233944954128
[[-0.17177556]] -0.324159021407
[[-0.21572633]] -0.347094801223
[[-0.21509495]] -0.646788990826
[[-0.17086846]] -0.34250764526
[[-0.17595944]] -0.496941896024
[[-0.16803505]] -0.382262996942
[[-0.16983894]] -0.348623853211
[[-0.17078683]] 0.363914373089
[[-0.21560851]] -0.186544342508
[[-0.22416025]] -0.374617737003
[[-0.1723443]] -0.186544342508
[[-0.16319042]] -0.0122324159021
[[-0.18837349]] -0.181957186544
[[-0.17371364]] -0.539755351682
[[-0.22232121]] -0.529051987768
[[-0.22187822]] -0.149847094801
As you can see, model output is actually all quite close to each other unlike the training set, where variability is much bigger (although, I should admit, that negative values are dominants in both training and test set.
What am I doing wrong here? Why training gets stuck or is it normal process and I should leave it for much longer (I was doing several hudreds epochs couple of times and still stay stuck). I also tried to use variable learning rate (used, for example, cosine annealing with restarts (as in I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with restarts.
arXiv preprint arXiv:1608.03983, 2016.)
I would appreciate any suggestions both from network structure and training approach and from coding/detailed sides.
Thank you very much in advance for help.

How do you train a linear model in tensorflow?

I have generated some input data in a CSV where 'awesomeness' is 'age * 10'. It looks like this:
age, awesomeness
67, 670
38, 380
32, 320
69, 690
40, 400
It should be trivial to write a tensorflow model that can predict 'awesomeness' from 'age', but I can't make it work.
When I run training, the output I get is:
accuracy: 0.0 <----------------------------------- What!??
accuracy/baseline_label_mean: 443.8
accuracy/threshold_0.500000_mean: 0.0
auc: 0.0
global_step: 6000
labels/actual_label_mean: 443.8
labels/prediction_mean: 1.0
loss: -2.88475e+09
precision/positive_threshold_0.500000_mean: 1.0
recall/positive_threshold_0.500000_mean: 1.0
Please note that this is obviously a completely contrived example, but that is because I was getting the same result with a more complex meaningful model with a much larger data set; 0% accuracy.
This is my attempt at the most minimal possible reproducible test case that I can make which exhibits the same behaviour.
Here's what I'm doing, based on the census example for the DNNClassifier from tflearn:
COLUMNS = ["age", "awesomeness"]
OUTPUT_COLUMN = "awesomeness"
def build_estimator(model_dir):
"""Build an estimator."""
age = tf.contrib.layers.real_valued_column("age")
deep_columns = [age]
m = tf.contrib.learn.DNNClassifier(model_dir=model_dir,
hidden_units=[50, 10])
return m
def input_fn(df):
"""Input builder function."""
feature_cols = {k: tf.constant(df[k].values, shape=[df[k].size, 1]) for k in CONTINUOUS_COLUMNS}
output = tf.constant(df[OUTPUT_COLUMN].values, shape=[df[OUTPUT_COLUMN].size, 1])
return feature_cols, output
def train_and_eval(model_dir, train_steps):
"""Train and evaluate the model."""
train_file_name, test_file_name = training_data()
df_train = pd.read_csv(...) # ommitted for clarity
df_test = pd.read_csv(...)
m = build_estimator(model_dir) input_fn(df_train), steps=train_steps)
results = m.evaluate(input_fn=lambda: input_fn(df_test), steps=1)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
def training_data():
"""Return path to the training and test data"""
training_datafile = path.join(path.dirname(__file__), 'data', '')
test_datafile = path.join(path.dirname(__file__), 'data', 'data.test')
return training_datafile, test_datafile
model_folder = 'scripts/model' # Where to store the model
train_steps = 2000 # How many iterations to run while training
train_and_eval(model_folder, train_steps)
A couple of notes:
The original example tutorial this is based on is here
Notice I am using the DNNClassifier, not the LinearClassifier as I want specifically to deal with continuous input variables.
A lot of examples just use 'premade' data sets which are known to work with examples; my data set has been manually generated and is absolutely not random.
I have verified the csv loader is loading the data correctly as int64 values.
Training and test data are generated identically, but have different values in them; however, using as the test data still returns a 0% accuracy, so there's no question that something isn't working, this isn't just over-fitting.
First of all, you are describing a regression task, not a classification task. Therefore, both, DNNClassifier and LinearClassifier would be the wrong thing to use. That also makes accuracy the wrong quantity to use to tell if your model works or not. I suggest you read up on these two different context e.g. in the book "The Elements of Statistical Learning"
But here is a short answer to your problem. Say you have a linear model
awesomeness_predicted = slope * age
where slope is the parameter you want to learn from data. Lets say you have data age[0], ..., age[N] and the corresponding awesomeness values a_data[0],...,a_data[N]. In order to specify if your model works well, we are going to use mean squared error, that is
error = sum((a_data[i] - a_predicted[i])**2 for i in range(N))
What you want to do now is start with a random guess for slope and gradually improving using gradient descent. Here is a full working example in pure tensorflow
import tensorflow as tf
import numpy as np
DTYPE = tf.float32
## Generate Data
age = np.array([67, 38, 32, 69, 40])
awesomeness = 10 * age
## Generate model
# define the parameter of the model
slope = tf.Variable(initial_value=tf.random_normal(shape=(1,), dtype=DTYPE))
# define the data inputs to the model as variable size tensors
x = tf.placeholder(DTYPE, shape=(None,))
y_data = tf.placeholder(DTYPE, shape=(None,))
# specify the model
y_pred = slope * x
# use mean squared error as loss function
loss = tf.reduce_mean(tf.square(y_data - y_pred))
target = tf.train.AdamOptimizer().minimize(loss)
## Train Model
init = tf.global_variables_initializer()
with tf.Session() as sess:
for epoch in range(100000):
_, training_loss =[target, loss],
feed_dict={x: age, y_data: awesomeness})
print("Training loss: ", training_loss)
print("Found slope=",
There are a few things I would like to say.
Assuming you load the data correctly:
-This looks like a regression task and you are using a classifier. I'm not saying it doesn't work at all, but like this your are giving a label to each entry of age and training on the whole batch each epoch is very unstable.
-You are getting a huge value for the loss, your gradients are exploding. Having this toy dataset you probably need to tune hyperparameters like hidden neurons, learning rate and number of epochs. Try to log the loss value for each epoch and see if that may be the problem.
-Last suggestion, make your data work with a simpler model, possibly suited for your task, like a regression model and then scale up
See also for using tflearn to solve this.
""" Multiple Regression/Multi target Regression Example
The input features have 10 dimensions, and target features are 2 dimension.
from __future__ import absolute_import, division, print_function
import tflearn
import numpy as np
# Regression data- 10 training instances
#10 input features per instance.
#2 output features per instance
# Multiple Regression graph, 10-d input layer
input_ = tflearn.input_data(shape=[None,10])
#10-d fully connected layer
r1 = tflearn.fully_connected(input_,10)
#2-d fully connected layer for output
r1 = tflearn.fully_connected(r1,2)
r1 = tflearn.regression(r1, optimizer='sgd', loss='mean_square',
metric='R2', learning_rate=0.01)
m = tflearn.DNN(r1),Y, n_epoch=100, show_metric=True, snapshot_epoch=False)
#Predict for 1 instance
print("\nInput features: ",testinstance)
print("\n Predicted output: ")

