Tensorflow slow on first prediction, much faster after - python

I have a trained dataset and saved the weights, at a later date and running the python script from scratch, I load the model and the weights and do a prediction, this takes for example 10 seconds, then all predictions afterward take 0.5 seconds.
I am measuring the time of the prediction only
t = perf_counter()
a = model.predict(p, verbose=0, workers=8).reshape(1,-1)
print(f'prediction took {perf_counter()-t} seconds')
I was expecting there to be no difference.
I see this post Tensorflow JS first prediction delay but not sure in my case how to "warm up".
I am coding a server and hence the concern as the first time someone issues a request for a prediction and in my case that's 10 of them, the users needs to wait a long time, which in this use case is not good.
Thanks for the help!

Related

Keras model.predict() taking unreasonable amount of time

I am working on a project where we are using a compiled keras ANN-model to classify different positions based on sensor data received. These data are continuously fed to the model for it to predict via a daemon-thread collecting data in the background. We are having a problem where model.predict() takes up to 2 seconds to finish, even when entering small data-sets. The data-points are arrays containing 38 floats each. The prediction time seems unaffected by the amount of rows supplied, up to a certain amount. We have tried supplying it with only one row, and up to hundreds. The elapsed time stays around 2 seconds. Isn't this time consumption abnormally high, even for the larger data sets?
If it helps:
Our program is using multi-threading to be able to collect the data from the sensors and restructure them so that they fit the predict method of the model. Two daemon threads are running in the background collecting and restructuring data, while the main thread is actively picking data from a queue of already structured data and classifying based on these. Here is the code where we classify based on the data collected:
values = []
rows = 0
while rows < 20:
val = pred_queue.shift()
if val != None:
values.append(val)
rows += 1
rows = 0
values = np.squeeze(values)
start_time = time.perf_counter()
predictions = model.predict(values)
elapsed_time = round(time.perf_counter() - start_time, 2)
print("Predict time: ", elapsed_time)
for i in range(len(predictions)):
print(predictions[i].argmax())
#print(f"Predicted {classification_res} in {elapsed_time}s!")
Some clarification of the code:
The shift() method returns the first entry in the pred_queue(). This will either be an array of 38 floats or None, depending on the queue being empty or not.
What could possibly make these predictions so slow?
Edit
The reason for the confusion around the prediction times is that we have run the same model on some data before compiling it. These data-points were collected from a csv file and put into a pandas dataframe and finally passed to the predict method. These data were not streamed live, but the dataset was much bigger, around 9000 rows each containing 38 floats. This prediction took 0.3 seconds when we timed it. Obviously much faster than our current speeds!
You can try to use the __call__ method directly, as the documentation of the predict method states (emphasis is mine):
Computation is done in batches. This method is designed for performance in large scale inputs. For small amount of inputs that fit in one batch, directly using __call__ is recommended for faster execution, e.g., model(x), or model(x, training=False) if you have layers such as tf.keras.layers.BatchNormalization that behaves differently during inference. Also, note the fact that test loss is not affected by regularization layers like noise and dropout.
Note that this performance hit that you are noticing could be related to the fact that the resources of the machine are limited. Investigate CPU usage, RAM usage, etc.

what's the mechanism of `tf.estimator.train_and_evaluate` function to control training and evaluation period?

When I trained a SSD object detection model 20K steps using TensorFlow Object Detection API, I found that the training time varies:
It was training fast on the first 10 minutes, and around 500 steps were performed (i.e. 0.83 steps/seconds). Then it slowed down and took about 40~50 minutes to perform single training step, evaluate the model on the evaluation dataset and save the checkpoint on disk. So I interrupted the training after few steps and continued by restoring the training.
Every time, it training fast on the first 10 minutes and then slowed down sharply as the figures showed.
The model's training are implemented by TensorFlow's Estimator API tf.estimator.train_and_evaluate()
Can anyone explain how it works? How the estimator controls the training and evaluation period? I do not want to evaluate the model for every step!
If you look at the EvalSpec and TrainSpec there is an argument throttle_secs, which is responsible for deciding when evaluation is called. Refer to this heated discussion, which has many details about Estimator methods! Controlling this would be the option to control train and eval cycles. Also in general, train_and_evaluate will work by building a graph of the the training and evaluation operation. The training graph is created only once, but evaluation graph is recreated every time you need to evaluate. This means that it will load the checkpoint that was created during training, which maybe one reason why this is taking so long! Maybe InMemoryEvaluationHook that is mentioned in that discussion can help you out, since it does not reload the checkpoint everytime evaluation is called.

Why does test takes longer than training?

I'am training sklearn KNNClassifier on MNIST digits dataset.
Here is the code :
knn = KNeighborsClassifier()
start_time = time.time()
print (start_time)
knn.fit(X_train, y_train)
elapsed_time = time.time() - start_time
print (elapsed_time)
it takes 40s. However, when I test on test data, it takes more than a few minutes (still running), while there are 6 times less test data than train data.
Here is the code :
y_pred = knn.predict(X_test)
print(confusion_matrix(y_test,y_pred))
Could you explain me why it takes so much time (more time than training) ? Something to solve this ?
Think about the working of the k-NN algorithm. It is a classic example of lazy learning, where at prediction time the distances to the original training data have to be calculated (to decide which are its closest neigbours).
At training time, it doesn't need to do very expensive distance calculation.
So the difference is mostly about going from .fit() to .predict()
When you would actually try to predict the train-set, this would take even longer.
For more information, see e.g. wikipedia
For solutions: think about whether this algorithm is actually ideal for your case, or if you could do with cruder approximation of the distance.

What does epochs mean in Doc2Vec and train when I have to manually run the iteration?

I am trying to understand the epochs parameter in the Doc2Vec function and epochs parameter in the train function.
In the following code snippet, I manually set up a loop of 4000 iterations. Is it required or passing 4000 as epochs parameter in the Doc2Vec enough? Also how epochs in Doc2Vec is different from epochs in train?
documents = Documents(train_set)
model = Doc2Vec(vector_size=100, dbow_words=1, dm=0, epochs=4000, window=5,
seed=1337, min_count=5, workers=4, alpha=0.001, min_alpha=0.025)
model.build_vocab(documents)
for epoch in range(model.epochs):
print("epoch "+str(epoch))
model.train(documents, total_examples=total_length, epochs=1)
ckpnt = model_name+"_epoch_"+str(epoch)
model.save(ckpnt)
print("Saving {}".format(ckpnt))
Also, how and when are the weights updated?
You don't have to manually run the iteration, and you shouldn't call train() more than once unless you're an expert who needs to do so for very specific reasons. If you've seen this technique in some online example you're copying, that example is likely outdated and misleading.
Call train() once, with your preferred number of passes as the epochs parameter.
Also, don't use a starting alpha learning-rate that is low (0.001) that then rises to a min_alpha value 25 times larger (0.025) - that's not how this is supposed to work, and most users shouldn't need to adjust the alpha-related defaults at all. (Again, if you're getting this from an online example somewhere - that's a bad example. Let them know they're giving bad advice.)
Also, 4000 training epochs is absurdly large. A value of 10-20 is common in published work, when dealing with tens-of-thousands to millions of documents. If your dataset is smaller, it may not work well with Doc2Vec, but sometimes more epochs (or smaller vector_size) can still learn something generalizable from tiny data - but still expect to use closer to dozens of epochs (not thousands).
A good intro (albeit with a tiny dataset that barely works with Doc2Vec) is the doc2vec-lee.ipynb Jupyter notebook that's bundled with gensim, and also viewable online at:
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
Good luck!

Debug python tensorflow issue

I am working on a audio set to train a neural network using tensorflow library but there is a weird issue that I can't figure out. So I am following this blog Urban Sound Classification, the only difference is that I have my own dataset.
So everything is working fine if I have small data like about 30 audio files or so but when I use the complete data my training code simply runs couple of iterations outputs cost and then that is about it, no error, exception or warning is thrown the tensorflow session simply doesn't give any further results. Let's see the code to better explanation:
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
_,cost = sess.run([optimizer,cost_function],feed_dict={X:tr_features,Y:tr_labels})
cost_history = np.append(cost_history,cost)
y_pred = sess.run(tf.argmax(y_,1),feed_dict={X: ts_features})
y_true = sess.run(tf.argmax(ts_labels,1))
print("Test accuracy: ",round(session.run(accuracy,
feed_dict={X: ts_features,Y: ts_labels}),3))
So when I run the above code for training on complete data (about 9000 files) it generates cost history for about 2 epochs and then stop generation history but the code keeps on executing like normal jus the session.run() stop outputting results. My guess is that due to some exception the session stops but how do I debug this stupid error? I have nothing to go on. Can anyone advise on this?
Note: I am not sure if this is the right forum but point me in right direction I will move the question if need be.
UPDATE 01:
So I have figured some correlation between the amount of data/learning rate and the error. Here is my understanding of what is happening. So when I was coding I used subset of my original data about 10-15 files for training and the learning rate was 0.01 and it worked well (as in it completed all it's epochs).
When I used 500 files for training it repeated the same behavior as described in original question (it would output 2 iterations and then kaboom not more outputs and no exception or error). I noticed in those iterations that cost was increasing so I tried to lower the learning rate, and viola it worked like a charm with a new learning rate of 0.001. (again all epochs ran and successfully outputted the results)
Finally when I run the training for all of my data that is about 9000 files but I observed the same behavior as previously discussed. So my question now is how much should I lower the learning rate? What is the correlation of the learning rate to the amount of data?

Categories

Resources