Calculating processing time of a deep learning model

Calculating processing time of a deep learning model - python

My model deals with videos, and I want to calculate how fast it can process frames as in frames per second or processing time for 1 frame.
I have made a single function to get predictions, it takes in raw frames as input, does all the preprocessing, and returns the classification. One of the preprocessing steps is sampling the frames from the video, basically, it reduces the number of frames which go into the deep learning model by 1/5. Without all the preprocessing, the model won't perform as expected.
So my question is, should I consider the preprocessing time aswell? And, most importantly, is this processing time for all frames or just for the frames the model actually sees?
Sample code structure as below:
start = time.time()
prediction = main(data)
end = time.time
print("Time for 1 frame=",(end-start)/n_frames) # lets say n_frames = 50
Inside the main function:
preprocessed = preprocess(data) # resizing, sampling down from 50 to 10 frames
prediction = model.predict(preprocessed)
return prediction
Example: Input is of 50 frames, and total time taken to preprocess them and make predictions is 1 second. (Note that the model only sees 10 preprocessed frames)
So, the processing time for 1 frame is 1/50seconds. OR Should it be 1/10seconds, as model only gets to process 10 frames, others simply get skipped in preprocessing. And where should I put the start time and end time frame?
Which way is the standard way or the right way?

There is no standard way, it depends on what exactly you're going to use the results for.
If you're trying to ONLY demonstrate the time used up by the deep learning model, don't include the preprocessing steps.
If you want to time the end to end process, include the entire pipeline.
An even better solution would be to profile you're code. This will give you a breakdown of how long each part of your code takes so you don't have to pick on or the other.
In your case, since you want to put the time in terms of the number of frames, I don't care about how many frames your model sees or any of the in workings of your pipeline. As a user, all I care about is if I put X number of frames in, how long will it take. So go with 50 frames / second.

Related

How to arrange multiple multivariate time series of different length before passing it to Keras LSTM layer

I have a number of multivariate time series that are produced by the same kind of process but:
are of significantly different lengths;
each time series is an independent instance, and the measurements are taken at different, quite random timestamps;
each time series is related at every timestamp to two targets.
In other words:
each time series has a shape of (n_timestamps, n_features)
each target series has a shape of (n_timestamps, 2).
To give an example, this could be treated as stocks of different companies, that are described by few various features and the target at a given timestamp are probabilities that the final price at the end of the year will be higher than x, except we learn them directly from magically given ground-truth probabilities (instead of observed 0/1 responses).
I want to be able to predict the target at each time point and I wanted to give RNNs a try. However, I'm having issues with figuring out how I should arrange the data before passing it to Keras LSTM layers. The main things I'm wondering about are:
I want my RNN to use data starting from the beginning of the series to make prediction at time t, not only last k timestamps. I can't really use the whole history directly without exploding the gradient (it's too long), therefore I need a way to "remember" previously learned weights even though in reality my RNN will loop over last k timestamps.
Each time series has different length, so I'm unsure how to make things compatible with each other. I'm aware of padding as an option, but since the difference in length of examples can be as significant as 1000 vs 3000 this will results in many training examples that constitutes only of padding value.
Since measurements are taken at different timestamps, I believe it may affect my network in a sense that it can't really learn that e.g. last 10 timestamps are the most important. Or even if it can, these last 10 timestamps will have different lengths in reality for each input time-series... How big problem is this? Should I start with resampling all examples to the same time points (e.g. by interpolating)?
My current thinking is that:
I can pad each of my example sequences to the same length (max(n_timestamps))
Create batches of short sequences of length k, where k represents the length of the loop of RNN layer. In consequence, assuming I have 200 example sequences with the longest one has 3000 timestamps and my selected k is 50, it would result in 3000/50=60 batches of (200, 50) shape. Or should I make 3000-1 batches where one batch differs from the next one only by one timestamp (i.e. while the fist batch has timestamps from 1 to 50, the next batch has timestamps from 2 to 51 etc.)?
Since padding was used, I would need to use Masking layer. Some (quite many) of the rows in prepared batches would constitute of inputs that should be ignored completely (as they would only have padding value for all 50 elements).
Is this the correct way to prepare the data for my problem? Can it be done better to not introduce bottlenecks such as learning using examples of only padding value (that should be ignored with masking layer). Or how can I prepare that data to address points 1., 2. and 3. described above?

each time series has a shape of (n_timestamps, n_features)
each target series has a shape of (n_timestamps, 2).
Okay, this is pretty standard so far.
I want my RNN to use data starting from the beginning of the series to make prediction at time t, not only last k timestamps. I can't really use the whole history directly without exploding the gradient (it's too long), therefore I need a way to "remember" previously learned weights even though in reality my RNN will loop over last k timestamps.
Check and make sure you actually need this. An RNN (or a Transformer) could use any of/all of the history that you give it. But that's assuming that the history is useful for the predictions you're making.
I'd try training on standard-sized random-clips of the data (like in this tutorial). I'd retrain it a few times with longer and longer clips and see if the model performance plateaus before I run out of memory.
But in Keras it is relatively simple to do exactly the thing you're asking.
Keras RNNs (LSTM, GRU) have these this argument return_states. It allows you to allows you to run the model over part of a sequence, pause, execute a training step, and then continue running exactly where you left off.
(and stateful argument is another mechanism to provide that effect)
The code ends up looking something like this:
class MyModel(keras.Model):
...
def train_step(self, args):
inputs, labels = args
state = self.get_initial_state()
while tf.shape(inputs)[1] != 0:
in_slice, inputs = inputs[:,:100], inputs[:,100:]
label_slice, labels = labels[:, :100], labels[:,100:]
with tf.GradientTape() as tape:
result, state = self(in_slice, state)
loss = self.loss(label_slice, result)
vars = self.trainable_variables
grads = tape.gradient(loss, vars)
self.optimizer.apply_gradients(zip(grads, vars))
It may also be possible to use ForwardAccumulator to collect the gradients. In that case you don't need to cut the sequences into chunks because the memory used by forward accumulator doesn't grow with sequence length. I've never tried before so I don't have example code.
Each time series has different length, so I'm unsure how to make things compatible with each other. I'm aware of padding as an option, but since the difference in length of examples can be as significant as 1000 vs 3000 this will results in many training examples that constitutes only of padding value.
That might be okay, just inefficient. You can make batches of similar sequence lengths using: Dataset.bucket_by_sequence_length
Since measurements are taken at different timestamps, I believe it may affect my network in a sense that it can't really learn that e.g. last 10 timestamps are the most important. Or even if it can, these last 10 timestamps will have different lengths in reality for each input time-series... How big problem is this? Should I start with resampling all examples to the same time points (e.g. by interpolating)?
Interpolating to a fixed rate might be a resonable thing to try if it doesn't make your data too much longer. Just think carefully about making predictions on interpolated values: There's some data leaking back in time from a future measurement.
Another approach would be to make the size of the time-step a feature. If each input is tagged with how long it's been since the last input the model can learn how to handle small or large steps.
I can pad each of my example sequences to the same length (max(n_timestamps))
Yes. Pad, or make clips of a fixed size.
Create batches of short sequences of length k, where k represents the length of the loop of RNN layer. In consequence, assuming I have 200 example sequences with the longest one has 3000 timestamps and my selected k is 50, it would result in 3000/50=60 batches of (200, 50) shape.
That would line up with the code example I gave.
Or should I make 3000-1 batches where one batch differs from the next one only by one timestamp
Either way is fine. But if you want to carry the state over from batch to batch (I'm skeptical that you actually need the carry over) then you need to do them chunk by chunk, not by single-stepping your window.
Since padding was used, I would need to use Masking layer. Some (quite many) of the rows in prepared batches would constitute of inputs that should be ignored completely (as they would only have padding value for all 50 elements).
Yeah, that'll be wasted computation, but it won't hurt anything.

Keras model.predict() taking unreasonable amount of time

I am working on a project where we are using a compiled keras ANN-model to classify different positions based on sensor data received. These data are continuously fed to the model for it to predict via a daemon-thread collecting data in the background. We are having a problem where model.predict() takes up to 2 seconds to finish, even when entering small data-sets. The data-points are arrays containing 38 floats each. The prediction time seems unaffected by the amount of rows supplied, up to a certain amount. We have tried supplying it with only one row, and up to hundreds. The elapsed time stays around 2 seconds. Isn't this time consumption abnormally high, even for the larger data sets?
If it helps:
Our program is using multi-threading to be able to collect the data from the sensors and restructure them so that they fit the predict method of the model. Two daemon threads are running in the background collecting and restructuring data, while the main thread is actively picking data from a queue of already structured data and classifying based on these. Here is the code where we classify based on the data collected:
values = []
rows = 0
while rows < 20:
val = pred_queue.shift()
if val != None:
values.append(val)
rows += 1
rows = 0
values = np.squeeze(values)
start_time = time.perf_counter()
predictions = model.predict(values)
elapsed_time = round(time.perf_counter() - start_time, 2)
print("Predict time: ", elapsed_time)
for i in range(len(predictions)):
print(predictions[i].argmax())
#print(f"Predicted {classification_res} in {elapsed_time}s!")
Some clarification of the code:
The shift() method returns the first entry in the pred_queue(). This will either be an array of 38 floats or None, depending on the queue being empty or not.
What could possibly make these predictions so slow?
Edit
The reason for the confusion around the prediction times is that we have run the same model on some data before compiling it. These data-points were collected from a csv file and put into a pandas dataframe and finally passed to the predict method. These data were not streamed live, but the dataset was much bigger, around 9000 rows each containing 38 floats. This prediction took 0.3 seconds when we timed it. Obviously much faster than our current speeds!

You can try to use the __call__ method directly, as the documentation of the predict method states (emphasis is mine):
Computation is done in batches. This method is designed for performance in large scale inputs. For small amount of inputs that fit in one batch, directly using __call__ is recommended for faster execution, e.g., model(x), or model(x, training=False) if you have layers such as tf.keras.layers.BatchNormalization that behaves differently during inference. Also, note the fact that test loss is not affected by regularization layers like noise and dropout.
Note that this performance hit that you are noticing could be related to the fact that the resources of the machine are limited. Investigate CPU usage, RAM usage, etc.

Is one step equal to on epoch if i train my model on one (large) image?

I am implementing an active learning pipeline with the tensorflow object detection api.
Therefor i am starting with one image from the xView Dataset (about 3000x4000px in size).
Now i am training my faster_rcnn network with a batch size of 1.
If there is only one image to train on and the batch size is 1, is every step (printed in the console) equal to one epoch?
Lets say after 20 active learning cycle there are 20 images in the training dataset and i train for 19 steps, the last image is never trained on, right?
If the image number increases but the step number per active learning cycle stays the same, the network will never train on the later added images or will the training resume where it stopped (image 19 for example)

You seem to understand epoch correctly: it's a training pass in which you train once on each image in the data set. If your batch size is equal to the data set size, then yes, you have one iteration per epoch.
If you train for 19 steps (batch size = 1) and there are 20 images, then one of them will be left out of the near-epoch ... but the left-out image is not necessarily the "last" one, ("last" depending on how your images are ordered). This depends on your data ingestion software -- which you didn't specify.
Most of these input packages employ a "shuffle" operation, a function that will randomly order the data set at the beginning of each epoch.
I have also worked with one ingestion package that did as you suggest, picking up each pass (pseudo-epoch training group) from where the previous one left off. It also had an option to re-shuffle or not, each time the data set was exhausted.
For a definitive answer, you'll have to check your framework's documentation and the configuration choices made for your particular model. If there is no such documentation, you're stuck doing what I've had to do a few times: take ten minutes of primal scream time :-) , and then read the code.

How to run lstm algorithm faster to send result to client side

I have datasets about 100 and each dataset has about 3000 item and I want to get prediction with lstm algorithm for next 2 month with using last 300 item foreach dataset after that I want to trigger this prediction from client(not for all datasets just 1 dataset) side.When user want to see some prediction, Client will send as an parameter name dataset and python script runs lstm algorithm and create some prediction for 1 dataset and send to client.My senario is something like this.But when I run this python script , it takes a long time to work.I need to run it faster.Do you have any suggestion also you can change my senario.But most important point that I need to Show some prediction to client in some minutes for just 1 dataset.I'm open to all kinds of suggestions.Thanks

One thing you could do is use numpy arrays, they are much faster better at dealing with large blocks of memory than the standard python lists.
import numpy as np
numpy documentation can be found here

Augementations in Keras ImageDataGenerator

I have please two questions concerning the ImageDataGenerator:
1) Are the same augmentations used on the whole batch or each image gets its own random transformation?
e.g. for rotation, does the module rotates all the images in the batch with same angle or each image get a random rotation angle ?
2) The data in ImageDataGenerator.flow is looped over (in batches) indefinitely. Is there a way to stop this infinite loop, i.e. doing the augmentation only for n number of time. Because I need to modify the batch_size in each step (not each epoch).
Thanks

Answer from Francois Chollet:
1) Are the same augmentations used on the whole batch or each image gets its own random transformation? e.g. for rotation, does the module rotates all the images in the batch with same angle or each image get a random rotation angle ?
Every single sample has a different unique transformation (e.g. a random rotation within a certain range).
2) The data in ImageDataGenerator.flow is looped over (in batches) indefinitely. Is there a way to stop this infinite loop, i.e. doing the augmentation only for n number of time. Because I need to modify the batch_size in each step (not each epoch). Thanks
Unclear what is meant here. But if you are using model.fit_generator(ImageDataGenerator.flow()) then you can specify samples_per_epoch=... to only yield a specific number of samples from the generator. If you want batch-level granularity, you could do:
for x, y in model.fit_generator(ImageDataGenerator.flow()):
model.train_on_batch(x, y)
In that case you can just break (it's a loop) after any number of batches that you want.

#Neal: Thank you for the prompt answer! You were right, I probably need to better explain my task. My work is somehow similar to classifying video sequences, but my data is saved in a database. I want my code to follow this steps for one epoch:
For i in (number_of_sequences):
Get N, the number of frames in the sequence i (I think that’s
equivalent to batch_size, the number of N of each sequence is
already saved in a list)
Fetch N successive frames from my database and their labels:
X_train, y_train
For j in range(number_of_rotation): -
Perform (the same) data Augmentation on all frames of the sequence (probably using datagen = ImageDataGenerator() and datagen.flow())
Train the network on X, y
My first thought was using model.fit_generator(generator = ImageDataGenerator().flow()) but this way, I can not modify my batch_size, and honestly I did not understand your solution.
Sorry for the long post, but I’m still a novice in both python and NN, but I’m really a big fan of Keras ;)
Thnx!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.