How to compute/measure inference time of a Tensorflow model?

How to compute/measure inference time of a Tensorflow model? - python

I have 2 TF models that I wanted to compare in terms of how fast they can predict. I'm aware that TF models have evaluate method that outputs ms per step inference. However, is there a way to have higher precision than this? If so, what's the proper way of doing it?

try this:
import time
start_time=time.time()
model.predict( etc.......
end_time=time.time()
duration= end_time-start_time
hours = tr_duration // 3600
minutes = (tr_duration - (hours * 3600)) // 60
seconds = tr_duration - ((hours * 3600) + (minutes * 60))
msg = f'training elapsed time was {str(hours)} hours, {minutes:4.1f} minutes, {seconds:4.2f} seconds)_
print (msg) # print out training duration time

Related

Operands could not be broadcast together with shapes (2,) (4,) when using OpenAI Gym in Jupyter

I have a following code:
def get_discrete_state(state): #We have to change the box type into discrete type to manipulate it
discrete_state = state/step_size + np.array([15,10,1,10])
return tuple(discrete_state.astype(np.int))
#iterate through our epochs
for epoch in range(epochs + 1):
#set the initial time, so we can calculate how much each action takes
t_initial = time.time()
#get the discrete state for the restarted environment, so we know what's going on
discrete_state = get_discrete_state(env.reset())
#we create a boolean that will tell us whether our game is running or not
done = False
#our reward is intialized at zero at the beginning of every eisode
epoch_reward = 0
#Every 1000 epochs we have an episode
if epoch % 1000 == 0:
print("Episode: " + str(epoch))
while not done:
#Now we are in our gameloop
#if some random number is greater than epsilon, then we take the best possible action we have explored so far
if np.random.random() > epsilon:
action = np.argmax(q_table[discrete_state])
#else, we will explore and take a random action
else:
action = np.random.randint(0, env.action_space.n)
#now we will intialize our new_state, reward, and done variables
new_state, reward, done, _ = env.step(action)
epoch_reward += reward
#we discretize our new state
new_discrete_state = get_discrete_state(new_state)
#we render our environment after 2000 steps
if epoch % 2000 == 0:
env.render()
#if the game loop is still running update the q-table
if not done:
max_new_q = np.max(q_table[new_discrete_state])
current_q = q_table[discrete_state + (action,)]
new_q = (1 - lr) * current_q + lr * (reward + gamma* max_new_q)
q_table[discrete_state + (action,)] = new_q
discrete_state = new_discrete_state
# if our epsilon is greater than .05m , and if our reward is greater than the previous and if we reached past our 10000 epoch, we recalculate episilon
if epsilon > 0.05:
if epoch_reward > prev_reward and epoch > 10000:
epsilon = math.pow(epsilon_decay_value, epoch - 10000)
if epoch % 500 == 0:
print("Epsilon: " + str(epsilon))
#we calculate the final time
tfinal = time.time()
#total epoch time
episode_total = tfinal - t_initial
total_time += episode_total
#calculate and update rewards
total_reward += epoch_reward
prev_reward = epoch_reward
#every 1000 episodes print the average time and the average reward
if epoch % 1000 == 0:
mean = total_time / 1000
print("Time Average: " + str(mean))
total_time = 0
mean_reward = total_reward / 1000
print("Mean Reward: " + str(mean_reward))
total_reward = 0
env.close()
If I try to run it, I first get:
/var/folders/6l/gfqkwfbd7rs176sshdhfz5f80000gn/T/ipykernel_74362/3082769315.py:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
discrete_state = state/step_size + np.array([15,10,1,10])
And after that:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [17], in <cell line: 2>()
4 t_initial = time.time()
6 #get the discrete state for the restarted environment, so we know what's going on
----> 7 discrete_state = get_discrete_state(env.reset())
9 #we create a boolean that will tell us whether our game is running or not
10 done = False
Input In [16], in get_discrete_state(state)
1 def get_discrete_state(state): #We have to change the box type into discrete type to manipulate it
----> 2 discrete_state = state/step_size + np.array([15,10,1,10])
3 return tuple(discrete_state.astype(np.int))
ValueError: operands could not be broadcast together with shapes (2,) (4,)
So why does this happen, how can I fix it and make my code work? The code example should work just fine as it is, but for me it doesn't.
BONUS: Also seems like many OpenAI code examples don't work as they exist in different tutorial sources anymore; why is that?
Thank you!

Python "for loop" looping time limit

how to set a time limit for "for loop"?
say I would like the looping each 200mms
for data in online_database:
looping time = 200 mms
print(data)
Thanks!

import time
t_sleep = 0.2
for x in range(10):
print(x)
time.sleep(t_sleep)
this code would sleep for 0.2 secs for every iteration

Maybe something like this could help
import time
for i in YourSequence:
current_millis = round(time.monotonic() * 1000)
max_milis = (current_millis + 200) #200 is for the time difference
while round(time.monotonic() * 1000) < max_milis:
#----Your code------
Additionally, if your program is large, you could add these lines to periodically keep an eye out if your program has crossed the max time limit
if round(time.monotonic() * 1000) > max_milis:
break
Something like this
import time
for i in a:
current_millis = round(time.monotonic() * 1000)
max_milis = (current_millis + 200)
while round(time.monotonic() * 1000) < max_milis:
#Your Codes......
if round(time.monotonic() * 1000) > max_milis:
break
#Your Codes......
if round(time.monotonic() * 1000) > max_milis:
break
#Your Codes......
break

Not understanding how movering average implementation works

I have a simple time series and I have a code implementing the moving average:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
keras = tf.keras
def plot_series(time, series, format="-", start=0, end=None, label=None):
plt.plot(time[start:end], series[start:end], format, label=label)
plt.xlabel("Time")
plt.ylabel("Value")
if label:
plt.legend(fontsize=14)
plt.grid(True)
def trend(time, slope=0):
return slope * time
def seasonal_pattern(season_time):
"""Just an arbitrary pattern, you can change it if you wish"""
return np.where(season_time < 0.4,
np.cos(season_time * 2 * np.pi),
1 / np.exp(3 * season_time))
def seasonality(time, period, amplitude=1, phase=0):
"""Repeats the same pattern at each period"""
season_time = ((time + phase) % period) / period
return amplitude * seasonal_pattern(season_time)
def white_noise(time, noise_level=1, seed=None):
rnd = np.random.RandomState(seed)
return rnd.randn(len(time)) * noise_level
time = np.arange(4 * 365 + 1)
slope = 0.05
baseline = 10
amplitude = 40
series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)
noise_level = 5
noise = white_noise(time, noise_level, seed=42)
series += noise
plt.figure(figsize=(10, 6))
plot_series(time, series)
plt.show()
def moving_average_forecast(series, window_size):
"""Forecasts the mean of the last few values.
If window_size=1, then this is equivalent to naive forecast"""
forecast = []
for time in range(len(series) - window_size):
forecast.append(series[time:time + window_size].mean())
return np.array(forecast)
split_time = 1000
time_train = time[:split_time]
x_train = series[:split_time]
time_valid = time[split_time:]
x_valid = series[split_time:]
moving_avg = moving_average_forecast(series, 30)[split_time - 30:]
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid, label="Series")
plot_series(time_valid, moving_avg, label="Moving average (30 days)")
I am not getting this part:
for time in range(len(series) - window_size):
forecast.append(series[time:time + window_size].mean())
return np.array(forecast)
What I do not understand is how series[time:time + window_size] works? Window_size is given into the function and can be a value specifying how many days are considered to calculate the mean, like 5 or 30 days.
When I try something similiar to illustrate this to myself, like
plot(series[time:time + 30]) this does not work.
Furthermore I do not get how len(series) - window_size) works.

debug your code and add some print statements to see how it is responding
Write them down and try to analyze the results
Step back and write a similar code that reproduce the same output
Compare
if it is the same congrats
if it is no then try to run again with timers on and see which one is faster.
if your code is faster the congrats.

Seems like the function moving_average_forecast simply calculates the x day rolling average? If that is the intention then:
The line for time in range(len(series) - window_size): gives you the index time that goes from 0 to some number n where n + 1 is the number of rolling averages you can get out of a time series of size N (i.e. if you have 11 data points and want to calculate 10 day rolling averages, you can get at most 2, here N = 11 = len(series), window_size = 10, so n = 1 and time = [0, 1]
The line series[time:time + window_size] I think should actually be series[time:time + window_size - 1] simply index into your data contained in series and calculate each of the rolling averages (i.e. using our example earlier, in the first iteration time = 0, time + window_size - 1 = 9 so series[time:time + window_size - 1] returns an array with the first 10 data points and so on
Hope that helps.

Is there a way to increase the line length for an equation in Gekko after receiving "APM model error: string > 15000 characters"?

I'm using Gekko for an optimization problem with constraints that require summations over array variables. Because these arrays are long, I keep getting the error: APM model error: string > 15000 characters
The summation needs to be summed over three variables: i in range(1, years), n in range(1, i), and j in range(1,receptors). As it compiles, the number of variables included in each summation increases. I want to leave the code as a summation with the following line:
m.Equation(emissions[:,3] == sum(sum(sum(f[n,j]*-r[j,2]*unit *(.001*(i-n)**2 + 0.062*(i-n)) for i in range(years)) for n in range(i))for j in range(rec)))
However, these constraints cause the error of more than 15,000 characters for a line.
I have previously solved the problem using for loops and intermediates to solve all of these variables outside of the "constraint" environment. It has given me the right answer, but takes a long time to compile the model (upwards of 4 hours for model building, and less than 3 minutes to solve it). The code looked like this:
for i in range(years):
emissions[i,0] = s[i,1]
emissions[i,1] = s[i,3]
emissions[i,2] = s[i,5]
emissions[i,3] = 0
emissions[i,4] = 0
emissions[i,5] = 0
for n in range(i):
for j in range(rec):
#update + binary * flux * conversion * growth
emissions[i,3] = m.Intermediate(emissions[i,3] + f[n,j] * - rankedcopy[j,2] * unit * (.001*(i-n)**2 + 0.062*(i-n)))
emissions[i,4] = m.Intermediate(emissions[i,4] + f[n,j] * - rankedcopy[j,3] * unit * (.001*(i-n)**2 + 0.062*(i-n)))
emissions[i,5] = m.Intermediate(emissions[i,5] + f[n,j] * - rankedcopy[j,4] * unit * (.001*(i-n)**2 + 0.062*(i-n)))
I'm hoping that avoiding the for loops will increase the efficiency which enables me to expand the model, but I'm unsure of a way to increase the APM model string limit.
I am also open to other suggestions of how to embed intermediates into the summation.

Try using the m.sum() function as a built-in GEKKO object. If you use the Python sum function then it creates a large summation equation that needs to be interpreted at run-time and may exceed the equation size limit. The m.sum() creates the summation in byte-code instead.
m.Equation(emissions[:,3] == \
m.sum(m.sum(m.sum(f[n,j]*-r[j,2]*unit *(.001*(i-n)**2 + 0.062*(i-n)) \
for i in range(years)) for n in range(i))for j in range(rec)))
Here is a simple example that shows the difference in performance.
from gekko import GEKKO
import numpy as np
import time
n = 5000
v = np.linspace(0,n-1,n)
# summation method 1 - Python sum
m = GEKKO()
t = time.time()
s = sum(v)
y = m.Var()
m.Equation(y==s)
m.solve(disp=False)
print(y.value[0])
print('Elapsed time: ' + str(time.time()-t))
m.cleanup()
# summation method 2 - Intermediates
m = GEKKO()
t = time.time()
s = 0
for i in range(n):
s = m.Intermediate(s + v[i])
y = m.Var()
m.Equation(y==s)
m.solve(disp=False)
print(y.value[0])
print('Elapsed time: ' + str(time.time()-t))
m.cleanup()
# summation method 3 - Gekko sum
m = GEKKO()
t = time.time()
s = m.sum(v)
y = m.Var()
m.Equation(y==s)
m.solve(disp=False)
print(y.value[0])
print('Elapsed time: ' + str(time.time()-t))
m.cleanup()
Results
12497500.0
Elapsed time: 0.17874956130981445
12497500.0
Elapsed time: 5.171698570251465
12497500.0
Elapsed time: 0.1246955394744873
The 15,000 character limit for a single equation is a hard limit. We thought about making it adjustable with m.options.MAX_MEMORY but then large equations can make very dense matrix factorizations for the solver. It is often better to break up the equation or use other methods to reduce the equation size.

Parallelize loops using OpenCL in Python

I have a given dataset in the matrix y and I want to train different SOMs with it. The SOM is one-dimensional (a line), and its number of neurons varies. I train a SOM of size N=2 at first, and N=NMax at last, giving a total of NMax-2+1 SOMs. For each SOM, I want to store the weights once the training is over before moving on to the next SOM.
The whole point of using PyOpenCL here is that each one of the outer loops is independent of the others. Namely, for each value of N, the script doesn't care about what happens when N takes other values. One could have the same result running the script NMax-2+1 times changing the value of N manually.
With this in mind, I was hoping to be able to perform each one of these independent iterations at the same time using the GPU, so that the time spent reduces significantly. The increase in speed will be less than 1/(NMax-2+1) though, because each iteration is more expensive that the previous ones, as for larger values of N, more calculations are made.
Is there a way to 'translate' this code to run on the GPU? I've never used OpenCL before, so let me know if this is too broad or silly so I can ask a more specific question. The code is self-contained, so feel free to try it out.The four constants declared at the beginning can be changed to whatever you like (given that NMax > 1 and all the others are strictly positive).
import numpy as np
import time
m = 3 # Dimension of datapoints
num_points = 2000 # Number of datapoints
iterMax = 150 # Maximum number of iterations
NMax = 3 # Maximum number of neurons
#%%
np.random.seed(0)
y = np.random.rand(num_points,m) # Generate always the same dataset
sigma_0 = 5 # Initial value of width of the neighborhood function
eta_0 = 1 # Initial value of learning rate
w = list(range(NMax - 1))
wClusters = np.zeros((np.size(y,axis = 0),NMax - 1)) # Clusters for each N
t_begin = time.clock() # Start time
for N in range(NMax-1): # Number of neurons for this iteration
w[N] = np.random.uniform(0,1,(N+2,np.size(y,axis=1))) - 0.5 # Initialize weights
iterCount = 1
while iterCount < iterMax:
# Mix up the input patterns
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Sigma reduction
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
s2 = 2*sigma**2
# Learning rate reduction
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for selectedInput in mixInputs: # Pick up one pattern
# Search winning neuron
aux = np.sum((selectedInput - w[N])**2, axis = -1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
jjs = abs(ii - list(range(N+2)))
dists = np.min(np.vstack([jjs , abs(jjs-(N+2))]), axis = 0)
# Update weights
w[N] = w[N] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N])
print(N+2,iterCount)
iterCount += 1
# Assign each datapoint to its nearest neuron
for kk in range(np.size(y,axis = 0)):
aux = np.sum((y[kk,] - w[N])**2,axis=-1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
wClusters[kk,N] = ii + 1
t_end = time.clock() # End time
#%%
print(t_end - t_begin)

I'm trying to give a somewhat complete answer.
First of all:
Can this code be adapted to be run on the GPU using (py)OpenCL?
Most probably yes.
Can this been done automatically?
No (afaik).
Most of the questions I get about OpenCL are along the lines of: "Is it worth porting this piece of code to OpenCL for a speedup gain?" You are stating, that your outer loop is independent on the results of other runs, which makes the code basically parallelizable. In a straightforward implementation, each OpenCL working element would execute the same code with slightly different input parameters. Not regarding overhead by data transfer between host and device, the running time of this approach would be equal to the running time of the slowest iteration. Depending on the iterations in your outer loop, this could be a massive speed gain. As long as the numbers stay relatively small, you could try the multiprocessing module in python to parallelize these iterations on the CPU instead of the GPU.
Porting to the GPU usually only makes sense, if a huge number of processes are to be run in parallel (about 1000 or more). So in your case, if you really want an enormous speed boost, see if you can parallelize all calculations inside the loop. For example, you have 150 iterations and 2000 data points. If you could somehow parallelize these 2000 data points, this could offer a much bigger speed gain, which could justify the work of porting the whole code to OpenCL.
TL;DR:
Try parallelizing on CPU first. If you find the need to run more than several 100s of processes at the same time, move to GPU.
Update: Simple code for parallelizing on CPU using multiprocessing (without callback)
import numpy as np
import time
import multiprocessing as mp
m = 3 # Dimension of datapoints
num_points = 2000 # Number of datapoints
iterMax = 150 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
#%%
np.random.seed(0)
y = np.random.rand(num_points,m) # Generate always the same dataset
sigma_0 = 5 # Initial value of width of the neighborhood function
eta_0 = 1 # Initial value of learning rate
w = list(range(NMax - 1))
wClusters = np.zeros((np.size(y,axis = 0),NMax - 1)) # Clusters for each N
def neuron_run(N):
w[N] = np.random.uniform(0,1,(N+2,np.size(y,axis=1))) - 0.5 # Initialize weights
iterCount = 1
while iterCount < iterMax:
# Mix up the input patterns
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Sigma reduction
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
s2 = 2*sigma**2
# Learning rate reduction
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for selectedInput in mixInputs: # Pick up one pattern
# Search winning neuron
aux = np.sum((selectedInput - w[N])**2, axis = -1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
jjs = abs(ii - list(range(N+2)))
dists = np.min(np.vstack([jjs , abs(jjs-(N+2))]), axis = 0)
# Update weights
w[N] = w[N] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N])
print(N+2,iterCount)
iterCount += 1
# Assign each datapoint to its nearest neuron
for kk in range(np.size(y,axis = 0)):
aux = np.sum((y[kk,] - w[N])**2,axis=-1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
wClusters[kk,N] = ii + 1
t_begin = time.clock() # Start time
#%%
def apply_async():
pool = mp.Pool(processes=NMax)
for N in range(NMax-1):
pool.apply_async(neuron_run, args = (N,))
pool.close()
pool.join()
print "Multiprocessing done!"
if __name__ == '__main__':
apply_async()
t_end = time.clock() # End time
print(t_end - t_begin)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to compute/measure inference time of a Tensorflow model? - python

I have 2 TF models that I wanted to compare in terms of how fast they can predict. I'm aware that TF models have evaluate method that outputs ms per step inference. However, is there a way to have higher precision than this? If so, what's the proper way of doing it?

Related

Operands could not be broadcast together with shapes (2,) (4,) when using OpenAI Gym in Jupyter

Python "for loop" looping time limit

Not understanding how movering average implementation works

Is there a way to increase the line length for an equation in Gekko after receiving "APM model error: string > 15000 characters"?

Parallelize loops using OpenCL in Python

Categories

Resources