I have trained a multiclass classifier for speech recognition using tensorflow. Then converted the model using tflite converter. The model can predict but it always outputs a single class. I suppose the problem is with the inference code because .h5 model can predict multiclass without any issue. I have been searching online for several days for some insight but I can't quite figure it out. Here is my code. Any suggestions would be really appreciated.
import sounddevice as sd
import numpy as np
import scipy.signal
import timeit
import python_speech_features
import tflite_runtime.interpreter as tflite
import importlib
# Parameters
debug_time = 0
debug_acc = 0
word_threshold = 0.95
rec_duration = 0.5 # 0.5
sample_length = 0.5
window_stride = 0.5 # 0.5
sample_rate = 8000 # The mic requires at least 44100 Hz to work
resample_rate = 8000
num_channels = 1
num_mfcc = 16
model_path = 'model.tflite'
mfccs_old = np.zeros((32, 25))
# Load model (interpreter)
interpreter = tflite.Interpreter(model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
# Filter and downsample
def decimate(signal, old_fs, new_fs):
# Check to make sure we're downsampling
if new_fs > old_fs:
print("Error: target sample rate higher than original")
return signal, old_fs
# Downsampling is possible only by an integer factor
dec_factor = old_fs / new_fs
if not dec_factor.is_integer():
print("Error: can only downsample by integer factor")
# Do decimation
resampled_signal = scipy.signal.decimate(signal, int(dec_factor))
return resampled_signal, new_fs
# Callback that gets called every 0.5 seconds
def sd_callback(rec, frames, time, status):
# Start timing for debug purposes
start = timeit.default_timer()
# Notify errors
if status:
print('Error:', status)
global mfccs_old
# Compute MFCCs
mfccs = python_speech_features.base.mfcc(rec,
samplerate=resample_rate,
winlen=0.02,
winstep=0.02,
numcep=num_mfcc,
nfilt=26,
nfft=512, # 2048
preemph=0.0,
ceplifter=0,
appendEnergy=True,
winfunc=np.hanning)
delta = python_speech_features.base.delta(mfccs, 2)
mfccs_delta = np.append(mfccs, delta, axis=1)
mfccs_new = mfccs_delta.transpose()
mfccs = np.append(mfccs_old, mfccs_new, axis=1)
# mfccs = np.insert(mfccs, [0], 0, axis=1)
mfccs_old = mfccs_new
# Run inference and make predictions
in_tensor = np.float32(mfccs.reshape(1, mfccs.shape[0], mfccs.shape[1], 1))
interpreter.set_tensor(input_details[0]['index'], in_tensor)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
val = np.amax(output_data) # DEFINED FOR BINARY CLASSIFICATION, CHANGE TO MULTICLASS
ind = np.where(output_data == val)
prediction = ind[1].astype(int)
if val > word_threshold:
print('index:', ind[1])
print('accuracy', val, '/n')
print(int(prediction))
if debug_acc:
# print('accuracy:', val)
# print('index:', ind[1])
print('out tensor:', output_data)
if debug_time:
print(timeit.default_timer() - start)
# Start recording from microphone
with sd.InputStream(channels=num_channels,
samplerate=sample_rate,
blocksize=int(sample_rate * rec_duration),
callback=sd_callback):
while True:
pass
Since I figured out the issue, I am answering it myself in case others find it useful.
The issue is not having a "background noise" class in your dataset. Also make sure you have enough data for background noises. If you look at Google's teachable machine's audio project (https://teachablemachine.withgoogle.com/train/audio), a "background noise" class is already there, you cannot delete or disable the class.
I tested with both codes provided on tensorflow's github example (https://github.com/tensorflow/examples/blob/master/lite/examples/sound_classification/raspberry_pi/classify.py) and on tensorflow's website (https://www.tensorflow.org/tutorials/audio/simple_audio). They both work well for your prediction as long as you have enough background noise samples in your dataset considering the particular environment you are testing it in.
I made slight changes to the tensorflow's github code to output the category name and category confidence score.
# Loop until the user close the classification results plot.
while True:
# Wait until at least interval_between_inference seconds has passed since
# the last inference.
now = time.time()
diff = now - last_inference_time
if diff < interval_between_inference:
time.sleep(pause_time)
continue
last_inference_time = now
# Load the input audio and run classify.
tensor_audio.load_from_audio_record(audio_record)
result = classifier.classify(tensor_audio)
for category in result.classifications[0].categories:
print(category.category_name, category.score)
Hope it's helpful for people playing around with similar projects.
Related
I'm trying to optimize the coordinates of the corners of an image. A similar technique works fine in Ceres Solver. But in torch.optim I'm having some issues. In particular, the optimizer for some reason does not change the parameters being optimized. I don't have much experience with pytorch, so I'm pretty sure the error is trivial. Unfortunately, reading the documentation did not help me much.
Optimization model class:
class OptimizeCorners(torch.nn.Module):
def __init__(self, real_corners):
super().__init__()
self._real_corners = torch.nn.Parameter(real_corners)
def forward(self, real_image, synt_image, synt_corners, _threshold):
# Find homography
if visualize_warp_interpolate:
real_image_before_processing = real_image
synt_image_before_processing = synt_image
homography_matrix = kornia.geometry.homography.find_homography_dlt(synt_corners,
self._real_corners,
weights=None)
# Warp and resize synt image
synt_image = kornia.geometry.transform.warp_perspective(synt_image.float(),
homography_matrix,
dsize=(int(real_image.shape[2]),
int(real_image.shape[3])),
mode='bilinear',
padding_mode='zeros',
align_corners=True,
fill_value=torch.zeros(3))
# Interpolate images
real_image = torch.nn.functional.interpolate(real_image.float(),
scale_factor=5,
mode='bicubic',
align_corners=None,
recompute_scale_factor=None,
antialias=False)
synt_image = torch.nn.functional.interpolate(synt_image.float(),
scale_factor=5,
mode='bicubic',
align_corners=None,
recompute_scale_factor=None,
antialias=False)
# Calculate loss
loss_map = torch.sub(real_image, synt_image, alpha=1)
# if element > _threshold: element = 0
loss_map = torch.nn.Threshold(_threshold, 0)(loss_map)
cumulative_loss = torch.sqrt(torch.sum(torch.pow(loss_map, 2)) /
(loss_map.size(dim=2) * loss_map.size(dim=3)))
return torch.autograd.Variable(cumulative_loss.data, requires_grad=True)
The way, how I am trying to execute optimization:
# Convert corresponding images to PyTorch tensors
_image = kornia.utils.image_to_tensor(_image, keepdim=False)
_synt_image = kornia.utils.image_to_tensor(_synt_image, keepdim=False)
_corners = torch.from_numpy(_corners)
_synt_corners = torch.from_numpy(_synt_corners)
# Optimizer L-BFGS
n_iters = 100
h_lbfgs = []
lr = 1
optimize_corners = OptimizeCorners(_corners)
optimizer = torch.optim.LBFGS(optimize_corners.parameters(),
lr=lr)
for it in tqdm(range(n_iters), desc='Fitting corners',
leave=False, position=1):
loss = optimize_corners(_image, _synt_image, _synt_corners, _threshold)
optimizer.zero_grad()
loss.backward()
optimizer.step(lambda: optimize_corners(_image, _synt_image, _synt_corners, _threshold))
h_lbfgs.append(loss.item())
print(h_lbfgs)
Output from console:
pic
So, as you can see, parameters to be optimized do not change.
UPD:
I changed return torch.autograd.Variable(cumulative_loss.data, requires_grad=True) to return cumulative_loss.requires_grad_(), and it actually works, but now I get this error after few iterations:
console output
UPD: this happens because the parameters being optimized turn into NaN after a few iterations.
After some time spent hugging the debugger, I found out that the main problem is that after a few iterations, the backward() method starts to calculate the gradient incorrectly and output NaN's. Thus, the parameters being optimized are also calclulated as NaN's. I didn't have a chance to find out exactly why this is happening, because all the traces (I used torch.autograd.set_detect_anomaly(True) method) pointed to the fact that the error occurs on the side of the C ++ Torch engine in the POW and SVD functions.
In the end, in my case, the problem was solved by the fact that I cast all parameters form float32 to float64 and reduce learning rate.
Here is the final code update can be found:
# Convert corresponding images to PyTorch tensors
_image = kornia.utils.image_to_tensor(_image, keepdim=False).double()
_synt_image = kornia.utils.image_to_tensor(_synt_image, keepdim=False).double()
_corners = torch.from_numpy(_corners).double()
_synt_corners = torch.from_numpy(_synt_corners).double()
# Optimizer L-BFGS
optimize_corners = OptimizeCorners(_corners)
optimizer = torch.optim.LBFGS(optimize_corners.parameters(),
max_iter=20,
lr=0.01)
torch.autograd.set_detect_anomaly(True)
def closure():
optimizer.zero_grad()
loss = optimize_corners(_image, _synt_image, _synt_corners, _threshold)
loss.backward()
return loss
for it in tqdm(range(100), desc="Fitting corners", leave=False, position=1):
optimizer.step(closure)
def forward(self, real_image, synt_image, synt_corners, _threshold):
# Find homography
if visualize_warp_interpolate:
real_image_before_processing = real_image
synt_image_before_processing = synt_image
homography_matrix = kornia.geometry.homography.find_homography_dlt(synt_corners,
self._real_corners,
weights=None)
# Warp and resize synt image
synt_image = kornia.geometry.transform.warp_perspective(synt_image,
homography_matrix,
dsize=(int(real_image.shape[2]),
int(real_image.shape[3])),
mode='bilinear',
padding_mode='zeros',
align_corners=True,
fill_value=torch.zeros(3))
# Interpolate images
real_image = torch.nn.functional.interpolate(real_image,
scale_factor=10,
mode='bicubic',
align_corners=None,
recompute_scale_factor=None,
antialias=False)
synt_image = torch.nn.functional.interpolate(synt_image,
scale_factor=10,
mode='bicubic',
align_corners=None,
recompute_scale_factor=None,
antialias=False)
# Calculate loss
loss_map = torch.sub(real_image, synt_image, alpha=1)
# if element > _threshold: element = 0
loss_map = torch.nn.Threshold(_threshold, 0)(loss_map)
cumulative_loss = torch.sqrt(torch.sum(torch.pow(loss_map, 2)) /
(loss_map.size(dim=2) * loss_map.size(dim=3)))
return cumulative_loss.requires_grad_()
I am Tensorflow newbie. I have model generated using convNetKerasLarge.py and saved as tflite model.
I am trying to test this saved model as follows
import tensorflow as tf
import numpy as np
import glob
from skimage.transform import resize
from skimage import io
# out of previously used training and test set
start = 4001
# no of images
row_count = 1
end = start + row_count
n_image_rows = 106
n_image_cols = 106
np_val_images = np.zeros(shape=(1, 1))
np_val_labels = np.zeros(shape=(1, 1))
def prepare_validation_set():
global np_val_images
global np_val_labels
positive_samples = glob.glob('datasets/drunk_resize_frontal_faces/pos/*')[start:end]
# negative_samples = glob.glob('datasets/drunk_resize_frontal_faces/neg/*')[start:end]
# negative_samples = random.sample(negative_samples, len(positive_samples))
val_images = []
val_labels = []
for i in range(len(positive_samples)):
val_images.append(resize(io.imread(positive_samples[i]), (n_image_rows, n_image_cols)))
val_labels.append(1)
# for i in range(len(negative_samples)):
# val_images.append(resize(io.imread(negative_samples[i]), (n_image_rows, n_image_cols)))
# val_labels.append(0)
np_val_images = np.array(val_images)
np_val_labels = np.array(val_labels)
def run_tflite_model(tflite_file, index):
prepare_validation_set()
# Initialize the interpreter
interpreter = tf.lite.Interpreter(model_path=str(tflite_file))
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]
test_image = np_val_images[index]
test_image = np.expand_dims(test_image, axis=0).astype(input_details["dtype"])
interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
output = interpreter.get_tensor(output_details["index"])[0]
print(output_details)
prediction = output.argmax()
print(prediction)
if __name__ == '__main__':
test_image_index = 1
tflite_model_file = "models/converted/model.tflite"
run_tflite_model(tflite_model_file, 0)
If I run this I am getting prediction as 0 even though label should be 1 since I am inputing a positive image. (FYI: Test loss: 0.08881912380456924 Test accuracy: 0.9729166626930237 with 10 epochs). I am confident that there a mistake in my code which causes this please help me find it.
The script you linked normalize the data before the training by subtracting the mean (here 0.5) and dividing by the standard deviation (here 1):
mean = np.array([0.5,0.5,0.5])
std = np.array([1,1,1])
X_train = X_train.astype('float')
X_test = X_test.astype('float')
for i in range(3):
X_train[:,:,:,i] = (X_train[:,:,:,i]- mean[i]) / std[i]
X_test[:,:,:,i] = (X_test[:,:,:,i]- mean[i]) / std[i]
If you don't repeat the same operations before doing a prediction with the model, the input you are passing to the model will not have the same characteristics as the that you trained with.
You could fix it by subtracting the mean (0.5) to the image when preparing the data, i.e:
np_val_images = np.array(val_images) - 0.5
I was playing with tflite and observed on my multicore CPU that it is not heavily stressed during inference time. I eliminated the IO bottleneck by creating random input data with numpy beforehand (random matrices resembling images) but then tflite still doesn't utilze the full potential of the CPU.
The documentation mentions the possibility to tweak the number of used threads. However I was not able to find out how to do that in the Python API. But since I have seen people using multiple interpreter instances for different models I thought one could maybe use multiple instances of the same model and run them on different threads/processes. I have written the following short script:
import numpy as np
import os, time
import tflite_runtime.interpreter as tflite
from multiprocessing import Pool
# global, but for each process the module is loaded, so only one global var per process
interpreter = None
input_details = None
output_details = None
def init_interpreter(model_path):
global interpreter
global input_details
global output_details
interpreter = tflite.Interpreter(model_path=model_path)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.allocate_tensors()
print('done init')
def do_inference(img_idx, img):
print('Processing image %d'%img_idx)
print('interpreter: %r' % (hex(id(interpreter)),))
print('input_details: %r' % (hex(id(input_details)),))
print('output_details: %r' % (hex(id(output_details)),))
tstart = time.time()
img = np.stack([img]*3, axis=2) # replicates layer three time for RGB
img = np.array([img]) # create batch dimension
interpreter.set_tensor(input_details[0]['index'], img )
interpreter.invoke()
logit= interpreter.get_tensor(output_details[0]['index'])
pred = np.argmax(logit, axis=1)[0]
logit = list(logit[0])
duration = time.time() - tstart
return logit, pred, duration
def main_par():
optimized_graph_def_file = r'./optimized_graph.lite'
# init model once to find out input dimensions
interpreter_main = tflite.Interpreter(model_path=optimized_graph_def_file)
input_details = interpreter_main.get_input_details()
input_w, intput_h = tuple(input_details[0]['shape'][1:3])
num_test_imgs=1000
# pregenerate random images with values in [0,1]
test_imgs = np.random.rand(num_test_imgs, input_w,intput_h).astype(input_details[0]['dtype'])
scores = []
predictions = []
it_times = []
tstart = time.time()
with Pool(processes=4, initializer=init_interpreter, initargs=(optimized_graph_def_file,)) as pool: # start 4 worker processes
results = pool.starmap(do_inference, enumerate(test_imgs))
scores, predictions, it_times = list(zip(*results))
duration =time.time() - tstart
print('Parent process time for %d images: %.2fs'%(num_test_imgs, duration))
print('Inference time for %d images: %.2fs'%(num_test_imgs, sum(it_times)))
print('mean time per image: %.3fs +- %.3f' % (np.mean(it_times), np.std(it_times)) )
if __name__ == '__main__':
# main_seq()
main_par()
However the memory address of the interpreter instance printed via hex(id(interpreter)) is the same for every process. The memory address of the input/output details is however different. Thus I was wondering if this way of doing it is potentially wrong even though I could experience a speedup? If so how could one achieve parallel inference with TFLite and python?
tflite_runtime version: 1.14.0 from here (the x86-64 Python 3.5 version)
python version: 3.5
I know that this thread was created two and a half years ago.
For me,
import multiprocessing
tf.lite.Interpreter(modelfile, num_threads=multiprocessing.cpu_count())
works very well.
I did not set initializer and use the following codes to load model, and do inference in the same function to workaround this issue.
with Pool(processes=multiprocessing.cpu_count()) as pool:
results = pool.starmap(inference, enumerate(test_imgs))
I'm having trouble with my first neural network. I simply cannot find the source of the error.
Problem
Reading the book "Make your own neural network" by Tariq Rashid I tried to implement Handwriting recognition using NN which would classify images and determine which digit from 0 to 9 is written down.
After training the NN the tests show that each of the letters have ~99% match, which is obviously wrong.
Suspicions
In the book the author approaches NN matrices a bit deferent then I have. For example he multiplies input-hidden layer weights with input, which I do other way around by multiplying input with input-hidden weights.
Here is illustration of the way I do matrix multiplication while querying NN (feedforward):
I'm aware that matrices do not posses commutative property for dot product but I it don't notice that I have made an error there.
Should I take different approach i.e. transpose all matrices and multiply them in different order?
Is there de facto standard for dimensions of an input and output matrix i.e. should they be shaped as 1×n or n×1?
If this is wrong approach then it certainly has manifested itself in backpropagation with gradient descent used for training.
Source code
import numpy as np
import matplotlib.pyplot
from matplotlib.pyplot import imshow
import scipy.special as scipy
from PIL import Image
class NeuralNetwork(object):
def __init__(self):
self.input_neuron_count = 28*28 # One for each pixel, 28*28 = 784 in total.
self.hidden_neuron_count = 100 # Arbitraty.
self.output_neuron_count = 10 # One for each digit from 0 to 9.
self.learning_rate = 0.1 # Arbitraty.
# Sampling the weights from a normal probability distribution
# centered around zero and with standard deviation
# that is related to the number of incoming links into a node,
# 1/√(number of incoming links).
generate_random_weight_matrix = lambda input_neuron_count, output_neuron_count: (
np.random.normal(0.0, pow(input_neuron_count, -0.5), (input_neuron_count, output_neuron_count))
)
self.input_x_hidden_weights = generate_random_weight_matrix(self.input_neuron_count, self.hidden_neuron_count)
self.hidden_x_output_weights = generate_random_weight_matrix(self.hidden_neuron_count, self.output_neuron_count)
self.activation_function = lambda value: scipy.expit(value) # Sigmoid function
def train(self, input_array, target_array):
inputs = np.array(input_array, ndmin=2)
targets = np.array(target_array, ndmin=2)
hidden_layer_input = np.dot(inputs, self.input_x_hidden_weights)
hidden_layer_output = self.activation_function(hidden_layer_input)
output_layer_input = np.dot(hidden_layer_output, self.hidden_x_output_weights)
output_layer_output = self.activation_function(output_layer_input)
output_errors = targets - output_layer_output
self.hidden_x_output_weights += self.learning_rate * np.dot(hidden_layer_output.T, (output_errors * output_layer_output * (1 - output_layer_output)))
hidden_errors = np.dot(output_errors, self.hidden_x_output_weights.T)
self.input_x_hidden_weights += self.learning_rate * np.dot(inputs.T, (hidden_errors * hidden_layer_output * (1 - hidden_layer_output)))
def query(self, input_array):
inputs = np.array(input_array, ndmin=2)
hidden_layer_input = np.dot(inputs, self.input_x_hidden_weights)
hidden_layer_output = self.activation_function(hidden_layer_input)
output_layer_input = np.dot(hidden_layer_output, self.hidden_x_output_weights)
output_layer_output = self.activation_function(output_layer_input)
return output_layer_output
Replication (Training and testing)
The original source of training and testing data is from The MNIST Database. I have used CSV version which I downloaded from the book authors web page The MNIST Dataset of Handwitten Digits.
Here is the code I have used for training and testing so far:
def prepare_data(handwritten_digit_array):
return ((handwritten_digit_array / 255.0 * 0.99) + 0.0001).flatten()
def create_target(digit_target):
target = np.zeros(10) + 0.01
target[digit_target] = target[digit_target] + 0.98
return target
# Training
neural_network = NeuralNetwork()
training_data_file = open('mnist_train.csv', 'r')
training_data = training_data_file.readlines()
training_data_file.close()
for data in training_data:
handwritten_digit_raw = data.split(',')
handwritten_digit_array = np.asfarray(handwritten_digit_raw[1:]).reshape((28, 28))
handwritten_digit_target = int(handwritten_digit_raw[0])
neural_network.train(prepare_data(handwritten_digit_array), create_target(handwritten_digit_target))
# Testing
test_data_file = open('mnist_test_10.csv', 'r')
test_data = test_data_file.readlines()
test_data_file.close()
for data in test_data:
handwritten_digit_raw = data.split(',')
handwritten_digit_array = np.asfarray(handwritten_digit_raw[1:]).reshape((28, 28))
handwritten_digit_target = int(handwritten_digit_raw[0])
output = neural_network.query(handwritten_digit_array.flatten())
print('target', handwritten_digit_target)
print('output', output)
This is one of those facepalm moments. Neural network has been working as expected all along. The truth is that I have now noticed I've overlooked the test results and read numbers written in scientific notation incorrectly.
Measured on 10000 test data from The MNIST Database this NN has accuracy of 94.01%.
I have trained a model with images.
And now would like to extract the fc-6 features to .npy files.
I'm using caffe.set_mode_gpu()to run the caffe.Classifier and extract the features.
Instead of extracting and saving the feature per frame.
I save all the features of a folder to a temp variable and the result of the complete video to a npy file(decreasing the number of write operations to disk).
I have also heard that I could use the Caffe.Net and then pass a batch of images. But I'm not sure of what preprocessing has to be done and if this is faster ?
import os
import shutil
import sys
import glob
from multiprocessing import Pool
import numpy as np
import os, sys, getopt
import time
def keep_fldrs(path,listr):
ll =list()
for x in listr:
if os.path.isdir(path+x):
ll.append(x)
return ll
def keep_img(path,listr):
ll = list()
for x in listr:
if os.path.isfile(path+str(x)) & str(x).endswith('.jpg'):
ll.append(x)
return ll
def ifdir(path):
if not os.path.isdir(path):
os.makedirs(path)
# Main path to your caffe installation
caffe_root = '/home/anilil/projects/lstm/lisa-caffe-public/python'
# Model prototxt file
model_prototxt = '/home/anilil/projects/caffe2tensorflow/deploy_singleFrame.prototxt'
# Model caffemodel file
model_trained = '/home/anilil/projects/caffe2tensorflow/snapshots_singleFrame_flow_v2_iter_55000.caffemodel'
sys.path.insert(0, caffe_root)
import caffe
caffe.set_mode_gpu()
net = caffe.Classifier(model_prototxt, model_trained,
mean=np.array([128, 128, 128]),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(255, 255))
Root='/media/anilil/Data/Datasets/UCf_scales/ori_mv_vis/Ori_MV/'
Out_fldr='/media/anilil/Data/Datasets/UCf_scales/ori_mv_vis/feat_fc6/'
allcalsses=keep_fldrs(Root,os.listdir(Root))
for classin in allcalsses:
temp_class=Root+classin+'/'
temp_out_class=Out_fldr+classin+'/'
ifdir(temp_out_class)
allvids_folders=keep_fldrs(temp_class,os.listdir(temp_class))
for each_vid_fldr in allvids_folders:
temp_pres_dir=temp_class+each_vid_fldr+'/'
temp_out_pres_dir=temp_out_class+each_vid_fldr+'/'
ifdir(temp_out_pres_dir)
all_images=keep_img(temp_pres_dir,os.listdir(temp_pres_dir))
frameno=0
if os.path.isfile(temp_out_pres_dir+'video.npy'):
continue
start = time.time()
temp_npy= np.ndarray((len(all_images),4096),dtype=np.float32)
for each_image in all_images:
input_image = caffe.io.load_image(temp_pres_dir+each_image)
prediction = net.predict([input_image],oversample=False)
temp_npy[frameno,:]=net.blobs['fc6'].data[0]
frameno=frameno+1
np.save(temp_out_pres_dir+'video.npy',temp_npy)
end = time.time()
print "lenght of imgs {} and time taken is {}".format(len(all_images),(end - start))
print ('Class {} done'.format(classin))
Output
lenght of imgs 426 and time taken is 388.539139032
lenght of imgs 203 and time taken is 185.467905998
Time needed per image Around 0.9 Seconds now-
I found the best answer here in this post.
Till now I had used a
net = caffe.Classifier(model_prototxt, model_trained,
mean=np.array([128, 128, 128]),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(255, 255))
to initialize a model and get the output per image.
But this method is really slow and requires around .9 seconds per image.
The best Idea is to pass a batch of images(maybe 100,200,250) changing. Depending on how much memory you have on your GPU.
for this I set caffe.set_mode_gpu() as I have one and It's faster when you send large batches.
Initialize the model with ur trained model.
net=caffe.Net(model_prototxt,model_trained,caffe.TEST)
Create a Transformer and make sure to set mean and other values depending on how u trained your model.
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # height*width*channel -> channel*height*width
mean_file = np.array([128, 128, 128])
transformer.set_mean('data', mean_file) #### subtract mean ####
transformer.set_raw_scale('data', 255) # pixel value range
transformer.set_channel_swap('data', (2,1,0)) # RGB -> BGR
data_blob_shape = net.blobs['data'].data.shape
data_blob_shape = list(data_blob_shape)
Read a group of images and convert to the network input.
net.blobs['data'].reshape(len(all_images), data_blob_shape[1], data_blob_shape[2], data_blob_shape[3])
images = [temp_pres_dir+str(x) for x in all_images]
net.blobs['data'].data[...] = map(lambda x:
transformer.preprocess('data',caffe.io.load_image(x)), images)
Pass the batch of images through network.
out = net.forward()
You can use this output as you wish.
Speed for each image is now 20 msec