Caffe feature extraction is too slow? caffe.Classifier or caffe.Net - python

I have trained a model with images.
And now would like to extract the fc-6 features to .npy files.
I'm using caffe.set_mode_gpu()to run the caffe.Classifier and extract the features.
Instead of extracting and saving the feature per frame.
I save all the features of a folder to a temp variable and the result of the complete video to a npy file(decreasing the number of write operations to disk).
I have also heard that I could use the Caffe.Net and then pass a batch of images. But I'm not sure of what preprocessing has to be done and if this is faster ?
import os
import shutil
import sys
import glob
from multiprocessing import Pool
import numpy as np
import os, sys, getopt
import time
def keep_fldrs(path,listr):
ll =list()
for x in listr:
if os.path.isdir(path+x):
ll.append(x)
return ll
def keep_img(path,listr):
ll = list()
for x in listr:
if os.path.isfile(path+str(x)) & str(x).endswith('.jpg'):
ll.append(x)
return ll
def ifdir(path):
if not os.path.isdir(path):
os.makedirs(path)
# Main path to your caffe installation
caffe_root = '/home/anilil/projects/lstm/lisa-caffe-public/python'
# Model prototxt file
model_prototxt = '/home/anilil/projects/caffe2tensorflow/deploy_singleFrame.prototxt'
# Model caffemodel file
model_trained = '/home/anilil/projects/caffe2tensorflow/snapshots_singleFrame_flow_v2_iter_55000.caffemodel'
sys.path.insert(0, caffe_root)
import caffe
caffe.set_mode_gpu()
net = caffe.Classifier(model_prototxt, model_trained,
mean=np.array([128, 128, 128]),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(255, 255))
Root='/media/anilil/Data/Datasets/UCf_scales/ori_mv_vis/Ori_MV/'
Out_fldr='/media/anilil/Data/Datasets/UCf_scales/ori_mv_vis/feat_fc6/'
allcalsses=keep_fldrs(Root,os.listdir(Root))
for classin in allcalsses:
temp_class=Root+classin+'/'
temp_out_class=Out_fldr+classin+'/'
ifdir(temp_out_class)
allvids_folders=keep_fldrs(temp_class,os.listdir(temp_class))
for each_vid_fldr in allvids_folders:
temp_pres_dir=temp_class+each_vid_fldr+'/'
temp_out_pres_dir=temp_out_class+each_vid_fldr+'/'
ifdir(temp_out_pres_dir)
all_images=keep_img(temp_pres_dir,os.listdir(temp_pres_dir))
frameno=0
if os.path.isfile(temp_out_pres_dir+'video.npy'):
continue
start = time.time()
temp_npy= np.ndarray((len(all_images),4096),dtype=np.float32)
for each_image in all_images:
input_image = caffe.io.load_image(temp_pres_dir+each_image)
prediction = net.predict([input_image],oversample=False)
temp_npy[frameno,:]=net.blobs['fc6'].data[0]
frameno=frameno+1
np.save(temp_out_pres_dir+'video.npy',temp_npy)
end = time.time()
print "lenght of imgs {} and time taken is {}".format(len(all_images),(end - start))
print ('Class {} done'.format(classin))
Output
lenght of imgs 426 and time taken is 388.539139032
lenght of imgs 203 and time taken is 185.467905998
Time needed per image Around 0.9 Seconds now-

I found the best answer here in this post.
Till now I had used a
net = caffe.Classifier(model_prototxt, model_trained,
mean=np.array([128, 128, 128]),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(255, 255))
to initialize a model and get the output per image.
But this method is really slow and requires around .9 seconds per image.
The best Idea is to pass a batch of images(maybe 100,200,250) changing. Depending on how much memory you have on your GPU.
for this I set caffe.set_mode_gpu() as I have one and It's faster when you send large batches.
Initialize the model with ur trained model.
net=caffe.Net(model_prototxt,model_trained,caffe.TEST)
Create a Transformer and make sure to set mean and other values depending on how u trained your model.
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # height*width*channel -> channel*height*width
mean_file = np.array([128, 128, 128])
transformer.set_mean('data', mean_file) #### subtract mean ####
transformer.set_raw_scale('data', 255) # pixel value range
transformer.set_channel_swap('data', (2,1,0)) # RGB -> BGR
data_blob_shape = net.blobs['data'].data.shape
data_blob_shape = list(data_blob_shape)
Read a group of images and convert to the network input.
net.blobs['data'].reshape(len(all_images), data_blob_shape[1], data_blob_shape[2], data_blob_shape[3])
images = [temp_pres_dir+str(x) for x in all_images]
net.blobs['data'].data[...] = map(lambda x:
transformer.preprocess('data',caffe.io.load_image(x)), images)
Pass the batch of images through network.
out = net.forward()
You can use this output as you wish.
Speed for each image is now 20 msec

Related

tflite inference only predicts one label despite multiclass label training

I have trained a multiclass classifier for speech recognition using tensorflow. Then converted the model using tflite converter. The model can predict but it always outputs a single class. I suppose the problem is with the inference code because .h5 model can predict multiclass without any issue. I have been searching online for several days for some insight but I can't quite figure it out. Here is my code. Any suggestions would be really appreciated.
import sounddevice as sd
import numpy as np
import scipy.signal
import timeit
import python_speech_features
import tflite_runtime.interpreter as tflite
import importlib
# Parameters
debug_time = 0
debug_acc = 0
word_threshold = 0.95
rec_duration = 0.5 # 0.5
sample_length = 0.5
window_stride = 0.5 # 0.5
sample_rate = 8000 # The mic requires at least 44100 Hz to work
resample_rate = 8000
num_channels = 1
num_mfcc = 16
model_path = 'model.tflite'
mfccs_old = np.zeros((32, 25))
# Load model (interpreter)
interpreter = tflite.Interpreter(model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
# Filter and downsample
def decimate(signal, old_fs, new_fs):
# Check to make sure we're downsampling
if new_fs > old_fs:
print("Error: target sample rate higher than original")
return signal, old_fs
# Downsampling is possible only by an integer factor
dec_factor = old_fs / new_fs
if not dec_factor.is_integer():
print("Error: can only downsample by integer factor")
# Do decimation
resampled_signal = scipy.signal.decimate(signal, int(dec_factor))
return resampled_signal, new_fs
# Callback that gets called every 0.5 seconds
def sd_callback(rec, frames, time, status):
# Start timing for debug purposes
start = timeit.default_timer()
# Notify errors
if status:
print('Error:', status)
global mfccs_old
# Compute MFCCs
mfccs = python_speech_features.base.mfcc(rec,
samplerate=resample_rate,
winlen=0.02,
winstep=0.02,
numcep=num_mfcc,
nfilt=26,
nfft=512, # 2048
preemph=0.0,
ceplifter=0,
appendEnergy=True,
winfunc=np.hanning)
delta = python_speech_features.base.delta(mfccs, 2)
mfccs_delta = np.append(mfccs, delta, axis=1)
mfccs_new = mfccs_delta.transpose()
mfccs = np.append(mfccs_old, mfccs_new, axis=1)
# mfccs = np.insert(mfccs, [0], 0, axis=1)
mfccs_old = mfccs_new
# Run inference and make predictions
in_tensor = np.float32(mfccs.reshape(1, mfccs.shape[0], mfccs.shape[1], 1))
interpreter.set_tensor(input_details[0]['index'], in_tensor)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
val = np.amax(output_data) # DEFINED FOR BINARY CLASSIFICATION, CHANGE TO MULTICLASS
ind = np.where(output_data == val)
prediction = ind[1].astype(int)
if val > word_threshold:
print('index:', ind[1])
print('accuracy', val, '/n')
print(int(prediction))
if debug_acc:
# print('accuracy:', val)
# print('index:', ind[1])
print('out tensor:', output_data)
if debug_time:
print(timeit.default_timer() - start)
# Start recording from microphone
with sd.InputStream(channels=num_channels,
samplerate=sample_rate,
blocksize=int(sample_rate * rec_duration),
callback=sd_callback):
while True:
pass
Since I figured out the issue, I am answering it myself in case others find it useful.
The issue is not having a "background noise" class in your dataset. Also make sure you have enough data for background noises. If you look at Google's teachable machine's audio project (https://teachablemachine.withgoogle.com/train/audio), a "background noise" class is already there, you cannot delete or disable the class.
I tested with both codes provided on tensorflow's github example (https://github.com/tensorflow/examples/blob/master/lite/examples/sound_classification/raspberry_pi/classify.py) and on tensorflow's website (https://www.tensorflow.org/tutorials/audio/simple_audio). They both work well for your prediction as long as you have enough background noise samples in your dataset considering the particular environment you are testing it in.
I made slight changes to the tensorflow's github code to output the category name and category confidence score.
# Loop until the user close the classification results plot.
while True:
# Wait until at least interval_between_inference seconds has passed since
# the last inference.
now = time.time()
diff = now - last_inference_time
if diff < interval_between_inference:
time.sleep(pause_time)
continue
last_inference_time = now
# Load the input audio and run classify.
tensor_audio.load_from_audio_record(audio_record)
result = classifier.classify(tensor_audio)
for category in result.classifications[0].categories:
print(category.category_name, category.score)
Hope it's helpful for people playing around with similar projects.

Google colab not loading image files while using tensorflow 2.0 batched dataset

A little bit of background, I am loading about 60,000 images to colab to train a GAN. I have already uploaded them to Drive and the directory structure contains folders for different classes (about 7-8) inside root. I am loading them to colab as follows:
root = "drive/My Drive/data/images"
root = pathlib.Path(root)
list_ds = tf.data.Dataset.list_files(str(root/'*/*'))
for f in list_ds.take(3):
print(f.numpy())
which gives the ouput:
b'drive/My Drive/data/images/folder_1/2994.jpg'
b'drive/My Drive/data/images/folder_1/6628.jpg'
b'drive/My Drive/data/images/folder_2/37872.jpg'
I am further processing them as follows:
def process_path(file_path):
label = tf.strings.split(file_path, '/')[-2]
image = tf.io.read_file(file_path)
image = tf.image.decode_jpeg(image)
image = tf.image.convert_image_dtype(image, tf.float32)
return image#, label
ds = list_ds.map(process_path)
BUFFER_SIZE = 60000
BATCH_SIZE = 128
train_dataset = ds.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
Each image is of size 128x128. Now coming to the problem when I try to view a batch in colab the execution goes on forever and never stops, for example, with this code:
for batch in train_dataset.take(4):
print([arr.numpy() for arr in batch])
Earlier I thought that batch_size might be a issue so tried changing it but still same problem. Can it be a problem due to colab as I am loading a large number of files?
Or due to the size of images as it was working when using MNIST(28x28)? If so, what are the possible solutions?
Thanks in advance.
EDIT:
After removing the shuffle statement, the last line gets executed within a few seconds. So I thought it could be a problem due to BUFFER_SIZE of shuffle, but even with a reduced BUFFER_SIZE, it is again taking a very long time to execute. Any workaround?
Here is how I load a 1.12GB zipped FLICKR image dataset from my personal Google Drive. First, I unzip the dataset in the colab environment. Some features that can speed up the performance is prefetch and autotune. Additionally, I use the local colab cache to store the processed images. This takes ~20 seconds to execute the first time (assuming you have unzipped the dataset). The cache then allows subsequent calls to load very fast.
Assuming you have authorized the google drive API, I start with unzipping the folder(s)
!unzip /content/drive/My\ Drive/Flickr8k
!unzip Flickr8k_Dataset
!ls
I then used your code with the addition of prefetch(), autotune, and cache file.
import pathlib
import tensorflow as tf
def prepare_for_training(ds, cache, BUFFER_SIZE, BATCH_SIZE):
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.shuffle(buffer_size=BUFFER_SIZE)
ds = ds.batch(BATCH_SIZE)
ds = ds.prefetch(buffer_size=AUTOTUNE)
return ds
AUTOTUNE = tf.data.experimental.AUTOTUNE
root = "Flicker8k_Dataset"
root = pathlib.Path(root)
list_ds = tf.data.Dataset.list_files(str(root/'**'))
for f in list_ds.take(3):
print(f.numpy())
def process_path(file_path):
label = tf.strings.split(file_path, '/')[-2]
img = tf.io.read_file(file_path)
img = tf.image.decode_jpeg(img)
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
img = tf.image.resize(img, [128, 128])
return img#, label
ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)
train_dataset = prepare_for_training(ds, cache="./custom_ds.tfcache", BUFFER_SIZE=600000, BATCH_SIZE=128)
for batch in train_dataset.take(4):
print([arr.numpy() for arr in batch])
Here is a way to do it with keras flow_from_directory(). The benefit of this approach is that you avoid the tensorflow shuffle() which depending on the buffer size may require a processing of the whole dataset. Keras gives you an iterator which you can call to fetch the data batch and has the random shuffling built in.
import pathlib
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
root = "Flicker8k_Dataset"
BATCH_SIZE=128
train_datagen = ImageDataGenerator(
rescale=1./255 )
train_generator = train_datagen.flow_from_directory(
directory = root, # This is the source directory for training images
target_size=(128, 128), # All images will be resized
batch_size=BATCH_SIZE,
shuffle=True,
seed=42, #for the shuffle
classes=[''])
i = 4
for batch in range(i):
[print(x[0]) for x in next(train_generator)]

Tensorflow Pipeline for images and numpy files

I am working with tensorflow 2.0.0 and am trying to setup an efficient pipeline for feeding in ~90,000 png images of size (256, 256, 3) and their labels which are numpy arrays of size (256,256) for an image segmentation problem. These images and labels won't load fully into memory.
The data are stored in a directory like this:
'C:/Users/user/Documents/data/ims/' #png images
'C:/Users/user/Documents/data/masks/' #img labels/masks
The file names are the same save the extension so for example "test1.png" and "test1.npy" are an image/label pair.
The data are not split into training, validation, and test subsets yet.
I need to get to a point in which I have both the images and labels split into train, validation, and testing subsets, and also have a means to feed the data into a model for training.
I was following this guide here but could not figure out how to deal with the numpy files within the get_label function.
I thought I could write a function that splits the data into subsets via file names alone and then on the fly load the batches via the file names provided, but I can't figure out how to do this efficiently.
I'm currently doing this which either doesn't work because the files are too big or too slow because there are some many files to load into memory, either of which isn't a viable solution.
import tensorflow as tf
import numpy as np
import glob2 as glob
from imageio import imread
base = '/mnt/projects/CNN_Data/clean_data/'
image_path = sorted(glob.glob(base + 'ims/*.png'))
label_path = sorted(glob.glob(base + 'masks/*.npy'))
images = [imread(img).astype(np.float32)/255.0 for img in image_path]
labels = [np.load(path) for path in label_path]
Edit to add:
Here was my attempt following the tensorflow example that I linked above. It runs, but I can't get get_label to what I want.
import tensorflow as tf
import numpy as np
import os
AUTOTUNE = tf.data.experimental.AUTOTUNE
base = '/mnt/projects/CNN_Data/clean_data/'
list_ds = tf.data.Dataset.list_files(base + 'ims/*')
def get_label(file_path):
parts = tf.strings.split(file_path, os.path.sep)
parts[-2] == 'masks'
fname = tf.strings.split(parts[-1], '.')[0]
fname = tf.strings.join([fname, '.npy'])
parts[-1] == fname
return parts
def decode_img(img):
img = tf.image.decode_png(img, channels = 3)
img = tf.image.convert_image_dtype(img, tf.float32)
return img
def process_path(file_path):
label = get_label(file_path)
img = tf.io.read_file(file_path)
img = decode_img(img)
return img, label
labeled_ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)

Why would this dataset implementation run out of memory?

I follow this instruction and write the following code to create a Dataset for images(COCO2014 training set)
from pathlib import Path
import tensorflow as tf
def image_dataset(filepath, image_size, batch_size, norm=True):
def preprocess_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, image_size)
if norm:
image /= 255.0 # normalize to [0,1] range
return image
def load_and_preprocess_image(path):
image = tf.read_file(path)
return preprocess_image(image)
all_image_paths = [str(f) for f in Path(filepath).glob('*')]
path_ds = tf.data.Dataset.from_tensor_slices(all_image_paths)
ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds = ds.shuffle(buffer_size = len(all_image_paths))
ds = ds.repeat()
ds = ds.batch(batch_size)
ds = ds.prefetch(tf.data.experimental.AUTOTUNE)
return ds
ds = image_dataset(train2014_dir, (256, 256), 4, False)
image = ds.make_one_shot_iterator().get_next('images')
# image is then fed to the network
This code will always run out of both memory(32G) and GPU(11G) and kill the process. Here is the messages shown on terminal.
I also spot that the program get stuck at sess.run(opt_op). Where is wrong? How can I fix it?
The problem is this:
ds = ds.shuffle(buffer_size = len(all_image_paths))
The buffer that Dataset.shuffle() uses is an 'in memory' buffer so you are effectively trying to load the whole dataset in memory.
You have a couple of options (which you can combine) to fix this:
Option 1:
Reduce the buffer size to a much smaller number.
Option 2:
Move the shuffle() statment before the map() statement.
This means we would be shuffling before we load the images therefore we'd just be storing the filenames in the memory buffer for the shuffle rather than storing huge tensors.

Keras MobileNet example yields different answers on different computers

I have a very simple example with the Keras MobileNet implementation trying to classify a minivan. I run the same code on two different computers and get different results, not just slightly different but different enough that the classifications are not the same.
(note that Tensorflow=1.7.0 and Keras=2.1.5 on both computers)
Code below
import sys
import argparse
import numpy as np
from PIL import Image
import requests
from io import BytesIO
import time
try:
import matplotlib.pyplot as plt
HAS_MATPLOTLIB = True
except:
HAS_MATPLOTLIB = False
from keras.preprocessing import image
#from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from keras.applications.mobilenet import MobileNet, preprocess_input, decode_predictions
#model = ResNet50(weights='imagenet')
model = MobileNet()
target_size = (224, 224)
def predict(model, img, target_size, top_n=3):
"""Run model prediction on image
Args:
model: keras model
img: PIL format image
target_size: (w,h) tuple
top_n: # of top predictions to return
Returns:
list of predicted labels and their probabilities
"""
if img.size != target_size:
img = img.resize(target_size)
print "preprocessing input.."
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print "making predicition..."
preds = model.predict(x)
print "prediction made: %s" % preds
return decode_predictions(preds, top=top_n)[0]
if __name__=="__main__":
a = argparse.ArgumentParser()
a.add_argument("--image", help="path to image")
a.add_argument("--image_url", help="url to image")
args = a.parse_args()
if args.image is None and args.image_url is None:
a.print_help()
sys.exit(1)
if args.image is not None:
img = Image.open(args.image)
preds = predict(model, img, target_size)
if args.image_url is not None:
print "getting image from url"
response = requests.get(args.image_url)
print "image gotten from url"
img = Image.open(BytesIO(response.content))
print "predicting.."
before = time.time()
preds = predict(model, img, target_size)
print "total time to predict: %.2f" % (time.time() - before)
print preds
plot_preds(img, preds)
Now if I run this on my MacBook Pro
$ python classify_example_mobile.py --image_url http://i.imgur.com/cg37Ojo.jpg
[(u'n03770679', u'minivan', 0.39935172), (u'n02974003', u'car_wheel', 0.28071228), (u'n02814533', u'beach_wagon', 0.19400564)]
but if I then run it on another computer that I have
(venv) $ python classify_example_mobile.py --image_url http://i.imgur.com/cg37Ojo.jpg
[(u'n02974003', u'car_wheel', 0.39516035), (u'n02814533', u'beach_wagon', 0.27965376), (u'n03770679', u'minivan', 0.22706936)]
the predictions are reversed, it no longer picks minivan as the top result.
How could this be? I know that different architectures can have different floating-point math accuracy, but would that be enough to account for these results? I also know that models can vary depending on the way the weights are initialized during training, but this is a pre-trained model, so what gives?
edit - to be clear, the image is a picture of a minivan, so in this case one architecture gets it right and the other one gets it wrong - so this is a big deal for me. (http://i.imgur.com/cg37Ojo.jpg)
So I don't quite understand what is going on here, but the error appears to have gone away once I did some more preprocessing of the input, which makes me think that maybe I had different PIL versions of numpy versions or something.
I added these lines
img = img.convert("RGB")
and now the results between the two computers are identical

Categories

Resources