I am trying to run a tflite model on Android for object detection. For the same,
I have successfully trained the model with my sets of images as follows:
(a) Training:
!python3 object_detection/model_main.py \
--pipeline_config_path=/content/drive/My\ Drive/Detecto\ Tutorial/models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config \
--model_dir=training/
(modifying the config file to point to where my specific TFrecords are mentioned)
(b) Export inference graph
!python /content/drive/'My Drive'/'Detecto Tutorial'/models/research/object_detection/export_inference_graph.py \
--input_type=image_tensor \
--pipeline_config_path=/content/drive/My\ Drive/Detecto\ Tutorial/models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config \
--output_directory={output_directory} \
--trained_checkpoint_prefix={last_model_path}
(c) Create tflite ready graph
!python /content/drive/'My Drive'/'Detecto Tutorial'/models/research/object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=/content/drive/My\ Drive/Detecto\ Tutorial/models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config \
--output_directory={output_directory} \
--trained_checkpoint_prefix={last_model_path} \
--add_postprocessing_op=true
I have created a tflite model using tflite_convert from the graph file as follows
!tflite_convert
--graph_def_file=/content/drive/My\ Drive/Detecto\ Tutorial/models/research/fine_tuned_model/tflite_graph.pb
--output_file=/content/drive/My\ Drive/Detecto\ Tutorial/models/research/fine_tuned_model/detect3.tflite
--output_format=TFLITE
--input_shapes=1,300,300,3
--input_arrays=normalized_input_image_tensor
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'
--inference_type=FLOAT
--allow_custom_ops
The above tflite model is validated independently and works fine (outside of Android).
There is a requirement now to populate the tflite model with metadata so that it can be processed in the sample Android code provided as per link below (as I am getting an error otherwise: not a valid Zip file and does not have associated files when run on Android studio).
https://github.com/tensorflow/examples/blob/master/lite/examples/object_detection/android/README.md
The sample .TFlite provided as part of the same link is populated with metadata and works fine.
When I try to use the following link:
https://www.tensorflow.org/lite/convert/metadata#deep_dive_into_the_image_classification_example
populator = _metadata.MetadataPopulator.with_model_file('/content/drive/My Drive/Detecto Tutorial/models/research/fine_tuned_model/detect3.tflite')
populator.load_metadata_buffer(metadata_buf)
populator.load_associated_files(['/content/drive/My Drive/Detecto Tutorial/models/research/fine_tuned_model/labelmap.txt'])
populator.populate()
to add metadata (rest of the code is practically the same with some changes of meta description to Object detection instead of Image classification and specifying the location of labelmap.txt), it gives the following error:
<ipython-input-6-173fc798ea6e> in <module>()
1 populator = _metadata.MetadataPopulator.with_model_file('/content/drive/My Drive/Detecto Tutorial/models/research/fine_tuned_model/detect3.tflite')
----> 2 populator.load_metadata_buffer(metadata_buf)
3 populator.load_associated_files(['/content/drive/My Drive/Detecto Tutorial/models/research/fine_tuned_model/labelmap.txt'])
4 populator.populate()
1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_lite_support/metadata/metadata.py in _validate_metadata(self, metadata_buf)
540 "The number of output tensors ({0}) should match the number of "
541 "output tensor metadata ({1})".format(num_output_tensors,
--> 542 num_output_meta))
543
544
ValueError: The number of output tensors (4) should match the number of output tensor metadata (1)
The 4 output tensors are the ones mentioned in the output_arrays in step 2 (someone may correct me there). I am not sure how to update output tensor metadata accordingly.
Can anyone who has recently used object detection using custom model (and then apply on Android) help? Or help understand how to update tensor metadata to 4 instead of 1.
Update on Jun 10, 2021:
See the latest tutorial about Metadata Writer Library on tensorflow.org.
Update:
The Metadata Writer library has been released. It currently supports image classifier and object detector, and more supported tasks are on the way.
Here is an example to write metadata for an object detector model:
Install the TFLite Support nightly Pypi package:
pip install tflite_support_nightly
Write metadata to the model using the following script:
from tflite_support.metadata_writers import object_detector
from tflite_support.metadata_writers import writer_utils
from tflite_support import metadata
ObjectDetectorWriter = object_detector.MetadataWriter
_MODEL_PATH = "ssd_mobilenet_v1_1_default_1.tflite"
_LABEL_FILE = "labelmap.txt"
_SAVE_TO_PATH = "ssd_mobilenet_v1_1_default_1_metadata.tflite"
writer = ObjectDetectorWriter.create_for_inference(
writer_utils.load_file(_MODEL_PATH), [127.5], [127.5], [_LABEL_FILE])
writer_utils.save_file(writer.populate(), _SAVE_TO_PATH)
# Verify the populated metadata and associated files.
displayer = metadata.MetadataDisplayer.with_model_file(_SAVE_TO_PATH)
print("Metadata populated:")
print(displayer.get_metadata_json())
print("Associated file(s) populated:")
print(displayer.get_packed_associated_file_list())
---------- Previous answer that writes metadata manually --------
Here is a code snippet you can use to populate metadata for object detection models, which is compatible with the TFLite Android app.
model_meta = _metadata_fb.ModelMetadataT()
model_meta.name = "SSD_Detector"
model_meta.description = (
"Identify which of a known set of objects might be present and provide "
"information about their positions within the given image or a video "
"stream.")
# Creates input info.
input_meta = _metadata_fb.TensorMetadataT()
input_meta.name = "image"
input_meta.content = _metadata_fb.ContentT()
input_meta.content.contentProperties = _metadata_fb.ImagePropertiesT()
input_meta.content.contentProperties.colorSpace = (
_metadata_fb.ColorSpaceType.RGB)
input_meta.content.contentPropertiesType = (
_metadata_fb.ContentProperties.ImageProperties)
input_normalization = _metadata_fb.ProcessUnitT()
input_normalization.optionsType = (
_metadata_fb.ProcessUnitOptions.NormalizationOptions)
input_normalization.options = _metadata_fb.NormalizationOptionsT()
input_normalization.options.mean = [127.5]
input_normalization.options.std = [127.5]
input_meta.processUnits = [input_normalization]
input_stats = _metadata_fb.StatsT()
input_stats.max = [255]
input_stats.min = [0]
input_meta.stats = input_stats
# Creates outputs info.
output_location_meta = _metadata_fb.TensorMetadataT()
output_location_meta.name = "location"
output_location_meta.description = "The locations of the detected boxes."
output_location_meta.content = _metadata_fb.ContentT()
output_location_meta.content.contentPropertiesType = (
_metadata_fb.ContentProperties.BoundingBoxProperties)
output_location_meta.content.contentProperties = (
_metadata_fb.BoundingBoxPropertiesT())
output_location_meta.content.contentProperties.index = [1, 0, 3, 2]
output_location_meta.content.contentProperties.type = (
_metadata_fb.BoundingBoxType.BOUNDARIES)
output_location_meta.content.contentProperties.coordinateType = (
_metadata_fb.CoordinateType.RATIO)
output_location_meta.content.range = _metadata_fb.ValueRangeT()
output_location_meta.content.range.min = 2
output_location_meta.content.range.max = 2
output_class_meta = _metadata_fb.TensorMetadataT()
output_class_meta.name = "category"
output_class_meta.description = "The categories of the detected boxes."
output_class_meta.content = _metadata_fb.ContentT()
output_class_meta.content.contentPropertiesType = (
_metadata_fb.ContentProperties.FeatureProperties)
output_class_meta.content.contentProperties = (
_metadata_fb.FeaturePropertiesT())
output_class_meta.content.range = _metadata_fb.ValueRangeT()
output_class_meta.content.range.min = 2
output_class_meta.content.range.max = 2
label_file = _metadata_fb.AssociatedFileT()
label_file.name = os.path.basename("label.txt")
label_file.description = "Label of objects that this model can recognize."
label_file.type = _metadata_fb.AssociatedFileType.TENSOR_VALUE_LABELS
output_class_meta.associatedFiles = [label_file]
output_score_meta = _metadata_fb.TensorMetadataT()
output_score_meta.name = "score"
output_score_meta.description = "The scores of the detected boxes."
output_score_meta.content = _metadata_fb.ContentT()
output_score_meta.content.contentPropertiesType = (
_metadata_fb.ContentProperties.FeatureProperties)
output_score_meta.content.contentProperties = (
_metadata_fb.FeaturePropertiesT())
output_score_meta.content.range = _metadata_fb.ValueRangeT()
output_score_meta.content.range.min = 2
output_score_meta.content.range.max = 2
output_number_meta = _metadata_fb.TensorMetadataT()
output_number_meta.name = "number of detections"
output_number_meta.description = "The number of the detected boxes."
output_number_meta.content = _metadata_fb.ContentT()
output_number_meta.content.contentPropertiesType = (
_metadata_fb.ContentProperties.FeatureProperties)
output_number_meta.content.contentProperties = (
_metadata_fb.FeaturePropertiesT())
# Creates subgraph info.
group = _metadata_fb.TensorGroupT()
group.name = "detection result"
group.tensorNames = [
output_location_meta.name, output_class_meta.name,
output_score_meta.name
]
subgraph = _metadata_fb.SubGraphMetadataT()
subgraph.inputTensorMetadata = [input_meta]
subgraph.outputTensorMetadata = [
output_location_meta, output_class_meta, output_score_meta,
output_number_meta
]
subgraph.outputTensorGroups = [group]
model_meta.subgraphMetadata = [subgraph]
b = flatbuffers.Builder(0)
b.Finish(
model_meta.Pack(b),
_metadata.MetadataPopulator.METADATA_FILE_IDENTIFIER)
self.metadata_buf = b.Output()
Related
I have installed tflite_runtime 2.5.0.post1 using !pip install --extra-index-url https://google-coral.github.io/py-repo/ tflite_runtime command on Windows 11 and tried to do inference with my Image Captioning model.
Below is my code:
import numpy as np
from PIL import Image
import tflite_runtime.interpreter as tflite
max_len = 20
word_to_idx = np.load('weights/word_to_idx.npy', allow_pickle = True).item()
idx_to_word = np.load('weights/idx_to_word.npy', allow_pickle = True).item()
FEATURE_GENERATION_MODEL_TFLITE = 'feature_generation_model.tflite'
CAPTION_GENERATION_MODEL_TFLITE = 'caption_generation_model.tflite'
def predict_caption(path):
a = Image.open(path)
a = a.resize((300, 300))
a = np.asarray(a, dtype = 'float32')
imgp = a.reshape(1, 300, 300, 3)
# Model 1
# feature extraction model
feat_interpreter = tflite.Interpreter(model_path = FEATURE_GENERATION_MODEL_TFLITE)
feat_interpreter.allocate_tensors()
input_index = feat_interpreter.get_input_details()[0]['index']
output_index = feat_interpreter.get_output_details()[0]['index']
feat_interpreter.set_tensor(input_index, imgp)
feat_interpreter.invoke()
feature_vector = feat_interpreter.get_tensor(output_index)
feature_vector = feature_vector.reshape((1, 1536))
# We got feature vector using the feature extration tflite model
# Now generating caption using these features
in_text = 'startseq'
for i in range(max_len):
seq = [word_to_idx[w] for w in in_text.split() if w in word_to_idx]
seq = pad_sequences([seq], maxlen = max_len, padding = 'post')
# Model 2
# Caption Generation Model
cap_interpreter = tflite.Interpreter(model_path = CAPTION_GENERATION_MODEL_TFLITE)
cap_interpreter.allocate_tensors()
input_index1 = cap_interpreter.get_input_details()[0]['index']
input_index2 = cap_interpreter.get_input_details()[1]['index']
output_index = cap_interpreter.get_output_details()[0]['index']
cap_interpreter.set_tensor(input_index1, feature_vector)
cap_interpreter.set_tensor(input_index2, np.float32(seq))
cap_interpreter.invoke()
y_pred = cap_interpreter.get_tensor(output_index)
y_pred = y_pred.argmax()
word = idx_to_word[y_pred]
in_text += ' '+word
if word == 'endseq':
break
final_caption = in_text.split()[1:-1]
final_caption = ' '.join(final_caption)
return final_caption
But when I am calling predict_caption('images/image.jpg') function it is giving me this error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10744\846162487.py in <module>
----> 1 predict_caption('images/image.jpg')
~\AppData\Local\Temp\ipykernel_10744\3775461012.py in predict_caption(path)
91 cap_interpreter.set_tensor(input_index1, feature_vector)
92 cap_interpreter.set_tensor(input_index2, np.float32(seq))
---> 93 cap_interpreter.invoke()
94
95 y_pred = cap_interpreter.get_tensor(output_index)
~\anaconda3\lib\site-packages\tflite_runtime\interpreter.py in invoke(self)
831 """
832 self._ensure_safe()
--> 833 self._interpreter.Invoke()
834
835 def reset_all_variables(self):
RuntimeError: Regular TensorFlow ops are not supported by this interpreter. Make sure you apply/link the Flex delegate before inference.Node number 9 (FlexTensorListReserve) failed to prepare.
I have no idea why this is happening, can anyone help here?
When I use import tensorflow.lite as tflite this code is working fine, but I don't want to use tensorflow. I want to use tflite_runtime.
This is how I converted my tensorflow model into tflite models:
# Feature Model
FEATURE_GENERATION_MODEL_TFLITE = 'feature_generation_model.tflite'
tf_lite_converter = tf.lite.TFLiteConverter.from_keras_model(feature_generation_model)
feature_tflite_model = tf_lite_converter.convert()
open(FEATURE_GENERATION_MODEL_TFLITE, 'wb').write(feature_tflite_model)
# Captioning Model
CAPTION_GENERATION_MODEL_TFLITE = 'caption_generation_model.tflite'
tf_lite_converter = tf.lite.TFLiteConverter.from_keras_model(image_captioning_model)
tf_lite_converter.optimizations = [tf.lite.Optimize.DEFAULT]
tf_lite_converter.experimental_new_converter = True
tf_lite_converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
caption_tflite_model = tf_lite_converter.convert()
open(CAPTION_GENERATION_MODEL_TFLITE, 'wb').write(caption_tflite_model)
If anyone wants to reproduce this code, here is a link to Google Drive, it has code, model weights, and tflite model.
I noticed a very strange behaviour regarding the 3D Resnet by Facebookresearch. Using their sample code from the website, I receive different results, when putting the model on GPU. While on cpu the correct class (archery) is predicted, the model fails to predict it on GPU. Can anyone replicate this and confirm that this is indeed the case? Does anyone know, why this is happening and how to prevent it? Following, you will find some code to quickly test it out:
import torch
import json
import urllib
from pytorchvideo.data.encoded_video import EncodedVideo
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
CenterCropVideo,
NormalizeVideo,
)
from pytorchvideo.transforms import (
ApplyTransformToKey,
ShortSideScale,
UniformTemporalSubsample
)
def predict_archery(model, device):
json_url = "https://dl.fbaipublicfiles.com/pyslowfast/dataset/class_names/kinetics_classnames.json"
json_filename = "kinetics_classnames.json"
try:
urllib.URLopener().retrieve(json_url, json_filename)
except:
urllib.request.urlretrieve(json_url, json_filename)
with open(json_filename, "r") as f:
kinetics_classnames = json.load(f)
# Create an id to label name mapping
kinetics_id_to_classname = {}
for k, v in kinetics_classnames.items():
kinetics_id_to_classname[v] = str(k).replace('"', "")
side_size = 256
mean = [0.45, 0.45, 0.45]
std = [0.225, 0.225, 0.225]
crop_size = 256
num_frames = 8
sampling_rate = 8
frames_per_second = 30
# Note that this transform is specific to the slow_R50 model.
transform = ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(num_frames),
Lambda(lambda x: x / 255.0),
NormalizeVideo(mean, std),
ShortSideScale(
size=side_size
),
CenterCropVideo(crop_size=(crop_size, crop_size))
]
),
)
# The duration of the input clip is also specific to the model.
clip_duration = (num_frames * sampling_rate) / frames_per_second
url_link = "https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4"
video_path = 'archery.mp4'
try:
urllib.URLopener().retrieve(url_link, video_path)
except:
urllib.request.urlretrieve(url_link, video_path)
# Select the duration of the clip to load by specifying the start and end duration
# The start_sec should correspond to where the action occurs in the video
start_sec = 0
end_sec = start_sec + clip_duration
# Initialize an EncodedVideo helper class and load the video
video = EncodedVideo.from_path(video_path)
# Load the desired clip
video_data = video.get_clip(start_sec=start_sec, end_sec=end_sec)
# Apply a transform to normalize the video input
video_data = transform(video_data)
# Move the inputs to the desired device
inputs = video_data["video"]
inputs = inputs.to(device)
# Pass the input clip through the model
preds = model(inputs[None, ...])
# Get the predicted classes
post_act = torch.nn.Softmax(dim=1)
preds = post_act(preds)
pred_classes = preds.topk(k=5).indices[0]
# Map the predicted classes to the label names
pred_class_names = [kinetics_id_to_classname[int(i)] for i in pred_classes]
print("Top 5 predicted labels: %s" % ", ".join(pred_class_names))
if __name__ == '__main__':
# Choose device
# device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device = torch.device("cpu")
# Choose the `slow_r50` model
model = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True).to(device)
model = model.eval()
predict_archery(model, device)
Results on cpu:
Top 5 predicted labels: archery, throwing axe, playing paintball,
stretching arm, riding or walking with horse
Results on GPU:
Top 5 predicted labels: flying kite, air drumming, beatboxing,
smoking, reading book
Edit:
Apparently, this issue cannot be reproduced on google colab. I therefore assume that the issue is related to the specific hardware / cuda version. I am using a NVIDIA TITAN Xp and cuda version 11.4.
I'm studying TTS and when doing training, the training stops with the information Segmentation fault (core dumped)
import os
# Trainer: Where the ✨️ happens.
# TrainingArgs: Defines the set of arguments of the Trainer.
from trainer import Trainer, TrainerArgs
# GlowTTSConfig: all model related values for training, validating and testing.
from TTS.tts.configs.tacotron2_config import Tacotron2Config
# BaseDatasetConfig: defines name, formatter and path of the dataset.
from TTS.tts.configs.shared_configs import BaseDatasetConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.tacotron2 import Tacotron2
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
from TTS.config.shared_configs import BaseAudioConfig
# we use the same path as this script as our training folder.
output_path = os.path.dirname(os.path.abspath(__file__))
# DEFINE DATASET CONFIG
# Set LJSpeech as our target dataset and define its path.
# You can also use a simple Dict to define the dataset and pass it to your custom formatter.
dataset_config = BaseDatasetConfig(
name="ljspeech", meta_file_train="metadata.csv", path=os.path.join(output_path, "../LJSpeech-1.1/")
)
# INITIALIZE THE TRAINING CONFIGURATION
# Configure the model. Every config class inherits the BaseTTSConfig.
audio_config = BaseAudioConfig(
sample_rate=22050,
do_trim_silence=True,
trim_db=60.0,
signal_norm=False,
mel_fmin=0.0,
mel_fmax=8000,
spec_gain=1.0,
log_func="np.log",
ref_level_db=20,
preemphasis=0.0,
)
config = Tacotron2Config( # This is the config that is saved for the future use
audio=audio_config,
batch_size=4,
eval_batch_size=4,
num_loader_workers=2,
num_eval_loader_workers=2,
run_eval=True,
test_delay_epochs=-1,
r=6,
gradual_training=[[0, 6, 4], [10000, 4, 4], [50000, 3, 4], [100000, 2, 4]],
double_decoder_consistency=True,
epochs=1000,
text_cleaner="phoneme_cleaners",
use_phonemes=True,
phoneme_language="en-us",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
print_step=10,
print_eval=True,
mixed_precision=False,
output_path=output_path,
datasets=[dataset_config],
)
# INITIALIZE THE AUDIO PROCESSOR
# Audio processor is used for feature extraction and audio I/O.
# It mainly serves to the dataloader and the training loggers.
ap = AudioProcessor.init_from_config(config)
# INITIALIZE THE TOKENIZER
# Tokenizer is used to convert text to sequences of token IDs.
# If characters are not defined in the config, default characters are passed to the config
tokenizer, config = TTSTokenizer.init_from_config(config)
# LOAD DATA SAMPLES
# Each sample is a list of ```[text, audio_file_path, speaker_name]```
# You can define your custom sample loader returning the list of samples.
# Or define your custom formatter and pass it to the `load_tts_samples`.
# Check `TTS.tts.datasets.load_tts_samples` for more details.
from TTS.tts.datasets import load_tts_samples
# custom formatter implementation
def formatter(root_path, manifest_file, **kwargs): # pylint: disable=unused-argument
"""Assumes each line as ```<filename>|<transcription>```
"""
txt_file = os.path.join(root_path, manifest_file)
items = []
speaker_name = "my_speaker"
with open(txt_file, "r", encoding="utf-8") as ttf:
for line in ttf:
cols = line.split("|")
#wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav")
wav_file = "/media/DATA-2/TTS/coqui/LJSpeech-1.1/wavs/" + cols[0] + ".wav"
text = cols[1]
items.append({"text":text, "audio_file":wav_file, "speaker_name":speaker_name})
return items
# load training samples
train_samples, eval_samples = load_tts_samples(dataset_config, eval_split=True, formatter=formatter)
# INITIALIZE THE MODEL
# Models take a config object and a speaker manager as input
# Config defines the details of the model like the number of layers, the size of the embedding, etc.
# Speaker manager is used by multi-speaker models.
print("================== train_samples ========================")
print("len data : ", len(train_samples))
print(train_samples)
print("======================== eval_samples ================")
print("len data : ", len(eval_samples))
print(eval_samples)
model = Tacotron2(config, ap, tokenizer, speaker_manager=None)
# INITIALIZE THE TRAINER
# Trainer provides a generic API to train all the 🐸TTS models with all its perks like mixed-precision training,
# distributed training, etc.
trainer = Trainer(
TrainerArgs(), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples
)
# AND... 3,2,1... 🚀
trainer.fit()
training runs until step 647/2162 and stops like this:
--> STEP: 647/2162 -- GLOBAL_STEP: 647
| > decoder_loss: 35.02273 (33.81891)
| > postnet_loss: 37.02569 (35.82565)
| > stopnet_loss: 0.82287 (0.85986)
| > decoder_coarse_loss: 35.01795 (33.80500)
| > decoder_ddc_loss: 0.00264 (0.00408)
| > ga_loss: 0.00451 (0.00664)
| > decoder_diff_spec_loss: 0.42732 (0.43585)
| > postnet_diff_spec_loss: 4.44786 (4.47058)
| > decoder_ssim_loss: 0.99999 (0.99978)
| > postnet_ssim_loss: 0.99983 (0.99947)
| > loss: 29.33145 (28.48289)
| > align_error: 0.98594 (0.97906)
| > grad_norm: 3.32803 (3.73880)
| > current_lr: 0.00000
| > step_time: 0.59430 (0.45785)
| > loader_time: 0.00150 (0.00150)
Segmentation fault (core dumped)
there was an error "CUBLAS_STATUS_EXECUTION_FAILED before" so I checked the pytorch version and I'm using 1.11.0 version
and I reduced the batch size because it was previously out of memory
what sould i do?
I'm using the common implementation from YoloV3 to do some inference. Which works fine using the regular in- and output.
modelWeightPath = r"./yolov3.weights"
modelPath = r"./yolov3.cfg"
network = cv2.dnn.readNetFromDarknet(modelPath,modelWeightPath)
Since we are using some edge devices which often cannot "convert" the last few layers, I m about to use the original implementation to do only the inference on the last few layers.
I know how the layers are named (network.getLayerNames()) and I know how de data from the previous layer looks like since I saved them to do the testing. (see input data -> inputScale1 from conv_81 Layer)
inputLayers = ['permute_82','permute_94','permute_106']
inputData = [cv2.UMat(inputScale1),cv2.UMat(inputScale2),cv2.UMat(inputScale3)]
Now I m not sure how I should use that knowledge to do the inference since I only get exceptions from all my attempts.
network.setInput(blob=inputData[0],name=inputLayers[0]) - throws
outs = network.forward(outputlayers[0])
throws the following exception :OpenCV(4.0.1) C:\ci\opencv-suite_1573470242804\work\modules\dnn\src\dnn.cpp:2929: error: (-204:Requested object was not found) Requested blob "permute_82" not found in function 'cv::dnn::dnn4_v20181221::Net::setInput'
network.setInputsNames(inputLayers)
network.setInput(inputData[0],name=inputLayers[0])
network.setInput(inputData[1],name=inputLayers[1])
network.setInput(inputData[2],name=inputLayers[2])
outs = network.forward() -> throws
Will throw: cv2.error: OpenCV(4.0.1) C:\ci\opencv-suite_1573470242804\work\modules\dnn\src\dnn.cpp:686: error: (-215:Assertion failed) inputs.size() == requiredOutputs in function 'cv::dnn::dnn4_v20181221::DataLayer::getMemoryShapes'
EDIT:
But the thing is, that this example works:
imgPath = r'./frame_93.png'
image = cv2.imread(imgPath);
blobInputimage = cv2.dnn.blobFromImage(image,1.0 / 255.0,(416,416),(0, 0, 0))
network.setInputsNames(['conv_0'])
network.setInput(blobInputimage,name='conv_0')
output = network.forward('conv_81')
but still you cannot do the forwarding only from the permute layer to the yolo layer.
Does someone know a solution?
So far I was able to get the same result as if I would to it by inferencing the regular network. Therefore I created per scale a "new" network from the .cfg file by removing all entries except the yolo entrie, as example yolov3_scale_1.cfg looks like:
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=16
width=255
height=13
channels=13
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
[yolo]
mask = 6,7,8
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
The code verify the output looks like:
def getOutputLayerNames(network):
layer_names = network.getLayerNames()
outputlayers=[layer_names[i[0] - 1] for i in network.getUnconnectedOutLayers()]
return outputlayers
# Load Network
modelWeightPath = r".\yolov3.weights"
modelPath_scale_1 = r".\yolov3_scale_1.cfg"
network_scale_1 = cv2.dnn.readNetFromDarknet(modelPath_scale_1,modelWeightPath)
modelPath_scale_2 = r".\yolov3_scale_2.cfg"
network_scale_2 = cv2.dnn.readNetFromDarknet(modelPath_scale_2,modelWeightPath)
modelPath_scale_3 = r".\yolov3_scale_3.cfg"
network_scale_3 = cv2.dnn.readNetFromDarknet(modelPath_scale_3,modelWeightPath)
networks = [network_scale_1,network_scale_2,network_scale_3]
outputLayers1 = getOutputLayerNames(network_scale_1)
outputLayers2 = getOutputLayerNames(network_scale_2)
outputLayers3 = getOutputLayerNames(network_scale_3)
## Read FileStorage - Network input
pathToFile = r'.\previousLayerOutput.yml'
s = cv2.FileStorage()
s.open(pathToFile, cv2.FileStorage_READ)
# Get outputs to verify behaviour
inputScale1 = s.getNode('conv_81').mat()
inputScale2 = s.getNode('conv_93').mat()
inputScale3 = s.getNode('conv_105').mat()
ouputOfYoloScale1 = s.getNode('yolo_82').mat()
ouputOfYoloScale2 = s.getNode('yolo_94').mat()
ouputOfYoloScale3 = s.getNode('yolo_106').mat()
correctOutputs = [ouputOfYoloScale1,ouputOfYoloScale2,ouputOfYoloScale3]
inputs = [inputScale1,inputScale2,inputScale3]
outputs = []
for network_with_different_scales,imageInputScaled in zip(networks,inputs):
network_with_different_scales.setInputsNames('permute_0')
network_with_different_scales.setInput(imageInputScaled)
outputs.append(network_with_different_scales.forward('yolo_0'))
strides = [32,16,8] # need to do it manually
for outputIdx,stride in zip(range(0,len(outputs)),strides):
outputs[outputIdx][:,3] = outputs[outputIdx][:,3]/stride
outputs[outputIdx][:,2] = outputs[outputIdx][:,2]/stride
for output, correctOutput in zip(outputs,correctOutputs):
print(np.array_equal(output,correctOutput))
The console output:
True
True
True
in spaCy < 3.0 I was able to train the NER component within the trained en_core_web_sm model:
python -m spacy train en model training validation --base-model en_core_web_sm --pipeline "ner" -R -n 10
Specifically, I need the tagger and in the parser of the en_core_web_sm model.
spaCy's new version doesn't take these commands anymore, they need to be set in the config file. According to spaCy's website these components can be added with the corresponding source and then insert to the frozen_component in the training section of the config file (I will provide my full config at the end of this question):
[components]
[components.tagger]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]
[components.parser]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]
.
.
.
[training]
frozen_components = ["tagger","parser"]
When I'm debugging, following error occurs:
ValueError: [E922] Component 'tagger' has been initialized with an output dimension of 49 - cannot add any more labels.
When I put tagger to the disabled components in the nlp section of the config file or if I delete everything related to the tagger, debugging and training works. However, when applying the trained model to a text loaded to a doc, only the trained NER works and none of the other components. E.g. the parser predicts everything is ROOT.
I also tried to train the NER model on its own and then add it to the loaded en_core_web_sm model:
MODEL_PATH = 'data/model/model-best'
nlp = spacy.load(MODEL_PATH)
english_nlp = spacy.load("en_core_web_sm")
ner_labels = nlp.get_pipe("ner")
english_nlp.add_pipe('ner_labels')
This leads to the following error:
ValueError: [E002] Can't find factory for 'ner_labels' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `#Language.component` (for function components) or `#Language.factory` (for class components).
Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer
Does anyone have a suggestion how I can either train my NER with the en_core_web_sm model or how I could integrate my trained component?
Here's the full config file:
[paths]
train = "training"
dev = "validation"
vectors = null
init_tok2vec = null
[system]
gpu_allocator = null
seed = 0
[nlp]
lang = "en"
pipeline = ["tok2vec","tagger","parser","ner"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"#tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.tagger]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]
[components.parser]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]
[components.ner]
factory = "ner"
moves = null
update_with_oracle_cut_size = 100
[components.ner.model]
#architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null
[components.ner.model.tok2vec]
#architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"
[components.tok2vec]
factory = "tok2vec"
[components.tok2vec.model]
#architectures = "spacy.Tok2Vec.v2"
[components.tok2vec.model.embed]
#architectures = "spacy.MultiHashEmbed.v1"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,2500,2500,2500]
include_static_vectors = false
[components.tok2vec.model.encode]
#architectures = "spacy.MaxoutWindowEncoder.v2"
width = 256
depth = 8
window_size = 1
maxout_pieces = 3
[corpora]
[corpora.dev]
#readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
#readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 2000
gold_preproc = false
limit = 0
augmenter = null
[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = ["tagger","parser"]
before_to_disk = null
[training.batcher]
#batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null
[training.batcher.size]
#schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0
[training.logger]
#loggers = "spacy.ConsoleLogger.v1"
progress_bar = false
[training.optimizer]
#optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001
[training.score_weights]
ents_per_type = null
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
[pretraining]
[initialize]
vectors = "en_core_web_lg"
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
I provided a longer answer on spaCy's discussion forum here, but in a nutshell if you want to source and freeze your parser/tagger, use this in the config:
[components.tagger]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]
[components.parser]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]
[components.tok2vec]
source = "en_core_web_sm"
i.e. make sure that the tagger & parser can connect to the correct tok2vec instance they were initially trained on.
You can then create an independent NER component either on top of the sourced (and pretrained) tok2vec, or create a new internal tok2vec component for the NER, or create a second tok2vec component with a distinct name, that you refer to as the upstream argument of the NER's Tok2VecListener.