How to do input for detectron2 builtinmodel? - python

I trained a model, now I would like to use it to detect objects in images. Using the DefaultDetector only the boundyboxes are returned, I would need the masks. I saw that you can also perform inference with this method:
model.eval()
with torch.no_grad():
outputs = model(inputs)
I think that's what he should use. The problem is that I don't know how to set the inputs, starting with images.
import torch
import glob
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/"
"mask_rcnn_R_101_FPN_3x.yaml"))
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class
cfg.INPUT.FORMAT = "BGR"
#Just run these lines if you have the trained model im memory
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set the testing threshold for this model
#build model
model = build_model(cfg)
DetectionCheckpointer(model).load("output/model_final.pth")
model.eval()#make sure its in eval mode
image = cv2.imread("/kaggle/working/detectron2/images/73-ab1.jpg")
height, width = image.shape[:2]
image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
image = ImageList.from_tensors([image])
with torch.no_grad():
inputs = image
outputs = model(inputs)
Unfortunately, however, I think I'm wrong, can someone enlighten me?

See the Model Input Format for the builtin models.
Basically, the model in your code is not expecting an ImageList object, but a list of dicts where each dict needs to provide specific information about one image, as explained in the documentation linked above.
So, your inference code needs to be corrected to the following.
image = cv2.imread("/kaggle/working/detectron2/images/73-ab1.jpg")
height, width = image.shape[:2]
image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
inputs = [{"image": image, "height": height, "width": width}]
with torch.no_grad():
outputs = model(inputs)
You can also see this in the code - the forward method of the GeneralizedRCNN class.

Related

Visualize output from Mark RCNN

I followed this tutorial: https://github.com/lih0905/PyTorch_Study/blob/master/8)%20TorchVision%200.3%20Object%20Detection%20finetuning%20tutorial.ipynb and how can you see there is output images at the end of page and it's pretty good.
Now I also did this and my images are cloudy, it looks like a trash.
I trained maskrcnn and save models, and from here I'm loading epoch-2.pt to see results.
This is my code:
PATH = '/home/Nezz/Train/ArT/models_vjezba/epoch-2.pt'
#model = torch.load(PATH)
#model.to(device)
model.load_state_dict(torch.load(PATH))
# pick one image from the test set
img, _ = dataset_test[21]
# put the model in evaluation mode
model.eval()
#evaluate(model, data_loader_test, device=device)
with torch.no_grad():
prediction = model([img.to(device)])
#print(prediction)
imaag = Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())
imag = Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())
imag.show()
imaag.show()
output:

Predict Image class after One shot model Training

I am making image search using the one-shot model because I have very few data for per class.
I am following this tutorial
Already prepared the datapipeline and trained the model. But I didn't understand the single image prediction process which we do which we do generally by model.predict.
I tried the following code but I think I am missing something.
img1 = cv2.imread("./images_evaluation/test.jpg",cv2.IMREAD_GRAYSCALE)
img1 = cv2.resize(img1,(105,105))
img1 = np.expand_dims(cv2.resize(img1, (105,105)), axis=2)
(test_image_names, train_image_names) = generate_oneshot_validation_trials(dataset, 20)
train_images = get_images(train_image_names, IMAGE_SHAPE)
images = np.tile(img1, (len(train_images), 1, 1, 1))
preds = siamese_model1.predict([images, train_images])
pred_idx = np.argmax(preds, axis=0)[0]
pred_char_name = train_image_names[pred_idx].split('/')[-2]
print(pred_char_name) ## here, finding different prediction after every try. whats the reason?

'TFLiteKerasModelConverterV2' object has no attribute 'predict'

I am trying to predict values by loading a saved version of my model.
here is the code for it-
def classifier(img, weights_file):
# Load the model
model = tf.lite.TFLiteConverter.from_keras_model(weights_file)
# Create the array of the right shape to feed into the keras model
data = np.ndarray(shape=(1, 200, 200, 3), dtype=np.float32)
image = img
# image sizing
size = (200, 200)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
# turn the image into a numpy array
image_array = np.asarray(image)
# Normalize the image
normalized_image_array = image_array.astype(np.float32) / 255
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction_percentage = model.predict(data)
prediction = prediction_percentage.round()
return prediction, prediction_percentage
My model throws an error " 'TFLiteKerasModelConverterV2' object has no attribute 'predict'"
Can anyone please tell me what can i change here?
You are creating a TFLiteConverter object from your weights file. The correct way to load the model weights is using load_weights link. Try:
tf.keras.model.load_weights(weights_file)
However, you also would first need to define the model the same way as you did when training the model. If you have saved your model in SavedModel format, use
model = tf.keras.models.load_model(weights_file)

Only Using Colours From A Specific Part of a Picture For Style Transfer

I've got a neural style transfer model. I'm currently working on trying to use different parts of an image to transfer different pictures. I'm wondering how can I get the model to just use the colours present in an image. Below is an example:
The picture above is the style image that I have gotten from using thresholding along with the original image. Now the transferred picture is below:
Obviously it's transferred some of the black parts of the image but I only want the non black colours present to be transferred. Below is my code for my model:
import torch
import torch.nn as nn
import torch.optim as optim
from PIL import Image
import torchvision.transforms as transforms
import torchvision.models as models
from torchvision.utils import save_image
class VGG(nn.Module):
def __init__(self):
super(VGG, self).__init__()
self.chosen_features = ["0", "5", "10", "19", "28"]
self.model = models.vgg19(pretrained=True).features[:29]
def forward(self, x):
# Store relevant features
features = []
for layer_num, layer in enumerate(self.model):
x = layer(x)
if str(layer_num) in self.chosen_features:
features.append(x)
return features
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def load_image(image_name):
image = Image.open(image_name)
image = loader(image).unsqueeze(0)
return image.to(device)
imsize = 384
loader = transforms.Compose(
[
transforms.Resize((imsize, imsize)),
transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
]
)
original_img = load_image("Content Image.jpg")
style_img = load_image("Adaptive Image 2.jpg")
# initialized generated as white noise or clone of original image.
# Clone seemed to work better for me.
generated = original_img.clone().requires_grad_(True)
# generated = load_image("20epoctom.png")
model = VGG().to(device).eval()
# Hyperparameters
total_steps = 10000
learning_rate = 0.001
alpha = 1
beta = 0.01
optimizer = optim.Adam([generated], lr=learning_rate)
for step in range(total_steps):
# Obtain the convolution features in specifically chosen layers
generated_features = model(generated)
original_img_features = model(original_img)
style_features = model(style_img)
# Loss is 0 initially
style_loss = original_loss = 0
# iterate through all the features for the chosen layers
for gen_feature, orig_feature, style_feature in zip(
generated_features, original_img_features, style_features
):
# batch_size will just be 1
batch_size, channel, height, width = gen_feature.shape
original_loss += torch.mean((gen_feature - orig_feature) ** 2)
# Compute Gram Matrix of generated
G = gen_feature.view(channel, height * width).mm(
gen_feature.view(channel, height * width).t()
)
# Compute Gram Matrix of Style
A = style_feature.view(channel, height * width).mm(
style_feature.view(channel, height * width).t()
)
style_loss += torch.mean((G - A) ** 2)
total_loss = alpha * original_loss + beta * style_loss
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
if step % 500 == 0:
print(total_loss)
save_image(generated, f"Generated Pictures/{step//500} Iterations Generated Picture.png")
Any idea of where to potentially go as well would be appreciated!
If you want some ways to preserve non black color in your style transfer model I suggest checking out the github repo here. It has .ipnyb notebooks with entire training pipelines, model weights, a good readme, etc. to reference. According to their readme, they try to implement this paper on preserving color in neural artistic style transfer which should help you. You can also referecne other repos and run some of them on collabs in this paper of codes repo here though I do suggest looking at the first repo first.
If you want to color transfer outside your styles transfer model and rather have two images that transfer color with the help of some functions in a linrary, then I recommend looking at this tutorial
Sarthak Jain

Invalid character found in base64 while using a deployed model on cloudml

For better context, I have uploaded a pre-trained model on cloud ml. It's an inceptionV3 model converted from keras to acceptable format in tensorflow.
from keras.applications.inception_v3 import InceptionV3
model = InceptionV3(weights='imagenet')
from keras.models import Model
intermediate_layer_model = Model(inputs=model.input,outputs=model.layers[311].output)
with tf.Graph().as_default() as g_input:
input_b64 = tf.placeholder(shape=(1,),
dtype=tf.string,
name='input')
input_bytes = tf.decode_base64(input_b64[0])
image = tf.image.decode_image(input_bytes)
image_f = tf.image.convert_image_dtype(image, dtype=tf.float32)
input_image = tf.expand_dims(image_f, 0)
output = tf.identity(input_image, name='input_image')
g_input_def = g_input.as_graph_def()
K.set_learning_phase(0)
sess = K.get_session()
from tensorflow.python.framework import graph_util
g_trans = sess.graph
g_trans_def = graph_util.convert_variables_to_constants(sess,
g_trans.as_graph_def(),
[intermediate_layer_model.output.name.replace(':0','')])
with tf.Graph().as_default() as g_combined:
x = tf.placeholder(tf.string, name="input_b64")
im, = tf.import_graph_def(g_input_def,
input_map={'input:0': x},
return_elements=["input_image:0"])
pred, = tf.import_graph_def(g_trans_def,
input_map={intermediate_layer_model.input.name: im,
'batch_normalization_1/keras_learning_phase:0': False},
return_elements=[intermediate_layer_model.output.name])
with tf.Session() as sess2:
inputs = {"inputs": tf.saved_model.utils.build_tensor_info(x)}
outputs = {"outputs":tf.saved_model.utils.build_tensor_info(pred)}
signature =tf.saved_model.signature_def_utils.build_signature_def(
inputs=inputs,
outputs=outputs,
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
)
# save as SavedModel
b = tf.saved_model.builder.SavedModelBuilder('inceptionv4/')
b.add_meta_graph_and_variables(sess2,
[tf.saved_model.tag_constants.SERVING],
signature_def_map={'serving_default': signature})
b.save()
The generated pb file works fine when I use it locally. But when I deploy it on cloud ml I get the following error.
RuntimeError: Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Invalid character found in base64.
[[Node: import/DecodeBase64 = DecodeBase64[_output_shapes=[<unknown>], _device="/job:localhost/replica:0/task:0/device:CPU:0"](import/strided_slice)]]")
Following is the code I use for getting local predictions.
import base64
import json
with open('MEL_BE_0.jpg', 'rb') as image_file:
encoded_string = str(base64.urlsafe_b64encode(image_file.read()),'ascii')
import tensorflow as tf
with tf.Session(graph=tf.Graph()) as sess:
MetaGraphDef=tf.saved_model.loader.load(
sess,
[tf.saved_model.tag_constants.SERVING],
'inceptionv4')
input_tensor = tf.get_default_graph().get_tensor_by_name('input_b64:0')
print(input_tensor)
avg_tensor = tf.get_default_graph().get_tensor_by_name('import_1/avg_pool/Mean:0')
print(avg_tensor)
predictions = sess.run(avg_tensor, {input_tensor: [encoded_string]})
And finally following is the code snippet that I use for wrapping the encoded string in the request that is sent to the cloud-ml engine.
request_body= json.dumps({"key":"0", "image_bytes": {"b64": [encoded_string]}})
It looks like you are trying to do the base64 decoding in TensorFlow and use the {"b64": ...} JSON format. You need to do one or the other; we typically recommend the latter.
As a side note, your input placeholder must have an outer dimension of None. That can make some things tricky, e.g., you'll either have to reshape the dimensions to be size 1 (which will prevent you from using the batch prediction service in its current state) or you'll have to us tf.map_fn to apply the same set of transformations to each element of the input "batch". You can find an example of that technique in this example.
Finally, I recommend the use of tf.saved_model.simple_save.
Putting it altogether, here is some modified code. Note that I'm inlining your input function (as opposed to serializing it to a graph def and reimporting):
HEIGHT = 299
WIDTH = 299
# Get Keras Model
from keras.applications.inception_v3 import InceptionV3
model = InceptionV3(weights='imagenet')
from keras.models import Model
intermediate_layer_model = Model(inputs=model.input,outputs=model.layers[311].output)
K.set_learning_phase(0)
sess = K.get_session()
from tensorflow.python.framework import graph_util
g_trans = sess.graph
g_trans_def = graph_util.convert_variables_to_constants(sess,
g_trans.as_graph_def(),
[intermediate_layer_model.output.name.replace(':0','')])
# Create inputs to model and export
with tf.Graph().as_default() as g_combined:
def decode_and_resize(image_bytes):
image = tf.image.decode_image(image_bytes)
# Note resize expects a batch_size, but tf_map supresses that index,
# thus we have to expand then squeeze. Resize returns float32 in the
# range [0, uint8_max]
image = tf.expand_dims(image, 0)
image = tf.image.resize_bilinear(
image, [HEIGHT, WIDTH], align_corners=False)
image = tf.squeeze(image, squeeze_dims=[0])
image = tf.cast(image, dtype=tf.uint8)
return image
input_byes = tf.placeholder(shape=(None,),
dtype=tf.string,
name='input')
images = tf.map_fn(
decode_and_resize, input_bytes, back_prop=False, dtype=tf.uint8)
images = tf.image.convert_image_dtype(images, dtype=tf.float32)
pred, = tf.import_graph_def(g_trans_def,
input_map={intermediate_layer_model.input.name: images,
'batch_normalization_1/keras_learning_phase:0': False},
return_elements=[intermediate_layer_model.output.name])
with tf.Session() as sess2:
tf.saved_model.simple_save(
sess2,
model_dir='inceptionv4/'
inputs={"inputs": input_bytes},
outputs={"outputs": pred})
Note: I'm not 100% certain that the shapes of intermediate_layer_model and images are compatible. The shape of images will be [None, height, width, num_channels].
Also note that your local prediction code will change a bit. You don't base64 encode the images and you need to send a "batch"/list of images rather than single images. Something like:
with open('MEL_BE_0.jpg', 'rb') as image_file:
encoded_string = image_file.read()
input_tensor = tf.get_default_graph().get_tensor_by_name('input:0')
print(input_tensor)
avg_tensor = tf.get_default_graph().get_tensor_by_name('import_1/avg_pool/Mean:0')
print(avg_tensor)
predictions = sess.run(avg_tensor, {input_tensor: [encoded_string]})
You didn't specify whether you're doing batch prediction or online prediction, which have similar but slightly different "formats" for the inputs. In either case, your model is not exporting a "key" field (did you mean to? It's probably helpful for batch prediction, but not for online).
For batch prediction, the file format is JSON lines; each line contains one example. Each line can be generated like so from Python:
example = json.dumps({"image_bytes": {"b64": ENCODED_STRING}})
(Note the omission of "key" for now). Since you only have one input, there is a shorthand:
example = json.dumps({"b64": ENCODED_STRING})
If you want to do online prediction, you'll note that if you are using gcloud to send requests, you actually use the same file format as for batch prediction.
In fact, we highly recommend using gcloud ml-engine local predict --json-instances=FILE --model-dir=... before deploying to the cloud to help debug.
If you intend to use some other client besides gcloud, e.g., in a web app, mobile app, frontend server, etc., then you won't be sending a file and you need to construct the full request yourself. It's very similar to the file format above. Basically, take each line of the JSON lines file and put them in an array calle "instances", i.e.,
request_body= json.dumps({"instances": [{"image_bytes": {"b64": [encoded_string]}}]})
You can use the same syntactic sugar if you'd like:
request_body= json.dumps({"instances": [{"b64": [encoded_string]}]})
I hope this helps!

Categories

Resources