Is preprocessing like input normalization made by default by Tensorflow Object Detection API ?
I cannot find anywhere any documentation on it. There is an option called 'NormalizeImage' in the DataAugmentations. In all the configuration files for the models in the zoo I never see it used. I trained ssd_mobilenet_v3_small_coco_2020_01_14 for transfer learning to my custom class without using it and everything works.
I know there is a similar question here but there is no answer in a couple of years and the network is different.
Testing with the following code (OpenCV 4.3.0 DNN module) produce the correct result:
import cv2 as cv
net = cv.dnn_DetectionModel('model/graph/frozen_inference_graph.pb', 'cvgraph.pbtxt')
net.setInputSize(300, 300)
#net.setInputScale(1.0 / 127.5)
#net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)
frame = cv.imread('test/2_329_985_165605-561561.jpg')
classes, confidences, boxes = net.detect(frame, confThreshold=0.7)
for classId, confidence, box in zip(classes.flatten(), confidences.flatten(), boxes):
print(classId, confidence)
cv.rectangle(frame, box, color=(0, 255, 0))
cv.imshow('out', frame)
cv.waitKey()
While here normalization is used. Using normalization in my case produce a wrong result, bounding box is much bigger than it should be. I guess that input normalization is somewhere performed under the hood by tensorflow?
even if I am probably too late to help you I want to answer the question as I came across it when I had a pretty similar problem of understanding how the normalization is defined. Maybe it helps someone else.
I even posted my own question (here) but found the answer an hour later. As I can't find the model you used (Model zoo of tf1 leads to a dead link for ssd_mobilenet_v3_small_coco) I assume that the pipeline there looks similar to the one I used.
In the pipeline config a feature extractor is defined.
feature_extractor {
type: "ssd_mobilenet_v2_keras"
depth_multiplier: 1.0
...
}
This uses this feature extractor. In this extractor the following preprocessing function is defined:
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
If you do the math you'll see that this is exactly the same as
image = (image-127.5)/127.5
just formatted in a different way. I hope this helps someone!
EDIT:
However I just realized that this does not explain why the model of OP works better without preprocessing. I guess OPs preprocessing must be already defined in the cvgraph as it states in the opencv docs.
Related
I'm currently trying to implement Augmentations in Detectron2 and my code is looking like this:
class AugmentationTrainer(DefaultTrainer):
#classmethod
def build_train_loader(cls, cfg):
mapper = DatasetMapper(cfg, is_train=True,augmentations=[
T.RandomBrightness(0.5,1.5),
T.RandomContrast(2, 2),
T.RandomSaturation(2 ,2),
T.RandomFlip(prob=1, horizontal=True, vertical=False),
T.RandomFlip(prob=0.5, horizontal=False, vertical=True),
])
return build_detection_train_loader(cfg, mapper = mapper)
I'm simply using this class as a Trainer instead of the Default one. I was wondering if the bounding box data gets changed properly using this, when using Augmentations like Flip or Rotate for example.
Thanks in Advance!
I trained with the above augmentations, but my end result was worse than the training without augmentations, which confused me a lot.
I am currently using the diffusers StableDiffusionPipeline (from hugging face) to generate AI images with a discord bot which I use with my friends. I was wondering if it was possible to get a preview of the image being generated before it is finished?
For example, if an image takes 20 seconds to generate, since it is using diffusion it starts off blury and gradually gets better and better. What I want is to save the image on each iteration (or every few seconds) and see how it progresses. How would I be able to do this?
class ImageGenerator:
def __init__(self, socket_listener, pretty_logger, prisma):
self.model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, use_auth_token=os.environ.get("HF_AUTH_TOKEN"))
self.model = self.model.to("cuda")
async def generate_image(self, data):
start_time = time.time()
with autocast("cuda"):
image = self.model(data.description, height=self.default_height, width=self.default_width,
num_inference_steps=self.default_inference_steps, guidance_scale=self.default_guidance_scale)
image.save(...)
The code I have currently is this, however it only returns the image when it is completely done. I have tried to look into how the image is generated inside of the StableDiffusionPipeline but I cannot find anywhere where the image is generated. If anybody could provide any pointers/tips on where I can begin that would be very helpful.
You can use the callback argument of the stable diffusion pipeline to get the latent space representation of the image: link to documentation
The implementation shows how the latents are converted back to an image. We just have to copy that code and decode the latents.
Here is a small example that saves the generated image every 5 steps:
from diffusers import StableDiffusionPipeline
import torch
#load model
model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, use_auth_token="YOUR TOKEN HERE")
model = model.to("cuda")
def callback(iter, t, latents):
# convert latents to image
with torch.no_grad():
latents = 1 / 0.18215 * latents
image = model.vae.decode(latents).sample
image = (image / 2 + 0.5).clamp(0, 1)
# we always cast to float32 as this does not cause significant overhead and is compatible with bfloa16
image = image.cpu().permute(0, 2, 3, 1).float().numpy()
# convert to PIL Images
image = model.numpy_to_pil(image)
# do something with the Images
for i, img in enumerate(image):
img.save(f"iter_{iter}_img{i}.png")
# generate image (note the `callback` and `callback_steps` argument)
image = model("tree", callback=callback, callback_steps=5)
To understand the stable diffusion model I highly recommend this blog post.
I have been trying to detect moving vehicles. But due to varying light conditions because of clouds, (not shadows of clouds, just illuminations) the background subtraction fails.
I have uploaded my input video here --> Youtube (30secs)
Here is what I got using various available background subtraction methods available in opencv
import numpy as np
import cv2
cap = cv2.VideoCapture('traffic_finalns.mp4')
#fgbgKNN = cv2.createBackgroundSubtractorKNN()
fgbgMOG = cv2.bgsegm.createBackgroundSubtractorMOG(100,5,0.7,0)
#fgbgGMG = cv2.bgsegm.createBackgroundSubtractorGMG()
#fgbgMOG2 = cv2.createBackgroundSubtractorMOG2()
#fgbgCNT = cv2.bgsegm.createBackgroundSubtractorCNT(15,True,15*60,True)
while(1):
ret, frame = cap.read()
# fgmaskKNN = fgbgKNN.apply(frame)
fgmaskMOG = fgbgMOG.apply(frame)
# fgmaskGMG = fgbgGMG.apply(frame)
# fgmaskMOG2 = fgbgMOG2.apply(frame)
# fgmaskCNT = fgbgCNT.apply(frame)
#
# cv2.imshow('frame',frame)
# cv2.imshow('fgmaskKNN',fgmaskKNN)
cv2.imshow('fgmaskMOG',fgmaskMOG)
# cv2.imshow('fgmaskGMG',fgmaskGMG)
# cv2.imshow('fgmaskMOG2',fgmaskMOG2)
# cv2.imshow('fgmaskCNT',fgmaskCNT)
k = cv2.waitKey(20) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
(Below images -> Frame number - 977)
BackgroundSubtractorMOG : By varying the input parameter history some illumination could be reduced, but not all, as the duration of illumination is variable
BackgroundSubtractorMOG2 :
BackgroundSubtractorGMG :
**BackgroundSubtractorKNN : **
BackgroundSubtractorCNT
1] Improving results by OpenCV Background Subtraction
For varying light conditions it is important to normalize your pixal values between 0 and 1. In your code I do not see that happening
Background subtraction will not work with a single image (In your code you are reading an image)
If you are applying background subtraction on sequence of frames then the first frame of background subtraction result is of no use
you might want to adjust the arguments of the cv2.bgsegm.createBackgroundSubtractorMOG() that you are passing to get the best results... Play around with the threshold and see what results do you get
You can also apply gaussian filter to the individual frames to reduce noise and get better results cv2.GaussianBlur()
You can try cv2.equalizeHist() on individual frame so that you improve the contrast of the frames
Anyways you say that you are trying to detect moving object. Nowadays there are many modern methods that use deep-learning for object detection
2] Use tensorflow object detection api
It does object detection in real-time and also gives you the bounding box co-ordinate of the detected objects
Here are results of Tensorflow object detection api:
3] How about trying Opencv Optical Flow
4] Simple subtraction
Your environment is static
So take a frame of your environment and store it in a variable say environment_frame
Now read every frame from your video and simply subtract it from your environment frame results = environment_frame - current_frame
Now if the np.sum(results) is greater than a threshold value then we say there is a object
Now if np.sum(results) is greater then threshold then we know there is a moving object but where ???
The moving object is where there are clustered cluttered pixels which you can easily find by some clustering algorithm
Do not forget to normalize your pixel values between 0 and 1
----------------------------UPDATED----------------------------------------
If you want to find helmets in real time then your best bet is deep-learning
You can use a deep learning technique like YOLO which newer version of OpenCV has ... but I do no think they have a python binding for YOLO in OpencV
The other real time technique can be RCNN which the tensorflow object detection api already has .... I have mentioned it above
If you want to use traditional computer vision methods then you can try hog and svm for helmet data and then you can try a sliding window technique to find the helmet in your frame (This won't be in real time)
I've referred to this question, but I don't quite understand the
second method provided by Mr.mrry.
overcome Graphdef cannot be larger than 2GB in tensorflow
Basically, I'm trying to use tf's built in image transformation methods on images. I'm running into the error provided in the title.
Also, do I need to keep creating a new session for each iteration?
Currently, this process is a little slow and am not sure how to speed it up.
import tensorflow as tf
import os
from scipy.ndimage import imread
from scipy.misc import imresize, imshow
import matplotlib.pyplot as plt
for fish in Fishes:
fish_images = os.listdir(os.path.join('C:\\Users\\Moondra\\Desktop\\Fishes', fish)) # get the image files
os.makedirs(SAVE_DIR + fish, exist_ok = True)
for num, fish_image in enumerate(fish_images):
image =imread(os.path.join('C:\\Users\\Moondra\\Desktop\\Fishes', fish, fish_image))
new_img =tf.image.adjust_brightness(image, .4) #image transformation
with tf.Session() as sess:
new_image =sess.run(new_img)
imsave(os.path.join(SAVE_DIR, fish, fish +str(num)+'.jpg'), new_image)
This is not how TF should be used.
You should create graph once.
You should create session once.
Your current code does both things in a loop, thus causing slowness and memory issues. The problem lies in the fact that TF is not imperative language, so
new_img =tf.image.adjust_brightness(image, .4) #image transformation
is not application of a function on the image.This creates an operation in a graph, and stores reference to this operation in new_img. So each time you call this function, your graph grows.
So in pseudo code it should be:
create placeholder for image name
create transformed image op - new_img
create session
for each image
call in a session new_img op, providing path to the placehodler using feed_dict
Here is basic script I use for drawing:
from graph_tool.all import *
g = load_graph("data.graphml")
g.set_directed(False)
pos = sfdp_layout(g)
graph_draw(g, pos=pos, output_size=(5000, 5000), vertex_text=g.vertex_index, vertex_fill_color=g.vertex_properties["color"], edge_text=g.edge_properties["name"], output="result.png")
Main problems here are ugly edge text and vertexes that are too close to parent. As I understand this happens because by default fit_view=True and result image scaled to fit size. When I set fit_view=False result image doesn't have graph (I see only little piece).
Maybe I need another output size for fit_view=False or some additional steps?
Today I ran into the same problem.
It seems that you can use fit_view=0.9, and by using a float number yo can scale the fit. In that case it would appear 90% than the normal size. If you use 1, will be the same size.
Hope it helps.