I'm currently trying to implement Augmentations in Detectron2 and my code is looking like this:
class AugmentationTrainer(DefaultTrainer):
#classmethod
def build_train_loader(cls, cfg):
mapper = DatasetMapper(cfg, is_train=True,augmentations=[
T.RandomBrightness(0.5,1.5),
T.RandomContrast(2, 2),
T.RandomSaturation(2 ,2),
T.RandomFlip(prob=1, horizontal=True, vertical=False),
T.RandomFlip(prob=0.5, horizontal=False, vertical=True),
])
return build_detection_train_loader(cfg, mapper = mapper)
I'm simply using this class as a Trainer instead of the Default one. I was wondering if the bounding box data gets changed properly using this, when using Augmentations like Flip or Rotate for example.
Thanks in Advance!
I trained with the above augmentations, but my end result was worse than the training without augmentations, which confused me a lot.
Related
I'm trying to use Panda3D to make a 3D game with Python. I used Blender to make a 3D model for the player, and gave it some animations and a green texture. It looks like this:
Then I exported it in .gltf Embedded, using default presets.
Finally, I tried using this code to display the model:
import panda3d.core as p3d
from direct.showbase.ShowBase import ShowBase
from direct.actor.Actor import Actor
class Game(ShowBase):
def __init__(self, model_path):
ShowBase.__init__(self)
# Load my model
self.player = Actor(model_path)
# Add player to scene
self.player.reparentTo(self.render)
if __name__ == "__main__":
model_path = "player_model.gltf"
# Create the game instance
game = Game(model_path)
game.run()
However, the model looks completely wrong:
I also get these warnings:
Known pipe types:
wglGraphicsPipe
(all display modules loaded.)
:Actor(warning): player_model.gltf is not a character!
:linmath(warning): Tried to invert singular LMatrix4.
:linmath(warning): Tried to invert singular LMatrix4.
...
:linmath(warning): Tried to invert singular LMatrix4.
How can I fix the problem? Is there something wrong with how I am exporting the model?
I was able to import a basic cube model with the same process and got the expected result. Is the problem perhaps caused by armature or animations?
I'm trying to "blueify" an image that I will use for a PyTorch application (AI), which works better with "bluer" images. Specifically, I want each pixel to be blueish. I will put the code inside a class which I will put in a transforms.Compose for and pass it to the torchvision.datasets. ImageFolder tranform key word argument.
I tried to use the PyTorch torchvision.transforms.functional functions (adjust_hue, adjust_saturation, adjust_brightness). However, I was always getting images with different colors (e.g. green & purple, red & blue). I will put them inside a class which I will put in a transforms.Compose for and pass it to the torchvision.datasets.ImageFolder tranform key word argument.
Can you please help?
I am currently using the diffusers StableDiffusionPipeline (from hugging face) to generate AI images with a discord bot which I use with my friends. I was wondering if it was possible to get a preview of the image being generated before it is finished?
For example, if an image takes 20 seconds to generate, since it is using diffusion it starts off blury and gradually gets better and better. What I want is to save the image on each iteration (or every few seconds) and see how it progresses. How would I be able to do this?
class ImageGenerator:
def __init__(self, socket_listener, pretty_logger, prisma):
self.model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, use_auth_token=os.environ.get("HF_AUTH_TOKEN"))
self.model = self.model.to("cuda")
async def generate_image(self, data):
start_time = time.time()
with autocast("cuda"):
image = self.model(data.description, height=self.default_height, width=self.default_width,
num_inference_steps=self.default_inference_steps, guidance_scale=self.default_guidance_scale)
image.save(...)
The code I have currently is this, however it only returns the image when it is completely done. I have tried to look into how the image is generated inside of the StableDiffusionPipeline but I cannot find anywhere where the image is generated. If anybody could provide any pointers/tips on where I can begin that would be very helpful.
You can use the callback argument of the stable diffusion pipeline to get the latent space representation of the image: link to documentation
The implementation shows how the latents are converted back to an image. We just have to copy that code and decode the latents.
Here is a small example that saves the generated image every 5 steps:
from diffusers import StableDiffusionPipeline
import torch
#load model
model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, use_auth_token="YOUR TOKEN HERE")
model = model.to("cuda")
def callback(iter, t, latents):
# convert latents to image
with torch.no_grad():
latents = 1 / 0.18215 * latents
image = model.vae.decode(latents).sample
image = (image / 2 + 0.5).clamp(0, 1)
# we always cast to float32 as this does not cause significant overhead and is compatible with bfloa16
image = image.cpu().permute(0, 2, 3, 1).float().numpy()
# convert to PIL Images
image = model.numpy_to_pil(image)
# do something with the Images
for i, img in enumerate(image):
img.save(f"iter_{iter}_img{i}.png")
# generate image (note the `callback` and `callback_steps` argument)
image = model("tree", callback=callback, callback_steps=5)
To understand the stable diffusion model I highly recommend this blog post.
I am trying to cut scenes from a video using scenedetect library in Python.
The usual technique is done using changes in image compositions, done via ContentDetector() objects. This is the standard way recommended on their GitHub repository. This is the example code they used:
# Standard PySceneDetect imports:
from scenedetect import VideoManager
from scenedetect import SceneManager
# For content-aware scene detection:
from scenedetect.detectors import ContentDetector
def find_scenes(video_path, threshold=30.0):
# Create our video & scene managers, then add the detector.
video_manager = VideoManager([video_path])
scene_manager = SceneManager()
scene_manager.add_detector(
ContentDetector(threshold=threshold))
# Improve processing speed by downscaling before processing.
video_manager.set_downscale_factor()
# Start the video manager and perform the scene detection.
video_manager.start()
scene_manager.detect_scenes(frame_source=video_manager)
# Each returned scene is a tuple of the (start, end) timecode.
return scene_manager.get_scene_list()
However, there is an alternative technique, based on brightness, that can be done via ThresholdDetector() objects. If I try to substitute ThresholdDetector() to ContentDetector(), I don't get a list of scenes anymore... just only one initial frame.
What am I doing wrong?
Is preprocessing like input normalization made by default by Tensorflow Object Detection API ?
I cannot find anywhere any documentation on it. There is an option called 'NormalizeImage' in the DataAugmentations. In all the configuration files for the models in the zoo I never see it used. I trained ssd_mobilenet_v3_small_coco_2020_01_14 for transfer learning to my custom class without using it and everything works.
I know there is a similar question here but there is no answer in a couple of years and the network is different.
Testing with the following code (OpenCV 4.3.0 DNN module) produce the correct result:
import cv2 as cv
net = cv.dnn_DetectionModel('model/graph/frozen_inference_graph.pb', 'cvgraph.pbtxt')
net.setInputSize(300, 300)
#net.setInputScale(1.0 / 127.5)
#net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)
frame = cv.imread('test/2_329_985_165605-561561.jpg')
classes, confidences, boxes = net.detect(frame, confThreshold=0.7)
for classId, confidence, box in zip(classes.flatten(), confidences.flatten(), boxes):
print(classId, confidence)
cv.rectangle(frame, box, color=(0, 255, 0))
cv.imshow('out', frame)
cv.waitKey()
While here normalization is used. Using normalization in my case produce a wrong result, bounding box is much bigger than it should be. I guess that input normalization is somewhere performed under the hood by tensorflow?
even if I am probably too late to help you I want to answer the question as I came across it when I had a pretty similar problem of understanding how the normalization is defined. Maybe it helps someone else.
I even posted my own question (here) but found the answer an hour later. As I can't find the model you used (Model zoo of tf1 leads to a dead link for ssd_mobilenet_v3_small_coco) I assume that the pipeline there looks similar to the one I used.
In the pipeline config a feature extractor is defined.
feature_extractor {
type: "ssd_mobilenet_v2_keras"
depth_multiplier: 1.0
...
}
This uses this feature extractor. In this extractor the following preprocessing function is defined:
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
If you do the math you'll see that this is exactly the same as
image = (image-127.5)/127.5
just formatted in a different way. I hope this helps someone!
EDIT:
However I just realized that this does not explain why the model of OP works better without preprocessing. I guess OPs preprocessing must be already defined in the cvgraph as it states in the opencv docs.