I'm trying to coordinate two systems; one that was already pre-trained on the MuJoCo MsPacman-v0 and another that only supports the gym version for training. With both systems working on the rgb image representations, the color palette discrepancy is problematic (Gym Output Left, Expected Right):
Is there a simple way to fix this (i.e. pixel mapping trick or some environment setting I'm not aware of), or is there something more involved that I have to do? Of note, the actual simulation I'm running uses gym.
Heyo, sorry about that! Look's like I'm dumb.
Context; I was trying to incorporate the SPACE detection model into DreamerV2, and I didn't see the little footnote with Space:
For some reason we were using BGR images for our Atari dataset and our
pretrained models can only handle that. Please convert the images to
BGR if you are to test your own Atari images with the provided
pretrained models.
So yeah... if you see something like this, I guess this is what's wrong...
Related
I have images that are 4928x3280 and I'd like to crop them into tiles of 640x640 with a certain percentage of overlap. The issue is that I have no idea how to deal with the bounding boxes of these files in my dataset as I've found this paper,(http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf), but not code or so referring to how they did this. There are some examples on the internet that actually have the yoloV5 tiling but without overlap like this(https://github.com/slanj/yolo-tiling) one.
Does anyone know how I could make this myself or if someone has an example of this for me?
If you want a ready to go library to make possible tiling and inference for yolov5, there is SAHI:
<https://github.com/obss/sahi
You can use it to create tiles with related annotations, to make inferences and evaluate model performance.
I'm training a computer vision algorithm, and I want to give it some more robust data. For the application of the software I'm building, oftentimes people will take pictures of their computer screen with their phones and use those images, rather than the actual original image file to run the computer vision on.
Do you know what kind of transformations I can make to my already labeled image dataset to emulate what it would look like if someone used a cellphone to take a picture of a screen?
Like some qualities demonstrated below in the sample image of my screen for this question:
I guess this is what I'm thinking so far conceptually, but I'm not sure what libraries to use in Python:
The image resolution will probably drop, so modulating that to be lower or more commensurate with what a cellphone's granularity is
Adding in random color aberrations to the images, because when you take pictures of screens it seems like mini rainbows form?
Warping the angle the image is viewed at, since when someone takes a photo, they may not be taking it perfectly square/flat
Adding pixel-looking grids to the images to make them look more like the images taken of screens.
Is there anything I missed and do yall have any library recommendations or starting code to help me? I really want to avoid relabelling all of my data...
Thanks in advance!
I found this: https://graphicdesign.stackexchange.com/a/14001
It seems to be exactly what I'm looking for, but how do I translate this into code? Any library recommendations?
A little of the background story.. I am trying to get a qualitative/quantitative judgement on whether there exists a useful solution(if any) that a convolutional neural network can arrive at for a set of synthetic images containing 3 classes.
Now, I am trying to run TSNE on a folder containing 3195 RGB images of resolution 256x256.
First question I would like to ask is, am I converting my image folder into an appropriate format for usage with TSNE? The python code can be seen here https://i.stack.imgur.com/79gNy.png.
Secondly, I managed to get the t-sne to run, although I am not sure if I am using it correctly, which can be seen here. https://i.stack.imgur.com/ZtOlR.png . The sourcecode is basically just a slight modification from Alexander Fabisch's MNIST example on Jupyter Notebook(apologies, however I cannot post more than two links since reputation <10.)
So, I would like to ask whether is there anything blatantly wrong for forcing a TSNE architecture used for MNIST dataset on a set of RGB images?
Lastly, I encountered a difficulty for the code in the second imgur link posted above with the below code,
imagebox = offsetbox.AnnotationBbox(
offsetbox.OffsetImage(X[i].reshape(256, 256)), X_embedded[i])
The first argument for offsetbox.AnnotationBbox is a 256x256 image(because my image resolution is such), which basically covers up my entire screen, obscuring the results), but I get an error when i try to change it:
ValueError: total size of new array must be unchanged
So, how can I reduce the size of the images being plotted?(or other ways to work around the issue)
Well.. solved everything using the C++ codes provided for bh-tsne. Kindly close this thread, apologies for any inconvenience caused.
I'm a beginner in opencv using python. I have many 16 bit gray scale images and need to detect the same object every time in the different images. Tried template matching in opencv python but needed to take different templates for different images which could be not desirable. Can any one suggest me any algorithm in python to do it efficiently.
Your question is way too general. Feature matching is a very vast field.
The type of algorithm to be used totally depends on the object you want to detect, its environment etc.
So if your object won't change its size or angle in the image then use Template Matching.
If the image will change its size and orientation you can use SIFT or SURF.
If your object has unique color features that is different from its background, you can use hsv method.
If you have to classify a group of images as you object,for example all the cricket bats should be detected then you can train a number of positive images to tell the computer how the object looks like and negative image to tell how it doesn't, it can be done using haar training.
u can try out sliding window method. if ur object is the same in all samples
One way to do this is to look for known colors, shapes, and sizes.
You could start by performing an HSV threshold on your image, by converting your image to HSV colorspace and then calling
cv2.inRange(source, (minHue, minSat, minVal), (maxHue, maxSat, maxVal))
Next, you could use cv2.findContours to find all the areas in your image that meet your color requirements. Then, you could use methods such as boundingRect and contourArea to find specific attributes of the object that you want.
What you will end up with is essentially a 'pipeline' that can take a frame, and look for a shape that fits the criteria you have set. Depending on the complexity of what you want to do (you didn't say what you're looking for), this may or may not work, but I have used it with reasonable success.
GRIP is an application that allows you to threshold things in a visual way, and it will also generate Python code for you if you want. I don't really recommend using the generated code as-is because I've run into some problems that way. Here's the link to GRIP: https://github.com/WPIRoboticsProjects/GRIP
If the object you want to detect has different size in every image and also slightly varies in shape too, then I recommend you use HaarCascade of that object. If the object is very general then you can easily find haar cascade for it online. Otherwise it is not very difficult to make haar cascades(can be a littile time consuming though).
You can use this tutorial by sentdex to make HaarCascade here.
Or If you want to know how to use HaarCascades then you can get it on this link
here.
I'm writing an OCR application to read characters from a screenshot image. Currently, I'm focusing only on digits. I'm partially basing my approach on this blog post: http://blog.damiles.com/2008/11/basic-ocr-in-opencv/.
I can successfully extract each individual character using some clever thresholding. Where things get a bit tricky is matching the characters. Even with fixed font face and size, there are some variables such as background color and kerning that cause the same digit to appear in slightly different shapes. For example, the below image is segmented into 3 parts:
Top: a target digit that I successfully extracted from a screenshot
Middle: the template: a digit from my training set
Bottom: the error (absolute difference) between the top and middle images
The parts have all been scaled (the distance between the two green horizontal lines represents one pixel).
You can see that despite both the top and middle images clearly representing a 2, the error between them is quite high. This causes false positives when matching other digits -- for example, it's not hard to see how a well-placed 7 can match the target digit in the image above better than the middle image can.
Currently, I'm handling this by having a heap of training images for each digit, and matching the target digit against those images, one-by-one. I tried taking the average image of the training set, but that doesn't resolve the problem (false positives on other digits).
I'm a bit reluctant to perform matching using a shifted template (it'd be essentially the same as what I'm doing now). Is there a better way to compare the two images than simple absolute difference? I was thinking of maybe something like the EMD (earth movers distance, http://en.wikipedia.org/wiki/Earth_mover's_distance) in 2D: basically, I need a comparison method that isn't as sensitive to global shifting and small local changes (pixels next to a white pixel becoming white, or pixels next to a black pixel becoming black), but is sensitive to global changes (black pixels that are nowhere near white pixels become black, and vice versa).
Can anybody suggest a more effective matching method than absolute difference?
I'm doing all this in OpenCV using the C-style Python wrappers (import cv).
I would look into using Haar cascades. I've used them for face detection/head tracking, and it seems like you could build up a pretty good set of cascades with enough '2's, '3's, '4's, and so on.
http://alereimondo.no-ip.org/OpenCV/34
http://en.wikipedia.org/wiki/Haar-like_features
OCR on noisy images is not easy - so simple approaches no not work well.
So, I would recommend you to use HOG to extract features and SVM to classify. HOG seems to be one of the most powerful ways to describe shapes.
The whole processing pipeline is implemented in OpenCV, however I do not know the function names in python wrappers. You should be able to train with the latest haartraining.cpp - it actually supports more than haar - HOG and LBP also.
And I think the latest code (from trunk) is much improved over the official release (2.3.1).
HOG usually needs just a fraction of the training data used by other recognition methods, however, if you want to classify shapes that are partially ocludded (or missing), you should make sure you include some such shapes in training.
I can tell you from my experience and from reading several papers on character classification, that a good way to start is by reading about Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA), and Support Vector Machines (SVMs). These are classification methods that are extremely useful for OCR, and it turns out that OpenCV already includes excellent implementations on PCAs and SVMs. I haven't seen any OpenCV code examples for OCR, but you can use some modified version of face classification to perform character classification. An excellent resource for face recognition code for OpenCV is this website.
Another library for Python that I recommend you is "scikits.learn". It is very easy to send cvArrays to scikits.learn and run machine learning algorithms on your data. A basic example for OCR using SVM is here.
Another more complicated example using manifold learning for handwritten character recognition is here.