A little of the background story.. I am trying to get a qualitative/quantitative judgement on whether there exists a useful solution(if any) that a convolutional neural network can arrive at for a set of synthetic images containing 3 classes.
Now, I am trying to run TSNE on a folder containing 3195 RGB images of resolution 256x256.
First question I would like to ask is, am I converting my image folder into an appropriate format for usage with TSNE? The python code can be seen here https://i.stack.imgur.com/79gNy.png.
Secondly, I managed to get the t-sne to run, although I am not sure if I am using it correctly, which can be seen here. https://i.stack.imgur.com/ZtOlR.png . The sourcecode is basically just a slight modification from Alexander Fabisch's MNIST example on Jupyter Notebook(apologies, however I cannot post more than two links since reputation <10.)
So, I would like to ask whether is there anything blatantly wrong for forcing a TSNE architecture used for MNIST dataset on a set of RGB images?
Lastly, I encountered a difficulty for the code in the second imgur link posted above with the below code,
imagebox = offsetbox.AnnotationBbox(
offsetbox.OffsetImage(X[i].reshape(256, 256)), X_embedded[i])
The first argument for offsetbox.AnnotationBbox is a 256x256 image(because my image resolution is such), which basically covers up my entire screen, obscuring the results), but I get an error when i try to change it:
ValueError: total size of new array must be unchanged
So, how can I reduce the size of the images being plotted?(or other ways to work around the issue)
Well.. solved everything using the C++ codes provided for bh-tsne. Kindly close this thread, apologies for any inconvenience caused.
Related
I'm trying to coordinate two systems; one that was already pre-trained on the MuJoCo MsPacman-v0 and another that only supports the gym version for training. With both systems working on the rgb image representations, the color palette discrepancy is problematic (Gym Output Left, Expected Right):
Is there a simple way to fix this (i.e. pixel mapping trick or some environment setting I'm not aware of), or is there something more involved that I have to do? Of note, the actual simulation I'm running uses gym.
Heyo, sorry about that! Look's like I'm dumb.
Context; I was trying to incorporate the SPACE detection model into DreamerV2, and I didn't see the little footnote with Space:
For some reason we were using BGR images for our Atari dataset and our
pretrained models can only handle that. Please convert the images to
BGR if you are to test your own Atari images with the provided
pretrained models.
So yeah... if you see something like this, I guess this is what's wrong...
I will be using the values produced by this function as a ground-truth labels for a computer vision task, where I will train a model using simulation data and test it using ArUco's real world ones.
I calibrated my smartphone camera with a chess board and got a reprojection error of 0.95 (I wasn't able to get less than this, tried all the possible options like ArUco and ChAruco, captured tens of images, filtered the bad ones, and none has improved the error) I read somewhere that this is expected from smartphones, if not, please let me know.
Now I placed my ArUco marker somewhere in the environment and captured images of it, produced its pose relative to the camera using estimatePoseSingleMarker .
Upon drawing the axis, everything looks perfect and it's accurate. However, when I compare the pose values with the ones generated from simulating the same environment with the same object and camera. The values are actually quite different, especially the z value.
I'm 100% sure that my simulation environment has no error, so I presume this gap is caused by ArUco.
Do you suggest any solution? how can I predict the error of ArUco?
Is there any other possible solution to collected ground-truth labels?
I want to make some kind of python script that can do that.
In my case i just want very simple unwarping as follow
Always having similar backgroud
Always Placing page at similar position
Always having same type of wraped image
I tried following methods but didn't work out.
I tried so many scanning apps but no app can unwarp 3d wrap
for example this one microsoft office lens
.
I tried page_dewarp.py. But it does not work with pages having spaces between texts or having segments of texts and most of times for that kind of images it just revert cure from left to right or vice versa and also unable to detect actual text area for example
I found deep-learning-for-document-dewarping that is trying to solve this problem by using pix2pixHD But i am not sure this is gona work and this project don't have trained models and currectly not solving the problem. should i train a model with just following training data train_A - warped input images and train_B - unwarped output images as mentioned at pix2pixHD. I can generate training data by make warped and uwarped images using blender 3d. In this way i can generate so many images by using some scanned book's pages by just rendering uwarped image and warped image it like someone taking photos of pages but virtually.
I want to write a script, that converts unknown images (jpg, png, gif, bmp, tiff, etc.) to a specific resolution and format as well as generating a thumbnail.
the problem is that the compression level, that is totally fine for pictures produces crap for exports of Presentations for example; So I want to differ the conversion settings based on the contents of the image.
Does anyone have experience in doing that kind of stuff in python (or shell scripts whose output is easily pasreable)?
my ideas are:
increase contrast and check histogramm if there are only single spikes left
doing a high pass filtering of the image and check what?
doing face recognition of known letters
the goal is that the recognition should be quite fast (approx. 10 images/second) and quite easy to implement
This is a pretty trivial machine learning problem, I would research the MNIST dataset problems that teach you how to recognize handwritten characters, this process should be very similar. Check out this tutorial and see if you can modify it to recognize graphics vs pictures. If your error rate ends up too high you'll have to try more advanced machine learning techniques.
http://mxnet.io/tutorials/python/mnist.html
I have a set of images and I want to find out which images are of the same object. Here are the different scenarios on how the object may be different among the images:
Images of the object may be rotated 90, 180, or 270 degrees
The object may be in a different spot in the image, but always in full fiew
The object may be flipped within the image, either horizontal or vertical
I started by using the histogram of the image and MSE but I am getting incorrect results as some of the objects have the same color distribution.
I am going for speed here as my initial data set is 1000 images and will grow as the algorithm gets more mature. So my initial thought was numpy/scipy but I am rather lost here. I have no experience in this area. I have read through the other posts on SO but they seem rather narrow in their scope. Does anyone have any experience or thoughts on how I could approach this?
Edit:
I do not have access to load any modules that are not part of the Python installation that is loaded on the machine that is running the script. Anaconda was installed so there are quite a few modules at my disposal, but no OpenCV.
Edit
Attached example files and answer file.
Example Files