I found this guide which teaches how to refine the orientation of objects from images. I would love to know if it can and should be used to analyze the orientation of objects displayed in video streams.
The basis for the work is from the scientific publication found in this video. I want to know how they got information about the direction of the Fish's face.
Thanks,
Avishai
You will probably need library like opencv to get orientation information from the image. You can apply threshold after converting this image to grayscale and extract contour of the image. After that you need to follow something like below pattern to get orientation. Very easy, just a little bit search you can find a lot of similar examples as well.
rectangle_for_angle = cv2.minAreaRect(cntrs[0])
angle = rectangle_for_angle[-1]
rect_points = cv2.boxPoints(rectangle_for_angle)
rect_points_result = np.int0(rect_points)
#You can also draw rotated image
cv2.drawContours(image,[rect_points_result],0,(0,0,255),2)
Related
I am trying different image alignment approaches to align the images containing texts using Computer Vision. I have tested following image alignment approaches:
Probabilistic Houghlines Transform to align images according to the detected lines. https://medium.com/p/97b61eeffb20 is my implementation. But that didn't help me as expected.
Implemented SIFT and ORB to detect and align images according to the template image but instead of aligning all images, it distorts the image sometimes. I have used https://pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/ as a reference.
Edge detection followed contour detection, corner detection and perspective transformation. But it doesn't work with images having different background types. This is the reference example https://pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/
morphology followed by contour detection and masking. Reference Crop exactly document paper from image
Trained the YOLO(You only look once) object detector to detect the documents but it detects the bounding box, my requirement is Quardilaterl with four image corners from which I can align documents using perspective transform.
Calculating the skewness and deskewing. Reference: https://github.com/sbrunner/deskew
But I couldn't align the document(identity documents such as citizenship, passport, license etc) images with different backgrounds perfectly using the above approaches.
This is a sample test image(important information are hidden due to privacy issue).
Is there are any other approaches of image alignment which can align the document images perfectly by correcting the skewness of the available text. My main focus is to extract the information form document using OCR preserving the information sequence in the document image.
Thank you!
To me, the third approach seems to be the most promising. But as you said, a cluttered background is a problem. Two ideas came to me about this:
Implementing a GUI as a fallback solution, so the user could select the contour.
Render some artificial dataset of official documents against a cluttered background and train a CNN to predict a segmentation map of the document. This map could be used then, as an initialization for the edge detection / contour detection. This answer contains two links to databases of images of official documents. Maybe these are of some use for you.
My goal is to transform an image captured by a camera and transform that image to orthographical image without effects of perspective.
I have a few objects of known size on a surface. I have a camera, placed above and directed to those objects, as exemplified in the scene. The camera is capturing images as in image captured by the camera. I want to get an orthographical image of the environment as in orthographical image I want to get.
I have read few posts, but did not really understand their relevance to my problem, as I am not expert on these transforms. The answer from this question made me think it is possible, although I did not get how.
I would appreciate a clear explanation or pointing a clear tutorial, using Python or Lua if possible.
Any help is appreciated.
This was not possible without distorting the image. A straightforward explanation is that the perspective causes some parts of the image to be not visible, for example the white line in the marked area is not visible, and there could be something small that we are not able to observe. For those parts, the algorithm is supposed to produce some kind of prediction based on heuristics.
Is there any predefined code for this or I have to write my own code?
Also, I do not have the camera properties for this, I have only the image taken in fisheye lens and now I have to flatten the images
OpenCV provides a module for working with fisheye images: https://docs.opencv.org/3.4/db/d58/group__calib3d__fisheye.html
This is a tutorial with an example application.
Keep in mind that your task might be a bit hard to achieve since the problem is under-determined. If you have some cues in the image (such as straight lines), that might help. Otherwise, you should seek a way of getting more information about the lens. If it's a known lens type, you might find calibration info online. Also, some images might have the lens used to capture them in the EXIF data.
I am trying to extract a subimage from a scanned paper like this:
https://cloud.kopa.ch/index.php/s/gGZm5xeMYlPfU81
The extracted images should be georeferenced and added to a webmap service, but thats not the question here.
How can I get the frame / its pixel coordinates to crop the image?
I am also free in creating the "layout" (similar to the example), which means I could add markers to get the frame better after scanning it again.
The workflow is:
generate layout - print map - draw on the map - scan it - crop "map-frame" - georeferencing this frame - show it on a webmap
The "map-frames" are preprocessed and I know their location/extent
Has anybody an idea how to crop the (scanned) images automatically to this "map-frame"?
I have to work with python and have the packages PIL, pillow and imagemagick for the image processing
Thanks for you help!
If you need more information, don't hesitate to ask
Here's an example I adapted form the Pillow docs, check them out for any further processing that you might need to perform:
from Pillow import Image
Image.open("/path/to/image.jpg")
box = (100, 100, 400, 400)
region = im.crop(box)
Also, it might prove valuable to search Stack Overflow for this kind of operation, I'm sure it has been discussed earlier.
As for finding the actual rectangle to crop you'll have to do some form of image analysis. In it's simplest form, conceptually that could be something along these lines:
Applying an S-curve filter to a black-and-white representation of your image
Iterate over all of the pixels in the image
Keep track of horizontal and vertical lines that has sufficiently black pixel values.
Use this data to determine the bounding box of the portion of the image your interested in.
Depending on your needs you might want to look into some computer vision library instead, which are well optimized for this and similar tasks. The one that springs to mind is OpenCV which is I would guess is well optimized and documented, and there's a python module available as well.
I am trying to add two images of different sizes using bitwise operations in OpenCV using python. I want a particular point in Image1(an image of face of a person) to coincide with a particular point in Image2(image of a spectacle frame). The particular points are not the cornermost points of the images.I know the 2 mid points of the frame glasses and the pupil of the eyes. I want the frame mid points to coincide with the pupil points of the eyes in the face. The code which I am using adds the second image's leftmost corner point to the specific point of Image1 as in Line 10, whereas i want the mid point of left glass frame to be added.
The face image can be any random image and the spectacle image is as -
I am using the code:
import cv2
import numpy as np
img_frame = cv2.imread('image1.jpg',1)
img_in = cv2.imread('face.jpg',1)
new_image = np.zeros(img_frame.shape,dtype=np.uint8)
i,j,k = img_frame.shape
for ii in range (1,i):
for jj in range (1,j):
pixel = img_frame[ii,jj]
img_in[339+ii,468+jj] = pixel
cv2.imwrite('pc2_with_frame_7.jpg',img_in)
cv2.imshow('win',img_in)
cv2.waitKey(0)
cv2.destroyWindow('win')
Any kind of help would be appreciated.
Thank you.
Ok, it seems nobody else much can help so I will offer what I can...
What you are trying to do is called alpha-compositing. You can read about it here on Wikipedia and also here in the OpenCV documentation.
My tool of choice for this would be ImageMagick, which is free and has Perl, Python, C/C++ bindings as well as command-line tools. If I start with this photo (face.jpg):
and take your glasses.jpg file and convert it to a PNG with transparency, whcih looks like this:
I can run the following ImageMagick command at the Terminal
composite glasses.png face.jpg out.jpg
and I get this:
It seems that OpenCV has problems maybe with transparency, and a solution is presented here. If you want to try the masking method suggested by #ypnos in that post, I have made you the necessary input files and you can download them from my website at:
glasses.png with alpha channel
input-mask.png