I am looking for a solution to get the largest possible rectangle inside a polygon.
currently I am using Arcpy for ArcGIS (python library), but there is no out of the box solution for getting it, instead there is a feature named Minimum Bounding Geometry (this returning the opposite result, a rectangle contains the polygon):
https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/minimum-bounding-geometry.htm
Example of the required result:
The best I've found is described in an academic paper titled "Finding the largest area rectangle of arbitrary orientation in a closed contour". Available in PDF, but copyright precludes me from linking the doc, which you can get to via the publisher https://www.sciencedirect.com/science/article/abs/pii/S0096300312003207.
The algorithm is O(N^3) and the authors claim there is no other with lower time complexity (which does NOT imply that it is most efficient for all use cases).
(My implementation (C#) was for a client who owns the code, so you will have to roll your own version unless it's been open sourced in the meantime.)
Related
I'm blacking out some information from several PDF's but there are some of these that the rectangles made by "draw_rect" functions are't being drawn correctly. I have checked the rectangles and they look right, that and I'm also usind the "add_redact_annot" with the exact same rectangle and works good.
def hide_text_rects(page, rects):
for rect in rects:
page.add_redact_annot(rect)
page.draw_rect(rect, color=(0,0,0), fill=(0,0,0))
The rectangles seem to be mirrored and zoomed (scaled). I really don't know what to do because I don't find any info related in the docs.
Edit: I found that the PDF's with version 1.7 are the ones working correctly. And the other ones are version 1.5.
The probable reason for this behaviour is a sloppy specification of the page's coordinate system.
For example the standard point (0,0) = bottom-left in PDF may have been redefined to be top-left.
If this type of coordinate change is not wrapped within PDF stacking operators q / Q (as it should), then any insertions (of text, drawing etc.) appended to the page /Contents act under wrong assumptions, and appear dislocated.
Heal this by executing page.clean_contents() before do any insertion.
You can also check if this is required at all by page.is_wrapped. Please also consult the documentation - there is an own section dealing with this.
I intend to make a 3D model based on multi view stereo images ( basically 2D plane images of the same object from different angles and orientation) inside Blender from scratch.However, I am new to Blender.
I wanted to know if there are any tutorials of how to project a single pixel or point in the space of Blender's 3D environment using python. If not tutorial, any documentation. I am still learning about this whole 3D construction thing and pretty new to this, so I am not sure maybe these points are displayed using a 3 dimensional matrix/array ?
Basically I want to implement 3D construction based on a paper written by some researchers. Mostly every such project is in C++. I want to do it in Python in Blender, and if I am capable enough, make these libraries open source.
Suggest me any pre-requisite if you think that shall help me. I have just started my 3rd year of BSc Computer Science course, and very new to the world of Computer Graphics.
(My skillset is C, Java and Python.)
I would be very glad and appreciate any help.
Thank You
[Link to websitehttps://vision.in.tum.de/research/image-based_3d_reconstruction/multiviewreconstruction[][1]]
image2
Yes, it can very likely be done in Blender, and in Python at least for small geometries / low resolution.
A valid approach for the kind of scenarios you seem to want to play with is based on the idea of "space carving" or "silhouette projection". A good description in is an old paper by Kutulakos and Seitz, which was based in part on earlier work by Szelisky.
Given a good estimation of the silhouettes, these methods can correctly reconstruct all convex portions of the object's surface, and the subset of concavities that are resolved in the photo hull. The remaining concavities are "patched" over and need to be reconstructed using a different method (e.g. stereo, or structured light). For the surfaces that can be reconstructed, space carving is generally more robust than stereo (since it is insensitive to the color and surface texture of the object), and can work on surfaces where structured light struggles (e.g. surfaces with specularities, or very dark objects with low reflectance for a laser stripe)
The basic idea is to use the silhouettes of the projection of the object in cameras around it to "remove" mass from an initial volume (e.g. a box) encompassing the object, a bit like a sculptor carving a statue by removing material from a block of marble.
Computationally, you can do it representing the volume of space of interest using an octree, initialized with a minimal level of subdivision, and then progressively refined. The refinement consists of projecting the vertices of the octree leaves in the cameras, and identifying which leaves are completely outside or partially inside the silhouettes. The former are pruned, while the latter are split, and the process continues until no more leaves can be split or a maximul level of subdivision is reached. The hull of the octree is then extracted as a "watertight" mesh using standard methods.
Apart from the above paper, a way more detailed description can be found on an old patent by Geometrix - it sold a scanner based on the above ideas around year 2000. Here is what it looked like:
I have an array made of 1 and 0 (image below), and I am working on a Python script that detects the borders of the central region (the big white blob) and marks all the internal points as 1. How would you do it?
I wrote a piece of code that does repeated connectivity search, but this doesn't seem the way to go - the region changes shape and new areas are added.
as I can't put a comment i put it here.
I had a problem close to yours: I wanted to select several holes and then calculate the area, the roundness...
What I did was to use the java implementation of python (jython) by which I could use a library called imageJ which is dedicated to image processing (all is include in Fiji). Navigating in the library is a bit fastidiuous but it is powerfull one
Here is the wand tool: http://rsbweb.nih.gov/ij/developer/api/ij/gui/Wand.html
Have a look here for "How getting pixels of a ROi" : http://fiji.sc/Introduction_into_Developing_Plugins#ImageJ.27s_API
I have written a program in Python which automatically reads score sheets like this one
At the moment I am using the following basic strategy:
Deskew the image using ImageMagick
Read into Python using PIL, converting the image to B&W
Calculate calculate the sums of pixels in the rows and the columns
Find peaks in these sums
Check the intersections implied by these peaks for fill.
The result of running the program is shown in this image:
You can see the peak plots below and to the right of the image shown in the top left. The lines in the top left image are the positions of the columns and the red dots show the identified scores. The histogram bottom right shows the fill levels of each circle, and the classification line.
The problem with this method is that it requires careful tuning, and is sensitive to differences in scanning settings. Is there a more robust way of recognising the grid, which will require less a-priori information (at the moment I am using knowledge about how many dots there are) and is more robust to people drawing other shapes on the sheets? I believe it may be possible using a 2D Fourier Transform, but I'm not sure how.
I am using the EPD, so I have quite a few libraries at my disposal.
First of all, I find your initial method quite sound and I would have probably tried the same way (I especially appreciate the row/column projection followed by histogramming, which is an underrated method that is usually quite efficient in real applications).
However, since you want to go for a more robust processing pipeline, here is a proposal that can probably be fully automated (also removing at the same time the deskewing via ImageMagick):
Feature extraction: extract the circles via a generalized Hough transform. As suggested in other answers, you can use OpenCV's Python wrapper for that. The detector may miss some circles but this is not important.
Apply a robust alignment detector using the circle centers.You can use Desloneux parameter-less detector described here. Don't be afraid by the math, the procedure is quite simple to implement (and you can find example implementations online).
Get rid of diagonal lines by a selection on the orientation.
Find the intersections of the lines to get the dots. You can use these coordinates for deskewing by assuming ideal fixed positions for these intersections.
This pipeline may be a bit CPU-intensive (especially step 2 that will proceed to some kind of greedy search), but it should be quite robust and automatic.
The correct way to do this is to use Connected Component analysis on the image, to segment it into "objects". Then you can use higher level algorithms (e.g. hough transform on the components centroids) to detect the grid and also determine for each cell whether it's on/off, by looking at the number of active pixels it contains.
I'd like to determine the position and orientation of a stereo camera relative to its previous position in world coordinates. I'm using a bumblebee XB3 camera and the motion between stereo pairs is on the order of a couple feet.
Would this be on the correct track?
Obtain rectified image for each pair
Detect/match feature points rectified images
Compute Fundamental Matrix
Compute Essential Matrix
Thanks for any help!
Well, it sounds like you have a fair understanding of what you want to do! Having a pre-calibrated stereo camera (like the Bumblebee) will then deliver up point-cloud data when you need it - but it also sounds like you basically want to also use the same images to perform visual odometry (certainly the correct term) and provide absolute orientation from a last known GPS position, when the GPS breaks down.
First things first - I wonder if you've had a look at the literature for some more ideas: As ever, it's often just about knowing what to google for. The whole idea of "sensor fusion" for navigation - especially in built up areas where GPS is lost - has prompted a whole body of research. So perhaps the following (intersecting) areas of research might be helpful to you:
Navigation in 'urban canyons'
Structure-from-motion for navigation
SLAM
Ego-motion
Issues you are going to encounter with all these methods include:
Handling static vs. dynamic scenes (i.e. ones that change purely based on the camera motion - c.f. others that change as a result of independent motion occurring in the scene: trees moving, cars driving past, etc.).
Relating amount of visual motion to real-world motion (the other form of "calibration" I referred to - are objects small or far away? This is where the stereo information could prove extremely handy, as we will see...)
Factorisation/optimisation of the problem - especially with handling accumulated error along the path of the camera over time and with outlier features (all the tricks of the trade: bundle adjustment, ransac, etc.)
So, anyway, pragmatically speaking, you want to do this in python (via the OpenCV bindings)?
If you are using OpenCV 2.4 the (combined C/C++ and Python) new API documentation is here.
As a starting point I would suggest looking at the following sample:
/OpenCV-2.4.2/samples/python2/lk_homography.py
Which provides a nice instance of basic ego-motion estimation from optic flow using the function cv2.findHomography.
Of course, this homography H only applies if the points are co-planar (i.e. lying on the same plane under the same projective transform - so it'll work on videos of nice flat roads). BUT - by the same principal we could use the Fundamental matrix F to represent motion in epipolar geometry instead. This can be calculated by the very similar function cv2.findFundamentalMat.
Ultimately, as you correctly specify above in your question, you want the Essential matrix E - since this is the one that operates in actual physical coordinates (not just mapping between pixels along epipoles). I always think of the Fundamental matrix as a generalisation of the Essential matrix by which the (inessential) knowledge of the camera intrinsic calibration (K) is omitted, and vise versa.
Thus, the relationships can be formally expressed as:
E = K'^T F K
So, you'll need to know something of your stereo camera calibration K after all! See the famous Hartley & Zisserman book for more info.
You could then, for example, use the function cv2.decomposeProjectionMatrix to decompose the Essential matrix and recover your R orientation and t displacement.
Hope this helps! One final word of warning: this is by no means a "solved problem" for the complexities of real world data - hence the ongoing research!