Object Counting on Images using OpenCV/YOLOv4 - python

I've been given an image containing stars and ovals, and have been tasked with detecting which is which and counting how many of each are contained within the image. One such image with ovals only looks like this:
I've first tried to solve the problem using OpenCV using tutorials such as this one and this one.
However I seem to run into issues with both in bounding the ovals, one results in a count of 1 oval whereas another results in a count of 330.
I then tried using YOLOv4, thinking that it would be more useful when dealing with two different classes (stars and ovals). I used the following code from top try bound boxes on my sample image.
box, label, count = cv.detect_common_objects(img)
output = draw_bbox(img, box, label, count)
output = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.figure(figsize= (10, 10))
plt.axis("off")
plt.imshow(img1)
plt.show()
However I received IndexError: Invalid index to scalar variable.
Can anyone point me in the right direction on how to proceed?
I first need to be able to do it for one class, and then multiple classes, before expanding into doing it for several images automatically.
Thanks

Related

Displaying several openCV images at the same time

I am trying to split an openCV frame image that I get from an input video stream
_, frame = cap.read()
into several smaller images and store them into an array. I don't know how many smaller images I will have beforehand, for example: I could split the image into 4 smaller images, or 8, 16 etc.
I want to create a function that allows me to display any arbitrary combination of the smaller images. Currently, it doesn't matter to me if they're being displayed in two separate windows or the same one (even though I would prefer them to be displayed in separate windows).
What I tried obviously doesn't work, looping over the list only displays the last image in the list:
# GridCells is the List that contains all the smaller images
def showCells(self):
for c in self.GridCells:
c.showC()
Where showC() is:
def showC(self):
cv2.imshow('cell',self.image)
As said I don't know how many smaller images I will have beforehand, hence having arbitrarily many cv2.imshow() statements is not a solution.
Thank you for your time!
Try this to make OpenCV create a new window for each image, where each window has a different name.
You can use the enumerate() function, which will be useful to have different window names, and the string formatter format() to quickly name the different windows using the enumerator passed to your showC function.
# GridCells is the List that contains all the smaller images
def showCells(self):
for i, c in enumerate(self.GridCells):
c.showC(i)
def showC(self, i):
cv2.imshow("cell{}".format(i),self.image)
You are only displaying the last image because you are giving all your images the same window name, here:
cv2.imshow('cell',self.image)
If you give each image a different name ('cell1', 'cell2', 'cell3' etc) they should show up at the same time.

Discover transformation required to align images of standardized documents

My question is not too far off from the "Image Alignment (ECC) in OpenCV ( C++ / Python )" article.
I also found the following article about facial alignment to be very interesting, but WAY more complex than my problem.
Wow! I can really go down the rabbit-hole.
My question is WAY more simple.
I have a scanned document that I have treated as a "template". In this template I have manually mapped the pixel regions that I require info from as:
area = (x1,y1,x2,y2)
such that x1<x2, y1<y2.
Now, these regions are, as is likely obvious, a bit too specific to my "template".
All other files that I want to extract data from are mostly shifted by some unknown amount such that their true area for my desired data is:
area = (x1 + ε1, y1 + ε2, x2 + ε1, y2 + ε2)
Where ε1, ε2 are unknown in advance.
But the documents are otherwise HIGHLY similar outside of this shift.
I want to discover, ideally through opencv, what translation is required (for the time being ignoring euclidean) to "align" these images as to disover my ε, shift my area, and parse my data directly.
I have thought about using tesseract to mine the text from the document and then parse from there, but there are check boxes that are either filled or empty
that contain meaningful information for my problem.
The code I currently have for cropping the image is:
from PIL import Image
img = Image.open(img_path)
area = area_lookup['key']
cropped_img = img.crop(area)
cropped_img.show()
My two sample files are attached.
My two images are:
We can assume my first image is my "template".
As you can see, the two images are very "similar" but one is moved slightly (human error). There may be cases where the rotation is more extreme, or the image is shifted more.
I would like transform image 2 to be as aligned to image 1 as possible, and then parse data from it.
Any help would be sincerely appreciated.
Thank you very much

cv2.HoughLinesP on a skeletonized image

I am trying to detect lines in a certain image. I run it through a skeletonization process before applying the cv2.HoughLinesP. I used the skeletonization code here.
No matter what I try I keep getting results similar to what is described here i.e. 'only fragments of a line..'
As suggested by Jiby, I use the named notation for the parameters and also high rho and theta, but to no avail.
Here is my code:
lines = cv2.HoughLinesP(skel, rho=5, theta=np.deg2rad(10), threshold=0, minLineLength=0, maxLineGap=0)
Prior to this I threshold a RGB image to extract most of my 'blue' hollow rectangle. Then I convert it to gray scale which I then feed to the skeletonizer.
Please advise.

How do I numerically position a barcode in reportlab?

I have a bit of a specific case that I can't seem to figure out a solution to. I'm writing a shipping label template object in ReportLab for Python. I have the following code that creates a barcode as a drawing.
uspsBarcode = createBarcodeDrawing('USPS_4State', value=self.imbval, routing=self.zip4.replace('-',''))
print uspsBarcode.getBounds() # print out of position and scale
Using that code, I later add it to a shape group, and that shape group gets returned. So I need the barcode to be positioned relative to the object. I can't seem to find any way to pass positioning to this, even though I've dug through the inheritance. Though as you can see from the print, positioning is set somwhere.
Just for anyone else that runs across this issue. It turns out that if you put the barcode drawing in a shape group, the shape group container can be moved around numerically with the shift function.
uspsBarcode = shapes.Group()
bc = createBarcodeDrawing('USPS_4State', value=self.imbVal, routing=self.zip4.replace('-',''))
uspsBarcode.add(bc)
uspsBarcode.shift(self.x+(s*0.2), self.y)

Categorize different images

I have a number of images from Chinese genealogies, and I would like to be able to programatically categorize them. Generally speaking, one type of image has primarily line-by-line text, while the other type may be in a grid or chart format.
Example photos
'Desired' type: http://www.flickr.com/photos/63588871#N05/8138563082/
'Other' type: http://www.flickr.com/photos/63588871#N05/8138561342/in/photostream/
Question: Is there a (relatively) simple way to do this? I have experience with Python, but little knowledge of image processing. Direction to other resources is appreciated as well.
Thanks!
Assuming that at least some of the grid lines are exactly or almost exactly vertical, a fairly simple approach might work.
I used PIL to find all the columns in the image where more than half of the pixels were darker than some threshold value.
Code
import Image, ImageDraw # PIL modules
withlines = Image.open('withgrid.jpg')
nolines = Image.open('nogrid.jpg')
def findlines(image):
w,h, = image.size
s = w*h
im = image.point(lambda i: 255 * (i < 60)) # threshold
d = im.getdata() # faster than per-pixel operations
linecolumns = []
for col in range(w):
black = sum( (d[x] for x in range(col, s, w)) )//255
if black > 450:
linecolumns += [col]
# return an image showing the detected lines
im2 = image.convert('RGB')
draw = ImageDraw.Draw(im2)
for col in linecolumns:
draw.line( (col,0,col,h-1), fill='#f00', width = 1)
return im2
findlines(withlines).show()
findlines(nolines).show()
Results
showing detected vertical lines in red for illustration
As you can see, four of the grid lines are detected, and, with some processing to ignore the left and right sides and the center of the book, there should be no false positives on the desired type.
This means that you could use the above code to detect black columns, discard those that are near to the edge or the center. If any black columns remain, classify it as the "other" undesired class of pictures.
AFAIK, there is no easy way to solve this. You will need a decent amount of image processing and some basic machine learning to classify these kinds of images (and even than it probably won't be 100% successful)
Another note:
While this can be solved by only using machine learning techniques, I would advice you to start searching for some image processing techniques first and try to convert your image to a form that has a decent difference for both images. For this you best start reading about the fft. After that have a look at some digital image processing techniques. When you feel comfortable that you have a decent understanding of these, you can read up on pattern recognition.
This is only one suggested approach though, there are more ways to achieve this.

Categories

Resources