I'm trying to make a face recognition program but the problem is the face encoding shape of some encodings are bigger than the others and thus im getting the error
ValueError: setting an array element with a sequence.
Here's my code to generate the encodings
class FaceEncoder():
def __init__(self, files, singleton = False, model_path='./models/lbpcascade_animeface.xml', scale_factor=1.1, min_neighbours=1):
self.singleton = singleton
self.files = files
self.model = model_path
self.scale_factor = scale_factor
self.min_neighbours = min_neighbours
def encode(self, singleton=False):
if self.singleton == False:
encodings = []
labels = []
for file in self.files:
cascade = cv2.CascadeClassifier(self.model)
image = cv2.imread(file)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
faces = cascade.detectMultiScale(rgb, self.scale_factor, self.min_neighbours)
if len(faces) > 0:
print('Found face in '+file)
encodings.append(faces.flatten())
labels.append(file.split('/')[2])
else:
print('Couldnt find face in '+file)
return encodings, labels
Here are some of the encodings
[204 96 211 211]
[525 168 680 680]
[205 11 269 269]
[ 165 31 316 316 1098 181 179 179]
[ 113 422 1371 1371]
[ 71 86 183 183]
[209 19 33 33 88 27 60 60 133 80 65 65 68 117 52 52]
[117 77 149 149]
[ 63 77 284 284]
[370 222 490 490]
[433 112 114 114 183 98 358 358]
[ 44 35 48 48 192 34 48 48]
[210 82 229 229]
[429 90 153 153]
[318 50 174 174 118 142 120 120]
you should not put several found rects into the same list entry.
if there are many faces found, put each on its own row, and add a label per face found (not per image)
then, what you have now, are NOT "encodings", just mere boxes / rectangles.
read up on how to get real encodings (facenet, spherenet ?), then you need to:
crop the face region fom the image
resize it to the nn input size (e.g. 96x96)
run it through the nn to receive the encoding
save that along with a label to a db/list
Related
I have a file f which holds N (unknown) events. Each event carries an (unknown and different for each event, call it i, j etc) amount of reconstructed tracks. Then, each track has properties like energy E and likelihood lik. So,
>>> print(f.events.tracks.lik)
[[lik1, lik2, ..., likX], [lik1, lik2, ..., likj], ..., [lik1, lik2, ..., likz]]
prints an array holding N subarrays (1 per event), each presenting the lik for all its tracks.
GOAL: call f.events.tracks[:, Inds].E to get the energies for the tracks with max likelihood.
Minimal code example
>>>import numpy as np
>>>lik = np.random.randint(low=0, high=100, size=50).reshape(5, 10)
>>>print(lik)
[[ 3 49 27 3 80 59 96 99 84 34]
[88 62 61 83 90 9 62 30 92 80]
[ 5 21 69 40 2 40 13 63 42 46]
[ 0 55 71 67 63 49 29 7 21 7]
[40 7 68 46 95 34 74 88 79 15]]
>>>energy = np.random.randint(low=100, high=2000, size=50).reshape(5, 10)
>>>print(energy)
[[1324 1812 917 553 185 743 358 877 1041 905]
[1407 663 359 383 339 1403 1511 1964 1797 1096]
[ 315 1431 565 786 544 1370 919 1617 1442 925]
[1710 698 246 1631 1374 1844 595 465 908 953]
[ 305 384 668 952 458 793 303 153 661 791]]
>>> Inds = np.argmax(lik, axis=1)
>>> print(Inds)
[2 1 8 6 7]
PROBLEM:
>>> # call energy[Inds] to get
# [917, 663, 1442, 1844, 153]
What is the correct way of accessing these energies?
You can select the values indexed by Inds for each line using a 2D indexing with a temporary array containing [0,1,2,...] (generated using np.arange).
Here is an example:
energy[np.arange(len(Inds)), Inds]
I'm trying to read text from an image, using OpenCV and Pytesseract, but with poor results.
The image I'm interested in reading the text is: https://www.lubecreostorepratolapeligna.it/gb/img/logo.png
This is the code I am using:
pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
image = cv2.imread(path_to_image)
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
cv2.imshow('threshold image', threshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
The result of the execution is : ["cu"," ","LUBE"," ","STORE","PRATOLA PELIGNA"]
But the result should be these 7 words: ["cucine", "LUBE", "CREO", "kitchens", "STORE", "PRATOLA", "PELIGNA"]
Is there anyone who could help me to solve this problem ?
Edit, 17.12.2020: Using preprocessing now it recognizes all, but the "O" in CREO. See the stages in ocr8.py. Then ocr9.py demonstrates (but not automated yet) finding the lines of text by the coordinates returned from pytesseract.image_to_boxes(), approcimate size of the letters and inter-symbol distance, then extrapolating one step ahead and searching for a single character (--psm 8).
It happened that Tesseract had actually recognized the "O" in CREO, but it read it as ♀, probably confused by the little "k" below etc.
Since it is a rare and "strange"/unexpected symbol, it could be corrected - replaced automatically (see the function Correct()).
There is a technical detail: Tesseract returns the ANSI/ASCII symbol 12, (0x0C) while the code in my editor was in Unicode/UTF-8 - 9792. So I coded it inside as chr(12).
The latest version: ocr9.py
You mentioned that PRATOLA and PELIGNA have to be given sepearately - just split by " ":
splitted = text.split(" ")
RECOGNIZED
CUCINE
LUBE
STORE
PRATOLA PELIGNA
CRE [+O with correction and extrapolation of the line]
KITCHENS
...
C 39 211 47 221 0
U 62 211 69 221 0
C 84 211 92 221 0
I 107 211 108 221 0
N 123 211 131 221 0
E 146 211 153 221 0
L 39 108 59 166 0
U 63 107 93 166 0
B 98 108 128 166 0
E 133 108 152 166 0
S 440 134 468 173 0
T 470 135 499 173 0
O 500 134 539 174 0
R 544 135 575 173 0
E 580 135 608 173 0
P 287 76 315 114 0
R 319 76 350 114 0
A 352 76 390 114 0
T 387 76 417 114 0
O 417 75 456 115 0
L 461 76 487 114 0
A 489 76 526 114 0
P 543 76 572 114 0
E 576 76 604 114 0
L 609 76 634 114 0
I 639 76 643 114 0
G 649 75 683 115 0
N 690 76 722 114 0
A 726 76 764 114 0
C 21 30 55 65 0
R 62 31 93 64 0
E 99 31 127 64 0
K 47 19 52 25 0
I 61 19 62 25 0
T 71 19 76 25 0
C 84 19 89 25 0
H 96 19 109 25 0
E 113 19 117 25 0
N 127 19 132 25 0
S 141 19 145 22 0
These are from getting "boxes".
Initial message:
I guess that for the area where "cucine" is, an adaptive threshold may segment it better or maybe applying some edge detection first.
Kitchens seems very small, what about trying to enlarge that area/distance.
For the CREO, I guess it's confused with the big and small size of adjacent captions.
For the "O" in creo, you may apply dilate in order to close the gap of the "O".
Edit: I played a bit, but without Tesseract and it needs more work. My goal was to make the letters more contrasting, may need some of these processings to be applied selectively only on the Cucine, maybe applying the recognition in two passes. When getting those partial words "Cu", apply adaptive threshold etc. (below) and OCR on a top rectangle around "CU..."
Binary Threshold:
Adaptive Threshold, Median blur (to clean noise) and invert:
Dilate connects small gaps, but it also destroys detail.
import cv2
import numpy as np
#pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
path_to_image = "logo.png"
#path_to_image = "logo1.png"
image = cv2.imread(path_to_image)
h, w, _ = image.shape
w*=3; h*=3
w = (int)(w); h = (int) (h)
image = cv2.resize(image, (w,h), interpolation = cv2.INTER_AREA) #Resize 3 times
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
#threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
threshold_img = cv2.adaptiveThreshold(gray_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,13,3) #cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11,2)[1]
cv2.imshow('threshold image', threshold_img)
cv2.waitKey(0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
threshold_img = cv2.medianBlur(threshold_img,5)
cv2.imshow('medianBlur', threshold_img)
cv2.waitKey(0)
threshold_img = cv2.bitwise_not(threshold_img)
cv2.imshow('Invert', threshold_img)
cv2.waitKey(0)
#kernel = np.ones((1, 1), np.uint8)
#threshold_img = cv2.dilate(threshold_img, kernel)
#cv2.imshow('Dilate', threshold_img)
#cv2.waitKey(0)
cv2.imshow('threshold image', thrfeshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
I have this image of a table
I'm trying to parse it using PyTesseract. I've gotten pretty darn close using this code:
from PIL import Image, ImageOps
import pytesseract
og_image = Image.open('og_image.png')
grayscale = ImageOps.grayscale(og_image)
inverted = ImageOps.invert(grayscale.convert('RGB'))
print(pytesseract.image_to_string(inverted))
This seems to be very accurate, except the single-digit numbers in the second-to-last column are blank. Do I need to do something different to pick up on those numbers?
Tesseract has several modes of page segmentation, and choosing the right one is necessary to help it getting best result. See documentation.
Also in this case, you can restrict tesseract to a certain character set.
Another thing, tesseract is sensitive to the fonts and image size. A simple resizing can change the results greatly. Here I change image size horizontally by factor 2 and vertically to get best result ;)
Combining all the above, you will get:
custom_config = r'--psm 6 -c tessedit_char_whitelist=0123456789.'
print(pytesseract.image_to_string(inverted.resize((1506, 412), Image.ANTIALIAS), config=custom_config))
1525 .199 303 82 161 162 7 .241
1464 .290 424 70 139 198 25 .352
1456 .292 425 116 224 224 0 .345
1433 .240 346 81 130 187 15 .275
1390 .273 373 108 217 216 3 .345
1386 .276 383 54 181 154 18 .315
1225 .208 255 68 148 129 1 .242
1218 .238 230 46 128 127 18 .273
1117 .240 268 43 113 1193 1 .308
I have file having EncodedPixels mask of different size
1: I want to convert these EncodedPixels in binary and resize all into 1024 and then again convert in to EncodedPixels.
Explanation:
In file there is image-Mask in Encoded Pixels form, and images have different dimensions (5000x5000, 260x260 etc) So I resize all images in to 1024x1024, Now I want to resize each image-mask according to image 1024x1024.
I my mind there is only one possible solution (might be more available) to resize mask is first we need to convert run length encoding pixel in to binary and then we are able to resize mask easily.
File Link: link here
This code will use to resize binary mask.
from PIL import Image
import numpy as np
pil_image = Image.fromarray(binary_mask)
pil_image = pil_image.resize((new_width, new_height), Image.NEAREST)
resized_binary_mask = np.asarray(pil_image)
Encoded Pixels Example
['6068157 7 6073371 20 6078584 34 6083797 48 6089010 62 6094223 72 6099436 76 6104649 80
6109862 85 6115075 89 6120288 93 6125501 98 6130714 102 6135927 106 6141140 111 6146354 114 6151567 118 6156780 123 6161993 127 6167206 131 6172419 136 6177632 140 6182845 144 6188058 149 6193271 153 6198484 157 6203697 162 6208910 166 6214124 169 6219337 174 6224550 178 6229763 182 6234976 187 6240189 191 6245402 195 6250615 200 6255828 204 6261041 208 6266254 213 6271467 218 6276680 224 6281893 229 6287107 233 6292320 238 6297533 244 6302746 249 6307959 254 6313172 259 6318385 265 6323598 270 6328811 275 6334024 280 6339237 286 6344450 291 6349663 296 6354877 300 6360090 306 6365303 311 6370516 316 6375729 322 6380942 327 6386155 332 6391368 337 6396581 343 6401794 348 6407007 353 6412220 358 6417433 364 6422647 368 6427860 373 6433073 378 6438286 384 6443499 389 6448712 394 6453925 399 6459138 405 6464351 410 6469564 415 6474777 420 6479990 426 17204187 78 17208797 227 17209412 56 17214025 203 17214637 34 17219253 179 17219862 11 17224481 155 17229709 131 17234937 107 17240165 83 17245393 60 17250621 36 17255849 12']
I am working on an image processing problem.
I create a function that applies a salt and pepper noise to an image.
Here is the function:
def sp_noise(image,prob):
res = np.zeros(image.shape,np.uint8)
for i in range(image.shape[0]):
for j in range(image.shape[1]):
rdn = random.random()
if rdn < prob:
rdn2 = random.random()
if rdn2 < 0.5:
res[i][j] = 0
else:
res[i][j] = 255
else:
res[i][j] = image[i][j]
return res
Problems happen when I want to display the result.
wood = loadPNGFile('wood.jpeg',rgb=False)
woodSP = sp_noise(bois,0.01)
plt.subplot(1,2,1)
plt.imshow(bois,'gray')
plt.title("Wood")
plt.subplot(1,2,2)
plt.imshow(woodSP,'gray')
plt.title("Wood SP")
I can not post the image directly but here is the link:
The picture is darker. But when I display the value of the pixels
But when I display the value of the pixels between the 2 images the values are the same:
[[ 99 97 96 ... 118 90 70]
[110 110 103 ... 116 115 101]
[ 79 73 65 ... 96 121 121]
...
[ 79 62 46 ... 105 124 113]
[ 86 98 100 ... 114 119 99]
[ 96 95 95 ... 116 111 90]]
[[255 97 96 ... 118 90 70]
[110 110 103 ... 116 115 101]
[ 79 73 65 ... 96 121 121]
...
[ 79 62 46 ... 105 124 113]
[ 86 98 100 ... 114 119 99]
[ 96 95 95 ... 116 111 90]]
I also check the mean value:
117.79877369007804
117.81332616658703
Apparently the problem comes from the display plt.imshow, but I can not find a solution
Looking at the documentation of imshow, there are 2 optional parameters, vmin, vmax which:
When using scalar data and no explicit norm, vmin and vmax define the
data range that the colormap covers. By default, the colormap covers
the complete value range of the supplied data. vmin, vmax are ignored
if the norm parameter is used.
Therefore, if no values are specified for these parameters, the range of luminosity is based on the actual data values, with the minimum value being set to black and the maximum value being set to white. This is useful in visualization, but not in comparisons, as you found out. Therefore, just set vmin and vmax to appropriate values (probably 0 and 255).