How can I load the RGB matrix of an image. Basically, if I have a 224x224 image(grayscale), I need it's RGB matrix so I want a 224x224 matrix consisting of 3 element tuples. I have tried:
f="/path/to/grayscale/image"
image = Image.open(f)
new_width = 224
new_height = 224
im = image.resize((new_width, new_height), Image.ANTIALIAS)
im=np.array(im)
print(im)
and it prints:
[[195 195 195 ..., 101 104 105]
[195 195 195 ..., 102 105 106]
[194 194 194 ..., 104 109 111]
...,
[137 138 140 ..., 209 207 206]
[133 134 136 ..., 209 207 206]
[132 133 135 ..., 209 207 206]]
After some testing, I realised that it was because of the image being grayscale. How can I load the RGB matrix of a grayscale image?
I am not proficien in PIL, but it looks there is an image.Convert("RGB") method that may or may not work, so give it a try.
However, if your intention is to continue using np.array then the following will work:
im=np.array(im)
imRGB = np.repeat(im[:, :, np.newaxis], 3, axis=2)
Basically it repeats the input np.array into a 3rd new axis, 3 times.
imRGB[:,:,0] is the Red channel
imRGB[:,:,1] is the Green channel
imRGB[:,:,2] is the Blue channel
Related
I'm trying to make a face recognition program but the problem is the face encoding shape of some encodings are bigger than the others and thus im getting the error
ValueError: setting an array element with a sequence.
Here's my code to generate the encodings
class FaceEncoder():
def __init__(self, files, singleton = False, model_path='./models/lbpcascade_animeface.xml', scale_factor=1.1, min_neighbours=1):
self.singleton = singleton
self.files = files
self.model = model_path
self.scale_factor = scale_factor
self.min_neighbours = min_neighbours
def encode(self, singleton=False):
if self.singleton == False:
encodings = []
labels = []
for file in self.files:
cascade = cv2.CascadeClassifier(self.model)
image = cv2.imread(file)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
faces = cascade.detectMultiScale(rgb, self.scale_factor, self.min_neighbours)
if len(faces) > 0:
print('Found face in '+file)
encodings.append(faces.flatten())
labels.append(file.split('/')[2])
else:
print('Couldnt find face in '+file)
return encodings, labels
Here are some of the encodings
[204 96 211 211]
[525 168 680 680]
[205 11 269 269]
[ 165 31 316 316 1098 181 179 179]
[ 113 422 1371 1371]
[ 71 86 183 183]
[209 19 33 33 88 27 60 60 133 80 65 65 68 117 52 52]
[117 77 149 149]
[ 63 77 284 284]
[370 222 490 490]
[433 112 114 114 183 98 358 358]
[ 44 35 48 48 192 34 48 48]
[210 82 229 229]
[429 90 153 153]
[318 50 174 174 118 142 120 120]
you should not put several found rects into the same list entry.
if there are many faces found, put each on its own row, and add a label per face found (not per image)
then, what you have now, are NOT "encodings", just mere boxes / rectangles.
read up on how to get real encodings (facenet, spherenet ?), then you need to:
crop the face region fom the image
resize it to the nn input size (e.g. 96x96)
run it through the nn to receive the encoding
save that along with a label to a db/list
I'm trying to read text from an image, using OpenCV and Pytesseract, but with poor results.
The image I'm interested in reading the text is: https://www.lubecreostorepratolapeligna.it/gb/img/logo.png
This is the code I am using:
pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
image = cv2.imread(path_to_image)
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
cv2.imshow('threshold image', threshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
The result of the execution is : ["cu"," ","LUBE"," ","STORE","PRATOLA PELIGNA"]
But the result should be these 7 words: ["cucine", "LUBE", "CREO", "kitchens", "STORE", "PRATOLA", "PELIGNA"]
Is there anyone who could help me to solve this problem ?
Edit, 17.12.2020: Using preprocessing now it recognizes all, but the "O" in CREO. See the stages in ocr8.py. Then ocr9.py demonstrates (but not automated yet) finding the lines of text by the coordinates returned from pytesseract.image_to_boxes(), approcimate size of the letters and inter-symbol distance, then extrapolating one step ahead and searching for a single character (--psm 8).
It happened that Tesseract had actually recognized the "O" in CREO, but it read it as ♀, probably confused by the little "k" below etc.
Since it is a rare and "strange"/unexpected symbol, it could be corrected - replaced automatically (see the function Correct()).
There is a technical detail: Tesseract returns the ANSI/ASCII symbol 12, (0x0C) while the code in my editor was in Unicode/UTF-8 - 9792. So I coded it inside as chr(12).
The latest version: ocr9.py
You mentioned that PRATOLA and PELIGNA have to be given sepearately - just split by " ":
splitted = text.split(" ")
RECOGNIZED
CUCINE
LUBE
STORE
PRATOLA PELIGNA
CRE [+O with correction and extrapolation of the line]
KITCHENS
...
C 39 211 47 221 0
U 62 211 69 221 0
C 84 211 92 221 0
I 107 211 108 221 0
N 123 211 131 221 0
E 146 211 153 221 0
L 39 108 59 166 0
U 63 107 93 166 0
B 98 108 128 166 0
E 133 108 152 166 0
S 440 134 468 173 0
T 470 135 499 173 0
O 500 134 539 174 0
R 544 135 575 173 0
E 580 135 608 173 0
P 287 76 315 114 0
R 319 76 350 114 0
A 352 76 390 114 0
T 387 76 417 114 0
O 417 75 456 115 0
L 461 76 487 114 0
A 489 76 526 114 0
P 543 76 572 114 0
E 576 76 604 114 0
L 609 76 634 114 0
I 639 76 643 114 0
G 649 75 683 115 0
N 690 76 722 114 0
A 726 76 764 114 0
C 21 30 55 65 0
R 62 31 93 64 0
E 99 31 127 64 0
K 47 19 52 25 0
I 61 19 62 25 0
T 71 19 76 25 0
C 84 19 89 25 0
H 96 19 109 25 0
E 113 19 117 25 0
N 127 19 132 25 0
S 141 19 145 22 0
These are from getting "boxes".
Initial message:
I guess that for the area where "cucine" is, an adaptive threshold may segment it better or maybe applying some edge detection first.
Kitchens seems very small, what about trying to enlarge that area/distance.
For the CREO, I guess it's confused with the big and small size of adjacent captions.
For the "O" in creo, you may apply dilate in order to close the gap of the "O".
Edit: I played a bit, but without Tesseract and it needs more work. My goal was to make the letters more contrasting, may need some of these processings to be applied selectively only on the Cucine, maybe applying the recognition in two passes. When getting those partial words "Cu", apply adaptive threshold etc. (below) and OCR on a top rectangle around "CU..."
Binary Threshold:
Adaptive Threshold, Median blur (to clean noise) and invert:
Dilate connects small gaps, but it also destroys detail.
import cv2
import numpy as np
#pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
path_to_image = "logo.png"
#path_to_image = "logo1.png"
image = cv2.imread(path_to_image)
h, w, _ = image.shape
w*=3; h*=3
w = (int)(w); h = (int) (h)
image = cv2.resize(image, (w,h), interpolation = cv2.INTER_AREA) #Resize 3 times
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
#threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
threshold_img = cv2.adaptiveThreshold(gray_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,13,3) #cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11,2)[1]
cv2.imshow('threshold image', threshold_img)
cv2.waitKey(0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
threshold_img = cv2.medianBlur(threshold_img,5)
cv2.imshow('medianBlur', threshold_img)
cv2.waitKey(0)
threshold_img = cv2.bitwise_not(threshold_img)
cv2.imshow('Invert', threshold_img)
cv2.waitKey(0)
#kernel = np.ones((1, 1), np.uint8)
#threshold_img = cv2.dilate(threshold_img, kernel)
#cv2.imshow('Dilate', threshold_img)
#cv2.waitKey(0)
cv2.imshow('threshold image', thrfeshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
I am trying to apply the below convolve method below on the cameraman image. The kernel applied to the image is a 3x3 filter populated with -1/9. I print the values of the cameraman image before applying the convolve method, and all I get are positive values. Next, when I apply the 3x3 negative kernel on the Image, I still get positive values when I print the values of the cameraman image after convolution.
The convolving function:
def convolve2d(image, kernel):
# This function which takes an image and a kernel
# and returns the convolution of them
# Args:
# image: a numpy array of size [image_height, image_width].
# kernel: a numpy array of size [kernel_height, kernel_width].
# Returns:
# a numpy array of size [image_height, image_width] (convolution output).
output = np.zeros_like(image) # convolution output
# Add zero padding to the input image
padding = int(len(kernel)/2)
image_padded=np.pad(image,((padding,padding),(padding,padding)),'constant')
for x in range(image.shape[1]): # Loop over every pixel of the image
for y in range(image.shape[0]):
# element-wise multiplication of the kernel and the image
output[y,x]=(kernel*image_padded[y:y+3,x:x+3]).sum()
return output
And here is the filter I am applying to the image:
filter2= [[-1/9,-1/9,-1/9],[-1/9,-1/9,-1/9],[-1/9,-1/9,-1/9]]
Finally, these are the intial values of the images, and the values after convolution respectively:
[[156 159 158 ... 151 152 152]
[160 154 157 ... 154 155 153]
[156 159 158 ... 151 152 152]
...
[114 132 123 ... 135 137 114]
[121 126 130 ... 133 130 113]
[121 126 130 ... 133 130 113]]
After convolution:
[[187 152 152 ... 154 155 188]
[152 99 99 ... 104 104 155]
[152 99 100 ... 103 103 154]
...
[175 133 131 ... 127 130 174]
[174 132 124 ... 125 130 175]
[202 173 164 ... 172 173 202]]
This is how I call the convolve2d method:
convolved_camManImage= convolve2d(camManImage,filter2)
This might be caused by how numpy dtypes work. As numpy.zeros_like's help says:
Return an array of zeros with the same shape and type as a given
array.
Thus your output might be dtype uint8, which use modulo arithmetics. To check if this is case add print(output.dtype) immediately after output = np.zeros_like(image) line
I have a 3d numpy array like
[[[ 90 220 210]
[241 409 310]
[126 376 201]]
[[280 357 162]
[108 204 248]
[376 259 344]]
[[254 279 216]
[338 376 102]
[310 256 84]]]
i want to iterate over each element and do this condition. if the element is greater than 255 i want to do integer*(255/integer) and save it in the same place.
how can i achieve this?
thanks in advance
if the element is greater than 255
over255 = arr > 255 # produces 3D boolean array
i want to do integer*(255/integer)
arr[over255] *= 255 / arr[over255]
Although in the end perhaps you can just do one of these:
arr[:] = np.minimum(arr, 255)
arr[:] = np.clip(arr, None, 255)
I want to manipulate RGB bands in a TIFF file and output the grayscale map on matplotlib. So far I have this code, but I couldn't get it on grayscale:
import scipy as N
import gdal
import sys
import matplotlib.pyplot as pyplot
tif = gdal.Open('filename.tif')
band1 = tif.GetRasterBand(1)
band2 = tif.GetRasterBand(2)
band3 = tif.GetRasterBand(3)
red = band1.ReadAsArray()
green = band2.ReadAsArray()
blue = band3.ReadAsArray()
gray = (0.299*red + 0.587*green + 0.114*blue)
pyplot.figure()
pyplot.imshow(gray)
pylab.show()
And these are the arrays:
[[255 255 255 ..., 237 237 251]
[255 255 255 ..., 237 237 251]
[255 255 255 ..., 237 237 251]
...,
[237 237 237 ..., 237 237 251]
[237 237 237 ..., 237 237 251]
[242 242 242 ..., 242 242 252]]
[[255 255 255 ..., 239 239 251]
[255 255 255 ..., 239 239 251]
[255 255 255 ..., 239 239 251]
...,
[239 239 239 ..., 239 239 251]
[239 239 239 ..., 239 239 251]
[243 243 243 ..., 243 243 252]]
[[255 255 255 ..., 234 234 250]
[255 255 255 ..., 234 234 250]
[255 255 255 ..., 234 234 250]
...,
[234 234 234 ..., 234 234 250]
[234 234 234 ..., 234 234 250]
[239 239 239 ..., 239 239 251]]
Any idea how can I fix this?
I don't have gdal installed, but a similar approach using PIL looks like this:
import numpy as np
import Image
import matplotlib.pyplot as pyplot
img = Image.open("/Users/travis/Desktop/new_zealand.tif")
img.getdata()
r, g, b = img.split()
ra = np.array(r)
ga = np.array(g)
ba = np.array(b)
gray = (0.299*ra + 0.587*ga + 0.114*ba)
pyplot.figure()
pyplot.imshow(img)
pyplot.figure()
pyplot.imshow(gray)
pyplot.figure()
pyplot.imshow(gray, cmap="gray")
It may be a simple matter of setting the color map to something besides the default ("jet") to get what you want, but I'm not sure what you're seeing.
Here are the images that are generated (don't ask me why the original is upside-down -- not sure what causes that):