PPM file data cannot be recognized - python

I'm writing a simple picture editor. It uses PPM files. From what I can tell, I feel like my code should work. However, I get this error
Traceback (most recent call last):
File "/home/zach/Downloads/piceditor (1).py", line 84, in <module>
main()
File "/home/zach/Downloads/piceditor (1).py", line 69, in main
image = Image(Point(100,100), filename)
File "/home/zach/Downloads/graphics.py", line 770, in __init__
self.img = tk.PhotoImage(file=pixmap[0], master=_root)
File "/usr/lib/python3.1/tkinter/__init__.py", line 3272, in __init__
Image.__init__(self, 'photo', name, cnf, master, **kw)
File "/usr/lib/python3.1/tkinter/__init__.py", line 3228, in __init__
self.tk.call(('image', 'create', imgtype, name,) + options)
_tkinter.TclError: couldn't recognize data in image file "pig.ppm"
My code looks like this
def main():
print("Image Editor")
print()
filename = input("name of image file: ")
print()
with open(filename) as f:
formatind = f.readline()
width, height = [int(x) for x in f.readline().split()]
colordepth = f.readline()
array = []
for line in f:
array.append([int(x) for x in line.split()])
win = GraphWin("Image Editor!", width, height)
image = Image(Point(100,100), filename)
Display(image, array, width, height, win)
inf.close()
win.getMouse()
win.close()
main()
And my Display function looks like this
def Display(image, array, width, height, win):
for i in range(width):
for j in range(0, height, 3):
colors = color_rgb(array[i][j], array[i][j+1], array[i][j+2])
image.setPixel(i, j, colors)
image.draw(win)
return
This is the ppm file i'm using
P3
6 8
255
249 249 249 255 255 255 250 250 250 255 255 255 250 250 250 250 250 250 254 255 255 251 255 255
249 251 255 253 249 255 255 248 255 255 234 255 255 242 255 255 245 253 255 246 243 255 253 241
255 255 237 255 255 237 252 255 241 249 255 246 249 255 253 254 255 255 255 252 255 255 248 241
255 251 239 254 247 241 252 254 253 252 255 255 251 255 255 242 242 242 255 255 255 241 241 241
0 0 0 0 0 0 4 4 4 20 20 20 236 236 236 252 252 252 254 255 253 248 255 250
0 0 0 0 0 0 4 4 4 20 20 20 236 236 236 252 252 252 254 255 253 248 255 250
I cannot for the life of me figure out why it won't recognize the data in the file.
Any help would be great. Thanks

Why don't you use the PIL library? In the documents it claims that it can work with PPM files. However I am not familiar with working with PPM files with PIL.
Example: Opening a a PPM file, creating an object from the file that then can be used to edit the file.

Related

Shape of face encodings differ

I'm trying to make a face recognition program but the problem is the face encoding shape of some encodings are bigger than the others and thus im getting the error
ValueError: setting an array element with a sequence.
Here's my code to generate the encodings
class FaceEncoder():
def __init__(self, files, singleton = False, model_path='./models/lbpcascade_animeface.xml', scale_factor=1.1, min_neighbours=1):
self.singleton = singleton
self.files = files
self.model = model_path
self.scale_factor = scale_factor
self.min_neighbours = min_neighbours
def encode(self, singleton=False):
if self.singleton == False:
encodings = []
labels = []
for file in self.files:
cascade = cv2.CascadeClassifier(self.model)
image = cv2.imread(file)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
faces = cascade.detectMultiScale(rgb, self.scale_factor, self.min_neighbours)
if len(faces) > 0:
print('Found face in '+file)
encodings.append(faces.flatten())
labels.append(file.split('/')[2])
else:
print('Couldnt find face in '+file)
return encodings, labels
Here are some of the encodings
[204 96 211 211]
[525 168 680 680]
[205 11 269 269]
[ 165 31 316 316 1098 181 179 179]
[ 113 422 1371 1371]
[ 71 86 183 183]
[209 19 33 33 88 27 60 60 133 80 65 65 68 117 52 52]
[117 77 149 149]
[ 63 77 284 284]
[370 222 490 490]
[433 112 114 114 183 98 358 358]
[ 44 35 48 48 192 34 48 48]
[210 82 229 229]
[429 90 153 153]
[318 50 174 174 118 142 120 120]
you should not put several found rects into the same list entry.
if there are many faces found, put each on its own row, and add a label per face found (not per image)
then, what you have now, are NOT "encodings", just mere boxes / rectangles.
read up on how to get real encodings (facenet, spherenet ?), then you need to:
crop the face region fom the image
resize it to the nn input size (e.g. 96x96)
run it through the nn to receive the encoding
save that along with a label to a db/list

How to obtain the best result from pytesseract?

I'm trying to read text from an image, using OpenCV and Pytesseract, but with poor results.
The image I'm interested in reading the text is: https://www.lubecreostorepratolapeligna.it/gb/img/logo.png
This is the code I am using:
pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
image = cv2.imread(path_to_image)
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
cv2.imshow('threshold image', threshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
The result of the execution is : ["cu"," ","LUBE"," ","STORE","PRATOLA PELIGNA"]
But the result should be these 7 words: ["cucine", "LUBE", "CREO", "kitchens", "STORE", "PRATOLA", "PELIGNA"]
Is there anyone who could help me to solve this problem ?
Edit, 17.12.2020: Using preprocessing now it recognizes all, but the "O" in CREO. See the stages in ocr8.py. Then ocr9.py demonstrates (but not automated yet) finding the lines of text by the coordinates returned from pytesseract.image_to_boxes(), approcimate size of the letters and inter-symbol distance, then extrapolating one step ahead and searching for a single character (--psm 8).
It happened that Tesseract had actually recognized the "O" in CREO, but it read it as ♀, probably confused by the little "k" below etc.
Since it is a rare and "strange"/unexpected symbol, it could be corrected - replaced automatically (see the function Correct()).
There is a technical detail: Tesseract returns the ANSI/ASCII symbol 12, (0x0C) while the code in my editor was in Unicode/UTF-8 - 9792. So I coded it inside as chr(12).
The latest version: ocr9.py
You mentioned that PRATOLA and PELIGNA have to be given sepearately - just split by " ":
splitted = text.split(" ")
RECOGNIZED
CUCINE
LUBE
STORE
PRATOLA PELIGNA
CRE [+O with correction and extrapolation of the line]
KITCHENS
...
C 39 211 47 221 0
U 62 211 69 221 0
C 84 211 92 221 0
I 107 211 108 221 0
N 123 211 131 221 0
E 146 211 153 221 0
L 39 108 59 166 0
U 63 107 93 166 0
B 98 108 128 166 0
E 133 108 152 166 0
S 440 134 468 173 0
T 470 135 499 173 0
O 500 134 539 174 0
R 544 135 575 173 0
E 580 135 608 173 0
P 287 76 315 114 0
R 319 76 350 114 0
A 352 76 390 114 0
T 387 76 417 114 0
O 417 75 456 115 0
L 461 76 487 114 0
A 489 76 526 114 0
P 543 76 572 114 0
E 576 76 604 114 0
L 609 76 634 114 0
I 639 76 643 114 0
G 649 75 683 115 0
N 690 76 722 114 0
A 726 76 764 114 0
C 21 30 55 65 0
R 62 31 93 64 0
E 99 31 127 64 0
K 47 19 52 25 0
I 61 19 62 25 0
T 71 19 76 25 0
C 84 19 89 25 0
H 96 19 109 25 0
E 113 19 117 25 0
N 127 19 132 25 0
S 141 19 145 22 0
These are from getting "boxes".
Initial message:
I guess that for the area where "cucine" is, an adaptive threshold may segment it better or maybe applying some edge detection first.
Kitchens seems very small, what about trying to enlarge that area/distance.
For the CREO, I guess it's confused with the big and small size of adjacent captions.
For the "O" in creo, you may apply dilate in order to close the gap of the "O".
Edit: I played a bit, but without Tesseract and it needs more work. My goal was to make the letters more contrasting, may need some of these processings to be applied selectively only on the Cucine, maybe applying the recognition in two passes. When getting those partial words "Cu", apply adaptive threshold etc. (below) and OCR on a top rectangle around "CU..."
Binary Threshold:
Adaptive Threshold, Median blur (to clean noise) and invert:
Dilate connects small gaps, but it also destroys detail.
import cv2
import numpy as np
#pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
path_to_image = "logo.png"
#path_to_image = "logo1.png"
image = cv2.imread(path_to_image)
h, w, _ = image.shape
w*=3; h*=3
w = (int)(w); h = (int) (h)
image = cv2.resize(image, (w,h), interpolation = cv2.INTER_AREA) #Resize 3 times
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
#threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
threshold_img = cv2.adaptiveThreshold(gray_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,13,3) #cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11,2)[1]
cv2.imshow('threshold image', threshold_img)
cv2.waitKey(0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
threshold_img = cv2.medianBlur(threshold_img,5)
cv2.imshow('medianBlur', threshold_img)
cv2.waitKey(0)
threshold_img = cv2.bitwise_not(threshold_img)
cv2.imshow('Invert', threshold_img)
cv2.waitKey(0)
#kernel = np.ones((1, 1), np.uint8)
#threshold_img = cv2.dilate(threshold_img, kernel)
#cv2.imshow('Dilate', threshold_img)
#cv2.waitKey(0)
cv2.imshow('threshold image', thrfeshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)

How to convert image Run length Encoded Pixels mask to binary mask and reshape in python?

I have file having EncodedPixels mask of different size
1: I want to convert these EncodedPixels in binary and resize all into 1024 and then again convert in to EncodedPixels.
Explanation:
In file there is image-Mask in Encoded Pixels form, and images have different dimensions (5000x5000, 260x260 etc) So I resize all images in to 1024x1024, Now I want to resize each image-mask according to image 1024x1024.
I my mind there is only one possible solution (might be more available) to resize mask is first we need to convert run length encoding pixel in to binary and then we are able to resize mask easily.
File Link: link here
This code will use to resize binary mask.
from PIL import Image
import numpy as np
pil_image = Image.fromarray(binary_mask)
pil_image = pil_image.resize((new_width, new_height), Image.NEAREST)
resized_binary_mask = np.asarray(pil_image)
Encoded Pixels Example
['6068157 7 6073371 20 6078584 34 6083797 48 6089010 62 6094223 72 6099436 76 6104649 80
6109862 85 6115075 89 6120288 93 6125501 98 6130714 102 6135927 106 6141140 111 6146354 114 6151567 118 6156780 123 6161993 127 6167206 131 6172419 136 6177632 140 6182845 144 6188058 149 6193271 153 6198484 157 6203697 162 6208910 166 6214124 169 6219337 174 6224550 178 6229763 182 6234976 187 6240189 191 6245402 195 6250615 200 6255828 204 6261041 208 6266254 213 6271467 218 6276680 224 6281893 229 6287107 233 6292320 238 6297533 244 6302746 249 6307959 254 6313172 259 6318385 265 6323598 270 6328811 275 6334024 280 6339237 286 6344450 291 6349663 296 6354877 300 6360090 306 6365303 311 6370516 316 6375729 322 6380942 327 6386155 332 6391368 337 6396581 343 6401794 348 6407007 353 6412220 358 6417433 364 6422647 368 6427860 373 6433073 378 6438286 384 6443499 389 6448712 394 6453925 399 6459138 405 6464351 410 6469564 415 6474777 420 6479990 426 17204187 78 17208797 227 17209412 56 17214025 203 17214637 34 17219253 179 17219862 11 17224481 155 17229709 131 17234937 107 17240165 83 17245393 60 17250621 36 17255849 12']

Parallel processing image analyzer function in Python

I have created a function, imgs_to_df() (which relies on img_to_vec()) that takes a list of URLs that point to a JPG (e.g. https://live.staticflickr.com/65535/48123413937_54bb53e98b_o.jpg), resizes it, and converts the URLs to a dataframe of RGB values, where each row is a different image, and each column is the R, G, or B value of the pixel of the (resized) image.
However, the function is very slow, especially once it gets into lists of hundreds or thousands of links, so I need a way to parallelize or otherwise make the process much, much faster. I'd also like to ensure there is a way to easily to match the URLs back with the RGB vectors after I'm done.
I am very new to parallel processing and everything I have read is just confusing me even more.
from PIL import Image
from io import BytesIO
import urllib.request
import requests
import numpy as np
import pandas as pd
def img_to_vec(jpg_url, resize=True, new_width=300, new_height=300):
""" Takes a URL of an image, resizes it (optional), and converts it to a
vector representing RGB values.
Parameters
----------
jpg_url: String. A URL that points to a JPG image.
resize: Boolean. Default True. Whether image should be resized before calculating RGB.
new_width: Int. Default 300. New width to convert image to before calculating RGB.
new_height: Int. Default 300. New height to conver image to before calculating RGB.
Returns
-------
rgb_vec: Vector of size 3*new_width*new_height for the RGB values in each pixel of the image.
"""
response = requests.get(jpg_url) # Create remote image connection
img = Image.open(BytesIO(response.content)) # Save image connection (NOT actual image)
if resize:
img = img.resize((new_width, new_height))
rgb_img = np.array(img) # Create matrix of RGB values
rgb_vec = rgb_img.ravel() # Flatten 3D matrix of RGB values to a vector
return rgb_vec
# Consider parallel processing here
def imgs_to_df(jpg_urls, common_width=300, common_height=300):
""" Takes a list of jpg_urls and converts it to a dataframe of RGB values.
Parameters
----------
jpg_urls: A list of jpg_urls to be resized and converted to a dataframe of RGB values.
common_width: Int. Default 300. New width to convert all images to before calculating RGB.
common_height: Int. Default 300. New height to convert all images to before calculating RGB.
Returns
-------
rgb_df: Pandas dataframe of dimensions len(jpg_urls) rows and common_width*common_height*3
columns. Each row is a unique jpeg image, and each column is an R/G/B value of
a particular pixel of the resized image
"""
assert common_width>0 and common_height>0, 'Error: invalid new_width or new_height dimensions'
for url_idx in range(len(jpg_urls)):
if url_idx % 100 == 0:
print('Converting url number {urlnum} of {urltotal} to RGB '.format(urlnum=url_idx, urltotal=len(jpg_urls)))
try:
img_i = img_to_vec(jpg_urls[url_idx])
if url_idx == 0:
vecs = img_i
else:
try:
vecs = np.vstack((vecs, img_i))
except:
vecs = np.vstack((vecs, np.array([-1]*common_width*common_height*3)))
print('Warning: Error in converting {error_url} to RGB'.format(error_url=jpg_urls[url_idx]))
except:
vvecs = np.vstack((vecs, np.array([-1]*common_width*common_height*3)))
print('Warning: Error in converting {error_url} to RGB'.format(error_url=jpg_urls[url_idx]))
rgb_df = pd.DataFrame(vecs)
return rgb_df
You can use a ThreadPool as your task is I/O bound.
I'm using concurrent.futures. Your function needs to be re-written so that it takes a single URL and makes it to a df.
I added two snippets, one just simply uses loops and another uses Threading. The second one is much much faster.
from PIL import Image
from io import BytesIO
import urllib.request
import requests
import numpy as np
import pandas as pd
def img_to_vec(jpg_url, resize=True, new_width=300, new_height=300):
""" Takes a URL of an image, resizes it (optional), and converts it to a
vector representing RGB values.
Parameters
----------
jpg_url: String. A URL that points to a JPG image.
resize: Boolean. Default True. Whether image should be resized before calculating RGB.
new_width: Int. Default 300. New width to convert image to before calculating RGB.
new_height: Int. Default 300. New height to conver image to before calculating RGB.
Returns
-------
rgb_vec: Vector of size 3*new_width*new_height for the RGB values in each pixel of the image.
"""
response = requests.get(jpg_url) # Create remote image connection
img = Image.open(BytesIO(response.content)) # Save image connection (NOT actual image)
if resize:
img = img.resize((new_width, new_height))
rgb_img = np.array(img) # Create matrix of RGB values
rgb_vec = rgb_img.ravel() # Flatten 3D matrix of RGB values to a vector
return rgb_vec
# Consider parallel processing here
def imgs_to_df(jpg_url, common_width=300, common_height=300):
assert common_width>0 and common_height>0, 'Error: invalid new_width or new_height dimensions'
try:
img_i = img_to_vec(jpg_url)
vecs = img_i
try:
vecs = np.vstack((vecs, img_i))
except:
vecs = np.vstack((vecs, np.array([-1]*common_width*common_height*3)))
print('Warning: Error in converting {error_url} to RGB'.format(error_url=jpg_urls[url_idx]))
except:
print('failed')
rgb_df = pd.DataFrame(vecs)
return rgb_df
img_urls = ['https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/1200px-Flower_poster_2.jpg', 'https://www.tiltedtulipflorist.com/assets/1/14/DimFeatured/159229xL_HR_fd_3_6_17.jpg?114702&value=217',
'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/1200px-Flower_poster_2.jpg', 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/1200px-Flower_poster_2.jpg']
import time
t1 = time.time()
dfs = []
for iu in img_urls:
df = imgs_to_df(iu)
dfs.append(df)
t2 = time.time()
print(t2-t1)
print(dfs)
# aprroach with multi-threading
import concurrent.futures
t1 = time.time()
with concurrent.futures.ThreadPoolExecutor() as executor:
dfs = [df for df in executor.map(imgs_to_df, img_urls)]
t2 = time.time()
print(t2-t1)
print(dfs)
Out:
3.540484666824341
[ 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 255 255 255 255 ... 93 155 119 97
1 255 255 255 255 ... 93 155 119 97
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns]]
1.2170848846435547
[ 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 255 255 255 255 ... 93 155 119 97
1 255 255 255 255 ... 93 155 119 97
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns], 0 1 2 3 ... 269996 269997 269998 269999
0 240 240 237 251 ... 247 243 243 243
1 240 240 237 251 ... 247 243 243 243
[2 rows x 270000 columns]]

Scipy: Convert RGB TIFF to grayscale TIFF and output it on Matplotlib

I want to manipulate RGB bands in a TIFF file and output the grayscale map on matplotlib. So far I have this code, but I couldn't get it on grayscale:
import scipy as N
import gdal
import sys
import matplotlib.pyplot as pyplot
tif = gdal.Open('filename.tif')
band1 = tif.GetRasterBand(1)
band2 = tif.GetRasterBand(2)
band3 = tif.GetRasterBand(3)
red = band1.ReadAsArray()
green = band2.ReadAsArray()
blue = band3.ReadAsArray()
gray = (0.299*red + 0.587*green + 0.114*blue)
pyplot.figure()
pyplot.imshow(gray)
pylab.show()
And these are the arrays:
[[255 255 255 ..., 237 237 251]
[255 255 255 ..., 237 237 251]
[255 255 255 ..., 237 237 251]
...,
[237 237 237 ..., 237 237 251]
[237 237 237 ..., 237 237 251]
[242 242 242 ..., 242 242 252]]
[[255 255 255 ..., 239 239 251]
[255 255 255 ..., 239 239 251]
[255 255 255 ..., 239 239 251]
...,
[239 239 239 ..., 239 239 251]
[239 239 239 ..., 239 239 251]
[243 243 243 ..., 243 243 252]]
[[255 255 255 ..., 234 234 250]
[255 255 255 ..., 234 234 250]
[255 255 255 ..., 234 234 250]
...,
[234 234 234 ..., 234 234 250]
[234 234 234 ..., 234 234 250]
[239 239 239 ..., 239 239 251]]
Any idea how can I fix this?
I don't have gdal installed, but a similar approach using PIL looks like this:
import numpy as np
import Image
import matplotlib.pyplot as pyplot
img = Image.open("/Users/travis/Desktop/new_zealand.tif")
img.getdata()
r, g, b = img.split()
ra = np.array(r)
ga = np.array(g)
ba = np.array(b)
gray = (0.299*ra + 0.587*ga + 0.114*ba)
pyplot.figure()
pyplot.imshow(img)
pyplot.figure()
pyplot.imshow(gray)
pyplot.figure()
pyplot.imshow(gray, cmap="gray")
It may be a simple matter of setting the color map to something besides the default ("jet") to get what you want, but I'm not sure what you're seeing.
Here are the images that are generated (don't ask me why the original is upside-down -- not sure what causes that):

Categories

Resources