I am trying to apply the below convolve method below on the cameraman image. The kernel applied to the image is a 3x3 filter populated with -1/9. I print the values of the cameraman image before applying the convolve method, and all I get are positive values. Next, when I apply the 3x3 negative kernel on the Image, I still get positive values when I print the values of the cameraman image after convolution.
The convolving function:
def convolve2d(image, kernel):
# This function which takes an image and a kernel
# and returns the convolution of them
# Args:
# image: a numpy array of size [image_height, image_width].
# kernel: a numpy array of size [kernel_height, kernel_width].
# Returns:
# a numpy array of size [image_height, image_width] (convolution output).
output = np.zeros_like(image) # convolution output
# Add zero padding to the input image
padding = int(len(kernel)/2)
image_padded=np.pad(image,((padding,padding),(padding,padding)),'constant')
for x in range(image.shape[1]): # Loop over every pixel of the image
for y in range(image.shape[0]):
# element-wise multiplication of the kernel and the image
output[y,x]=(kernel*image_padded[y:y+3,x:x+3]).sum()
return output
And here is the filter I am applying to the image:
filter2= [[-1/9,-1/9,-1/9],[-1/9,-1/9,-1/9],[-1/9,-1/9,-1/9]]
Finally, these are the intial values of the images, and the values after convolution respectively:
[[156 159 158 ... 151 152 152]
[160 154 157 ... 154 155 153]
[156 159 158 ... 151 152 152]
...
[114 132 123 ... 135 137 114]
[121 126 130 ... 133 130 113]
[121 126 130 ... 133 130 113]]
After convolution:
[[187 152 152 ... 154 155 188]
[152 99 99 ... 104 104 155]
[152 99 100 ... 103 103 154]
...
[175 133 131 ... 127 130 174]
[174 132 124 ... 125 130 175]
[202 173 164 ... 172 173 202]]
This is how I call the convolve2d method:
convolved_camManImage= convolve2d(camManImage,filter2)
This might be caused by how numpy dtypes work. As numpy.zeros_like's help says:
Return an array of zeros with the same shape and type as a given
array.
Thus your output might be dtype uint8, which use modulo arithmetics. To check if this is case add print(output.dtype) immediately after output = np.zeros_like(image) line
Related
I'm trying to make a face recognition program but the problem is the face encoding shape of some encodings are bigger than the others and thus im getting the error
ValueError: setting an array element with a sequence.
Here's my code to generate the encodings
class FaceEncoder():
def __init__(self, files, singleton = False, model_path='./models/lbpcascade_animeface.xml', scale_factor=1.1, min_neighbours=1):
self.singleton = singleton
self.files = files
self.model = model_path
self.scale_factor = scale_factor
self.min_neighbours = min_neighbours
def encode(self, singleton=False):
if self.singleton == False:
encodings = []
labels = []
for file in self.files:
cascade = cv2.CascadeClassifier(self.model)
image = cv2.imread(file)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
faces = cascade.detectMultiScale(rgb, self.scale_factor, self.min_neighbours)
if len(faces) > 0:
print('Found face in '+file)
encodings.append(faces.flatten())
labels.append(file.split('/')[2])
else:
print('Couldnt find face in '+file)
return encodings, labels
Here are some of the encodings
[204 96 211 211]
[525 168 680 680]
[205 11 269 269]
[ 165 31 316 316 1098 181 179 179]
[ 113 422 1371 1371]
[ 71 86 183 183]
[209 19 33 33 88 27 60 60 133 80 65 65 68 117 52 52]
[117 77 149 149]
[ 63 77 284 284]
[370 222 490 490]
[433 112 114 114 183 98 358 358]
[ 44 35 48 48 192 34 48 48]
[210 82 229 229]
[429 90 153 153]
[318 50 174 174 118 142 120 120]
you should not put several found rects into the same list entry.
if there are many faces found, put each on its own row, and add a label per face found (not per image)
then, what you have now, are NOT "encodings", just mere boxes / rectangles.
read up on how to get real encodings (facenet, spherenet ?), then you need to:
crop the face region fom the image
resize it to the nn input size (e.g. 96x96)
run it through the nn to receive the encoding
save that along with a label to a db/list
Given a numpy array of shape (64,64) (=an image) and an arbitrary function that takes that image as an input, I want to find the image that minimizes the function. Let's say the function computes the contrast.
Example:
import numpy as np
def contrast(X):
vmin, vmax = int(np.min(X)), int(np.max(X))
num = vmax - vmin
denom = vmax + vmin
if denom == 0:
return 0
else:
return num / denom
img = np.random.randint(256, size=(64,64), dtype=np.uint8)
res = contrast(img)
Scipy offers fmin(), but that function would not work with such a large input. Any ideas how to find
the image that minimizes the function?
Run the code in google colab.
It is by no means perfect, but you can at least get close to a local minimumĀ¹ with a simple gradient descent optimization and automatic differentiation in e.g. autograd. In order for the automatic gradient to work, you most likely will have to convert the image data to floats, do the optimization, and subsequently convert and cast back to ints. This might in principle cause you to miss minima or find wrong ones or get stuck in local ones.
1: Note that this in no way guarantees that you find a global minimum in any case where such a thing exists, this will find a minimum.
import autograd.numpy as np
from autograd import elementwise_grad
def michelson_contrast(image):
vmin, vmax = np.min(image), np.max(image)
if (vmax + vmin) > 1e-15:
return (vmax - vmin) / (vmax + vmin)
return 0
For the specific function you specified, the Michelson contrast, the optimization converges extremely slowly,
f = michelson_contrast
df = elementwise_grad(f)
img = np.random.randint(256, size=(100, 100)).astype(np.float64)
# Simple gradient descent.
for i in range(1, (max_iterations := 100000) + 1):
img -= 10**3 * df(img)
# Round and cast the image back to integer values.
img = np.round(img).astype(int)
but a 100 x 100 random test converges on my laptop in about a minute.
iter. function
--------------------------------------
0 1.0000000000
10000 0.6198908490
20000 0.4906918649
30000 0.3968742592
40000 0.3204002330
50000 0.2539835041
60000 0.1942016682
70000 0.1386916909
80000 0.0863448569
90000 0.0361678029
100000 0.0003124169
Rounded back to integers, the answer is an exact minimum with f = 0, but there of course exists many (256 of them to be exact):
[[146 146 146 ... 146 146 146]
[146 146 146 ... 146 146 146]
[146 146 146 ... 146 146 146]
...
[146 146 146 ... 146 146 146]
[146 146 146 ... 146 146 146]
[146 146 146 ... 146 146 146]]
A different example, the RMS contrast, converges much faster (less than a second)
def rms_contrast(image):
N = image.size
image_mean = np.mean(image)
return np.sum((image - image_mean)**2) / N
f = rms_contrast
df = elementwise_grad(f)
img = np.random.randint(256, size=(100, 100)).astype(np.float64)
for i in range(1, (max_iterations := 100) + 1):
img -= 10**3 * df(img)
img = np.round(img).astype(int)
with
iter. function
--------------------------------------
0 5486.3646543900
10 63.2534779216
20 0.7292629494
30 0.0084078294
40 0.0000969357
50 0.0000011176
60 0.0000000129
70 0.0000000001
80 0.0000000000
90 0.0000000000
100 0.0000000000
and resulting image (again a perfect minimum after casting back to integers).
[[126 126 126 ... 126 126 126]
[126 126 126 ... 126 126 126]
[126 126 126 ... 126 126 126]
...
[126 126 126 ... 126 126 126]
[126 126 126 ... 126 126 126]
[126 126 126 ... 126 126 126]]
Unless the function is very complicated or computationally expensive, or the input image is enormous, this approach should at least get you somewhat closer to your answer.
I have file having EncodedPixels mask of different size
1: I want to convert these EncodedPixels in binary and resize all into 1024 and then again convert in to EncodedPixels.
Explanation:
In file there is image-Mask in Encoded Pixels form, and images have different dimensions (5000x5000, 260x260 etc) So I resize all images in to 1024x1024, Now I want to resize each image-mask according to image 1024x1024.
I my mind there is only one possible solution (might be more available) to resize mask is first we need to convert run length encoding pixel in to binary and then we are able to resize mask easily.
File Link: link here
This code will use to resize binary mask.
from PIL import Image
import numpy as np
pil_image = Image.fromarray(binary_mask)
pil_image = pil_image.resize((new_width, new_height), Image.NEAREST)
resized_binary_mask = np.asarray(pil_image)
Encoded Pixels Example
['6068157 7 6073371 20 6078584 34 6083797 48 6089010 62 6094223 72 6099436 76 6104649 80
6109862 85 6115075 89 6120288 93 6125501 98 6130714 102 6135927 106 6141140 111 6146354 114 6151567 118 6156780 123 6161993 127 6167206 131 6172419 136 6177632 140 6182845 144 6188058 149 6193271 153 6198484 157 6203697 162 6208910 166 6214124 169 6219337 174 6224550 178 6229763 182 6234976 187 6240189 191 6245402 195 6250615 200 6255828 204 6261041 208 6266254 213 6271467 218 6276680 224 6281893 229 6287107 233 6292320 238 6297533 244 6302746 249 6307959 254 6313172 259 6318385 265 6323598 270 6328811 275 6334024 280 6339237 286 6344450 291 6349663 296 6354877 300 6360090 306 6365303 311 6370516 316 6375729 322 6380942 327 6386155 332 6391368 337 6396581 343 6401794 348 6407007 353 6412220 358 6417433 364 6422647 368 6427860 373 6433073 378 6438286 384 6443499 389 6448712 394 6453925 399 6459138 405 6464351 410 6469564 415 6474777 420 6479990 426 17204187 78 17208797 227 17209412 56 17214025 203 17214637 34 17219253 179 17219862 11 17224481 155 17229709 131 17234937 107 17240165 83 17245393 60 17250621 36 17255849 12']
How can I load the RGB matrix of an image. Basically, if I have a 224x224 image(grayscale), I need it's RGB matrix so I want a 224x224 matrix consisting of 3 element tuples. I have tried:
f="/path/to/grayscale/image"
image = Image.open(f)
new_width = 224
new_height = 224
im = image.resize((new_width, new_height), Image.ANTIALIAS)
im=np.array(im)
print(im)
and it prints:
[[195 195 195 ..., 101 104 105]
[195 195 195 ..., 102 105 106]
[194 194 194 ..., 104 109 111]
...,
[137 138 140 ..., 209 207 206]
[133 134 136 ..., 209 207 206]
[132 133 135 ..., 209 207 206]]
After some testing, I realised that it was because of the image being grayscale. How can I load the RGB matrix of a grayscale image?
I am not proficien in PIL, but it looks there is an image.Convert("RGB") method that may or may not work, so give it a try.
However, if your intention is to continue using np.array then the following will work:
im=np.array(im)
imRGB = np.repeat(im[:, :, np.newaxis], 3, axis=2)
Basically it repeats the input np.array into a 3rd new axis, 3 times.
imRGB[:,:,0] is the Red channel
imRGB[:,:,1] is the Green channel
imRGB[:,:,2] is the Blue channel
Currently I am iterating over one array and for each value in this array I am looking for the closest value at the corresponding point in another array that is within a region surrounding the corresponding point.
In summary: For any point in an array, how far away from a corresponding point in another array do you need to go to get the same value.
The code seems to work well for small arrays, however I am working now with 1024x768 arrays, leading me to wait a long time for each run....
Any help or advice would be greatly appreciated as I have been on this for a while!!
Example matrix in format Im using: np.array[[1,2],[3,4]]
#Distance to agreement
#Used later to define a region of pixels around a corresponding point
#to iterate over:
DTA = 26
#To account for noise in pixels - doesnt have to find the exact value,
#just one within +/-130 of it.
limit = 130
#Containers for all pixel value matches and also the smallest distance
#to pixel match
Dist = []
Dist_min = []
#Continer matrix for gamma pass/fail values
Dist_to_agree = np.zeros((i_size,j_size))
#i,j indexes the reference matrix (x), ii,jj indexes the measured
#matrix(y). Finds a match within the limits,
#appends the distance to the match into Dist.
#Then find the minimum distance to a match for that pixel and append it
#to dist_min
for i, k in enumerate(x):
for j, l in enumerate(k):
#added 10 packing to y matrix, so need to shift it by 10 in i&j
for ii in range((i+10)-DTA,(i+10)+DTA):
for jj in range((j+10)-DTA,(j+10)+DTA):
#If the pixel value is within a range to account for noise,
#let it be "found"
if (y[ii,jj]-limit) <= x[i,j] <= (y[ii,jj]+limit):
#Calculating distance
dist_eu = sqrt(((i)-(ii))**2 + ((j) - (jj))**2)
Dist.append(dist_eu)
#If a value cannot be found within the noise range,
#append 10 = instant fail.
else:
Dist.append(10)
try:
Dist_min.append(min(Dist))
Dist_to_agree[i,j] = min(Dist)
except ValueError:
pass
#Need to reset container or previous values will also be
#accounted for when finding minimum
Dist = []
print Dist_to_agree
First, you are getting the elements of x in k and l, but then throwing that away and indexing x again. So in place of x[i,j], you could just use l, which would be much faster (although l isn't a very meaningful name, something like xi and xij might be better).
Second, you are recomputing y[ii,jj]-limit and y[ii,jj]+limitevery time. If you have enough memory, you can-precomputer these:ym = y-limitandyp = y+limit`.
Third, appending to a list is slower than creating an array and setting the values for long lists vs. long arrays. You can also skip the entire else clause by pre-setting the default value.
Fourth, you are computing min(dist) twice, and further may be using the python version rather than the numpy version, the latter being faster for arrays (which is another reason to make dist and array).
However, the biggest speedup would be to vectorize the inner two loops. Here is my tests, with x=np.random.random((10,10)) and y=np.random.random((100,100)):
Your version takes 623 ms.
Here is my version, which takes 7.6 ms:
dta = 26
limit = 130
dist_to_agree = np.zeros_like(x)
dist_min = []
ym = y-limit
yp = y+limit
for i, xi in enumerate(x):
irange = (i-np.arange(i+10-dta, i+10+dta))**2
if not irange.size:
continue
ymi = ym[i+10-dta:i+10+dta, :]
ypi = yp[i+10-dta:i+10+dta, :]
for j, xij in enumerate(xi):
jrange = (j-np.arange(j+10-dta, j+10+dta))**2
if not jrange.size:
continue
ymij = ymi[:, j+10-dta:j+10+dta]
ypij = ypi[:, j+10-dta:j+10+dta]
imesh, jmesh = np.meshgrid(irange, jrange, indexing='ij')
dist = np.sqrt(imesh+jmesh)
dist[ymij > xij or xij < ypij] = 10
mindist = dist.min()
dist_min.append(mindist)
dist_to_agree[i,j] = mindist
print(dist_to_agree)
#Ciaran
Meshgrid is kinda a vectorized equivalent of two nested loops. Below are two equivalent ways of calculating the dist. One with loops and one with meshgrid+numpy vector operations. The second one is six times faster.
DTA=5
i=100
j=200
def func1():
dist1=np.zeros((DTA*2,DTA*2))
for ii in range((i+10)-DTA,(i+10)+DTA):
for jj in range((j+10)-DTA,(j+10)+DTA):
dist1[ii-((i+10)-DTA),jj-((j+10)-DTA)] =sqrt(((i)-(ii))**2 + ((j) - (jj))**2)
return dist1
def func2():
dist2=np.zeros((DTA*2,DTA*2))
ii, jj = meshgrid(np.arange((i+10)-DTA,(i+10)+DTA),
np.arange((j+10)-DTA,(j+10)+DTA))
dist2=np.sqrt((i-ii)**2+(j-jj)**2)
return dist2
This is how ii and jj matrices look after meshgrid operation
ii=
[[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]]
jj=
[[205 205 205 205 205 205 205 205 205 205]
[206 206 206 206 206 206 206 206 206 206]
[207 207 207 207 207 207 207 207 207 207]
[208 208 208 208 208 208 208 208 208 208]
[209 209 209 209 209 209 209 209 209 209]
[210 210 210 210 210 210 210 210 210 210]
[211 211 211 211 211 211 211 211 211 211]
[212 212 212 212 212 212 212 212 212 212]
[213 213 213 213 213 213 213 213 213 213]
[214 214 214 214 214 214 214 214 214 214]]
for loops are very slow in pure python and you have four nested loops which will be very slow. Cython does wonders to the for loop speed. You can also try vectorization. While I'm not sure I fully understand what you are trying to do, you may try to vectorize at last some of the operations. Especially the last two loops.
So instead of two ii,jj cycles over
y[ii,jj]-limit) <= x[i,j] <= (y[ii,jj]+limit)
you can do something like
ii, jj = meshgrid(np.arange((i+10)-DTA,(i+10)+DTA), np.arange((j+10)-DTA,(j+10)+DTA))
t=(y[(i+10)-DTA,(i+10)+DTA]-limit>=x[i,j]) & (y[(i+10)-DTA,(i+10)+DTA]+limit<=x[i,j])
Dist=np.sqrt((i-ii)**2)+(j-jj)**2))
np.min(Dist[t]) will have your minimum distance for element i,j
The numbapro compiler offers gpu Acceleration. Unfortunately it isn't free.
http://docs.continuum.io/numbapro/