I'm trying to get an array with the alpha channel value of a sprite image, using Pyglet library.
I wrote this code which actually works:
mask = []
for x in range(image.width):
mask.append([])
for y in range(image.height):
mask[x].append(ord(image.get_region(x, y, 1, 1).get_image_data().get_data("RGBA", 4)[3]))
return mask
The only problem is that it's really really slow, I guess I'm using a wrong function to retrieve the alpha channel. What can I do to make it faster?
UPDATE
I find a solution with the following code which is faster:
rawimage = image.get_image_data()
format = 'RGBA'
pitch = rawimage.width * len(format)
pixels = rawimage.get_data(format, pitch)
data = unpack("%iB" % (4 * image.width * image.height), pixels)
mask = data[3::4]
return mask
I don't know pyglet, but I'm guessing the performance issue is related to the many queries for only a single pixel. Instead you want to get the entire image from the GPU in just one call, including the colour and alpha values, and then extract just the alpha. I'd also use struct.unpack instead of ord().
Note: This code is untested and purely based on the example in the question. There is probably a better way.
from struct import unpack
...
region = image.get_region(0, 0, image.width, image.height)
packed_data = region.get_image_data().get_data("RGBA", 4)
data = unpack("%iB" % (4 * image.width * image.height), packed_data)
mask = data[3::4]
I don't think it'd be worth it but if you really didn't want to drag colour back from the GPU you could explore copying alpha to another texture first (there might even be a way to get GL to unpack it as a format conversion or reinterpretation).
Related
I would like to know how to read a HDR image (.hdr) by obtaining pixel values in the RGBE format quickly and efficiently in Python.
These are somethings I tried:
import imageio
img = imageio.imread(hdr_path, format="HDR-FI")
alternatively:
import cv2
img = cv2.imread(hdr_path, flags=cv2.IMREAD_ANYDEPTH)
This read the image, but gave the values in a RGB format.
How do you obtain the 4rth channel, the "E" channel for every pixel, without altered RGB values?
I would prefer a solution involving only imageio, as i am restricted to use only that module.
If you prefer the RGBE representation over the float representation you can convert between the two
def float_to_rgbe(image, *, channel_axis=-1):
# ensure channel-last
image = np.moveaxis(image, channel_axis, -1)
max_float = np.max(image, axis=-1)
scale, exponent = np.frexp(max_float)
scale *= 256.0/max_float
image_rgbe = np.empty((*image.shape[:-1], 4)
image_rgbe[..., :3] = image * scale
image_rgbe[..., -1] = exponent + 128
image_rgbe[scale < 1e-32, :] = 0
# restore original axis order
image_rgbe = np.moveaxis(image_rgbe, -1, channel_axis)
return image_rgbe
(Note: this is based on the RGBE reference implementation (found here) and can be further optimized if it actually is the bottleneck.)
In your comment, you mention "If i parse the numpy array manually and split the channels into an E channel, it takes too much time...", but it is hard to tell why that is the case without seeing the code. The above is O(height*width), which seems reasonable for a pixel-level image processing method.
a screenshot of the img values2[this is the original]3[this is the expected output]this is the output I getI'm trying to stretch the grey levels from 0-100 to 50-200 in python but the output image is not right.
I drew the straight line representing the linear relationship between the two ranges, and in line 8 I'm using this equation to get the output.
What's wrong with my code?
This is my first question, so sorry for mistakes.
def Contrast_enhancement(img):
newimg = img
height = img.shape[0]
width = img.shape[1]
for i in range(height):
for j in range(width):
if(img[i][j] * 255 >= 0 and img[i][j] * 255 <= 100):
newimg[i][j] = (((3/2) * (img[i][j] * 255)) + 50)/255
return newimg
import numpy as np
import copy
def Contrast_enhancement(img):
newimg = np.array(copy.deepcopy(img)) #this makes a real copy of img, if you dont, any change to img will change newimg too
temp_img=np.array(copy.deepcopy(img))*3/2+50/255
newimg = np.where(newimg<=100,temp_img,newimg)
return newimg
or shorter:
import numpy as np
import copy
def Contrast_enhancement(img):
newimg = np.array(copy.deepcopy(img)) #this makes a real copy of img, if you dont, any change to img will change newimg too
newimg = np.where(newimg<=100,newimg*3/2+50/255,newimg)
return newimg
The copy part should be solving your problem and the numpy part is just to speed things up. Np.where returns temp_img if newimg is <=100 and newimg if not.
There are two answers to your question:
The one is strictly technical (the one that #DonQuiKong tries to answer) referring to how to do the stretching you refer to simpler or correctly.
The other one is implicit and tries to answer you actual problem of image stretching.
I am focusing on the second case here. Judging from the image sample you provided you are not taking the correct approach. Let's consider the samples you provided indeed have all intensity values between 0-100 (from screen capturing in my pc they don't but that's screen dependent to a degree). Your method seems correct and should work with minor bugs.
1) A minor bug for example is that:
newimg = img
does not do what you think it does. It does creates an alias of the original variable. Use:
newimg = img.copy()
instead.
2) If an image with different boundaries come to you your code is broken. It will ignore some pixels for some reason and that is not you wanted I guess.
3) The stretching you want can be applied to the whole image in that case using something like:
newimg -= np.min(newimg)
newimg /= np.max(newimg)
which just stretches your intensities to the 0-255 boundary.
4) Judging from your sample images also you need a more radical stretching (which will sacrifice a bit of image information to increase image contrast). Instead of the above you can use a lower limit:
newimg -= np.min(newimg)
newimg /= (np.max(newimg) * 0.5)
This effectively "burns" some pixels but in your case the result looks more close to your desired one. Apart from that you can apply a non linear mapping (a logarithmic one for example) of old intensities to new ones and you won't get any "burned" pixels.
A sample with value 0.5:
I have a problem with FFT implementation in Python. I have completely strange results.
Ok so, I want to open image, get value of every pixel in RGB, then I need to use fft on it, and convert to image again.
My steps:
1) I'm opening image with PIL library in Python like this
from PIL import Image
im = Image.open("test.png")
2) I'm getting pixels
pixels = list(im.getdata())
3) I'm seperate every pixel to r,g,b values
for x in range(width):
for y in range(height):
r,g,b = pixels[x*width+y]
red[x][y] = r
green[x][y] = g
blue[x][y] = b
4). Let's assume that I have one pixel (111,111,111). And use fft on all red values like this
red = np.fft.fft(red)
And then:
print (red[0][0], green[0][0], blue[0][0])
My output is:
(53866+0j) 111 111
It's completely wrong I think. My image is 64x64, and FFT from gimp is completely different. Actually, my FFT give me only arrays with huge values, thats why my output image is black.
Do you have any idea where is problem?
[EDIT]
I've changed as suggested to
red= np.fft.fft2(red)
And after that I scale it
scale = 1/(width*height)
red= abs(red* scale)
And still, I'm getting only black image.
[EDIT2]
Ok, so lets take one image.
Assume that I dont want to open it and save as greyscale image. So I'm doing like this.
def getGray(pixel):
r,g,b = pixel
return (r+g+b)/3
im = Image.open("test.png")
im.load()
pixels = list(im.getdata())
width, height = im.size
for x in range(width):
for y in range(height):
greyscale[x][y] = getGray(pixels[x*width+y])
data = []
for x in range(width):
for y in range(height):
pix = greyscale[x][y]
data.append(pix)
img = Image.new("L", (width,height), "white")
img.putdata(data)
img.save('out.png')
After this, I'm getting this image , which is ok. So now, I want to make fft on my image before I'll save it to new one, so I'm doing like this
scale = 1/(width*height)
greyscale = np.fft.fft2(greyscale)
greyscale = abs(greyscale * scale)
after loading it. After saving it to file, I have . So lets try now open test.png with gimp and use FFT filter plugin. I'm getting this image, which is correct
How I can handle it?
Great question. I’ve never heard of it but the Gimp Fourier plugin seems really neat:
A simple plug-in to do fourier transform on you image. The major advantage of this plugin is to be able to work with the transformed image inside GIMP. You can so draw or apply filters in fourier space, and get the modified image with an inverse FFT.
This idea—of doing Gimp-style manipulation on frequency-domain data and transforming back to an image—is very cool! Despite years of working with FFTs, I’ve never thought about doing this. Instead of messing with Gimp plugins and C executables and ugliness, let’s do this in Python!
Caveat. I experimented with a number of ways to do this, attempting to get something close to the output Gimp Fourier image (gray with moiré pattern) from the original input image, but I simply couldn’t. The Gimp image appears to be somewhat symmetric around the middle of the image, but it’s not flipped vertically or horizontally, nor is it transpose-symmetric. I’d expect the plugin to be using a real 2D FFT to transform an H×W image into a H×W array of real-valued data in the frequency domain, in which case there would be no symmetry (it’s just the to-complex FFT that’s conjugate-symmetric for real-valued inputs like images). So I gave up trying to reverse-engineer what the Gimp plugin is doing and looked at how I’d do this from scratch.
The code. Very simple: read an image, apply scipy.fftpack.rfft in the leading two dimensions to get the “frequency-image”, rescale to 0–255, and save.
Note how this is different from the other answers! No grayscaling—the 2D real-to-real FFT happens independently on all three channels. No abs needed: the frequency-domain image can legitimately have negative values, and if you make them positive, you can’t recover your original image. (Also a nice feature: no compromises on image size. The size of the array remains the same before and after the FFT, whether the width/height is even or odd.)
from PIL import Image
import numpy as np
import scipy.fftpack as fp
## Functions to go from image to frequency-image and back
im2freq = lambda data: fp.rfft(fp.rfft(data, axis=0),
axis=1)
freq2im = lambda f: fp.irfft(fp.irfft(f, axis=1),
axis=0)
## Read in data file and transform
data = np.array(Image.open('test.png'))
freq = im2freq(data)
back = freq2im(freq)
# Make sure the forward and backward transforms work!
assert(np.allclose(data, back))
## Helper functions to rescale a frequency-image to [0, 255] and save
remmax = lambda x: x/x.max()
remmin = lambda x: x - np.amin(x, axis=(0,1), keepdims=True)
touint8 = lambda x: (remmax(remmin(x))*(256-1e-4)).astype(int)
def arr2im(data, fname):
out = Image.new('RGB', data.shape[1::-1])
out.putdata(map(tuple, data.reshape(-1, 3)))
out.save(fname)
arr2im(touint8(freq), 'freq.png')
(Aside: FFT-lover geek note. Look at the documentation for rfft for details, but I used Scipy’s FFTPACK module because its rfft interleaves real and imaginary components of a single pixel as two adjacent real values, guaranteeing that the output for any-sized 2D image (even vs odd, width vs height) will be preserved. This is in contrast to Numpy’s numpy.fft.rfft2 which, because it returns complex data of size width/2+1 by height/2+1, forces you to deal with one extra row/column and deal with deinterleaving complex-to-real yourself. Who needs that hassle for this application.)
Results. Given input named test.png:
this snippet produces the following output (global min/max have been rescaled and quantized to 0-255):
And upscaled:
In this frequency-image, the DC (0 Hz frequency) component is in the top-left, and frequencies move higher as you go right and down.
Now, let’s see what happens when you manipulate this image in a couple of ways. Instead of this test image, let’s use a cat photo.
I made a few mask images in Gimp that I then load into Python and multiply the frequency-image with to see what effect the mask has on the image.
Here’s the code:
# Make frequency-image of cat photo
freq = im2freq(np.array(Image.open('cat.jpg')))
# Load three frequency-domain masks (DSP "filters")
bpfMask = np.array(Image.open('cat-mask-bpfcorner.png')).astype(float) / 255
hpfMask = np.array(Image.open('cat-mask-hpfcorner.png')).astype(float) / 255
lpfMask = np.array(Image.open('cat-mask-corner.png')).astype(float) / 255
# Apply each filter and save the output
arr2im(touint8(freq2im(freq * bpfMask)), 'cat-bpf.png')
arr2im(touint8(freq2im(freq * hpfMask)), 'cat-hpf.png')
arr2im(touint8(freq2im(freq * lpfMask)), 'cat-lpf.png')
Here’s a low-pass filter mask on the left, and on the right, the result—click to see the full-res image:
In the mask, black = 0.0, white = 1.0. So the lowest frequencies are kept here (white), while the high ones are blocked (black). This blurs the image by attenuating high frequencies. Low-pass filters are used all over the place, including when decimating (“downsampling”) an image (though they will be shaped much more carefully than me drawing in Gimp 😜).
Here’s a band-pass filter, where the lowest frequencies (see that bit of white in the top-left corner?) and high frequencies are kept, but the middling-frequencies are blocked. Quite bizarre!
Here’s a high-pass filter, where the top-left corner that was left white in the above mask is blacked out:
This is how edge-detection works.
Postscript. Someone, make a webapp using this technique that lets you draw masks and apply them to an image real-time!!!
There are several issues here.
1) Manual conversion to grayscale isn't good. Use Image.open("test.png").convert('L')
2) Most likely there is an issue with types. You shouldn't pass np.ndarray from fft2 to a PIL image without being sure their types are compatible. abs(np.fft.fft2(something)) will return you an array of type np.float32 or something like this, whereas PIL image is going to receive something like an array of type np.uint8.
3) Scaling suggested in the comments looks wrong. You actually need your values to fit into 0..255 range.
Here's my code that addresses these 3 points:
import numpy as np
from PIL import Image
def fft(channel):
fft = np.fft.fft2(channel)
fft *= 255.0 / fft.max() # proper scaling into 0..255 range
return np.absolute(fft)
input_image = Image.open("test.png")
channels = input_image.split() # splits an image into R, G, B channels
result_array = np.zeros_like(input_image) # make sure data types,
# sizes and numbers of channels of input and output numpy arrays are the save
if len(channels) > 1: # grayscale images have only one channel
for i, channel in enumerate(channels):
result_array[..., i] = fft(channel)
else:
result_array[...] = fft(channels[0])
result_image = Image.fromarray(result_array)
result_image.save('out.png')
I must admit I haven't managed to get results identical to the GIMP FFT plugin. As far as I see it does some post-processing. My results are all kinda very low contrast mess, and GIMP seems to overcome this by tuning contrast and scaling down non-informative channels (in your case all chanels except Red are just empty). Refer to the image:
I'm writing a script to chroma key (green screen) and composite some videos using Python and PIL (pillow). I can key the 720p images, but there's some left over green spill. Understandable but I'm writing a routine to remove that spill...however I'm struggling with how long it's taking. I can probably get better speeds using numpy tricks, but I'm not that familiar with it. Any ideas?
Here's my despill routine. It takes a PIL image and a sensitivity number but I've been leaving that at 1 so far...it's been working well. I'm coming in at just over 4 seconds for a 720p frame to remove this spill. For comparison, the chroma key routine runs in about 2 seconds per frame.
def despill(img, sensitivity=1):
"""
Blue limits green.
"""
start = time.time()
print '\t[*] Starting despill'
width, height = img.size
num_channels = len(img.getbands())
out = Image.new("RGBA", img.size, color=0)
for j in range(height):
for i in range(width):
#r,g,b,a = data[j,i]
r,g,b,a = img.getpixel((i,j))
if g > (b*sensitivity):
out_g = (b*sensitivity)
else:
out_g = g
# end if
out.putpixel((i,j), (r,out_g,b,a))
# end for
# end for
out.show()
print '\t[+] done.'
print '\t[!] Took: %0.1f seconds' % (time.time()-start)
exit()
return out
# end despill
Instead of putpixel, I tried to write the output pixel values to a numpy array then convert the array to a PIL image, but that was averaging just over 5 seconds...so this was faster somehow. I know putpixel isn't the snappiest option but I'm at a loss...
putpixel is slow, and loops like that are even slower, since they are run by the Python interpreter, which is slow as hell. The usual solution is to convert immediately the image to a numpy array and solve the problem with vectorized operations on it, which run in heavily optimized C code. In your case I would do something like:
arr = np.array(img)
g = arr[:,:,1]
bs = arr[:,:,2]*sensitivity
cond = g>bs
arr[:,:,1] = cond*bs + (~cond)*g
out = Image.fromarray(arr)
(it may not be correct and I'm sure it can be optimized way better, this is just a sketch)
Let's assume the image is stored as a png file and I need to drop every odd line and resize the result horizontally to 50% in order to keep the aspect ratio.
The result must have 50% of the resolution of the original image.
It will not be enough to recommend an existing image library, like PIL, I would like to see some working code.
UPDATE - Even if the question received a correct answer, I want to warn others that PIL is not in a great shape, the project website was not updated in months, there is no link to a bug traker and the list activity is quite low. I was surprised to discover that a simple BMP file saved with Paint was not loaded by PIL.
Is it essential to keep every even line (in fact, define "even" - are you counting from 1 or 0 as the first row of the image?)
If you don't mind which rows are dropped, use PIL:
from PIL import Image
img=Image.open("file.png")
size=list(img.size)
size[0] /= 2
size[1] /= 2
downsized=img.resize(size, Image.NEAREST) # NEAREST drops the lines
downsized.save("file_small.png")
I recently wanted to deinterlace some stereo images, extracting the images for the left and right eye. For that I wrote:
from PIL import Image
def deinterlace_file(input_file, output_format_str, row_names=('Left', 'Right')):
print("Deinterlacing {}".format(input_file))
source = Image.open(input_file)
source.load()
dim = source.size
scaled_size1 = (math.floor(dim[0]), math.floor(dim[1]/2) + 1)
scaled_size2 = (math.floor(dim[0]/2), math.floor(dim[1]/2) + 1)
top = Image.new(source.mode, scaled_size1)
top_pixels = top.load()
other = Image.new(source.mode, scaled_size1)
other_pixels = other.load()
for row in range(dim[1]):
for col in range(dim[0]):
pixel = source.getpixel((col, row))
row_int = math.floor(row / 2)
if row % 2:
top_pixels[col, row_int] = pixel
else:
other_pixels[col, row_int] = pixel
top_final = top.resize(scaled_size2, Image.NEAREST) # Downsize to maintain aspect ratio
other_final = other.resize(scaled_size2, Image.NEAREST) # Downsize to maintain aspect ratio
top_final.save(output_format_str.format(row_names[0]))
other_final.save(output_format_str.format(row_names[1]))
output_format_str should be something like: "filename-{}.png" where the {} will be replaced with the row name.
Note that it ends up with the image being half of it's original size. If you don't want this you can twiddle the last scaling step
It's not the fastest operation as it goes through pixel by pixel, but I could not see an easy way to extract rows from an image.