I open a TIFF LAB image and return a big numpy array (4928x3264x3 float64) using python with this function:
def readTIFFLAB(filename):
"""Read TIFF LAB and retur a float matrix
read 16 bit (2 byte) each time without any multiprocessing
about 260 sec"""
import numpy as np
....
....
# Data read
# Matrix creation
dim = (int(ImageLength), int(ImageWidth), int(SamplePerPixel))
Image = np.empty(dim, np.float64)
contatore = 0
for address in range(0, len(StripOffsets)):
offset = StripOffsets[address]
f.seek(offset)
for lung in range(0, (StripByteCounts[address]/SamplePerPixel/2)):
v = np.array(f.read(2))
v.dtype = np.uint16
v1 = np.array(f.read(2))
v1.dtype = np.int16
v2 = np.array(f.read(2))
v2.dtype = np.int16
v = np.array([v/65535.0*100])
v1 = np.array([v1/32768.0*128])
v2 = np.array([v2/32768.0*128])
v = np.append(v, [v1, v2])
riga = contatore // ImageWidth
colonna = contatore % ImageWidth
# print(contatore, riga, colonna)
Image[riga, colonna, :] = v
contatore += 1
return(Image)
but this routine need about 270 second to do all the work and return a numpy array.
I try to use multiprocessing but is not possible to share an array or to use queue to pass it and sharedmem is not usable in windows system (at home I use openSuse but at work I must use windows).
Someone could help me to reduce the elaboration time? I read about threadind, to write some part in C language but I don’t understand what the best (and easier) solution,...I’m a food technologist not a real programmer :-)
Thanks
Wow, your method is really slow indeed, try tifffile library, you can find it here. That library will open your file very fast, then you just need to make the proper conversion, here's the simple usage:
import numpy as np
import tifffile
from skimage import color
import time
import matplotlib.pyplot as plt
def convert_to_tifflab(image):
# divide the color channel
L = image[:, :, 0]
a = image[:, :, 1]
b = image[:, :, 2]
# correct interpretation of a/b channel
a.dtype = np.int16
b.dtype = np.int16
# scale the result
L = L / 65535.0 * 100
a = a / 32768.0 * 128
b = b / 32768.0 * 128
# join the result
lab = np.dstack([L, a, b])
# view the image
start = time.time()
rgb = color.lab2rgb(lab)
print "Lab2Rgb: {0}".format(time.time() - start)
return rgb
if __name__ == "__main__":
filename = '/home/cilladani1/FERRERO/Immagini Digi Eye/Test Lettura CIELAB/TestLetturaCIELAB (LAB).tif'
start = time.time()
I = tifffile.imread(filename)
end = time.time()
print "Image fetching: {0}".format(end - start)
rgb = convert_to_tifflab(I)
print "Image conversion: {0}".format(time.time() - end)
plt.imshow(rgb)
plt.show()
The benchmark gives this data:
Image fetching: 0.0929999351501
Lab2Rgb: 12.9520001411
Image conversion: 13.5920000076
As you can see the bottleneck in this case is lab2rgb, which converts from xyz to rgb space. I'd recommend you to report an issue to the author of tifffile requesting the feature to read your fileformat, I'm sure he'll be able to speed up directly the C code.
After doing what BPL suggest me I modify the result array as follow:
# divide the color channel
L = I[:, :, 0]
a = I[:, :, 1]
b = I[:, :, 2]
# correct interpretation of a/b channel
a.dtype = np.int16
b.dtype = np.int16
# scale the result
L = L / 65535.0 * 100
a = a / 32768.0 * 128
b = b / 32768.0 * 128
# join the result
lab = np.dstack([L, a, b])
# view the image
from skimage import color
rgb = color.lab2rgb(lab)
plt.imshow(rgb)
So now is easier to read TIFF LAB image.
Thank BPL
Related
I'm trying to improve the speed of my image manipulation as it's been too slow for actual use.
What I need to do is apply a complex transformation on the colour of every pixel on an image. The manipulation is basically apply a vector transform like T(r, g, b, a) => (r * x, g * x, b * y, a) or in layman's terms, it's a multiplication of Red and Green values by a constant, a different multiplication for Blue and keep Alpha. But I also need to manipulate it differently if the RGB colour falls under some specific colours, in those cases they must follow a dictionary/transformation table where RGB => newRGB again keeping alpha.
The algorithm would be:
for each pixel in image:
if pixel[r, g, b] in special:
return special[pixel[r, g, b]] + pixel[a]
else:
return T(pixel)
It's simple but speed has been sub-optimal. I believe there's some way using numpy vectors, but I could not find how.
Important details about the implementation:
I don't care about the original buffer/image (manipulation can be in place)
I can use wxPython, Pillow and NumPy
Order or dimension of the array is not important as long as the buffer keeps the length
The buffer is obtained from a wxPython Bitmap and special and (RG|B)_pal are transformation tables, the end result will become a wxPython Bitmap too. They're obtained like these:
# buffer
bitmap = wx.Bitmap # it's valid wxBitmap here, this is just to let you know it exists
buff = bytearray(bitmap.GetWidth() * bitmap.GetHeight() * 4)
bitmap.CopyToBuffer(buff, wx.BitmapBufferFormat_RGBA)
self.RG_mult= 0.75
self.B_mult = 0.83
self.RG_pal = []
self.B_pal = []
for i in range(0, 256):
self.RG_pal.append(int(i * self.RG_mult))
self.B_pal.append(int(i * self.B_mult))
self.special = {
# RGB: new_RGB
# Implementation specific for the fastest access
# with buffer keys are 24bit numbers, with PIL keys are tuples
}
Implementations I tried include direct buffer manipulation:
for x in range(0, bitmap.GetWidth() * bitmap.GetHeight()):
index = x * 4
r = buf[index]
g = buf[index + 1]
b = buf[index + 2]
rgb = buf[index:index + 3]
if rgb in self.special:
special = self.special[rgb]
buf[index] = special[0]
buf[index + 1] = special[1]
buf[index + 2] = special[2]
else:
buf[index] = self.RG_pal[r]
buf[index + 1] = self.RG_pal[g]
buf[index + 2] = self.B_pal[b]
Use Pillow with getdata():
pil = Image.frombuffer("RGBA", (bitmap.GetWidth(), bitmap.GetHeight()), buf)
pil_buf = []
for colour in pil.getdata():
colour_idx = colour[0:3]
if (colour_idx in self.special):
special = self.special[colour_idx]
pil_buf.append((
special[0],
special[1],
special[2],
colour[3],
))
else:
pil_buf.append((
self.RG_pal[colour[0]],
self.RG_pal[colour[1]],
self.B_pal[colour[2]],
colour[3],
))
pil.putdata(pil_buf)
buf = pil.tobytes()
Pillow with point() and getdata() (fastest I achieved, more than twice times faster than others)
pil = Image.frombuffer("RGBA", (bitmap.GetWidth(), bitmap.GetHeight()), buf)
r, g, b, a = pil.split()
r = r.point(lambda r: r * self.RG_mult)
g = g.point(lambda g: g * self.RG_mult)
b = b.point(lambda b: b * self.B_mult)
pil = Image.merge("RGBA", (r, g, b, a))
i = 0
for colour in pil.getdata():
colour_idx = colour[0:3]
if (colour_idx in self.special):
special = self.special[colour_idx]
pil.putpixel(
(i % bitmap.GetWidth(), i // bitmap.GetWidth()),
(
special[0],
special[1],
special[2],
colour[3],
)
)
i += 1
buf = pil.tobytes()
I also tried working with numpy.where but then I could not get it to work. With numpy.apply_along_axis it worked but the performance was terrible. Other tries with numpy I could not access the RGB together, only as separated bands.
Pure Numpy Version
This first optimization relies on the fact, that one probably has way less special colors than pixels. I use numpy to do all the inner loops. This works well with images of up to 1MP. If You have multiple images I'd recommend the parallel approach.
Let's define a test case:
import requests
from io import BytesIO
from PIL import Image
import numpy as np
# Load some image, so we have the same
response = requests.get("https://upload.wikimedia.org/wikipedia/commons/4/41/Rick_Astley_Dallas.jpg")
# Make areas of known color
img = Image.open(BytesIO(response.content)).rotate(10, expand=True).rotate(-10,expand=True, fillcolor=(255,255,255)).convert('RGBA')
print("height: %d, width: %d (%.2f MP)"%(img.height, img.width, img.width*img.height/10e6))
height: 5034, width: 5792 (2.92 MP)
Define our special colors
specials = {
(4,1,6):(255,255,255),
(0, 0, 0):(255, 0, 255),
(255, 255, 255):(0, 255, 0)
}
Algorithm
def transform_map(img, specials, R_factor, G_factor, B_factor):
# Your transform
def transform(x, a):
a *= x
return a.clip(0, 255).astype(np.uint8)
# Convert to array
img_array = np.asarray(img)
# Extract channels
R = img_array.T[0]
G = img_array.T[1]
B = img_array.T[2]
A = img_array.T[3]
# Find Special colors
# First, calculate a uniqe hash
color_hashes = (R + 2**8 * G + 2**16 * B)
# Find inidices of special colors
special_idxs = []
for k, v in specials.items():
key_arr = np.array(list(k))
val_arr = np.array(list(v))
spec_hash = key_arr[0] + 2**8 * key_arr[1] + 2**16 * key_arr[2]
special_idxs.append(
{
'mask': np.where(np.isin(color_hashes, spec_hash)),
'value': val_arr
}
)
# Apply transform to whole image
R = transform(R, R_factor)
G = transform(G, G_factor)
B = transform(B, B_factor)
# Replace values where special colors were found
for idx in special_idxs:
R[idx['mask']] = idx['value'][0]
G[idx['mask']] = idx['value'][1]
B[idx['mask']] = idx['value'][2]
return Image.fromarray(np.array([R,G,B,A]).T, mode='RGBA')
And finally some bench marks on a Intel Core i5-6300U # 2.40GHz
import time
times = []
for i in range(10):
t0 = time.time()
# Test
transform_map(img, specials, 1.2, .9, 1.2)
#
t1 = time.time()
times.append(t1-t0)
np.round(times, 2)
print('average run time: %.2f +/-%.2f'%(np.mean(times), np.std(times)))
average run time: 9.72 +/-0.91
EDIT Parallelization
With the same setup as above, we can get a 2x speed increase on large images. (Small ones are faster without numba)
from numba import njit, prange
from numba.core import types
from numba.typed import Dict
# Map dict of special colors or transform over array of pixel values
#njit(parallel=True, locals={'px_hash': types.uint32})
def check_and_transform(img_array, d, T):
#Save Shape for later
shape = img_array.shape
# Flatten image for 1-d iteration
img_array_flat = img_array.reshape(-1,3).copy()
N = img_array_flat.shape[0]
# Replace or map
for i in prange(N):
px_hash = np.uint32(0)
px_hash += img_array_flat[i,0]
px_hash += types.uint32(2**8) * img_array_flat[i,1]
px_hash += types.uint32(2**16) * img_array_flat[i,2]
try:
img_array_flat[i] = d[px_hash]
except Exception:
img_array_flat[i] = (img_array_flat[i] * T).astype(np.uint8)
# return image
return img_array_flat.reshape(shape)
# Wrapper for function above
def map_or_transform_jit(image: Image, specials: dict, T: np.ndarray):
# assemble numba typed dict
d = Dict.empty(
key_type=types.uint32,
value_type=types.uint8[:],
)
for k, v in specials.items():
k = types.uint32(k[0] + 2**8 * k[1] + 2**16 * k[2])
v = np.array(v, dtype=np.uint8)
d[k] = v
# get rgb channels
img_arr = np.array(img)
rgb = img_arr[:,:,:3].copy()
img_shape = img_arr.shape
# apply map
rgb = check_and_transform(rgb, d, T)
# set color channels
img_arr[:,:,:3] = rgb
return Image.fromarray(img_arr, mode='RGBA')
# Benchmark
import time
times = []
for i in range(10):
t0 = time.time()
# Test
test_img = map_or_transform_jit(img, specials, np.array([1, .5, .5]))
#
t1 = time.time()
times.append(t1-t0)
np.round(times, 2)
print('average run time: %.2f +/- %.2f'%(np.mean(times), np.std(times)))
test_img
average run time: 3.76 +/- 0.08
`n = 3
array = np.ones((n,n)) / (n*n)
n = array.shape[0] * array.shape1
while(True):
ret, frame = cap.read()
if ret is True:
print("newframe")
gframe = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
dst = cv2.copyMakeBorder(gframe, 1, 1, 1, 1, borderType, None, None)
blur = cv2.blur(dst,(3,3))
if k == 1 :
lastframe = gframe
curframe = gframe
nextframe = gframe
newFrame = gframe
k = 0
else :
lf = ndimage.convolve(lastframe, array, mode='constant', cval= 0.0)
cf = ndimage.convolve(curframe, array, mode='constant', cval= 0.0)
nf = ndimage.convolve(nextframe, array, mode='constant', cval= 0.0)
lastframe = curframe
curframe = nextframe
nextframe = gframe
b = np.zeros((3, 528, 720))
b[0] = lf
b[1] = cf
b[2] = nf
result = np.mean(b, axis=0)
cv2.imshow('frame',result)
cv2.imshow('frame2',gframe)
`enter image description here
I am trying to add all pixel values of a 3x3 pixel and then average them. I need to do that for every pixel and every frame and replace the primary pixel with the averaged one. However the way i am trying to do it makes it really slow and not really accurate.
This sounds like a convolution.
import numpy as np
from scipy import ndimage
a = np.random.random((5, 5))
a
[[0.14742615 0.83548453 0.67433445 0.59162829 0.21160044]
[0.1700598 0.89074466 0.84155171 0.65092969 0.3842437 ]
[0.22662423 0.2266929 0.47757456 0.34480112 0.06261333]
[0.89402116 0.00101947 0.90503461 0.93112109 0.44817247]
[0.21788789 0.3338606 0.07323461 0.28944439 0.91217591]]
Convolution operation with window size 3x3
n = 3
k = np.ones((n, n)) / (n * n)
n = k.shape[0] * k.shape[1]
b = ndimage.convolve(a, k, mode='constant', cval=0.0)
b
[[0.22707946 0.39551126 0.49829704 0.3726987 0.2042669 ]
[0.27744803 0.49894366 0.61486021 0.47103081 0.24953517]
[0.26768469 0.51481368 0.58549664 0.56067136 0.31354238]
[0.21112292 0.37288334 0.39808704 0.4937969 0.33203648]
[0.16075435 0.26945093 0.28152386 0.39546479 0.28676821]]
Now you just have to do it for the current frame, and the two prior frames.
-------- EDIT: For three frames -----------
For 3D you could write a convolution function like in this post, but its quite complex as it uses FFTs
If you just want to average across three frames, you could do:
f1 = np.random.random((5, 5)) # Frame1
f2 = np.random.random((5, 5)) # Frame2
f3 = np.random.random((5, 5)) # Frame3
n = 3
k = np.ones((n, n)) / (n * n)
n = k.shape[0] * k.shape[1]
b0 = ndimage.convolve(f1, k, mode='constant', cval=0.0)
b1 = ndimage.convolve(f2, k, mode='constant', cval=0.0)
b2 = ndimage.convolve(f3, k, mode='constant', cval=0.0)
# Create a 3D Matrix, with each fame placed along the first dimension
b = np.zeros((3, 5, 5))
b[0] = b0
b[1] = b1
b[2] = b2
# Take the average across the first dimension (across frames)
result = np.mean(b, axis=0)
There probably is a more elegant solution than this, but it gets the job done.
-------- EDIT: For Movies -----------
Based on all the questions in the comments I've decided to attempt to add some more code to help with implementation.
Firstly I'm starting out with these 7 consecutive stills from a movie:
I have not verified that the following code is bug proof or actually returns the correct result.
import cv2
import numpy as np
from scipy import ndimage
# this is a function to do previous code
def mean_frames(frames, kernel):
b = np.zeros(frames.shape)
for i in range(frames.shape[0]):
b[i] = ndimage.convolve(frames[i], k, mode='constant', cval=0.0)
b = np.mean(b, axis=0) / frames.shape[0]
return b
mean_N = 3 # frames to average
# read in 1 file to get dimenions
im = cv2.imread(f'{root}1.png', cv2.IMREAD_GRAYSCALE)
# setup numpy matrix that will hold mean_N frames at a time
frames = np.zeros((mean_N, im.shape[0], im.shape[1]))
avg_frames = [] # list to store our 3 averaged frames
count = 0 # counter to position frames in 1st dim of 3D matrix for avg
k = np.ones((3, 3)) / (3 * 3) # kernel for 2D convolution
for j in range(1, 7): # 7 images
file_name = root + str(j) + '.png'
im = cv2.imread(file_name, cv2.IMREAD_GRAYSCALE)
frames[count, ::] = im # store in 3D matrix
# if loaded more than min req. for avg, we average
if j >= mean_N:
# average and store to list
avg_frames.append(mean_frames(frames, k))
# if the count is mean_N - 1, that means we need to replace
# the 0th matrix in frames so that we are doing a 'moving avg'
if count == (mean_N - 1):
count = 0
else:
count += 1 #increase position in 0th dim for 3D matrix storage
# ouput averaged frames
for i, f in enumerate(avg_frames):
cv2.imwrite(f'{path}output{i}.jpg', f)
Then looking at the folder, there are 5 files (as expected if we did a moving average of 3 frames over 7 stills:
looking at before and after:
Image 3:
and averaged image #1:
The image not only is in gray scale (as expected) but seems quite dark. Perhaps some brightening would make things look better/more apparent.
Your question is very interesting.
I saw that you use many loops for activating this function. Let's process analysis.
Just for a frame.
You want to add all pixel values of a 3x3 pixel neighborhood. So I think Image interpolation is very suitable for this case. In OpenCV, we use resize() to interpolate pixel for image. So the INTER_NEAREST is best for this situation.
This is the formula for INTER_NEAREST.
Now you get the pixel added image.
Then you want to do that for every pixel and every frame and replace the primary pixel with the average one. And I think the Average filtering is a better solution.
The filter will work every pixel.
The code of a temporary example.
Interpolation
img = cv2.resize(img, (img.size[0]*3, img.size[1]*3), cv2.INTER_NEAREST)
Filter
img = cv2.blur(img, (3, 3))
I'm trying to implement Reinhard's method to use the color distribution of a target image to color normalize a passed in image for a research project. I've gotten the code to work and it outputs correctly but it's pretty slow. It takes about 20 minutes to iterate through 300 images. I'm pretty sure the bottleneck is how I'm handling applying the function to each image. I'm currently iterating through each pixel of the image and applying the functions below to each channel.
def reinhard(target, img):
#converts image and target from BGR colorspace to l alpha beta
lAB_img = cv2.cvtColor(img, cv2.COLOR_BGR2Lab)
lAB_tar = cv2.cvtColor(target, cv2.COLOR_BGR2Lab)
#finds mean and standard deviation for each color channel across the entire image
(mean, std) = cv2.meanStdDev(lAB_img)
(mean_tar, std_tar) = cv2.meanStdDev(lAB_tar)
#iterates over image implementing formula to map color normalized pixels to target image
for y in range(512):
for x in range(512):
lAB_tar[x, y, 0] = (lAB_img[x, y, 0] - mean[0]) / std[0] * std_tar[0] + mean_tar[0]
lAB_tar[x, y, 1] = (lAB_img[x, y, 1] - mean[1]) / std[1] * std_tar[1] + mean_tar[1]
lAB_tar[x, y, 2] = (lAB_img[x, y, 2] - mean[2]) / std[2] * std_tar[2] + mean_tar[2]
mapped = cv2.cvtColor(lAB_tar, cv2.COLOR_Lab2BGR)
return mapped
My supervisor told me that I could try using a matrix to apply the function all at once to improve the runtime but I'm not exactly sure how to go about doing that.
The original and the target:
Color transfer reuslts using Reinhard'method in 5 ms:
I prefer to implement the formulat in numpy vectorized operations other than python loops.
# implementing the formula
#(Io - mo)/so*st + mt = Io * (st/so) + mt - mo*(st/so)
ratio = (std_tar/std_ori).reshape(-1)
offset = (mean_tar - mean_ori*std_tar/std_ori).reshape(-1)
lab_tar = cv2.convertScaleAbs(lab_ori*ratio + offset)
Here is the code:
# 2019/02/19 by knight-金
# https://stackoverflow.com/a/54757659/3547485
import numpy as np
import cv2
def reinhard(target, original):
# cvtColor: COLOR_BGR2Lab
lab_tar = cv2.cvtColor(target, cv2.COLOR_BGR2Lab)
lab_ori = cv2.cvtColor(original, cv2.COLOR_BGR2Lab)
# meanStdDev: calculate mean and stadard deviation
mean_tar, std_tar = cv2.meanStdDev(lab_tar)
mean_ori, std_ori = cv2.meanStdDev(lab_ori)
# implementing the formula
#(Io - mo)/so*st + mt = Io * (st/so) + mt - mo*(st/so)
ratio = (std_tar/std_ori).reshape(-1)
offset = (mean_tar - mean_ori*std_tar/std_ori).reshape(-1)
lab_tar = cv2.convertScaleAbs(lab_ori*ratio + offset)
# convert back
mapped = cv2.cvtColor(lab_tar, cv2.COLOR_Lab2BGR)
return mapped
if __name__ == "__main__":
ori = cv2.imread("ori.png")
tar = cv2.imread("tar.png")
mapped = reinhard(tar, ori)
cv2.imwrite("mapped.png", mapped)
mapped_inv = reinhard(ori, tar)
cv2.imwrite("mapped_inv.png", mapped)
I managed to figure it out after looking at the numpy documentation. I just needed to replace my nested for loop with proper array accessing. It took less than a minute to iterate through all 300 images with this.
lAB_tar[:,:,0] = (lAB_img[:,:,0] - mean[0])/std[0] * std_tar[0] + mean_tar[0]
lAB_tar[:,:,1] = (lAB_img[:,:,1] - mean[1])/std[1] * std_tar[1] + mean_tar[1]
lAB_tar[:,:,2] = (lAB_img[:,:,2] - mean[2])/std[2] * std_tar[2] + mean_tar[2]
I wrote a python script with combines images in unique ways for an OpenGL shader. The problem is that I have a large number of very large maps and it takes a long time to process. Is there a way to write this in a quicker fashion?
import numpy as np
map_data = {}
image_data = {}
for map_postfix in names:
file_name = inputRoot + '-' + map_postfix + resolution + '.png'
print 'Loading ' + file_name
image_data[map_postfix] = Image.open(file_name, 'r')
map_data[map_postfix] = image_data[map_postfix].load()
color = mapData['ColorOnly']
ambient = mapData['AmbientLight']
shine = mapData['Shininess']
width = imageData['ColorOnly'].size[0]
height = imageData['ColorOnly'].size[1]
arr = np.zeros((height, width, 4), dtype=int)
for i in range(width):
for j in range(height):
ambient_mod = ambient[i,j][0] / 255.0
arr[j, i, :] = [color[i,j][0] * ambient_mod , color[i,j][1] * ambient_mod , color[i,j][2] * ambient_mod , shine[i,j][0]]
print 'Converting Color Map to image'
return Image.fromarray(arr.astype(np.uint8))
This is just a sample of a large number of batch processes so I am more interested in if there is a faster way to iterate and modify an image file. Almost all the time is being spent on the nested loop vs loading and saving.
Vectorised-code example -- test effect on yours in timeit or zmq.Stopwatch()
Reported to have 22.14 seconds >> 0.1624 seconds speedup!
While your code seems to loop just over RGBA[x,y], let me show a "vectorised"-syntax of a code, that benefits from numpy matrix-manipulation utilities ( forget the RGB/YUV manipulation ( originally based on OpenCV rather than PIL ), but re-use the vectorised-syntax approach to avoid for-loops and adapt it to work efficiently for your calculus. Wrong order of operations may more than double yours processing time.
And use a test / optimise / re-test loop for speeding up.
For testing, use standard python timeit if [msec] resolution is enough.
Go rather for zmq.StopWatch() if you need going into [usec] resolution.
# Vectorised-code example, to see the syntax & principles
# do not mind another order of RGB->BRG layers
# it has been OpenCV traditional convention
# it has no other meaning in this demo of VECTORISED code
def get_YUV_U_Cb_Rec709_BRG_frame( brgFRAME ): # For the Rec. 709 primaries used in gamma-corrected sRGB, fast, VECTORISED MUL/ADD CODE
out = numpy.zeros( brgFRAME.shape[0:2] )
out -= 0.09991 / 255 * brgFRAME[:,:,1] # // Red
out -= 0.33601 / 255 * brgFRAME[:,:,2] # // Green
out += 0.436 / 255 * brgFRAME[:,:,0] # // Blue
return out
# normalise to <0.0 - 1.0> before vectorised MUL/ADD, saves [usec] ...
# on 480x640 [px] faster goes about 2.2 [msec] instead of 5.4 [msec]
In your case, using dtype = numpy.int, guess it shall be faster to MUL first by ambient[:,:,0] and finally DIV to normalisearr[:,:,:3] /= 255
# test if this goes even faster once saving the vectorised overhead on matrix DIV
arr[:,:,0] = color[:,:,0] * ambient[:,:,0] / 255 # MUL remains INT, shall precede DIV
arr[:,:,1] = color[:,:,1] * ambient[:,:,0] / 255 #
arr[:,:,2] = color[:,:,2] * ambient[:,:,0] / 255 #
arr[:,:,3] = shine[:,:,0] # STO alpha
So how it may look in your algo?
One need not have Peter Jackson's impressive budget and time once planned, spanned and executed immense number-crunching over 3 years in a New Zealand hangar, overcrowded by a herd of SGI workstations, as he was producing "The Lord of The Rings" fully-digital mastering assembly-line, right by the frame-by-frame pixel manipulation, to realise that miliseconds and microseconds and even nanoseconds in the mass-production pipe-line simply do matter.
So, take a deep breath and test and re-test so as to optimise your real-world imagery processing performance to levels that your project needs.
Hope this may help you on this:
# OPTIONAL for performance testing -------------# ||||||||||||||||||||||||||||||||
from zmq import Stopwatch # _MICROSECOND_ timer
# # timer-resolution step ~ 21 nsec
# # Yes, NANOSECOND-s
# OPTIONAL for performance testing -------------# ||||||||||||||||||||||||||||||||
arr = np.zeros( ( height, width, 4 ), dtype = int )
aStopWatch = zmq.Stopwatch() # ||||||||||||||||||||||||||||||||
# /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\# <<< your original code segment
# aStopWatch.start() # |||||||||||||__.start
# for i in range( width ):
# for j in range( height ):
# ambient_mod = ambient[i,j][0] / 255.0
# arr[j, i, :] = [ color[i,j][0] * ambient_mod, \
# color[i,j][1] * ambient_mod, \
# color[i,j][2] * ambient_mod, \
# shine[i,j][0] \
# ]
# usec_for = aStopWatch.stop() # |||||||||||||__.stop
# print 'Converting Color Map to image'
# print ' FOR processing took ', usec_for, ' [usec]'
# /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\# <<< proposed alternative
aStopWatch.start() # |||||||||||||__.start
# reduced numpy broadcasting one dimension less # ref. comments below
arr[:,:, 0] = color[:,:,0] * ambient[:,:,0] # MUL ambient[0] * [{R}]
arr[:,:, 1] = color[:,:,1] * ambient[:,:,0] # MUL ambient[0] * [{G}]
arr[:,:, 2] = color[:,:,2] * ambient[:,:,0] # MUL ambient[0] * [{B}]
arr[:,:,:3] /= 255 # DIV 255 to normalise
arr[:,:, 3] = shine[:,:,0] # STO shine[ 0] in [3]
usec_Vector = aStopWatch.stop() # |||||||||||||__.stop
print 'Converting Color Map to image'
print ' Vectorised processing took ', usec_Vector, ' [usec]'
return Image.fromarray( arr.astype( np.uint8 ) )
The following bit of Python takes two images and performs an 'alpha composite' of them, or in other words, sticks one on top of the other, and returns a single image. The code isn't really something I quite grasp, as it came from another Stack Overflow answer.
import numpy as np
import Image
def alpha_composite(src, dst):
src = np.asarray(src)
dst = np.asarray(dst)
out = np.empty(src.shape, dtype = 'float')
alpha = np.index_exp[:, :, 3:]
rgb = np.index_exp[:, :, :3]
src_a = src[alpha]/255.0
dst_a = dst[alpha]/255.0
out[alpha] = src_a+dst_a*(1-src_a)
old_setting = np.seterr(invalid = 'ignore')
out[rgb] = (src[rgb]*src_a + dst[rgb]*dst_a*(1-src_a))/out[alpha]
np.seterr(**old_setting)
out[alpha] *= 255
np.clip(out,0,255)
# astype('uint8') maps np.nan (and np.inf) to 0
out = out.astype('uint8')
out = Image.fromarray(out, 'RGBA')
return out
It works great on Windows, but as soon as I move it over to Ubuntu Server, it gives me the following error:
File "ImageStitcher.py", line 21, in alpha_composite
src_a = src[alpha]/255.0
IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index
I'm using the same version of PIL and the same version of numpy on both.
Any idea what might be going on here?