How to find the difference of images in numpy arrays? - python

I'm trying to calculate the difference between 2 images. I'm expecting an integer as my result, but I'm not getting what I expect.
from imageio import imread
#https://raw.githubusercontent.com/glennford49/sampleImages/main/cat1.png
#https://raw.githubusercontent.com/glennford49/sampleImages/main/cat2.png
img1="cat1.png" # 183X276
img2="cat2.png" # 183x276
numpyImg1=[]
numpyImg2=[]
img1=imread(img1)
img2=imread(img2)
numpyImg1.append(img1)
numpyImg2.append(img2)
diff = numpyImg1[0] - numpyImg2[0]
result = sum(abs(diff))
print("difference:",result)
print:
# it prints an array of images rather than printing an interger only
target:
difference: <int>

You are using Python's built-in sum function which only performs a summation along the first dimension of a NumPy array. This is the reason why you are getting a 2D array as the output instead of the single integer you expect. Please use numpy.sum on your result instead which will internally flatten a multi-dimensional NumPy array then sum over the results. In addition, you might as well use numpy.abs for the absolute computation too:
import numpy as np
result = np.sum(np.abs(diff))
Using numpy.sum means that you no longer need to reshape your array into a flattened representation prior to using the built-in sum function in your answer. For future development, always use NumPy methods on any arithmetic operations you want to perform on NumPy arrays. It prevents unexpected behaviour such as what you've just seen.

A (Colored) image is a 3D matrix, so what you can do is convert those image in numpy array using numpy.array(image) and then you can get the difference of those two numpy arrays.
The final answer will be an array in 3-dimenssion

I believe the dimension of numpy array is not 1, You need to perform the sum the number of times as the dimesion of the array to have a single sum value.
[1,2,3]
sum gives : 6
[[1,2,3],[1,2,3]]
sum gives : [2,4,6]
doing a second sum opertion gives
: 12 (single value)
you may need to add one more "sum(result)" before printing data (if image is 2 dimension) .
eg:
numpyImg2.append(img2)
diff = numpyImg1[0] - numpyImg2[0]
result = sum(abs(diff))
result = sum(result) >> Repeat
print("difference:",result)

This is my answer of finding the difference of 2 images in rgb channels.
If 2 the same images were to be subtracted,
prints:
difference per pixel: 0
from numpy import sum
from imageio import imread
#https://github.com/glennford49/sampleImages/blob/main/cat2.png
#https://github.com/glennford49/sampleImages/blob/main/cat2.png
img1="cat1.png"
img2="cat2.png"
numpyImg1=[]
numpyImg2=[]
img1=imread(img1)
img2=imread(img2)
numpyImg1.append(img1)
numpyImg2.append(img2)
diff = numpyImg1[0] - numpyImg2[0]
result = sum(diff/numpyImg1[0].size)
result = sum(abs(result.reshape(-1)))
print("difference per pixel:",result)

Related

Are there any limitations of np.dot() function in numpy library?

I have two vectors or arrays with one million elements each(all are positive). I want to find their dot product. When I use python lists to find them I get some big 20 - 30 digit answer. When using numpy arrays with np.dot() function I am getting a negative answer. The code is shown below. Kindly explain your solution.
Code:
# Python lists
arr1 = list(range(1000000))
arr2 = list(range(1000000, 2000000))
# Numpy arrays
arr1_np = np.array(arr1)
arr2_np = np.array(arr2)
# Dot product using lists
result = 0
for x1,x2 in zip(arr1,arr2):
result+=x1*x2
print(result)
# Dot product using numpy built in function np.dot()
print(np.dot(arr1_np,arr2_np))
enter image description here

Access to short diagonal elements in Numpy 3 dimensional array

I have a 3 dimensional numpy array and I want to access short diagonal elements of it. Let's say i,j,k are three dimensions. Is it possible to access elements where i==j or i==k or j==k, so that I can set them to a specific value.
I tried to solve this by creating a mask variable of indices. This mask variable of indices is fed to the final array where the values of {i=j or i=k or j=k} are set to specific values. Unfortunately this code is returning the set where {i=j=k}
import numpy as np
N = 3
maskXY = np.eye(N).reshape(N,N,1)
maskYZ = np.eye(N).reshape(1,N,N)
maskXZ = np.eye(N).reshape(N,1,N)
maskIndices = maskXY * maskYZ*maskXZ
#set the values of final array using above mask
finalArray[maskIndices] = #specific values
Approach #1
We could create open meshes with np.ix_ using the ranged arrays covering the dimensions of the input array and then perform OR-ing among those with a very close syntax to the one described in the question, like so -
i,j,k = np.ix_(*[np.arange(r) for r in finalArray.shape])
mask = (i==j) | (i==k) | (j==k)
finalArray[mask] = # desired values
Approach #2
It seems, we can also follow the posted code in the question and use boolean versions of the masks and then perform OR-ing to get the mask equivalent, like so -
mask = (maskXY==1) | (maskYZ==1) | (maskXZ==1)
But, this involves masks that are 2D (when squeezed) and as such won't be as memory-efficient as the previous approach that dealt with 1D arrays.

np.bincount for 1 line, vectorized multidimensional averaging

I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.

Fill in a numpy array without creating list

I would like to create a numpy array without creating a list first.
At the moment I've got this:
import pandas as pd
import numpy as np
dfa = pd.read_csv('csva.csv')
dfb = pd.read_csv('csvb.csv')
pa = np.array(dfa['location'])
pb = np.array(dfb['location'])
ra = [(pa[i+1] - pa[i]) / float(pa[i]) for i in range(9999)]
rb = [(pb[i+1] - pb[i]) / float(pb[i]) for i in range(9999)]
ra = np.array(ra)
rb = np.array(rb)
Is there any elegant way to do in one step the last fill in of this np array without creating the list first ?
Thanks
You can calculate with vectors in numpy, without the need of lists:
ra = (pa[1:] - pa[:-1]) / pa[:-1]
rb = (pb[1:] - pb[:-1]) / pb[:-1]
The title of your question and what you need to do in your specific case are actually two slighly different things.
To create a numpy array without "casting" a list (or other iterable) you can use one of the several methods defined by numpy itself that returns array:
np.empty, np.zeros, np.ones, np.full to create arrays of given size with fixed values
np.random.* (where * can be various distributions, like normal, uniform, exponential ...), to create arrays of given size with random values
In general, read this: Array creation routines
In your case, you already have numpy arrays (pa and pb) and you don't have to create lists to calculate the new arrays (ra and rb), you can directly operate on the numpy arrays (which is the entire point of numpy: you can do operations on arrays way faster that would be iterating over each element!). Copied from #Daniel's answer:
ra = (pa[1:] - pa[:-1]) / pa[:-1]
rb = (pb[1:] - pb[:-1]) / pb[:-1]
This will be much faster than you're current implementation, not only because you avoid converting a list to ndarray, but because numpy arrays are order of magnuitude faster for mathematical and batch operations than iteration
numpy.zeros
Return a new array of given shape and type, filled with zeros.
or
numpy.ones
Return a new array of given shape and type, filled with ones.
or
numpy.empty
Return a new array of given shape and type, without initializing
entries.

How to search in one NumPy array for positions for getting at these position the value from a second NumPy array?

I have two raster files which I have converted into NumPy arrays (arcpy.RasterToNumpyArray) to work with the values in the raster cells with Python.
One of the raster has two values True and False. The other raster has different values in the range between 0 to 1000. Both rasters have exactly the same extent, so both NumPy arrays are build up identically (columns and rows), except the values.
My aim is to identify all positions in NumPy array A which have the value True. These positions shall be used for getting the value at these positions from NumPy array B.
Do you have any idea how I can implement this?
If I understand your description right, you should just be able to do B[A].
You can use the array with True and False values to simply index into the other. Here's a sample:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[True,False,False],[False,True,False],[False,False,True]])
a[b] ## gives array([1, 5, 9])

Categories

Resources