Why do these two numpy.divide operations give such different results? - python

I would like to correct the values in hyperspectral readings from a cameara using the formula described over here;
the captured data is subtracted by dark reference and divided with
white reference subtracted dark reference.
In the original example, the task is rather simple, white and dark reference has the same shape as the main data so the formula is executed as:
corrected_nparr = np.divide(np.subtract(data_nparr, dark_nparr),
np.subtract(white_nparr, dark_nparr))
However the main data is much larger in my experience. Shapes in my case are as following;
$ white_nparr.shape, dark_nparr.shape, data_nparr.shape
((100, 640, 224), (100, 640, 224), (4300, 640, 224))
that's why I repeat the reference arrays.
white_nparr_rep = white_nparr.repeat(43, axis=0)
dark_nparr_rep = dark_nparr.repeat(43, axis=0)
return np.divide(np.subtract(data_nparr, dark_nparr_rep), np.subtract(white_nparr_rep, dark_nparr_rep))
And it works almost perfectly, as can be seen in the image at the left. But this approach requires enormous amount of memory, so I decided to traverse the large array and replace the original values with corrected ones on-the-go instead:
ref_scale = dark_nparr.shape[0]
data_scale = data_nparr.shape[0]
for i in range(int(data_scale / ref_scale)):
data_nparr[i*ref_scale:(i+1)*ref_scale] =
np.divide
(
np.subtract(data_nparr[i*ref_scale:(i+1)*ref_scale], dark_nparr),
np.subtract(white_nparr, dark_nparr)
)
But that traversal approach gives me the ugliest of results, as can be seen in the right. I'd appreciate any idea that would help me fix this.
Note: I apply 20-times co-adding (mean of 20 readings) to obtain the images below.
EDIT: dtype of each array is as following:
$ white_nparr.dtype, dark_nparr.dtype, data_nparr.dtype
(dtype('float32'), dtype('float32'), dtype('float32'))

Your two methods don't agree because in the first method you used
white_nparr_rep = white_nparr.repeat(43, axis=0)
but the second method corresponds to using
white_nparr_rep = np.tile(white_nparr, (43, 1, 1))
If the first method is correct, you'll have to adjust the second method to act accordingly. Perhaps
for i in range(int(data_scale / ref_scale)):
data_nparr[i*ref_scale:(i+1)*ref_scale] =
np.divide
(
np.subtract(data_nparr[i*ref_scale:(i+1)*ref_scale], dark_nparr[i]),
np.subtract(white_nparr[i], dark_nparr[i])
)
A simple example with 2-d arrays that shows the difference between repeat and tile:
In [146]: z
Out[146]:
array([[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15]])
In [147]: np.repeat(z, 3, axis=0)
Out[147]:
array([[ 1, 2, 3, 4, 5],
[ 1, 2, 3, 4, 5],
[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15],
[11, 12, 13, 14, 15],
[11, 12, 13, 14, 15]])
In [148]: np.tile(z, (3, 1))
Out[148]:
array([[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15],
[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15],
[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15]])
Off topic postscript: I don't know why the author of the page that you linked to writes NumPy expressions as (for example):
corrected_nparr = np.divide(
np.subtract(data_nparr, dark_nparr),
np.subtract(white_nparr, dark_nparr))
NumPy allows you to write that as
corrected_nparr = (data_nparr - dark_nparr) / (white_nparr - dark_nparr)
whick looks much nicer to me.

Related

Numpy filter matrix based on column

I have a matrix with several different values for each row:
arr1 = np.array([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18],[19,20,21,22,23,24,25,26,27]])
arr2 = np.array([["A"],["B"],["C"]])
This produces the following matrices:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24, 25, 26, 27]])
array([['A'],
['B'],
['C']])
A represents the first 3 columns, B represents the next 3 columns, and C represents the last 3 columns. So the result I'd like here is:
array([[1,2,3],
[13,14,15],
[25,26,27]])
I was thinking about converting arr2 to a mask array, but I'm not even sure how to do this. If it was a 1darray I could do something like this:
arr[0,1,2]
but for a 2darray I'm not even sure how to mask like this. I tried this and got errors:
arr[[0,1,2],[3,4,5],[6,7,8]]
What's the best way to do this?
Thanks.
You could use string.ascii_uppercase to index the index in the alphabet. And reshape arr1 by 3 chunks:
from string import ascii_uppercase
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(ascii_uppercase.index)(arr2).ravel()]
Or just directly map A to 0 and so on...
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(['A', 'B', 'C'].index)(arr2).ravel()]
Both Output:
array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
If you gonna have shape of arr1 fixed as shown above (3,9) then it can be done with single line of code as below:
arr2 = np.array([arr1[0][0:3],arr1[1][3:6],arr1[2][6:9]])
The output will be as follows:
[[ 1 2 3]
[13 14 15]
[25 26 27]]
you can use 'advanced indexing' which index the target array by coordinate arrays.
rows = np.array([[0,0,0],[1,1,1],[2,2,2]])
cols = np.array([[0,1,2],[3,4,5],[6,7,8]])
arr1[rows, cols]
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
and you can make some functions like
def diagonal(arr, step):
rows = np.array([[x]*step for x in range(step)])
cols = np.array([[y for y in range(x, x+step)] for x in range(0, step**2, step)])
return arr[rows, cols]
diagonal(arr1, 3)
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
reference: https://numpy.org/devdocs/user/basics.indexing.html

How to Reccurently Transpose A Series/List/Array

I have a array/list/pandas series :
np.arange(15)
Out[11]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
What I want is:
[[0,1,2,3,4,5],
[1,2,3,4,5,6],
[2,3,4,5,6,7],
...
[10,11,12,13,14]]
That is, recurently transpose this columns into a 5-column matrix.
The reason is that I am doing feature engineering for a column of temperature data. I want to use last 5 data as features and the next as target.
What's the most efficient way to do that? my data is large.
If the array is formatted like this :
arr = np.array([1,2,3,4,5,6,7,8,....])
You could try it like this :
recurr_transpose = np.matrix([[arr[i:i+5] for i in range(len(arr)-4)]])

Numpy Problems with Arrays of poly1d Objects

I'd like to first start this out with the fact that it is possible, in numpy, to create an array of poly1d objects:
random_poly = np.frompyfunc(lambda i, j: np.poly1d(np.random.randint(1, 4, 3)), 2, 1)
def random_poly_array(shape):
return np.fromfunction(random_poly, shape)
a1 = random_poly_array((3,3))
This works just fine, and we can even multiply matrices made from this form using np.dot:
a2 = random_poly_array((3,3))
a1_x_a2 = np.dot(a1, a2)
However, most other methods fail to work. For example, you can't take a list of certain poly1d objects and convert it into an array:
np.array([np.poly1d([1,2,3]), np.poly1d([1,2,3])])
As that will raise ValueError: cannot copy sequence with size 2 to array axis with dimension 3. To add to the confusion,
np.array([np.poly1d([1,2]), np.poly1d([1,2])])
will not raise an error, but instead create a 2x2 array of just 2's. Adding dtype=object has no affect, and numpy will still try to convert the poly1d objects to arrays.
The reason why this is problematic is that one cannot take an array of dimension d and convert it to an array of poly1d objects of dimension d-1. I would have expected
arr = np.arange(1, 10).reshape(3,3)
np.apply_along_axis(np.poly1d, 0, arr)
To return an array of poly1d objects, but instead it returns an unalterated array. Even worse, if arr=np.arange(9).reshape(3,3), it will throw an error, as the first poly1d object created will have a length of 2 instead of 3 due to the zero coefficient. Thus, my question is this: is there a feasible method to create poly1d arrays in numpy? If not, why not?
Using the concept of None forcing numpy to not broadcast an object into an array, something brought to my attention by Paul Panzer, I created a function which will transform the last axis into a poly1d object:
def array_to_poly(arr):
return np.apply_along_axis(lambda poly: [None, np.poly1d(poly)], -1, arr)[..., 1]
However, if we're okay with abusing more than one system in a single function, we can make it apply over arbitrary axes:
def array_to_poly(arr, axis=-1):
temp_arr = np.apply_along_axis(lambda poly: [None, np.poly1d(poly)], axis, arr)
n = temp_arr.ndim
s = [slice(None) if i != axis%n else 1 for i in range(n)]
return temp_arr[s]
Testing it with arr = np.arange(1, 25).reshape(2,3,4), we obtain:
In [ ]: array_to_poly(arr, 0)
Out[ ]:
array([[poly1d([ 1, 13]), poly1d([ 2, 14]), poly1d([ 3, 15]),
poly1d([ 4, 16])],
[poly1d([ 5, 17]), poly1d([ 6, 18]), poly1d([ 7, 19]),
poly1d([ 8, 20])],
[poly1d([ 9, 21]), poly1d([10, 22]), poly1d([11, 23]),
poly1d([12, 24])]], dtype=object)
In [ ]: array_to_poly(arr, 1)
Out[ ]:
array([[poly1d([1, 5, 9]), poly1d([ 2, 6, 10]), poly1d([ 3, 7, 11]),
poly1d([ 4, 8, 12])],
[poly1d([13, 17, 21]), poly1d([14, 18, 22]), poly1d([15, 19, 23]),
poly1d([16, 20, 24])]], dtype=object)
In [ ]: array_to_poly(arr, 2)
Out[ ]:
array([[poly1d([1, 2, 3, 4]), poly1d([5, 6, 7, 8]),
poly1d([ 9, 10, 11, 12])],
[poly1d([13, 14, 15, 16]), poly1d([17, 18, 19, 20]),
poly1d([21, 22, 23, 24])]], dtype=object)
as expected.

Efficient numpy array random views with dropped dimensions

For computer vision training purposes, random cropping is often used as a data augmentation technique. At each iteration, a batch of random crops is generated and fed to the network being trained. This needs to be efficient, as it is done at each training iteration.
If the data has too many dimensions, random dimension selection might also be needed. Random frames can be selected in a video for example. The data can even have 4 dimensions (3 in space + time), or more.
How can one write an efficient generator of random views of lower dimension?
A very naïve version for getting 2D views from 3D data, and only one by one, could be:
import numpy as np
import numpy.random as nr
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
selector = np.zeros(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 1
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 1
else:
selector[:, :, drop_dim_keep] = 1
yield np.squeeze(data[selector])
A more elegant solution probably exists, where at least:
there is no ugly if/else on the randomly chosen dimension
views can take a batch_size integer argument and generate several views at once without a loop
the dimension of input/output data is not specified (e.g. can do 3D -> 2D as well as 4D -> 2D)
I tweaked your function to clarify what it's doing:
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
dropshape = list(shape[:])
dropshape[drop_dim] -= 1
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
selector = np.ones(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 0
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 0
else:
selector[:, :, drop_dim_keep] = 0
yield data[selector].reshape(dropshape)
A small sample run:
In [534]: data = np.arange(24).reshape(shape)
In [535]: data
Out[535]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [536]: v = views()
In [537]: next(v)
2 1
Out[537]:
array([[[ 0, 2, 3],
[ 4, 6, 7],
[ 8, 10, 11]],
[[12, 14, 15],
[16, 18, 19],
[20, 22, 23]]])
In [538]: next(v)
0 0
Out[538]:
array([[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
So it's picking one of the dimensions, and for that dimension dropping one 'column'.
The main efficiency issue is whether it's returning a view or a copy. In this case it has to return a copy.
You are using a boolean mask to select the return, exactly the same as what np.delete does in this case.
In [544]: np.delete(data,1,2).shape
Out[544]: (2, 3, 3)
In [545]: np.delete(data,0,0).shape
Out[545]: (1, 3, 4)
So you could replace much of your interals with delete, letting it take care of generalizing the dimensions. Look at its code to see how it handles those details (It isn't short and sweet!).
def rand_delete():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
yield np.delete(data, drop_dim_keep, drop_dim)
In [547]: v1=rand_delete()
In [548]: next(v1)
0 1
Out[548]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
In [549]: next(v1)
2 0
Out[549]:
array([[[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]],
[[13, 14, 15],
[17, 18, 19],
[21, 22, 23]]])
Replace the delete with take:
def rand_take():
while True:
take_dim = nr.randint(0, 3)
take_keep = nr.randint(0, shape[take_dim])
print(take_dim, take_keep)
yield np.take(data, take_keep, axis=take_dim)
In [580]: t = rand_take()
In [581]: next(t)
0 0
Out[581]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [582]: next(t)
2 3
Out[582]:
array([[ 3, 7, 11],
[15, 19, 23]])
np.take returns a copy, but the equivalent slicing does not
In [601]: data.__array_interface__['data']
Out[601]: (182632568, False)
In [602]: np.take(data,0,1).__array_interface__['data']
Out[602]: (180099120, False)
In [603]: data[:,0,:].__array_interface__['data']
Out[603]: (182632568, False)
A slicing tuple can be generated with expressions like
In [604]: idx = [slice(None)]*data.ndim
In [605]: idx[1] = 0
In [606]: data[tuple(idx)]
Out[606]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
Various numpy functions that take an axis parameter construct an indexing tuple like this. (For example one or more of the apply... functions.

how to exclude elements from numpy matrix

Suppose we have a matrix:
mat = np.random.randn(5,5)
array([[-1.3979852 , -0.37711369, -1.99509723, -0.6151796 , -0.78780951],
[ 0.12491113, 0.90526669, -0.18217331, 1.1252506 , -0.31782889],
[-3.5933008 , -0.17981343, 0.91469733, -0.59719805, 0.12728085],
[ 0.6906646 , 0.2316733 , -0.2804641 , 1.39864598, -0.09113139],
[-0.38012856, -1.7230821 , -0.5779237 , 0.30610451, -1.30015299]])
Suppose also that we have an index array:
idx = np.array([0,4,3,1,3])
While we can extract elements from the matrix using the following:
mat[idx, range(len(idx))]
array([-1.3979852 , -1.7230821 , -0.2804641 , 1.1252506 , -0.09113139])
What I want to know is how we can use the index to exclude elements from matrix, i.e. how do I obtain the following result:
array([[0.12491113 , -0.37711369, -1.99509723, -0.6151796 , -0.78780951],
[-3.5933008 , 0.90526669, -0.18217331, -0.59719805, -0.31782889],
[0.6906646 , -0.17981343, 0.91469733, 1.39864598, 0.12728085],
[-0.38012856, 0.2316733 , -0.5779237 , 0.30610451, -1.30015299]])
Thought it would be as simple as doing mat[-idx, range(len(idx))] but that doesn't work. I've also tried np.delete() but that doesn't seem to do it either. Any solutions out there that don't require looping or list comprehensions? Would appreciate any insight. Thanks.
EDIT: data must be in the same columns post processing.
When you say 'delete' does not work, what do you mean? What does it do? That might be diagnostic.
Lets first look at the selection that does work:
In [484]: mat=np.arange(25).reshape(5,5) # I like this better than random
In [485]: mat[idx,range(5)]
Out[485]: array([ 0, 21, 17, 8, 19])
this can also be used on a flattened version of the file:
In [486]: mat.flat[idx*5+np.arange(5)]
Out[486]: array([ 0, 21, 17, 8, 19])
now try the same with the default flat delete:
In [487]: np.delete(mat,idx*5+np.arange(5)).reshape(5,4)
Out[487]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 9],
[10, 11, 12, 13],
[14, 15, 16, 18],
[20, 22, 23, 24]])
delete isn't an inplace operator; it returns a new matrix. And if you specify an axis, delete removes whole rows or columns, not selected items.
mat[-idx, range(len(idx))] isn't going to work since negative indexes already have a meaning - count from the end.
This delete ends up doing boolean indexing, thus:
In [498]: mat1=mat.ravel()
In [499]: idx1=idx*5+np.arange(5)
In [500]: ii=np.ones(mat1.shape, bool)
In [501]: ii[idx1]=False
In [502]: mat1[ii]
Out[502]:
array([ 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24])
This sort of indexing/delete works even if you delete a different number of items from each row. Of course in that case you couldn't count on reshaping the matrix back to a rectangular matrix.
In general when dealing with different indexes for different rows, the operation ends up acting on the flat or raveled version of the matrix. 'Irregular' operations usually make more sense when dealing with 1d arrays than with 2d.
Looking more carefully at your example, I see that when you remove an item, you move the other column values up to fill the gap. In my version, I moved values along rows. Let's try this with F ordered.
In [523]: mat2=mat.flatten('F')
In [524]: np.delete(mat2,idx2).reshape(5,4).T
Out[524]:
array([[ 5, 1, 2, 3, 4],
[10, 6, 7, 13, 9],
[15, 11, 12, 18, 14],
[20, 16, 22, 23, 24]])
where I removed a value from each column:
In [525]: mat2[idx2]
Out[525]: array([ 0, 21, 17, 8, 19])

Categories

Resources