Python: Cell arrays comparison using minus function - python

I have 3 cell arrays with each cell array have different sizes of array. How can I perform minus function for each of the possible combinations of cell arrays? For example:
import numpy as np
a=np.array([[np.array([[2,2,1,2]]),np.array([[1,3]])]])
b=np.array([[np.array([[4,2,1]])]])
c=np.array([[np.array([[1,2]]),np.array([[4,3]])]])
The possible combination here is a-b, a-c and b-c. Let's say a - b:
a=2,2,1,2 and 1,3
b=4,2,1
The desired result come with shifting windows due to different size array:
(2,2,1)-(4,2,1) ----> -2,0,0
(2,1,2)-(4,2,1) ----> -2,-1,1
(1,3) -(4,2) ----> -3,1,1
(1,3) -(2,1) ----> 4,-1,2
I would like to know how to use python create shifting window that allow me to minus my cell arrays.

You can use the function sliding_window() from the toolz library to do the shifting window:
>>> import numpy as np
>>> import toolz
>>> a = np.array([2,2,1,2])
>>> b = np.array([4, 2, 1])
>>> for chunk in toolz.sliding_window(b.size, a):
...: print(chunk - b)
...:
[-2 0 0]
[-2 -1 1]

I think this pair of functions does what you want. The first may need some tweaking to get the pairing of the differences right.
import numpy as np
def diffs(a,b):
# collect sliding window differences
# length of window determined by the shorter array
# if a,b are not arrays, need to replace b[...]-a with
# a list comprehension
n,m=len(a),len(b)
if n>m:
# ensure s is the shorter
b,a=a,b # switch
n,m=len(a),len(b)
# may need to correct for sign switch
result=[]
for i in range(0,1+m-n):
result.append(b[i:i+n]-a)
return result
def alldiffs(a,b):
# collect all the differences for elements of a and b
# a,b could be lists or arrays of arrays, or 2d arrays
result=[]
for aa in a:
for bb in b:
result.append(diffs(aa,bb))
return result
# define the 3 arrays
# each is a list of 1d arrays
a=[np.array([2,2,1,2]),np.array([1,3])]
b=[np.array([4,2,1])]
c=[np.array([1,2]),np.array([4,3])]
# display the differences
print(alldiffs(a,b))
print(alldiffs(a,c))
print(alldiffs(b,c))
producing (with some pretty printing):
1626:~/mypy$ python stack30678737.py
[[array([-2, 0, 0]), array([-2, -1, 1])],
[array([ 3, -1]), array([ 1, -2])]]
[[array([1, 0]), array([ 1, -1]), array([0, 0])],
[array([-2, -1]), array([-2, -2]), array([-3, -1])],
[array([ 0, -1])], [array([3, 0])]]
[[array([3, 0]), array([ 1, -1])],
[array([ 0, -1]), array([-2, -2])]]
Comparing my answer to yours, I wonder, are you padding your shorter arrays with 0 so the result is always 3 elements long?
Changing a to a=[np.array([2,2,1,2]),np.array([0,1,3]),np.array([1,3,0])]
produces:
[[array([-2, 0, 0]), array([-2, -1, 1])],
[array([ 4, 1, -2])], [array([ 3, -1, 1])]]
I suppose you could do something fancier with this inner loop:
for i in range(0,1+m-n):
result.append(b[i:i+n]-a)
But why? The first order of business is to get the problem specifications clear. Speed can wait. Besides sliding window code in image packages, there is a neat striding trick in np.lib.stride_tricks.as_strided. But I doubt if that will save time, especially not in small examples like this.

Related

how do i change indexes in an array using numba

I have a function in which I do some operations and want to speed it up with numba. In my code changing the values in an array with advanced indexing is not working. I think they do say that in the numba documents. But what is a workaround for like numpy.put()?
Here a short example what I want to do:
#example array
array([[ 0, 1, 2],
[ 0, 2, -1],
[ 0, 3, -1]])
changeing the values at given indexes with any method working in numba...to get:
changed values at:[0,0], [1,2], [2,1]
#changed example array by given indexes with one given value (10)
array([[ 10, 1, 2],
[ 0, 2, 10],
[ 0, 10, -1]])
Here what I did in python, but not working with numba:
indexList is a Tuple, which works with numpy.take()
This is the working example python code and the values in the array change to 100.
x = np.zeros((151,151))
print(x.ndim)
indexList=np.array([[0,1,3],[0,1,2]])
indexList=tuple(indexList)
def change(xx,filter_list):
xx[filter_list] = 100
return xx
Z = change(x,indexList)
Now using #jit on the function:
#jit
def change(xx,filter_list):
xx[filter_list] = 100
return xx
Z = change(x,indexList)
Compilation is falling back to object mode WITH looplifting enabled because Function "change" failed type inference due to: No implementation of function Function() found for signature: setitem(array(float64, 2d, C), UniTuple(array(int32, 1d, C) x 2), Literalint)
This error comes up. So I need a workaround for this. numpy.put() is not supported by numba.
I would be greatful for any ideas.
Thankyou
If it's not a problem for your to keep the indexList as an array you can use it in conjunction with for loops in the change function to make it compatible with numba:
indexList = np.array([[0,1,3],[0,1,2]]).T
#njit()
def change(xx, filter_list):
for y, x in filter_list:
xx[y, x] = 100
return xx
change(x, indexList)
Note that the indexList has to be transposed in order to have the y, x coordinates along the 1st axis. In other words, it has to have a shape of (n, 2) rather than (2, n) for the n points to be change. Effectively it's now a list of coordinates: [[0, 0],[1, 1],[3, 2]]
#mandulaj posted the way to go. Here a little different way I went before mandulaj gave his answer.
With this function I get a deprecation warning...so best way to go with #mandulaj and dont forget to transpose the indexList.
#jit
def change_arr(arr,idx,val): # change values in array by np index array to value
for i,item in enumerate(idx):
arr[item[0],item[1]]= val
return arr

Is there any way I can iterate through an array faster than a for loop?

I'm writing a code that compares fluxes of pixels on an astronomical map with the corresponding area on another one. Both maps are numpy arrays of data.
In order to do that, I need to transform pixel indexes on the first map (Av) to their equivalent on sky coordinates, then transform those sky coordinates to their pixel indexes equivalent on the second map (CO). Then, I scale the fluxes of the second map to match the values of the first map. After that, I have to keep handling the data.
The problem is that with thousands of pixels on the first map, the code is taking a really long time to finish doing what it's supposed to do, which is a hassle for troubleshooting. I've figured out that the slowest thing on this part of the code is the for loop.
Is there any way I can iterate through a numpy array, being able to work with the indexes and calculate data from every pixel, faster than a for loop? Is there a better way to do this at all?
In pseudocode, my code is something like this:
for pixel i,j in 1st map:
sky_x1,sky_y1 = pixel_2_skycoord(i,j)
i2,j2 = skycoord_2_pixel(sky_x1,sky_y1)
Avmap.append(Avflux[i,j])
COmap.append(COflux[i2,j2]*scale)
The actual code is:
for i in xrange(0,sAv_y-1):
for j in xrange(0,sAv_x-1):
if not np.isnan(Avdata[i,j]):
y,x=wcs.utils.skycoord_to_pixel(wcs.utils.pixel_to_skycoord(i,j,wAv,0),wcs=wCO)
x=x.astype(int)+0 #the zero is because i don't understand the problem with numpy but it fixes it anyway
y=y.astype(int)+0 #i couldn't get the number from an array with 1 value but adding zero resolves it somehow
COflux=COdata[x,y]
ylist.append(Avdata[i,j])
xlist.append(COflux*(AvArea/COArea))
The culprit here is the two for loops. Numpy has many functions that prevent the use of for loops to allow fast compiled code. The trick is to vectorize your code.
You can look into numpy's meshgrid function to convert this data into a vectorized form that you can then use something like this SO question to apply an arbitrary function to that vector.
Something along the lines of:
x_width = 15
y_width = 10
x, y = np.meshgrid(range(x_width), range(y_width))
def translate(x, y, x_o, y_o):
x_new = x + x_o
y_new = y + y_o
return x_new, y_new
x_new, y_new = translate(x, y, 3, 3)
x_new[4,5], y[4,5]
(8, 4)
You must avoid loops, and do the heavy computation in the underlying C code, in Numpy or in Astropy for the sky/pixel conversion. There are several options to do this with astropy.wcs.
The first one is with SkyCoord. Let's first create a grid of value for your pixel indices:
In [30]: xx, yy = np.mgrid[:5, :5]
...: xx, yy
Out[30]:
(array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]]), array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]))
Now we can create the SkyCoord object (which is a Numpy array subclass), from the pixels indices, and using the wcs:
In [33]: from astropy.coordinates import SkyCoord
...: sky = SkyCoord.from_pixel(xx, yy, wcs)
...: sky
Out[33]:
<SkyCoord (FK5: equinox=2000.0): (ra, dec) in deg
[[(53.17127889, -27.78771333), (53.17127889, -27.78765778),
(53.17127889, -27.78760222), (53.17127889, -27.78754667),
(53.17127889, -27.78749111)],
....
Note that this is using wcs.utils.skycoord_to_pixel. This object also has a method to project to pixel with a wcs. I will the same here for practical purpose:
In [34]: sky.to_pixel(wcs)
Out[34]:
(array([[ 0.00000000e+00, -1.11022302e-16, -2.22044605e-16,
-3.33066907e-16, 1.13149046e-10],
...
[ 4.00000000e+00, 4.00000000e+00, 4.00000000e+00,
4.00000000e+00, 4.00000000e+00]]),
array([[-6.31503738e-11, 1.00000000e+00, 2.00000000e+00,
3.00000000e+00, 4.00000000e+00],
...
[-1.11457732e-10, 1.00000000e+00, 2.00000000e+00,
3.00000000e+00, 4.00000000e+00]]))
We get a tuple of float values for the new x and y indices. So you will need to round these values and convert to int to use that as array indices.
The second option is to use the lower level functions, e.g. wcs.pixel_to_world_values and wcs.world_to_pixel_values, which takes Nx2 arrays and return this as well:
In [37]: wcs.pixel_to_world_values(np.array([xx.ravel(), yy.ravel()]).T)
Out[37]:
array([[ 53.17127889, -27.78771333],
[ 53.17127889, -27.78765778],
[ 53.17127889, -27.78760222],
[ 53.17127889, -27.78754667],
...

Numpy array slice using tuple

I've read the numpy doc on slicing(especially the bottom where it discusses variable array indexing)
https://docs.scipy.org/doc/numpy/user/basics.indexing.html
But I'm still not sure how I could do the following: Write a method that either returns a 3D set of indices, or a 4D set of indices that are then used to access an array. I want to write a method for a base class, but the classes that derive from it access either 3D or 4D depending on which derived class is instantiated.
Example Code to illustrate idea:
import numpy as np
a = np.ones([2,2,2,2])
size = np.shape(a)
print(size)
for i in range(size[0]):
for j in range(size[1]):
for k in range(size[2]):
for p in range(size[3]):
a[i,j,k,p] = i*size[1]*size[2]*size[3] + j*size[2]*size[3] + k*size[3] + p
print(a)
print('compare')
indices = (0,:,0,0)
print(a[0,:,0,0])
print(a[indices])
In short, I'm trying to get a tuple(or something) that can be used to make both of the following access depending on how I fill the tuple:
a[i, 0, :, 1]
a[i, :, 1]
The slice method looked promising, but it seems to require a range, and I just want a ":" i.e. the whole dimension. What options are out there for variable numpy array dimension access?
In [324]: a = np.arange(8).reshape(2,2,2)
In [325]: a
Out[325]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
slicing:
In [326]: a[0,:,0]
Out[326]: array([0, 2])
In [327]: idx = (0,slice(None),0) # interpreter converts : into slice object
In [328]: a[idx]
Out[328]: array([0, 2])
In [331]: idx
Out[331]: (0, slice(None, None, None), 0)
In [332]: np.s_[0,:,0] # indexing trick to generate same
Out[332]: (0, slice(None, None, None), 0)
Your code appears to work how you want it using :. The reason the two examples
(a[i, 0, :, 7], a[i, :, 7])
don't work is because the 7 is out of range of the array. If you change the 7 to something in range like 1 then it returns a value, which I believe is what you are looking for.

form numpy array from possible numpy array

EDIT
I realized that I did not check my mwe very well and as such asked something of the wrong question. The main problem is when the numpy array is passed in as a 2d array instead of 1d (or even when a python list is passed in as 1d instead of 2d). So if we have
x = np.array([[1], [2], [3]])
then obviously if you try to index this then you will get arrays out (if you use item you do not). this same thing also applies to standard python lists.
Sorry about the confusion.
Original
I am trying to form a new numpy array from something that may be a numpy array or may be a standard python list.
for example
import numpy as np
x = [2, 3, 1]
y = np.array([[0, -x[2], x[1]], [x[2], 0, -x[0]], [-x[1], x[0], 0]])
Now I would like to form a function such that I can make y easily.
def skew(vector):
"""
this function returns a numpy array with the skew symmetric cross product matrix for vector.
the skew symmetric cross product matrix is defined such that
np.cross(a, b) = np.dot(skew(a), b)
:param vector: An array like vector to create the skew symmetric cross product matrix for
:return: A numpy array of the skew symmetric cross product vector
"""
return np.array([[0, -vector[2], vector[1]],
[vector[2], 0, -vector[0]],
[-vector[1], vector[0], 0]])
This works great and I can now write (assuming the above function is included)
import numpy as np
x=[2, 3, 1]
y = skew(x)
However, I would also like to be able to call skew on existing 1d or 2d numpy arrays. For instance
import numpy as np
x = np.array([2, 3, 1])
y = skew(x)
Unfortunately, doing this returns a numpy array where the elements are also numpy arrays, not python floats as I would like them to be.
Is there an easy way to form a new numpy array like I have done from something that is either a python list or a numpy array and have the result be just a standard numpy array with floats in each element?
Now obviously one solution is to check to see if the input is a numpy array or not:
def skew(vector):
"""
this function returns a numpy array with the skew symmetric cross product matrix for vector.
the skew symmetric cross product matrix is defined such that
np.cross(a, b) = np.dot(skew(a), b)
:param vector: An array like vector to create the skew symmetric cross product matrix for
:return: A numpy array of the skew symmetric cross product vector
"""
if isinstance(vector, np.ndarray):
return np.array([[0, -vector.item(2), vector.item(1)],
[vector.item(2), 0, -vector.item(0)],
[-vector.item(1), vector.item(0), 0]])
else:
return np.array([[0, -vector[2], vector[1]],
[vector[2], 0, -vector[0]],
[-vector[1], vector[0], 0]])
however, it gets very tedious having to write these instance checks all over the place.
Another solution would be to cast everything to an array first and then just use the array call
def skew(vector):
"""
this function returns a numpy array with the skew symmetric cross product matrix for vector.
the skew symmetric cross product matrix is defined such that
np.cross(a, b) = np.dot(skew(a), b)
:param vector: An array like vector to create the skew symmetric cross product matrix for
:return: A numpy array of the skew symmetric cross product vector
"""
vector = np.array(vector)
return np.array([[0, -vector.item(2), vector.item(1)],
[vector.item(2), 0, -vector.item(0)],
[-vector.item(1), vector.item(0), 0]])
but I feel like this is inefficient as it requires creating a new copy of vector (in this case not a big deal since vector is small but this is just a simple example).
My question is, is there a different way to do this outside of what I've discussed or am I stuck using one of these methods?
Arrays are iterable. You can write in your skew function:
def skew(x):
return np.array([[0, -x[2], x[1]],
[x[2], 0, -x[0]],
[-x[1], x[0], 0]])
x = [1,2,3]
y = np.array([1,2,3])
>>> skew(y)
array([[ 0, -3, 2],
[ 3, 0, -1],
[-2, 1, 0]])
>>> skew(x)
array([[ 0, -3, 2],
[ 3, 0, -1],
[-2, 1, 0]])
In any case your methods ended with 1st dimension elements being numpy arrays containing floats. You'll need in any case a call on the 2nd dimension to get the floats inside.
Regarding what you told me in the comments, you may add an if condition for 2d arrays:
def skew(x):
if (isinstance(x,ndarray) and len(x.shape)>=2):
return np.array([[0, -x[2][0], x[1][0]],
[x[2][0], 0, -x[0][0]],
[-x[1][0], x[0][0], 0]])
else:
return np.array([[0, -x[2], x[1]],
[x[2], 0, -x[0]],
[-x[1], x[0], 0]])
You can implement the last idea efficiently using numpy.asarray():
vector = np.asarray(vector)
Then, if vector is already a NumPy array, no copying occurs.
You can keep the first version of your function and convert the numpy array to list:
def skew(vector):
if isinstance(vector, np.ndarray):
vector = vector.tolist()
return np.array([[0, -vector[2], vector[1]],
[vector[2], 0, -vector[0]],
[-vector[1], vector[0], 0]])
In [58]: skew([2, 3, 1])
Out[58]:
array([[ 0, -1, 3],
[ 1, 0, -2],
[-3, 2, 0]])
In [59]: skew(np.array([2, 3, 1]))
Out[59]:
array([[ 0, -1, 3],
[ 1, 0, -2],
[-3, 2, 0]])
This is not an optimal solution but is a very easy one.
You can just convert the vector into list by default.
def skew(vector):
vector = list(vector)
return np.array([[0, -vector[2], vector[1]],
[vector[2], 0, -vector[0]],
[-vector[1], vector[0], 0]])

How to apply a function to all the column of a numpy matrix?

It should be a standard question but I am not able find the answer :(
I have a numpy darray n samples (raw) and p variables (observation).
I would like to count how many times each variables is non 0.
I would use a function like
sum([1 for i in column if i!=0])
but how can I apply this function to all the columns of my matrix?
from this post: How to apply numpy.linalg.norm to each row of a matrix?
If the operation supports axis, use the axis parameter, it's usually faster,
Otherwise, np.apply_along_axis could help.
Here is the numpy.count_nonzero.
So here is the simple answer:
import numpy as np
arr = np.eye(3)
np.apply_along_axis(np.count_nonzero, 0, arr)
You can use np.sum over a boolean array created from comparing your original array to zero, using the axis keyword argument to indicate whether you want to count over rows or columns. In your case:
>>> a = np.array([[0, 1, 1, 0],[1, 1, 0, 0]])
>>> a
array([[0, 1, 1, 0],
[1, 1, 0, 0]])
>>> np.sum(a != 0, axis=0)
array([1, 2, 1, 0])

Categories

Resources