Explanation of boolean indexing behaviors - python

For the 2D array y:
y = np.arange(20).reshape(5,4)
---
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
All indexing select 1st, 3rd, and 5th rows. This is clear.
print(y[
[0, 2, 4],
::
])
print(y[
[0, 2, 4],
::
])
print(y[
[True, False, True, False, True],
::
])
---
[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
Questions
Please help understand what rules or mechanism are working to produce the results.
Replacing [] with tuple produces an empty array with shape (0, 5, 4).
y[
(True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)
Use single True adds a new axis.
y[True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True].shape
---
(1, 5, 4)
Adding additional boolean True produces the same.
y[True, True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True, True].shape
---
(1, 5, 4)
However, adding False boolean causes the empty array again.
y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)
Not sure the documentation explains this behavior.
Boolean array indexing
In general if an index includes a Boolean array, the result will be
identical to inserting obj.nonzero() into the same position and using
the integer array indexing mechanism described above. x[ind_1,
boolean_array, ind_2] is equivalent to x[(ind_1,) +
boolean_array.nonzero() + (ind_2,)].
If there is only one Boolean array and no integer indexing array
present, this is straight forward. Care must only be taken to make
sure that the boolean index has exactly as many dimensions as it is
supposed to work with.

Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:
/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/
So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.
This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:
x = np.ones((2, 2))
assert x[x > 0].ndim == 1
x = np.ones(2)
assert x[x > 0].ndim == 1
x = np.ones(())
assert x[x > 0].ndim == 1 # scalar boolean here!
The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.
Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

Related

Numpy where() using a condition that changes with the items position in the array

I'm trying to build a grid world using numpy.
The grid is 4*4 and laid out in a square.
The first and last squares (i.e. 1 and 16) are terminal squares.
At each time step you can move one step in any direction either: up, down , left or right.
Once you enter one of the terminal squares no further moves are possible and the game terminates.
The first and last columns are the left and right edges of the square whilst the first and last rows represent the top and bottom edges.
If you are on an edge, for example the left one and attempt to move left, instead of moving left you stay in the square you started in. Similarly you remain in the same square if you try and cross any of the other edges.
Although the grid is a square I've implemented it as an array.
States_r calculates the position of the states after a move right. 1 and 16 stay where they are because they are terminal states (note the code uses zero based counting so 1 and 16 are 0 and 15 respectively in the code).
The rest of the squares are in increased by one. The code for states_r works however those squares on the right edge i.e. (4, 8, 12) should also stay where they are but states_r code doesn't do that.
State_l is my attempt to include the edge condition for the left edge of the square. The logic is the same the terminal states (1, 16) should not move nor should those squares on the left edge (5, 9, 13). I think the general logic is correct but it's producing an error.
states = np.arange(16)
states_r = states[np.where((states + 1 <= 15) & (states != 0), states + 1, states)]
states_l = states[np.where((max(1, (states // 4) * 4) <= states - 1) & (states != 15), states - 1, states)]
The first example states_r works, it handles the terminal state but does not handle the edge condition.
The second example is my attempt to include the edge condition, however it is giving me the following error:
"The truth value of an array with more than one element is ambiguous."
Can someone please explain how to fix my code?
Or alternatively suggest another solution,ideally I want the code to be fast (so I can scale it up) so I want to avoid for loops if possible?
If I understood correctly you want arrays which indicate for each state where the next state is, depending on the move (right, left, up, down).
If so, I guess your implementation of state_r is not quit right. I would suggest to switch to a 2D representation of your grid, because a lot of the things you describe are easier and more intuitive to handle if you have x and y directly (at least for me).
import numpy as np
n = 4
states = np.arange(n*n).reshape(n, n)
states_r, states_l, states_u, states_d = (states.copy(), states.copy(),
states.copy(), states.copy())
states_r[:, :n-1] = states[:, 1:]
states_l[:, 1:] = states[:, :n-1]
states_u[1:, :] = states[:n-1, :]
states_d[:n-1, :] = states[1:, :]
# up [[ 0, 1, 2, 3],
# left state right [ 0, 1, 2, 3],
# down [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]]
#
# [[ 0, 0, 1, 2], [[ 0, 1, 2, 3], [[ 1, 2, 3, 3],
# [ 4, 4, 5, 6], [ 4, 5, 6, 7], [ 5, 6, 7, 7],
# [ 8, 8, 9, 10], [ 8, 9, 10, 11], [ 9, 10, 11, 11],
# [12, 12, 13, 14]] [12, 13, 14, 15]] [13, 14, 15, 15]]
#
# [[ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [12, 13, 14, 15]]
If you want to exclude the terminal states, you can do something like this:
terminal_states = np.zeros((n, n), dtype=bool)
terminal_states[0, 0] = True
terminal_states[-1, -1] = True
states_r[terminal_states] = states[terminal_states]
states_l[terminal_states] = states[terminal_states]
states_u[terminal_states] = states[terminal_states]
states_d[terminal_states] = states[terminal_states]
If you prefer the 1D approach:
import numpy as np
n = 4
states = np.arange(n*n)
valid_s = np.ones(n*n, dtype=bool)
valid_s[0] = False
valid_s[-1] = False
states_r = np.where(np.logical_and(valid_s, states % n < n-1), states+1, states)
states_l = np.where(np.logical_and(valid_s, states % n > 0), states-1, states)
states_u = np.where(np.logical_and(valid_s, states > n-1), states-n, states)
states_d = np.where(np.logical_and(valid_s, states < n**2-n), states+n, states)
Another way of doing it without preallocating arrays:
states = np.arange(16).reshape(4,4)
states_l = np.hstack((states[:,0][:,None],states[:,:-1],))
states_r = np.hstack((states[:,1:],states[:,-1][:,None]))
states_d = np.vstack((states[1:,:],states[-1,:]))
states_u = np.vstack((states[0,:],states[:-1,:]))
To get them all in 1-D, you can always flatten()/ravel()/reshape(-1) the 2-D arrays.
[[ 0 1 2 3]
[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 0 0 1 2] [[ 0 1 2 3] [[ 1 2 3 3]
[ 4 4 5 6] [ 4 5 6 7] [ 5 6 7 7]
[ 8 8 9 10] [ 8 9 10 11] [ 9 10 11 11]
[12 12 13 14]] [12 13 14 15]] [13 14 15 15]]
[[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[12 13 14 15]]
And for corners you can do:
states_u[-1,-1] = 15
states_l[-1,-1] = 15

can u please explain how this tuple (2,0,1) transposing ,i am not able to find logic of this transpose [duplicate]

In [28]: arr = np.arange(16).reshape((2, 2, 4))
In [29]: arr
Out[29]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]]])
In [32]: arr.transpose((1, 0, 2))
Out[32]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[ 4, 5, 6, 7],
[12, 13, 14, 15]]])
When we pass a tuple of integers to the transpose() function, what happens?
To be specific, this is a 3D array: how does NumPy transform the array when I pass the tuple of axes (1, 0 ,2)? Can you explain which row or column these integers refer to? And what are axis numbers in the context of NumPy?
To transpose an array, NumPy just swaps the shape and stride information for each axis. Here are the strides:
>>> arr.strides
(64, 32, 8)
>>> arr.transpose(1, 0, 2).strides
(32, 64, 8)
Notice that the transpose operation swapped the strides for axis 0 and axis 1. The lengths of these axes were also swapped (both lengths are 2 in this example).
No data needs to be copied for this to happen; NumPy can simply change how it looks at the underlying memory to construct the new array.
Visualising strides
The stride value represents the number of bytes that must be travelled in memory in order to reach the next value of an axis of an array.
Now, our 3D array arr looks this (with labelled axes):
This array is stored in a contiguous block of memory; essentially it is one-dimensional. To interpret it as a 3D object, NumPy must jump over a certain constant number of bytes in order to move along one of the three axes:
Since each integer takes up 8 bytes of memory (we're using the int64 dtype), the stride value for each dimension is 8 times the number of values that we need to jump. For instance, to move along axis 1, four values (32 bytes) are jumped, and to move along axis 0, eight values (64 bytes) need to be jumped.
When we write arr.transpose(1, 0, 2) we are swapping axes 0 and 1. The transposed array looks like this:
All that NumPy needs to do is to swap the stride information for axis 0 and axis 1 (axis 2 is unchanged). Now we must jump further to move along axis 1 than axis 0:
This basic concept works for any permutation of an array's axes. The actual code that handles the transpose is written in C and can be found here.
As explained in the documentation:
By default, reverse the dimensions, otherwise permute the axes according to the values given.
So you can pass an optional parameter axes defining the new order of dimensions.
E.g. transposing the first two dimensions of an RGB VGA pixel array:
>>> x = np.ones((480, 640, 3))
>>> np.transpose(x, (1, 0, 2)).shape
(640, 480, 3)
In C notation, your array would be:
int arr[2][2][4]
which is an 3D array having 2 2D arrays. Each of those 2D arrays has 2 1D array, each of those 1D arrays has 4 elements.
So you have three dimensions. The axes are 0, 1, 2, with sizes 2, 2, 4. This is exactly how numpy treats the axes of an N-dimensional array.
So, arr.transpose((1, 0, 2)) would take axis 1 and put it in position 0, axis 0 and put it in position 1, and axis 2 and leave it in position 2. You are effectively permuting the axes:
0 -\/-> 0
1 -/\-> 1
2 ----> 2
In other words, 1 -> 0, 0 -> 1, 2 -> 2. The destination axes are always in order, so all you need is to specify the source axes. Read off the tuple in that order: (1, 0, 2).
In this case your new array dimensions are again [2][2][4], only because axes 0 and 1 had the same size (2).
More interesting is a transpose by (2, 1, 0) which gives you an array of [4][2][2].
0 -\ /--> 0
1 --X---> 1
2 -/ \--> 2
In other words, 2 -> 0, 1 -> 1, 0 -> 2. Read off the tuple in that order: (2, 1, 0).
>>> arr.transpose((2,1,0))
array([[[ 0, 8],
[ 4, 12]],
[[ 1, 9],
[ 5, 13]],
[[ 2, 10],
[ 6, 14]],
[[ 3, 11],
[ 7, 15]]])
You ended up with an int[4][2][2].
You'll probably get better understanding if all dimensions were of different size, so you could see where each axis went.
Why is the first inner element [0, 8]? Because if you visualize your 3D array as two sheets of paper, 0 and 8 are lined up, one on one paper and one on the other paper, both in the upper left. By transposing (2, 1, 0) you're saying that you want the direction of paper-to-paper to now march along the paper from left to right, and the direction of left to right to now go from paper to paper. You had 4 elements going from left to right, so now you have four pieces of paper instead. And you had 2 papers, so now you have 2 elements going from left to right.
Sorry for the terrible ASCII art. ¯\_(ツ)_/¯
It seems the question and the example originates from the book Python for Data Analysis by Wes McKinney. This feature of transpose is mentioned in Chapter 4.1. Transposing Arrays and Swapping Axes.
For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes (for extra mind bending).
Here "permute" means "rearrange", so rearranging the order of axes.
The numbers in .transpose(1, 0, 2) determines how the order of axes are changed compared to the original. By using .transpose(1, 0, 2), we mean, "Change the 1st axis with the 2nd." If we use .transpose(0, 1, 2), the array will stay the same because there is nothing to change; it is the default order.
The example in the book with a (2, 2, 4) sized array is not very clear since 1st and 2nd axes has the same size. So the end result doesn't seem to change except the reordering of rows arr[0, 1] and arr[1, 0].
If we try a different example with a 3 dimensional array with each dimension having a different size, the rearrangement part becomes more clear.
In [2]: x = np.arange(24).reshape(2, 3, 4)
In [3]: x
Out[3]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [4]: x.transpose(1, 0, 2)
Out[4]:
array([[[ 0, 1, 2, 3],
[12, 13, 14, 15]],
[[ 4, 5, 6, 7],
[16, 17, 18, 19]],
[[ 8, 9, 10, 11],
[20, 21, 22, 23]]])
Here, original array sizes are (2, 3, 4). We changed the 1st and 2nd, so it becomes (3, 2, 4) in size. If we look closer to see how the rearrangement exactly happened; arrays of numbers seems to have changed in a particular pattern. Using the paper analogy of #RobertB, if we were to take the 2 chunks of numbers, and write each one on sheets, then take one row from each sheet to construct one dimension of the array, we would now have a 3x2x4-sized array, counting from the outermost to the innermost layer.
[ 0, 1, 2, 3] \ [12, 13, 14, 15]
[ 4, 5, 6, 7] \ [16, 17, 18, 19]
[ 8, 9, 10, 11] \ [20, 21, 22, 23]
It could be a good idea to play with different sized arrays, and change different axes to gain a better intuition of how it works.
I ran across this in Python for Data Analysis by Wes McKinney as well.
I will show the simplest way of solving this for a 3-dimensional tensor, then describe the general approach that can be used for n-dimensional tensors.
Simple 3-dimensional tensor example
Suppose you have the (2,2,4)-tensor
[[[ 0 1 2 3]
[ 4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]]
If we look at the coordinates of each point, they are as follows:
[[[ (0,0,0) (0,0,1) (0,0,2) (0,0,3)]
[ (0,1,0) (0,1,1) (0,1,2) (0,1,3)]]
[[ (1,0,0) (1,0,1) (1,0,2) (0,0,3)]
[ (1,1,0) (1,1,1) (1,1,2) (0,1,3)]]
Now suppose that the array above is example_array and we want to perform the operation: example_array.transpose(1,2,0)
For the (1,2,0)-transformation, we shuffle the coordinates as follows (note that this particular transformation amounts to a "left-shift":
(0,0,0) -> (0,0,0)
(0,0,1) -> (0,1,0)
(0,0,2) -> (0,2,0)
(0,0,3) -> (0,3,0)
(0,1,0) -> (1,0,0)
(0,1,1) -> (1,1,0)
(0,1,2) -> (1,2,0)
(0,1,3) -> (1,3,0)
(1,0,0) -> (0,0,1)
(1,0,1) -> (0,1,1)
(1,0,2) -> (0,2,1)
(0,0,3) -> (0,3,0)
(1,1,0) -> (1,0,1)
(1,1,1) -> (1,1,1)
(1,1,2) -> (1,2,1)
(0,1,3) -> (1,3,0)
Now, for each original value, place it into the shifted coordinates in the result matrix.
For instance, the value 10 has coordinates (1, 0, 2) in the original matrix and will have coordinates (0, 2, 1) in the result matrix. It is placed into the first 2d tensor submatrix in the third row of that submatrix, in the second column of that row.
Hence, the resulting matrix is:
array([[[ 0, 8],
[ 1, 9],
[ 2, 10],
[ 3, 11]],
[[ 4, 12],
[ 5, 13],
[ 6, 14],
[ 7, 15]]])
General n-dimensional tensor approach
For n-dimensional tensors, the algorithm is the same. Consider all of the coordinates of a single value in the original matrix. Shuffle the axes for that individual coordinate. Place the value into the resulting, shuffled coordinates in the result matrix. Repeat for all of the remaining values.
To summarise a.transpose()[i,j,k] = a[k,j,i]
a = np.array( range(24), int).reshape((2,3,4))
a.shape gives (2,3,4)
a.transpose().shape gives (4,3,2) shape tuple is reversed.
when is a tuple parameter is passed axes are permuted according to the tuple.
For example
a = np.array( range(24), int).reshape((2,3,4))
a[i,j,k] equals a.transpose((2,0,1))[k,i,j]
axis 0 takes 2nd place
axis 1 takes 3rd place
axis 2 tales 1st place
of course we need to take care that values in tuple parameter passed to transpose are unique and in range(number of axis)

Indexing of duplicate in aligned time series indices

Say I have two time sequences whose indices are aligned as follows:
import numpy as np
t1_ind = np.array([ 1, 1, 1, 2, 3, 4, 5, 5, 6])
t2_ind = np.array([20, 21, 22, 23, 23, 24, 25, 26, 27])
which means that the index 1 of t1 is aligned with index 20, 21 and 22 of t2 (implying that t1 is faster than t2 in the first three increments) and so on.
The expected output should be:
y = np.array(([ 1, 2, 4, 5, 6],
[20, 23, 24, 25, 27]))
The logic is to "scan" t1_ind and t2_ind and mark both the onset and offset of every duplicate segment. In this example, the entry 1 in t1_ind is followed by its duplicate, so the onset pair is recorded in y[:,0], and the respective offset pair is y[:,1]. The next duplicate segment in t1_ind starts and ends as y[:,3] and y[:,4], respectively. t2_ind is done in the same way, the resulting pairs are y[:,1] (won't be recorded twice though) and y[:,2]. It seems to me similar with a duplicate-removal problem but I don't know how.
Sorry it is kinda hard for me to think of a proper title and to explain the logic precisely in short. Thanks for any help.
You can create a boolean slice that you can pass to both array, based on the conditions you set up. Since nothing comes before the first elements, we will always keep the those. You can check for repeated elements after the first by subtracting slices of the arrays that are shifted by 1. Doing this for both arrays gives you the boolean array to use as the slice.
array_slice = np.concatenate((
np.array([True]),
((t1_ind[1:] - t1_ind[:-1]) != 0) &
(t2_ind[1:] - t2_ind[:-1]) != 0)
))
array_slice
# returns:
array([ True, False, False, True, False, True, True, False, True], dtype=bool)
t1_ind[array_slice]
t2_ind[array_slice]
# returns:
array([1, 2, 4, 5, 6])
array([20, 23, 24, 25, 27])

Handling masked numpy array

I have masked numpy array. While doing processing for each of the element, I need to first check whether the particular element is masked or not, if masked then I need to skip those element.
I have tried like this :
from netCDF4 import Dataset
data=Dataset('test.nc')
dim_size=len(data.dimensions[nc_dims[0]])
model_dry_tropo_corr=data.variables['model_dry_tropo_corr'][:]
solid_earth_tide=data.variables['solid_earth_tide'][:]
for i in range(0,dim_size)
try :
model_dry_tropo_corr[i].mask=True
continue
except :
Pass
try:
solid_earth_tide[i].mask=True
continue
except:
Pass
correction=model_dry_tropo_corr[i]/2+solid_earth_tide[i]
Is there other efficient way to do this, please do let me know. Your suggestion or comments are highly appreciated.
Instead of a loop you could use
correction = model_dry_tropo_corr/2 + solid_earth_tide
This will create a new masked array that will have your answers and masks. You could then access unmasked values from new array.
I'm puzzled about this code
try :
model_dry_tropo_corr[i].mask=True
continue
except :
Pass
I don't have netCDF4 installed, but it appears from the documentation that your variable will look like, maybe even be a numpy.ma masked array.
It would be helpful if you printed all or part of this variable, with attributes like shape and dtype.
I can make a masked array with an expression like:
In [746]: M=np.ma.masked_where(np.arange(10)%3==0,np.arange(10))
In [747]: M
Out[747]:
masked_array(data = [-- 1 2 -- 4 5 -- 7 8 --],
mask = [ True False False True False False True False False True],
fill_value = 999999)
I can test whether mask for a given element if True/False with:
In [748]: M.mask[2]
Out[748]: False
In [749]: M.mask[3]
Out[749]: True
But if I index first,
In [754]: M[2]
Out[754]: 2
In [755]: M[3]
Out[755]: masked
In [756]: M[2].mask=True
...
AttributeError: 'numpy.int32' object has no attribute 'mask'
In [757]: M[3].mask=True
So yes, your try/except will skip the elements that have the mask set True.
But I think it would be clear to do:
if model_dry_tropo_corr.mask[i]:
continue
But that is still iterative.
But as #user3404344 showed, you could perform the math with the variables. Masking will carry over. That could though be a problem if masked values are 'bad' and cause errors in the calculation.
If I define another masked array
In [764]: N=np.ma.masked_where(np.arange(10)%4==0,np.arange(10))
In [765]: N+M
Out[765]:
masked_array(data = [-- 2 4 -- -- 10 -- 14 -- --],
mask = [ True False False True True False True False True True],
fill_value = 999999)
you can see how elements that were masked in either M or N are masked in the result
I can used the compressed method to give only the valid elements
In [766]: (N+M).compressed()
Out[766]: array([ 2, 4, 10, 14])
filling can also be handy when doing math with masked arrays:
In [779]: N.filled(0)+M.filled(0)
Out[779]: array([ 0, 2, 4, 3, 4, 10, 6, 14, 8, 9])
I could use filled to neutralize problem calculations, and still mask those values
In [785]: z=np.ma.masked_array(N.filled(0)+M.filled(0),mask=N.mask|M.mask)
In [786]: z
Out[786]:
masked_array(data = [-- 2 4 -- -- 10 -- 14 -- --],
mask = [ True False False True True False True False True True],
fill_value = 999999)
Oops, I don't need to worry about the masked values messing the calculation. The masked addition is doing the filling for me
In [787]: (N+M).data
Out[787]: array([ 0, 2, 4, 3, 4, 10, 6, 14, 8, 9])
In [788]: N.data+M.data # raw unmasked addition
Out[788]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
In [789]: z.data # same as the (N+M).data
Out[789]: array([ 0, 2, 4, 3, 4, 10, 6, 14, 8, 9])

Apply function to an array of tuples

I have a function that I would like to apply to an array of tuples and I am wondering if there is a clean way to do it.
Normally, I could use np.vectorize to apply the function to each item in the array, however, in this case "each item" is a tuple so numpy interprets the array as a 3d array and applies the function to each item within the tuple.
So I can assume that the incoming array is one of:
tuple
1 dimensional array of tuples
2 dimensional array of tuples
I can probably write some looping logic but it seems like numpy most likely has something that does this more efficiently and I don't want to reinvent the wheel.
This is an example. I am trying to apply the tuple_converter function to each tuple in the array.
array_of_tuples1 = np.array([
[(1,2,3),(2,3,4),(5,6,7)],
[(7,2,3),(2,6,4),(5,6,6)],
[(8,2,3),(2,5,4),(7,6,7)],
])
array_of_tuples2 = np.array([
(1,2,3),(2,3,4),(5,6,7),
])
plain_tuple = (1,2,3)
# Convert each set of tuples
def tuple_converter(tup):
return tup[0]**2 + tup[1] + tup[2]
# Vectorizing applies the formula to each integer rather than each tuple
tuple_converter_vectorized = np.vectorize(tuple_converter)
print(tuple_converter_vectorized(array_of_tuples1))
print(tuple_converter_vectorized(array_of_tuples2))
print(tuple_converter_vectorized(plain_tuple))
Desired Output for array_of_tuples1:
[[ 6 11 38]
[54 14 37]
[69 13 62]]
Desired Output for array_of_tuples2:
[ 6 11 38]
Desired Output for plain_tuple:
6
But the code above produces this error (because it is trying to apply the function to an integer rather than a tuple.)
<ipython-input-209-fdf78c6f4b13> in tuple_converter(tup)
10
11 def tuple_converter(tup):
---> 12 return tup[0]**2 + tup[1] + tup[2]
13
14
IndexError: invalid index to scalar variable.
array_of_tuples1 and array_of_tuples2 are not actually arrays of tuples, but just 3- and 2-dimensional arrays of integers:
In [1]: array_of_tuples1 = np.array([
...: [(1,2,3),(2,3,4),(5,6,7)],
...: [(7,2,3),(2,6,4),(5,6,6)],
...: [(8,2,3),(2,5,4),(7,6,7)],
...: ])
In [2]: array_of_tuples1
Out[2]:
array([[[1, 2, 3],
[2, 3, 4],
[5, 6, 7]],
[[7, 2, 3],
[2, 6, 4],
[5, 6, 6]],
[[8, 2, 3],
[2, 5, 4],
[7, 6, 7]]])
So, instead of vectorizing your function, because it then will basically for-loop through the elements of the array (integers), you should apply it on the suitable axis (the axis of the "tuples") and not care about the type of the sequence:
In [6]: np.apply_along_axis(tuple_converter, 2, array_of_tuples1)
Out[6]:
array([[ 6, 11, 38],
[54, 14, 37],
[69, 13, 62]])
In [9]: np.apply_along_axis(tuple_converter, 1, array_of_tuples2)
Out[9]: array([ 6, 11, 38])
The other answer above is certainly correct, and probably what you're looking for. But I noticed you put the word "clean" into your question, and so I'd like to add this answer as well.
If we can make the assumption that all the tuples are 3 element tuples (or that they have some constant number of elements), then there's a nice little trick you can do so that the same piece of code will work on any single tuple, 1d array of tuples, or 2d array of tuples without an if/else for the 1d/2d cases. I'd argue that avoiding switches is always cleaner (although I suppose this could be contested).
import numpy as np
def map_to_tuples(x):
x = np.array(x)
flattened = x.flatten().reshape(-1, 3)
return np.array([tup[0]**2 + tup[1] + tup[2] for tup in flattened]).reshape(x.shape[:-1])
Outputs the following for your inputs (respectively), as desired:
[[ 6 11 38]
[54 14 37]
[69 13 62]]
[ 6 11 38]
6
If you are serious about the tuples bit, you could define a structured dtype.
In [535]: dt=np.dtype('int,int,int')
In [536]: x1 = np.array([
[(1,2,3),(2,3,4),(5,6,7)],
[(7,2,3),(2,6,4),(5,6,6)],
[(8,2,3),(2,5,4),(7,6,7)],
], dtype=dt)
In [537]: x1
Out[537]:
array([[(1, 2, 3), (2, 3, 4), (5, 6, 7)],
[(7, 2, 3), (2, 6, 4), (5, 6, 6)],
[(8, 2, 3), (2, 5, 4), (7, 6, 7)]],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
Note that the display uses tuples. x1 is a 3x3 array of type dt. The elements, or records, are displayed as tuples. This more useful if the tuple elements differ - float, integer, string etc.
Now define a function that works with fields of such an array:
In [538]: def foo(tup):
return tup['f0']**2 + tup['f1'] + tup['f2']
It applies neatly to x1.
In [539]: foo(x1)
Out[539]:
array([[ 6, 11, 38],
[54, 14, 37],
[69, 13, 62]])
It also applies to a 1d array of the same dtype.
In [540]: x2=np.array([(1,2,3),(2,3,4),(5,6,7) ],dtype=dt)
In [541]: foo(x2)
Out[541]: array([ 6, 11, 38])
And a 0d array of matching type:
In [542]: foo(np.array(plain_tuple,dtype=dt))
Out[542]: 6
But foo(plain_tuple) won't work, since the function is written to work with named fields, not indexed ones.
The function could be modified to cast the input to the correct dtype if needed:
In [545]: def foo1(tup):
temp = np.asarray(tup, dtype=dt)
.....: return temp['f0']**2 + temp['f1'] + temp['f2']
In [548]: plain_tuple
Out[548]: (1, 2, 3)
In [549]: foo1(plain_tuple)
Out[549]: 6
In [554]: foo1([(1,2,3),(2,3,4),(5,6,7)]) # list of tuples
Out[554]: array([ 6, 11, 38])

Categories

Resources