Apply function to an array of tuples - python

I have a function that I would like to apply to an array of tuples and I am wondering if there is a clean way to do it.
Normally, I could use np.vectorize to apply the function to each item in the array, however, in this case "each item" is a tuple so numpy interprets the array as a 3d array and applies the function to each item within the tuple.
So I can assume that the incoming array is one of:
tuple
1 dimensional array of tuples
2 dimensional array of tuples
I can probably write some looping logic but it seems like numpy most likely has something that does this more efficiently and I don't want to reinvent the wheel.
This is an example. I am trying to apply the tuple_converter function to each tuple in the array.
array_of_tuples1 = np.array([
[(1,2,3),(2,3,4),(5,6,7)],
[(7,2,3),(2,6,4),(5,6,6)],
[(8,2,3),(2,5,4),(7,6,7)],
])
array_of_tuples2 = np.array([
(1,2,3),(2,3,4),(5,6,7),
])
plain_tuple = (1,2,3)
# Convert each set of tuples
def tuple_converter(tup):
return tup[0]**2 + tup[1] + tup[2]
# Vectorizing applies the formula to each integer rather than each tuple
tuple_converter_vectorized = np.vectorize(tuple_converter)
print(tuple_converter_vectorized(array_of_tuples1))
print(tuple_converter_vectorized(array_of_tuples2))
print(tuple_converter_vectorized(plain_tuple))
Desired Output for array_of_tuples1:
[[ 6 11 38]
[54 14 37]
[69 13 62]]
Desired Output for array_of_tuples2:
[ 6 11 38]
Desired Output for plain_tuple:
6
But the code above produces this error (because it is trying to apply the function to an integer rather than a tuple.)
<ipython-input-209-fdf78c6f4b13> in tuple_converter(tup)
10
11 def tuple_converter(tup):
---> 12 return tup[0]**2 + tup[1] + tup[2]
13
14
IndexError: invalid index to scalar variable.

array_of_tuples1 and array_of_tuples2 are not actually arrays of tuples, but just 3- and 2-dimensional arrays of integers:
In [1]: array_of_tuples1 = np.array([
...: [(1,2,3),(2,3,4),(5,6,7)],
...: [(7,2,3),(2,6,4),(5,6,6)],
...: [(8,2,3),(2,5,4),(7,6,7)],
...: ])
In [2]: array_of_tuples1
Out[2]:
array([[[1, 2, 3],
[2, 3, 4],
[5, 6, 7]],
[[7, 2, 3],
[2, 6, 4],
[5, 6, 6]],
[[8, 2, 3],
[2, 5, 4],
[7, 6, 7]]])
So, instead of vectorizing your function, because it then will basically for-loop through the elements of the array (integers), you should apply it on the suitable axis (the axis of the "tuples") and not care about the type of the sequence:
In [6]: np.apply_along_axis(tuple_converter, 2, array_of_tuples1)
Out[6]:
array([[ 6, 11, 38],
[54, 14, 37],
[69, 13, 62]])
In [9]: np.apply_along_axis(tuple_converter, 1, array_of_tuples2)
Out[9]: array([ 6, 11, 38])

The other answer above is certainly correct, and probably what you're looking for. But I noticed you put the word "clean" into your question, and so I'd like to add this answer as well.
If we can make the assumption that all the tuples are 3 element tuples (or that they have some constant number of elements), then there's a nice little trick you can do so that the same piece of code will work on any single tuple, 1d array of tuples, or 2d array of tuples without an if/else for the 1d/2d cases. I'd argue that avoiding switches is always cleaner (although I suppose this could be contested).
import numpy as np
def map_to_tuples(x):
x = np.array(x)
flattened = x.flatten().reshape(-1, 3)
return np.array([tup[0]**2 + tup[1] + tup[2] for tup in flattened]).reshape(x.shape[:-1])
Outputs the following for your inputs (respectively), as desired:
[[ 6 11 38]
[54 14 37]
[69 13 62]]
[ 6 11 38]
6

If you are serious about the tuples bit, you could define a structured dtype.
In [535]: dt=np.dtype('int,int,int')
In [536]: x1 = np.array([
[(1,2,3),(2,3,4),(5,6,7)],
[(7,2,3),(2,6,4),(5,6,6)],
[(8,2,3),(2,5,4),(7,6,7)],
], dtype=dt)
In [537]: x1
Out[537]:
array([[(1, 2, 3), (2, 3, 4), (5, 6, 7)],
[(7, 2, 3), (2, 6, 4), (5, 6, 6)],
[(8, 2, 3), (2, 5, 4), (7, 6, 7)]],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
Note that the display uses tuples. x1 is a 3x3 array of type dt. The elements, or records, are displayed as tuples. This more useful if the tuple elements differ - float, integer, string etc.
Now define a function that works with fields of such an array:
In [538]: def foo(tup):
return tup['f0']**2 + tup['f1'] + tup['f2']
It applies neatly to x1.
In [539]: foo(x1)
Out[539]:
array([[ 6, 11, 38],
[54, 14, 37],
[69, 13, 62]])
It also applies to a 1d array of the same dtype.
In [540]: x2=np.array([(1,2,3),(2,3,4),(5,6,7) ],dtype=dt)
In [541]: foo(x2)
Out[541]: array([ 6, 11, 38])
And a 0d array of matching type:
In [542]: foo(np.array(plain_tuple,dtype=dt))
Out[542]: 6
But foo(plain_tuple) won't work, since the function is written to work with named fields, not indexed ones.
The function could be modified to cast the input to the correct dtype if needed:
In [545]: def foo1(tup):
temp = np.asarray(tup, dtype=dt)
.....: return temp['f0']**2 + temp['f1'] + temp['f2']
In [548]: plain_tuple
Out[548]: (1, 2, 3)
In [549]: foo1(plain_tuple)
Out[549]: 6
In [554]: foo1([(1,2,3),(2,3,4),(5,6,7)]) # list of tuples
Out[554]: array([ 6, 11, 38])

Related

matlab sum function to python converstion

I am trying to convert this matlab code to python:
#T2 = (sum((log(X(1:m,:)))'));
Here is my code in python:
T2 = sum(np.log(X[0:int(m),:]).T)
where m = 103 and X is a matrix:
f1 = np.float64(135)
f2 = np.float64(351)
X = np.float64(p[:, int(f1):int(f2)])
and p is dictionary (loaded data)
The problem is python gives me the exact same value with same dimension (216x103) like matlab before applying the sum function on (np.log(X[0:int(m), :]).T). However. after applying the sum function it gives me the correct value but wrong dimension (103x1). The correct dimension is (1x103). I have tried using transpose after getting the sum but it doesnt work. Any suggestions how to get my desired dimension?
A matrix in MATLAB consists of m rows and n columns, but a matrix in NumPy is an array of arrays. Each subarray is a flat vector having 1 dimension equal to the number of its elements n. MATLAB doesn't have flat vectors at all, a row is 1xn matrix, a column is mx1 matrix, and a scalar is 1x1 matrix.
So, back to the question, when you write T2 = sum(np.log(X[0:int(m),:]).T) in Python, it's neither 103x1 nor 1x103, it's a flat 103 vector. If you specifically want a 1x103 matrix like MATLAB, just reshape(1,-1) and you don't have to transpose since you can sum over the second axis.
import numpy as np
X = np.random.rand(216,103)
m = 103
T2 = np.sum(np.log(X[:m]), axis=1).reshape(1,-1)
T2.shape
# (1, 103)
Lets make a demo 2d array:
In [19]: x = np.arange(12).reshape(3,4)
In [20]: x
Out[20]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
And apply the base Python sum function (which isn't the same as numpy's own):
In [21]: sum(x)
Out[21]: array([12, 15, 18, 21])
The result is a (4,) shape array (not 4x1). Print sum(x).shape if you don't believe me.
The numpy.sum function adds all terms if no axis is given:
In [22]: np.sum(x)
Out[22]: 66
or with axis:
In [23]: np.sum(x, axis=0)
Out[23]: array([12, 15, 18, 21])
In [24]: np.sum(x, axis=1)
Out[24]: array([ 6, 22, 38])
The Python sum treats x as a list of arrays, and adds them together
In [25]: list(x)
Out[25]: [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])]
In [28]: x[0]+x[1]+x[2]
Out[28]: array([12, 15, 18, 21])
Transpose, without parameter, switch axes. It does not add any dimensions:
In [29]: x.T # (4,3) shape
Out[29]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
In [30]: sum(x).T
Out[30]: array([12, 15, 18, 21]) # still (4,) shape
Octave
>> x=reshape(0:11,4,3)'
x =
0 1 2 3
4 5 6 7
8 9 10 11
>> sum(x)
ans =
12 15 18 21
>> sum(x,1)
ans =
12 15 18 21
>> sum(x,2)
ans =
6
22
38
edit
The np.sum function has a keepdims parmeter:
In [32]: np.sum(x, axis=0, keepdims=True)
Out[32]: array([[12, 15, 18, 21]]) # (1,4) shape
In [33]: np.sum(x, axis=1, keepdims=True)
Out[33]:
array([[ 6], # (3,1) shape
[22],
[38]])
If I reshape the array to 3d, and sum, the result is 2d - unless I keepdims:
In [34]: np.sum(x.reshape(3,2,2), axis=0).shape
Out[34]: (2, 2)
In [36]: np.sum(x.reshape(3,2,2), axis=0,keepdims=True).shape
Out[36]: (1, 2, 2)
MATLAB/Octave on the other hand keeps the dims by default:
sum(reshape(x,3,2,2)) # (1,2,2)
unless I sum on that last, 3rd:
sum(reshape(x,3,2,2),3) # (3,2)
The key is that MATLAB everything is 2d, with the option of additional trailing dimensions, which aren't handled the same way. In numpy every number of dimensions, from 0 on up is handled the same way.

Explanation of boolean indexing behaviors

For the 2D array y:
y = np.arange(20).reshape(5,4)
---
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
All indexing select 1st, 3rd, and 5th rows. This is clear.
print(y[
[0, 2, 4],
::
])
print(y[
[0, 2, 4],
::
])
print(y[
[True, False, True, False, True],
::
])
---
[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
Questions
Please help understand what rules or mechanism are working to produce the results.
Replacing [] with tuple produces an empty array with shape (0, 5, 4).
y[
(True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)
Use single True adds a new axis.
y[True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True].shape
---
(1, 5, 4)
Adding additional boolean True produces the same.
y[True, True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True, True].shape
---
(1, 5, 4)
However, adding False boolean causes the empty array again.
y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)
Not sure the documentation explains this behavior.
Boolean array indexing
In general if an index includes a Boolean array, the result will be
identical to inserting obj.nonzero() into the same position and using
the integer array indexing mechanism described above. x[ind_1,
boolean_array, ind_2] is equivalent to x[(ind_1,) +
boolean_array.nonzero() + (ind_2,)].
If there is only one Boolean array and no integer indexing array
present, this is straight forward. Care must only be taken to make
sure that the boolean index has exactly as many dimensions as it is
supposed to work with.
Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:
/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/
So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.
This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:
x = np.ones((2, 2))
assert x[x > 0].ndim == 1
x = np.ones(2)
assert x[x > 0].ndim == 1
x = np.ones(())
assert x[x > 0].ndim == 1 # scalar boolean here!
The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.
Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

numpy.r_ place the the 1’s in the mid

I am reading numpy.r_ docs. I get it that I cannot place the 1’s at the mid position.
For example ,
a = np.array( [[3,4,5],[ 33,44,55]])
b = np.array( [[-3,-4,-5],[ -33,-44,-55]])
np.r_['0,3,1',a,b]
Actually firstly the shape (2,3) of a is upgraded to shape (1,2,3) and the same as b.Then plus the two shape (1,2,3) + (1,2,3) = (2,2,3) is the final shape of result, note I only plus the first number since the '0' in the '0,3,1'.
Now the question is that according the docs, I can upgrade the shape of a to shape(1,2,3) or (2,3,1) ,but how can upgrade to the shape (2,1,3) ?
In [381]: a = np.array( [[3,4,5],[ 33,44,55]])
...:
...: b = np.array( [[-3,-4,-5],[ -33,-44,-55]])
...:
...: np.r_['0,3,1',a,b]
Out[381]:
array([[[ 3, 4, 5],
[ 33, 44, 55]],
[[ -3, -4, -5],
[-33, -44, -55]]])
Your question should have displayed this result. It helps the reader visualize the action, and better understand your question. Not everyone can run your sample (I couldn't when I first read it on my phone).
You can do the same concatenation with stack (or even np.array((a,b))):
In [382]: np.stack((a,b))
Out[382]:
array([[[ 3, 4, 5],
[ 33, 44, 55]],
[[ -3, -4, -5],
[-33, -44, -55]]])
stack with axis produces what you want (again, a good question would display the desired result):
In [383]: np.stack((a,b), axis=1)
Out[383]:
array([[[ 3, 4, 5],
[ -3, -4, -5]],
[[ 33, 44, 55],
[-33, -44, -55]]])
We can add the dimension to a by itself with:
In [384]: np.expand_dims(a,1)
Out[384]:
array([[[ 3, 4, 5]],
[[33, 44, 55]]])
In [385]: _.shape
Out[385]: (2, 1, 3)
a[:,None] and a.reshape(2,1,3) also do it.
As you found, I can't do the same with np.r_:
In [413]: np.r_['0,3,0',a].shape
Out[413]: (2, 3, 1)
In [414]: np.r_['0,3,1',a].shape
Out[414]: (1, 2, 3)
In [415]: np.r_['0,3,-1',a].shape
Out[415]: (1, 2, 3)
Even looking at the code it is hard to tell how r_ is handling this 3rd parameter. It looks like it uses the ndmin parameter to expand the arrays (which prepends new axes if needed), and then some sort of transpose to move the new axis.
This could be classed as bug in r_, but it's been around so long, I doubt if any one will care. It's more useful for expanding "slices" than for fancy concatenation.
While the syntax of np.r_ may be convenient on occasion, it isn't an essential function. It's just another front end to np.concatenate (with the added arange/linspace functionality).

Numpy Problems with Arrays of poly1d Objects

I'd like to first start this out with the fact that it is possible, in numpy, to create an array of poly1d objects:
random_poly = np.frompyfunc(lambda i, j: np.poly1d(np.random.randint(1, 4, 3)), 2, 1)
def random_poly_array(shape):
return np.fromfunction(random_poly, shape)
a1 = random_poly_array((3,3))
This works just fine, and we can even multiply matrices made from this form using np.dot:
a2 = random_poly_array((3,3))
a1_x_a2 = np.dot(a1, a2)
However, most other methods fail to work. For example, you can't take a list of certain poly1d objects and convert it into an array:
np.array([np.poly1d([1,2,3]), np.poly1d([1,2,3])])
As that will raise ValueError: cannot copy sequence with size 2 to array axis with dimension 3. To add to the confusion,
np.array([np.poly1d([1,2]), np.poly1d([1,2])])
will not raise an error, but instead create a 2x2 array of just 2's. Adding dtype=object has no affect, and numpy will still try to convert the poly1d objects to arrays.
The reason why this is problematic is that one cannot take an array of dimension d and convert it to an array of poly1d objects of dimension d-1. I would have expected
arr = np.arange(1, 10).reshape(3,3)
np.apply_along_axis(np.poly1d, 0, arr)
To return an array of poly1d objects, but instead it returns an unalterated array. Even worse, if arr=np.arange(9).reshape(3,3), it will throw an error, as the first poly1d object created will have a length of 2 instead of 3 due to the zero coefficient. Thus, my question is this: is there a feasible method to create poly1d arrays in numpy? If not, why not?
Using the concept of None forcing numpy to not broadcast an object into an array, something brought to my attention by Paul Panzer, I created a function which will transform the last axis into a poly1d object:
def array_to_poly(arr):
return np.apply_along_axis(lambda poly: [None, np.poly1d(poly)], -1, arr)[..., 1]
However, if we're okay with abusing more than one system in a single function, we can make it apply over arbitrary axes:
def array_to_poly(arr, axis=-1):
temp_arr = np.apply_along_axis(lambda poly: [None, np.poly1d(poly)], axis, arr)
n = temp_arr.ndim
s = [slice(None) if i != axis%n else 1 for i in range(n)]
return temp_arr[s]
Testing it with arr = np.arange(1, 25).reshape(2,3,4), we obtain:
In [ ]: array_to_poly(arr, 0)
Out[ ]:
array([[poly1d([ 1, 13]), poly1d([ 2, 14]), poly1d([ 3, 15]),
poly1d([ 4, 16])],
[poly1d([ 5, 17]), poly1d([ 6, 18]), poly1d([ 7, 19]),
poly1d([ 8, 20])],
[poly1d([ 9, 21]), poly1d([10, 22]), poly1d([11, 23]),
poly1d([12, 24])]], dtype=object)
In [ ]: array_to_poly(arr, 1)
Out[ ]:
array([[poly1d([1, 5, 9]), poly1d([ 2, 6, 10]), poly1d([ 3, 7, 11]),
poly1d([ 4, 8, 12])],
[poly1d([13, 17, 21]), poly1d([14, 18, 22]), poly1d([15, 19, 23]),
poly1d([16, 20, 24])]], dtype=object)
In [ ]: array_to_poly(arr, 2)
Out[ ]:
array([[poly1d([1, 2, 3, 4]), poly1d([5, 6, 7, 8]),
poly1d([ 9, 10, 11, 12])],
[poly1d([13, 14, 15, 16]), poly1d([17, 18, 19, 20]),
poly1d([21, 22, 23, 24])]], dtype=object)
as expected.

Iterate over numpy array in a specific order based on values

I want to iterate over a numpy array starting at the index of the highest value working through to the lowest value
import numpy as np #imports numpy package
elevation_array = np.random.rand(5,5) #creates a random array 5 by 5
print elevation_array # prints the array out
ravel_array = np.ravel(elevation_array)
sorted_array_x = np.argsort(ravel_array)
sorted_array_y = np.argsort(sorted_array_x)
sorted_array = sorted_array_y.reshape(elevation_array.shape)
for index, rank in np.ndenumerate(sorted_array):
print index, rank
I want it to print out:
index of the highest value
index of the next highest value
index of the next highest value etc
If you want numpy doing the heavy lifting, you can do something like this:
>>> a = np.random.rand(100, 100)
>>> sort_idx = np.argsort(a, axis=None)
>>> np.column_stack(np.unravel_index(sort_idx[::-1], a.shape))
array([[13, 62],
[26, 77],
[81, 4],
...,
[83, 40],
[17, 34],
[54, 91]], dtype=int64)
You first get an index that sorts the whole array, and then convert that flat index into pairs of indices with np.unravel_index. The call to np.column_stack simply joins the two arrays of coordinates into a single one, and could be replaced by the Python zip(*np.unravel_index(sort_idx[::-1], a.shape)) to get a list of tuples instead of an array.
Try this:
from operator import itemgetter
>>> a = np.array([[2, 7], [1, 4]])
array([[2, 7],
[1, 4]])
>>> sorted(np.ndenumerate(a), key=itemgetter(1), reverse=True)
[((0, 1), 7),
((1, 1), 4),
((0, 0), 2),
((1, 0), 1)]
you can iterate this list if you so wish. Essentially I am telling the function sorted to order the elements of np.ndenumerate(a) according to the key itemgetter(1). This function itemgetter gets the second (index 1) element from the tuples ((0, 1), 7), ((1, 1), 4), ... (i.e the values) generated by np.ndenumerate(a).

Categories

Resources