Numpy: About the second rule of numpy.ufunc.reduceat? - python

From the docs - v1.14 the second rule is
if indices[i] >= indices[i + 1], the i-th generalized “row” is simply a[indices[i]].
So how will this being used? Is there any real example?
I meant, there must(?) be some real situations that fit what this rule is doing, then we defined the rule to fit them, so what's that situation?

The first 2 examples make use of this rule.
In the 2nd example, the 2d array, it shows explicitly what [0, 3, 1, 2, 0] produces
# [row1 + row2 + row3] 0:3
# [row4] 3
# [row2] 1:2
# [row3] 2
# [row1 + row2 + row3 + row4] 0:end
In the first example, this rule is partly hidden by the [::2] indexing.
Without that:
In [183]: np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])
Out[183]: array([ 6, 4, 10, 5, 14, 6, 18, 7])
There's [0:4]sum, [4], [1,5]sum, [5], [2:6]sum, [6], [3:7]sum, 7
Selecting just the odd results, we get 4 range sums:
In [184]: _[::2]
Out[184]: array([ 6, 10, 14, 18])
In [187]: [np.arange(0,4).sum(),np.arange(1,5).sum(),np.arange(2,6).sum()]
Out[187]: [6, 10, 14]

Related

Merging a 2D array into a list

I'm working on a linear regression with Python for my school project. And I want to merge my 7x12 2D array into a 1D list.
Here's my original array.
y = df["fatalProb"].values.reshape(7,12)
print(y)
[[0.3725 0.4336 0.537 0.392 0.233 0.2892 0.2721 0.2392 0.2281 0.2689
0.2898 0.2825]
[0.3112 0.3936 0.3874 0.2793 0.2416 0.275 0.2802 0.2587 0.2583 0.258
0.2906 0.2927]
[0.3486 0.3278 0.3836 0.3041 0.2477 0.2734 0.276 0.2903 0.2531 0.2659
0.2928 0.2896]
[0.3044 0.4032 0.3665 0.3275 0.2939 0.2882 0.3089 0.2949 0.2547 0.2699
0.2973 0.2869]
[0.3488 0.3651 0.4307 0.3361 0.2833 0.3035 0.3051 0.2898 0.2695 0.271
0.2787 0.3034]
[0.3559 0.3357 0.4075 0.3428 0.2834 0.3156 0.2952 0.2992 0.2795 0.2806
0.2905 0.267 ]
[0.3965 0.3814 0.4735 0.3813 0.3089 0.3105 0.3282 0.3047 0.2834 0.2974
0.2737 0.2986]]
I wanted my y to be 12-length list.
It's a list with all the values added that has a same index in each lists.
e.g.:
[[a,b,c], [d,e,f], [g,h,i]] to [a+d+g, b+e+h, c+f+i]
I thought about using list comprehension, but I was not so happy to use:
y = [y[0][j] + y[1][j] + y[2][j] + y[3][j] + y[4][j] + y[5][j] + y[6][j] for j in range(len(y[0]))]
Here's your list comprehension:
[sum(el) for el in zip(*y)]
Result:
[2.4379, 2.6404, 2.9861999999999997, 2.3631, 1.8918, 2.0553999999999997, 2.0656999999999996, 1.9768, 1.8266, 1.9116999999999997, 2.0134, 2.0206999999999997]
Your y, derived from a dataframe, is 2d; a simpler example:
In [25]: y = np.arange(12).reshape(3,4)
In [26]: y
Out[26]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the sum method, you can add values vertically or horizontally:
In [27]: y.sum(axis=0)
Out[27]: array([12, 15, 18, 21])
In [28]: y.sum(axis=1)
Out[28]: array([ 6, 22, 38]
Check below code based on your e.g "[[a,b,c],[d,e,f],[g,h,i]] to [a+d+g,b+e+h,c+f+i]"
df = pd.DataFrame({'col':[1,2,3], 'col2':[3,4,5], 'col3':[6,7,8]})
list(df.values.sum(axis=1))
Output:
[10, 13, 16]
All you need to do after getting y is
y = y.sum(axis=0).tolist()
It will sum your (7,12) shaped matrix along the columns and give you a array of shape (1,12) which can be converted to a list. #hpaulj gave a detailed explaination.

matlab sum function to python converstion

I am trying to convert this matlab code to python:
#T2 = (sum((log(X(1:m,:)))'));
Here is my code in python:
T2 = sum(np.log(X[0:int(m),:]).T)
where m = 103 and X is a matrix:
f1 = np.float64(135)
f2 = np.float64(351)
X = np.float64(p[:, int(f1):int(f2)])
and p is dictionary (loaded data)
The problem is python gives me the exact same value with same dimension (216x103) like matlab before applying the sum function on (np.log(X[0:int(m), :]).T). However. after applying the sum function it gives me the correct value but wrong dimension (103x1). The correct dimension is (1x103). I have tried using transpose after getting the sum but it doesnt work. Any suggestions how to get my desired dimension?
A matrix in MATLAB consists of m rows and n columns, but a matrix in NumPy is an array of arrays. Each subarray is a flat vector having 1 dimension equal to the number of its elements n. MATLAB doesn't have flat vectors at all, a row is 1xn matrix, a column is mx1 matrix, and a scalar is 1x1 matrix.
So, back to the question, when you write T2 = sum(np.log(X[0:int(m),:]).T) in Python, it's neither 103x1 nor 1x103, it's a flat 103 vector. If you specifically want a 1x103 matrix like MATLAB, just reshape(1,-1) and you don't have to transpose since you can sum over the second axis.
import numpy as np
X = np.random.rand(216,103)
m = 103
T2 = np.sum(np.log(X[:m]), axis=1).reshape(1,-1)
T2.shape
# (1, 103)
Lets make a demo 2d array:
In [19]: x = np.arange(12).reshape(3,4)
In [20]: x
Out[20]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
And apply the base Python sum function (which isn't the same as numpy's own):
In [21]: sum(x)
Out[21]: array([12, 15, 18, 21])
The result is a (4,) shape array (not 4x1). Print sum(x).shape if you don't believe me.
The numpy.sum function adds all terms if no axis is given:
In [22]: np.sum(x)
Out[22]: 66
or with axis:
In [23]: np.sum(x, axis=0)
Out[23]: array([12, 15, 18, 21])
In [24]: np.sum(x, axis=1)
Out[24]: array([ 6, 22, 38])
The Python sum treats x as a list of arrays, and adds them together
In [25]: list(x)
Out[25]: [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])]
In [28]: x[0]+x[1]+x[2]
Out[28]: array([12, 15, 18, 21])
Transpose, without parameter, switch axes. It does not add any dimensions:
In [29]: x.T # (4,3) shape
Out[29]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
In [30]: sum(x).T
Out[30]: array([12, 15, 18, 21]) # still (4,) shape
Octave
>> x=reshape(0:11,4,3)'
x =
0 1 2 3
4 5 6 7
8 9 10 11
>> sum(x)
ans =
12 15 18 21
>> sum(x,1)
ans =
12 15 18 21
>> sum(x,2)
ans =
6
22
38
edit
The np.sum function has a keepdims parmeter:
In [32]: np.sum(x, axis=0, keepdims=True)
Out[32]: array([[12, 15, 18, 21]]) # (1,4) shape
In [33]: np.sum(x, axis=1, keepdims=True)
Out[33]:
array([[ 6], # (3,1) shape
[22],
[38]])
If I reshape the array to 3d, and sum, the result is 2d - unless I keepdims:
In [34]: np.sum(x.reshape(3,2,2), axis=0).shape
Out[34]: (2, 2)
In [36]: np.sum(x.reshape(3,2,2), axis=0,keepdims=True).shape
Out[36]: (1, 2, 2)
MATLAB/Octave on the other hand keeps the dims by default:
sum(reshape(x,3,2,2)) # (1,2,2)
unless I sum on that last, 3rd:
sum(reshape(x,3,2,2),3) # (3,2)
The key is that MATLAB everything is 2d, with the option of additional trailing dimensions, which aren't handled the same way. In numpy every number of dimensions, from 0 on up is handled the same way.

Numpy giving weird results while multiplying matrices

So I've two matrices:
A = np.array([[93478902, 389555660, 163056852, 208537174],
[256421362, 1068627076, 447283132, 572058098],
[438743250, 1828454948, 765313074, 978809440]])
B = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 12, 5],
[10, 92, 23, 43]])
I multiply them using numpy function A.dot(B).
Although the expected result (as I fastly calculated) is something like this one:
[[5594140610, 23340280292, 9760363552, 12493632310]
[15345685910, 64026781516, 26774387456, 34272361778]
[26256930056, 109551815408, 45811788394, 58641073978]]
Numpy is 100% sure that it should be like this:
[[ 1299173314 1865443812 1170428960 -391269578]
[-1834183274 -397727924 1004583680 -87376590]
[ 487126280 -2117334288 -1432851862 -1488468166]]
And I have no idea where I am making a mistake, and how on the earth it gets a lower than 0 number from the multiplication of greater than 0 numbers? Can anyone help with this?
Looks like overflow to me. Changing to dtype='int64' seems solves the problem:
>>> A = np.array([[93478902, 389555660, 163056852, 208537174],
... [256421362, 1068627076, 447283132, 572058098],
... [438743250, 1828454948, 765313074, 978809440]], dtype='int64')
>>>
>>> B = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 12, 5],
[10, 92, 23, 43]])
>>> A.dot(B)
array([[ 5594140610, 23340280292, 9760363552, 13272743630],
[ 15345685910, 64026781516, 26774387456, 36409615930],
[ 26256930056, 109551815408, 45811788394, 62297983874]],
dtype=int64)
Are you sure that you are really storing those matrices in the variables? The fact that some matrix elements have a minus sign implies that there should be some negative elements in A or B

Explanation of boolean indexing behaviors

For the 2D array y:
y = np.arange(20).reshape(5,4)
---
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
All indexing select 1st, 3rd, and 5th rows. This is clear.
print(y[
[0, 2, 4],
::
])
print(y[
[0, 2, 4],
::
])
print(y[
[True, False, True, False, True],
::
])
---
[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
Questions
Please help understand what rules or mechanism are working to produce the results.
Replacing [] with tuple produces an empty array with shape (0, 5, 4).
y[
(True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)
Use single True adds a new axis.
y[True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True].shape
---
(1, 5, 4)
Adding additional boolean True produces the same.
y[True, True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True, True].shape
---
(1, 5, 4)
However, adding False boolean causes the empty array again.
y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)
Not sure the documentation explains this behavior.
Boolean array indexing
In general if an index includes a Boolean array, the result will be
identical to inserting obj.nonzero() into the same position and using
the integer array indexing mechanism described above. x[ind_1,
boolean_array, ind_2] is equivalent to x[(ind_1,) +
boolean_array.nonzero() + (ind_2,)].
If there is only one Boolean array and no integer indexing array
present, this is straight forward. Care must only be taken to make
sure that the boolean index has exactly as many dimensions as it is
supposed to work with.
Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:
/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/
So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.
This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:
x = np.ones((2, 2))
assert x[x > 0].ndim == 1
x = np.ones(2)
assert x[x > 0].ndim == 1
x = np.ones(())
assert x[x > 0].ndim == 1 # scalar boolean here!
The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.
Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

Operations on 'N' dimensional numpy arrays

I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});
In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).
If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]
You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])
The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page

Categories

Resources