I'm working on a linear regression with Python for my school project. And I want to merge my 7x12 2D array into a 1D list.
Here's my original array.
y = df["fatalProb"].values.reshape(7,12)
print(y)
[[0.3725 0.4336 0.537 0.392 0.233 0.2892 0.2721 0.2392 0.2281 0.2689
0.2898 0.2825]
[0.3112 0.3936 0.3874 0.2793 0.2416 0.275 0.2802 0.2587 0.2583 0.258
0.2906 0.2927]
[0.3486 0.3278 0.3836 0.3041 0.2477 0.2734 0.276 0.2903 0.2531 0.2659
0.2928 0.2896]
[0.3044 0.4032 0.3665 0.3275 0.2939 0.2882 0.3089 0.2949 0.2547 0.2699
0.2973 0.2869]
[0.3488 0.3651 0.4307 0.3361 0.2833 0.3035 0.3051 0.2898 0.2695 0.271
0.2787 0.3034]
[0.3559 0.3357 0.4075 0.3428 0.2834 0.3156 0.2952 0.2992 0.2795 0.2806
0.2905 0.267 ]
[0.3965 0.3814 0.4735 0.3813 0.3089 0.3105 0.3282 0.3047 0.2834 0.2974
0.2737 0.2986]]
I wanted my y to be 12-length list.
It's a list with all the values added that has a same index in each lists.
e.g.:
[[a,b,c], [d,e,f], [g,h,i]] to [a+d+g, b+e+h, c+f+i]
I thought about using list comprehension, but I was not so happy to use:
y = [y[0][j] + y[1][j] + y[2][j] + y[3][j] + y[4][j] + y[5][j] + y[6][j] for j in range(len(y[0]))]
Here's your list comprehension:
[sum(el) for el in zip(*y)]
Result:
[2.4379, 2.6404, 2.9861999999999997, 2.3631, 1.8918, 2.0553999999999997, 2.0656999999999996, 1.9768, 1.8266, 1.9116999999999997, 2.0134, 2.0206999999999997]
Your y, derived from a dataframe, is 2d; a simpler example:
In [25]: y = np.arange(12).reshape(3,4)
In [26]: y
Out[26]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the sum method, you can add values vertically or horizontally:
In [27]: y.sum(axis=0)
Out[27]: array([12, 15, 18, 21])
In [28]: y.sum(axis=1)
Out[28]: array([ 6, 22, 38]
Check below code based on your e.g "[[a,b,c],[d,e,f],[g,h,i]] to [a+d+g,b+e+h,c+f+i]"
df = pd.DataFrame({'col':[1,2,3], 'col2':[3,4,5], 'col3':[6,7,8]})
list(df.values.sum(axis=1))
Output:
[10, 13, 16]
All you need to do after getting y is
y = y.sum(axis=0).tolist()
It will sum your (7,12) shaped matrix along the columns and give you a array of shape (1,12) which can be converted to a list. #hpaulj gave a detailed explaination.
Related
I am trying to convert this matlab code to python:
#T2 = (sum((log(X(1:m,:)))'));
Here is my code in python:
T2 = sum(np.log(X[0:int(m),:]).T)
where m = 103 and X is a matrix:
f1 = np.float64(135)
f2 = np.float64(351)
X = np.float64(p[:, int(f1):int(f2)])
and p is dictionary (loaded data)
The problem is python gives me the exact same value with same dimension (216x103) like matlab before applying the sum function on (np.log(X[0:int(m), :]).T). However. after applying the sum function it gives me the correct value but wrong dimension (103x1). The correct dimension is (1x103). I have tried using transpose after getting the sum but it doesnt work. Any suggestions how to get my desired dimension?
A matrix in MATLAB consists of m rows and n columns, but a matrix in NumPy is an array of arrays. Each subarray is a flat vector having 1 dimension equal to the number of its elements n. MATLAB doesn't have flat vectors at all, a row is 1xn matrix, a column is mx1 matrix, and a scalar is 1x1 matrix.
So, back to the question, when you write T2 = sum(np.log(X[0:int(m),:]).T) in Python, it's neither 103x1 nor 1x103, it's a flat 103 vector. If you specifically want a 1x103 matrix like MATLAB, just reshape(1,-1) and you don't have to transpose since you can sum over the second axis.
import numpy as np
X = np.random.rand(216,103)
m = 103
T2 = np.sum(np.log(X[:m]), axis=1).reshape(1,-1)
T2.shape
# (1, 103)
Lets make a demo 2d array:
In [19]: x = np.arange(12).reshape(3,4)
In [20]: x
Out[20]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
And apply the base Python sum function (which isn't the same as numpy's own):
In [21]: sum(x)
Out[21]: array([12, 15, 18, 21])
The result is a (4,) shape array (not 4x1). Print sum(x).shape if you don't believe me.
The numpy.sum function adds all terms if no axis is given:
In [22]: np.sum(x)
Out[22]: 66
or with axis:
In [23]: np.sum(x, axis=0)
Out[23]: array([12, 15, 18, 21])
In [24]: np.sum(x, axis=1)
Out[24]: array([ 6, 22, 38])
The Python sum treats x as a list of arrays, and adds them together
In [25]: list(x)
Out[25]: [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])]
In [28]: x[0]+x[1]+x[2]
Out[28]: array([12, 15, 18, 21])
Transpose, without parameter, switch axes. It does not add any dimensions:
In [29]: x.T # (4,3) shape
Out[29]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
In [30]: sum(x).T
Out[30]: array([12, 15, 18, 21]) # still (4,) shape
Octave
>> x=reshape(0:11,4,3)'
x =
0 1 2 3
4 5 6 7
8 9 10 11
>> sum(x)
ans =
12 15 18 21
>> sum(x,1)
ans =
12 15 18 21
>> sum(x,2)
ans =
6
22
38
edit
The np.sum function has a keepdims parmeter:
In [32]: np.sum(x, axis=0, keepdims=True)
Out[32]: array([[12, 15, 18, 21]]) # (1,4) shape
In [33]: np.sum(x, axis=1, keepdims=True)
Out[33]:
array([[ 6], # (3,1) shape
[22],
[38]])
If I reshape the array to 3d, and sum, the result is 2d - unless I keepdims:
In [34]: np.sum(x.reshape(3,2,2), axis=0).shape
Out[34]: (2, 2)
In [36]: np.sum(x.reshape(3,2,2), axis=0,keepdims=True).shape
Out[36]: (1, 2, 2)
MATLAB/Octave on the other hand keeps the dims by default:
sum(reshape(x,3,2,2)) # (1,2,2)
unless I sum on that last, 3rd:
sum(reshape(x,3,2,2),3) # (3,2)
The key is that MATLAB everything is 2d, with the option of additional trailing dimensions, which aren't handled the same way. In numpy every number of dimensions, from 0 on up is handled the same way.
I have a numpy matrix with 130 X 13. Say I want to select a specific set of rows meeting a condition and a subset of columns -
trainx[trainy==label,[0,6]]
The above code does not work and throws an error - IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (43,) (2,).
However if I do it in 2 steps - first subset rows and then columns, it works. Is it something weird or numpy works this way?
temp1 = trainx[trainy==label,:]
temp1 = temp1[:,[0,6]]
you can simply chain the indexing like
trainx[trainy==label][:, [0,6]]
Runable Example
arr = np.random.rand(130,13)
arr[arr[:,0]>0.5][:, [0,6]]
In [154]: x = np.arange(24).reshape(6,4)
In [155]: mask = np.array([1,0,1,0,1,0],bool)
With your two step approach:
In [156]: x[mask] # x[mask, :]
Out[156]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11],
[16, 17, 18, 19]])
In [157]: x[mask][:,[1,3]]
Out[157]:
array([[ 1, 3],
[ 9, 11],
[17, 19]])
Or the two indices could be combined with ix_:
In [158]: np.ix_(mask, [1,3])
Out[158]:
(array([[0],
[2],
[4]]), array([[1, 3]]))
In [159]: x[np.ix_(mask, [1,3])]
Out[159]:
array([[ 1, 3],
[ 9, 11],
[17, 19]])
Note that the first array in Out[158] is np.nonzero(mask)[0][:,None], the nonzero indices in column vector form. That (3,1) indexing array can broadcast with the (2,) column array to select a (3,2) array of elements. Or in your example a (43,2) array.
The boolean mask cannot be turned into a (6,1) array and used to mask x; that would only work if it was turned into a (6,4) mask, matching the shape of x.
So either use the 2 step indexing, or use ix_.
From the docs - v1.14 the second rule is
if indices[i] >= indices[i + 1], the i-th generalized “row” is simply a[indices[i]].
So how will this being used? Is there any real example?
I meant, there must(?) be some real situations that fit what this rule is doing, then we defined the rule to fit them, so what's that situation?
The first 2 examples make use of this rule.
In the 2nd example, the 2d array, it shows explicitly what [0, 3, 1, 2, 0] produces
# [row1 + row2 + row3] 0:3
# [row4] 3
# [row2] 1:2
# [row3] 2
# [row1 + row2 + row3 + row4] 0:end
In the first example, this rule is partly hidden by the [::2] indexing.
Without that:
In [183]: np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])
Out[183]: array([ 6, 4, 10, 5, 14, 6, 18, 7])
There's [0:4]sum, [4], [1,5]sum, [5], [2:6]sum, [6], [3:7]sum, 7
Selecting just the odd results, we get 4 range sums:
In [184]: _[::2]
Out[184]: array([ 6, 10, 14, 18])
In [187]: [np.arange(0,4).sum(),np.arange(1,5).sum(),np.arange(2,6).sum()]
Out[187]: [6, 10, 14]
import numpy as np
m = []
k = []
a = np.array([[1,2,3,4,5,6],[50,51,52,40,20,30],[60,71,82,90,45,35]])
for i in range(len(a)):
m.append(a[i, -1:])
for j in range(len(a[i])-1):
n = abs(m[i] - a[i,j])
k.append(n)
k.append(m[i])
print(k)
Expected Output in k:
[5,4,3,2,1,6],[20,21,22,10,10,30],[25,36,47,55,10,35]
which is also a numpy array.
But the output that I am getting is
[array([5]), array([4]), array([3]), array([2]), array([1]), array([6]), array([20]), array([21]), array([22]), array([10]), array([10]), array([30]), array([25]), array([36]), array([47]), array([55]), array([10]), array([35])]
How can I solve this situation?
You want to subtract the last column of each sub array from themselves. Why don't you use a vectorized approach? You can do all the subtractions at once by subtracting the last column from the rest of the items and then column_stack together with unchanged version of the last column. Also note that you need to change the dimension of the last column inorder to be subtractable from the 2D array. For that sake we can use broadcasting.
In [71]: np.column_stack((abs(a[:, :-1] - a[:, None, -1]), a[:,-1]))
Out[71]:
array([[ 5, 4, 3, 2, 1, 6],
[20, 21, 22, 10, 10, 30],
[25, 36, 47, 55, 10, 35]])
I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});
In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).
If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]
You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])
The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page