I want to do a multidimensional array operation using numpy on three arrays, of which one is an index array, e.g.:
a = numpy.arange(20).reshape((5, 4))
# a = [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19]]
b = numpy.arange(24).reshape(((3, 2, 4)))
# b = [[[ 0 1 2 3] [ 4 5 6 7]] [[ 8 9 10 11] [12 13 14 15]] [[16 17 18 19] [20 21 22 23]]]
c = numpy.array([0,0,1,1,2])
# c = [0 0 1 1 2]
now, what I want is:
d = a * b[&] + b[&&]
where & is the second element of second dimension of b (e.g. [ 4 5 6 7]) and && is the first element of second dimension (e.g. [ 0 1 2 3]) related to i-th item of the first dimension of b, where i is from array c (e.g. c[0]=0 for the first element of first dimension of array b). d has same dimension as a.
Edit: Answer for the above example is:
# d = [[0 6 14 24] [16 26 38 52] [104 126 150 176] [152 178 206 236] [336 374 414 456]]
Thanks
>>> a * b[c,1,:] + b[c,0,:]
array([[ 0, 6, 14, 24],
[ 16, 26, 38, 52],
[104, 126, 150, 176],
[152, 178, 206, 236],
[336, 374, 414, 456]])
Related
Does anybody know of a way (preferably using numpy or something similar) to multiply a matrix by a vector of matrices and obtain the desired product shown below? Basically the idea is to follow the normal rules of matrix multplication of a matrix and a vector, only the elements of the vector are matrices themselves and not numbers.
If I understand the question correctly, you can try this:
import numpy as np
A = np.arange(3*3*3).reshape(3, 3, 3)
b = np.arange(9).reshape(3, 3)
print(f"A=\n{A}\n\nb=\n{b}")
It gives:
A=
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
b=
[[0 1 2]
[3 4 5]
[6 7 8]]
Then:
out = (b#A.transpose(2, 0, 1)).transpose(1, 2, 0)
print(out)
which gives:
[[[ 45 48 51]
[ 54 57 60]
[ 63 66 69]]
[[126 138 150]
[162 174 186]
[198 210 222]]
[[207 228 249]
[270 291 312]
[333 354 375]]]
The matrix out[0] is equal to 0*A[0] + 1*A[1] + 2*A[2], out[1] is equal to 3*A[0] + 4*A[1] + 5*A[2] etc.
Is this what you want to calculate:
# Define two matrices
A = np.arange(9).reshape(3, 3)
B = np.arange(9, 18).reshape(3, 3)
# First calculate the desired result:
rows = []
for i in range(3):
rows.append([A[i, j] * B for j in range(3)])
result = np.stack(rows).sum(axis=1)
assert(result.shape == (3, 3, 3))
print(result)
[[[ 27 30 33]
[ 36 39 42]
[ 45 48 51]]
[[108 120 132]
[144 156 168]
[180 192 204]]
[[189 210 231]
[252 273 294]
[315 336 357]]]
Is this correct?
If so, then here is the same calculation using numpy's einsum function:
C = np.array([B] * 3) # shape (3, 3, 3)
result = np.einsum("ij,jkl->ikl", A, C)
I have a list of rates, which contain almost 35040 values in it. I have divided my list into 365 blocks of 96 elements in it. Now I want to get the first 4 minimum values from each block and to achieve that first I am sorting blocks in increasing order and printing or inserting the first 4 elements from the list into a new list.
my approach:
import pandas as pd
inputFile = "inputFile.xlsx"
fileName = inputFile
inputSheetDF = pd.read_excel(fileName, sheet_name='Sheet1')
iexRate = inputSheetDF['IEX Price']
#iexRate = [2.3, 2.4, 3, 4, 3.2, 4.1, 5.......]
testList = []
n = 96
x = [iexRate[i:i + n] for i in range(0, len(iexRate), n)]
x.sort()
but this x.sort() giving me an error.
ValueError: Can only compare identically-labeled Series objects
So basically I want an output in that testList which contains the first 4 minimum elements in each 96 block.
Here's a proposed solution, which has the advantage of being vectorized. I'm using a much smaller dataset - 3 chunks of 4 each, sampling the top (button) 2 from each chunk - but the idea for a larger dataset is of course the same.
df = pd.DataFrame({"rate": np.random.randint(1, 100, 12), "chunk": [1]*4 + [2]*4 + [3]*4 })
print(df)
==>
rate chunk
0 81 1
1 51 1
2 50 1
3 83 1
4 33 2
5 88 2
6 97 2
7 2 2
8 22 3
9 23 3
10 4 3
11 83 3
df.sort_values("rate", inplace=True)
df.groupby("chunk").head(2).sort_values("chunk")
==>
rate chunk
2 50 1
1 51 1
7 2 2
4 33 2
10 4 3
8 22 3
To get a flat list of all the rates, just do :
flat_list = list(res.rate)
==> [50, 51, 2, 33, 4, 22]
iexRate = pd.Series(range(1,100))
n = 15
x = [iexRate[i:i + n] for i in range(0, len(iexRate), n)]
testList = [sorted(block)[:4] for block in x]
[[1, 2, 3, 4], [16, 17, 18, 19], [31, 32, 33, 34], [46, 47, 48, 49], [61, 62, 63, 64], [76, 77, 78, 79], [91, 92, 93, 94]]
I have a dataframe of the following format:
Id Name_prev Weight_prev Name_now Weight_now
1 [1,3,4,5] [10,34,67,37] [1,3,5] [45,76,12]
2 [10,3,40,5] [100,134,627,347] [10,40,5] [34,56,78]
3 [1,30,4,50] [11,22,45,67] [1,30,50] [12,45,78]
4 [1,7,8,9] [32,54,76,98] [7,8,9] [34,12,32]
I want to create two new variables:
Union of Name_prev and Name_now : This is intersection of Name_prev and Name_now fields and can be done using set operation on the two columns, I am able to compute the same.
Ratio of Name_prev and Name_now : This is ratio of values (Weight_prev and Weight_now) corresponding to common names in (Name_prev and Name_now).
Expected Output:
Id Union of Name_prev and Name_now Ratio of Name_prev and Name_now
1 [1,3,5] [10/45, 34/76,37/12]
2 [10,40,5] [100/34,627/56,347/78]
3 [1,30,50] [11/12,22/45,67/78]
4 [7,8,9] [54/34,76/12,98/32]
I am trying to create a dictionary like structure by combining Name_prev and Weigth_prev as key, value pair and doing the same for Name_now and Weight_now and then taking ratio for common keys, but am stuck...
Use:
a, b = [],[]
for n1, n2, w1, w2 in zip(df['Name_prev'], df['Name_now'],
df['Weight_prev'], df['Weight_now']):
#get intersection of lists
n = [val for val in n1 if val in n2]
#get indices by enumerate and select weights
w3 = [w1[i] for i, val in enumerate(n1) if val in n2]
w4 = [w2[i] for i, val in enumerate(n2) if val in n1]
#divide each value in list
w = [i/j for i, j in zip(w3, w4)]
a.append(n)
b.append(w)
df = df.assign(name=a, weight=b)
print (df)
Id Name_prev Weight_prev Name_now Weight_now \
0 1 [1, 3, 4, 5] [10, 34, 67, 37] [1, 3, 5] [45, 76, 12]
1 2 [10, 3, 40, 5] [100, 134, 627, 347] [10, 40, 5] [34, 56, 78]
2 3 [1, 30, 4, 50] [11, 22, 45, 67] [1, 30, 50] [12, 45, 78]
3 4 [1, 7, 8, 9] [32, 54, 76, 98] [7, 8, 9] [34, 12, 32]
name weight
0 [1, 3, 5] [0.2222222222222222, 0.4473684210526316, 3.083...
1 [10, 40, 5] [2.9411764705882355, 11.196428571428571, 4.448...
2 [1, 30, 50] [0.9166666666666666, 0.4888888888888889, 0.858...
3 [7, 8, 9] [1.588235294117647, 6.333333333333333, 3.0625]
If need remove original columns use DataFrame.pop:
a, b = [],[]
for n1, n2, w1, w2 in zip(df.pop('Name_prev'), df.pop('Name_now'),
df.pop('Weight_prev'), df.pop('Weight_now')):
n = [val for val in n1 if val in n2]
w3 = [w1[i] for i, val in enumerate(n1) if val in n2]
w4 = [w2[i] for i, val in enumerate(n2) if val in n1]
w = [i/j for i, j in zip(w3, w4)]
a.append(n)
b.append(w)
df = df.assign(name=a, weight=b)
print (df)
Id name weight
0 1 [1, 3, 5] [0.2222222222222222, 0.4473684210526316, 3.083...
1 2 [10, 40, 5] [2.9411764705882355, 11.196428571428571, 4.448...
2 3 [1, 30, 50] [0.9166666666666666, 0.4888888888888889, 0.858...
3 4 [7, 8, 9] [1.588235294117647, 6.333333333333333, 3.0625]
EDIT:
Working with lists in pandas is always not vectorized, so better is flatten lists first, merge and if necessary aggregate lists:
from itertools import chain
df_prev = pd.DataFrame({
'Name' : list(chain.from_iterable(df['Name_prev'].values.tolist())),
'Weight_prev' : list(chain.from_iterable(df['Weight_prev'].values.tolist())),
'Id' : df['Id'].values.repeat(df['Name_prev'].str.len())
})
print (df_prev)
Name Weight_prev Id
0 1 10 1
1 3 34 1
2 4 67 1
3 5 37 1
4 10 100 2
5 3 134 2
6 40 627 2
7 5 347 2
8 1 11 3
9 30 22 3
10 4 45 3
11 50 67 3
12 1 32 4
13 7 54 4
14 8 76 4
15 9 98 4
df_now = pd.DataFrame({
'Name' : list(chain.from_iterable(df['Name_now'].values.tolist())),
'Weight_now' : list(chain.from_iterable(df['Weight_now'].values.tolist())),
'Id' : df['Id'].values.repeat(df['Name_now'].str.len())
})
print (df_now)
Name Weight_now Id
0 1 45 1
1 3 76 1
2 5 12 1
3 10 34 2
4 40 56 2
5 5 78 2
6 1 12 3
7 30 45 3
8 50 78 3
9 7 34 4
10 8 12 4
11 9 32 4
df = df_prev.merge(df_now, on=['Id','Name'])
df['Weight'] = df['Weight_prev'] / df['Weight_now']
print (df)
Name Weight_prev Id Weight_now Weight
0 1 10 1 45 0.222222
1 3 34 1 76 0.447368
2 5 37 1 12 3.083333
3 10 100 2 34 2.941176
4 40 627 2 56 11.196429
5 5 347 2 78 4.448718
6 1 11 3 12 0.916667
7 30 22 3 45 0.488889
8 50 67 3 78 0.858974
9 7 54 4 34 1.588235
10 8 76 4 12 6.333333
11 9 98 4 32 3.062500
df = df.groupby('Id')['Name','Weight'].agg(list).reset_index()
print (df)
Id Name Weight
0 1 [1, 3, 5] [0.2222222222222222, 0.4473684210526316, 3.083...
1 2 [10, 40, 5] [2.9411764705882355, 11.196428571428571, 4.448...
2 3 [1, 30, 50] [0.9166666666666666, 0.4888888888888889, 0.858...
3 4 [7, 8, 9] [1.588235294117647, 6.333333333333333, 3.0625]
I've taken it upon myself to learn how NumPy works for my own curiosity.
It seems that the simplest function is the hardest to translate to code (I understand by code). It's easy to hard code each axis for each case but I want to find a dynamic algorithm that can sum in any axis with n-dimensions.
The documentation on the official website is not helpful (It only shows the result not the process) and it's hard to navigate through Python/C code.
Note: I did figure out that when an array is summed, the axis specified is "removed", i.e. Sum of an array with a shape of (4, 3, 2) with axis 1 yields an answer of an array with a shape of (4, 2)
Setup
consider the numpy array a
a = np.arange(30).reshape(2, 3, 5)
print(a)
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
[[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]]]
Where are the dimensions?
The dimensions and positions are highlighted by the following
p p p p p
o o o o o
s s s s s
dim 2 0 1 2 3 4
| | | | |
dim 0 ↓ ↓ ↓ ↓ ↓
----> [[[ 0 1 2 3 4] <---- dim 1, pos 0
pos 0 [ 5 6 7 8 9] <---- dim 1, pos 1
[10 11 12 13 14]] <---- dim 1, pos 2
dim 0
----> [[15 16 17 18 19] <---- dim 1, pos 0
pos 1 [20 21 22 23 24] <---- dim 1, pos 1
[25 26 27 28 29]]] <---- dim 1, pos 2
↑ ↑ ↑ ↑ ↑
| | | | |
dim 2 p p p p p
o o o o o
s s s s s
0 1 2 3 4
Dimension examples:
This becomes more clear with a few examples
a[0, :, :] # dim 0, pos 0
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
a[:, 1, :] # dim 1, pos 1
[[ 5 6 7 8 9]
[20 21 22 23 24]]
a[:, :, 3] # dim 2, pos 3
[[ 3 8 13]
[18 23 28]]
sum
explanation of sum and axis
a.sum(0) is the sum of all slices along dim 0
a.sum(0)
[[15 17 19 21 23]
[25 27 29 31 33]
[35 37 39 41 43]]
same as
a[0, :, :] + \
a[1, :, :]
[[15 17 19 21 23]
[25 27 29 31 33]
[35 37 39 41 43]]
a.sum(1) is the sum of all slices along dim 1
a.sum(1)
[[15 18 21 24 27]
[60 63 66 69 72]]
same as
a[:, 0, :] + \
a[:, 1, :] + \
a[:, 2, :]
[[15 18 21 24 27]
[60 63 66 69 72]]
a.sum(2) is the sum of all slices along dim 2
a.sum(2)
[[ 10 35 60]
[ 85 110 135]]
same as
a[:, :, 0] + \
a[:, :, 1] + \
a[:, :, 2] + \
a[:, :, 3] + \
a[:, :, 4]
[[ 10 35 60]
[ 85 110 135]]
default axis is -1
this means all axes. or sum all numbers.
a.sum()
435
I use a nested loop operation to explain it.
import numpy as np
n = np.array(
[[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[2, 4, 6],
[8, 10, 12],
[14, 16, 18]],
[[1, 3, 5],
[7, 9, 11],
[13, 15, 17]]])
print(n)
print("============ sum axis=None=============")
sum = 0
for i in range(3):
for j in range(3):
for k in range(3):
sum += n[k][i][j]
print(sum) # 216
print('------------------')
print(np.sum(n)) # 216
print("============ sum axis=0 =============")
for i in range(3):
for j in range(3):
sum = 0
for axis in range(3):
sum += n[axis][i][j]
print(sum,end=' ')
print()
print('------------------')
print("sum[0][0] = %d" % (n[0][0][0] + n[1][0][0] + n[2][0][0]))
print("sum[1][1] = %d" % (n[0][1][1] + n[1][1][1] + n[2][1][1]))
print("sum[2][2] = %d" % (n[0][2][2] + n[1][2][2] + n[2][2][2]))
print('------------------')
print(np.sum(n, axis=0))
print("============ sum axis=1 =============")
for i in range(3):
for j in range(3):
sum = 0
for axis in range(3):
sum += n[i][axis][j]
print(sum,end=' ')
print()
print('------------------')
print("sum[0][0] = %d" % (n[0][0][0] + n[0][1][0] + n[0][2][0]))
print("sum[1][1] = %d" % (n[1][0][1] + n[1][1][1] + n[1][2][1]))
print("sum[2][2] = %d" % (n[2][0][2] + n[2][1][2] + n[2][2][2]))
print('------------------')
print(np.sum(n, axis=1))
print("============ sum axis=2 =============")
for i in range(3):
for j in range(3):
sum = 0
for axis in range(3):
sum += n[i][j][axis]
print(sum,end=' ')
print()
print('------------------')
print("sum[0][0] = %d" % (n[0][0][0] + n[0][0][1] + n[0][0][2]))
print("sum[1][1] = %d" % (n[1][1][0] + n[1][1][1] + n[1][1][2]))
print("sum[2][2] = %d" % (n[2][2][0] + n[2][2][1] + n[2][2][2]))
print('------------------')
print(np.sum(n, axis=2))
print("============ sum axis=(0,1)) =============")
for i in range(3):
sum = 0
for axis1 in range(3):
for axis2 in range(3):
sum += n[axis1][axis2][i]
print(sum,end=' ')
print()
print('------------------')
print("sum[1] = %d" % (n[0][0][1] + n[0][1][1] + n[0][2][1] +
n[1][0][1] + n[1][1][1] + n[1][2][1] +
n[2][0][1] + n[2][1][1] + n[2][2][1] ))
print('------------------')
print(np.sum(n, axis=(0,1)))
result:
[[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]]
[[ 1 3 5]
[ 7 9 11]
[13 15 17]]]
============ sum axis=None=============
216
------------------
216
============ sum axis=0 =============
4 9 14
19 24 29
34 39 44
------------------
sum[0][0] = 4
sum[1][1] = 24
sum[2][2] = 44
------------------
[[ 4 9 14]
[19 24 29]
[34 39 44]]
============ sum axis=1 =============
12 15 18
24 30 36
21 27 33
------------------
sum[0][0] = 12
sum[1][1] = 30
sum[2][2] = 33
------------------
[[12 15 18]
[24 30 36]
[21 27 33]]
============ sum axis=2 =============
6 15 24
12 30 48
9 27 45
------------------
sum[0][0] = 6
sum[1][1] = 30
sum[2][2] = 45
------------------
[[ 6 15 24]
[12 30 48]
[ 9 27 45]]
============ sum axis=(0,1)) =============
57 72 87
------------------
sum[1] = 72
------------------
[57 72 87]
Lets take a 3D array as an example. Or a cube for easier visualizing.
I want to select all the faces of that cube. And I would like to generalize this to arbitrary dimensions.
I'd also like to then add/remove faces to the cube(cuboid), and the generalization to arbitrary dimensions.
I know that for every fixed number of dimensions you can do array[:,:,0], array[-1,:,:] I'd like to know how to generalize to arbitrary dimensions and how to easily iterate over all faces.
To get a face:
def get_face(M, dim, front_side):
if front_side:
side = 0
else:
side = -1
index = tuple(side if i == dim else slice(None) for i in range(M.ndim))
return M[index]
To add a face (untested):
def add_face(M, new_face, dim, front_side):
#assume sizes match up correctly
if front_side:
return np.concatenate((new_face, M), dim)
else:
return np.concatenate((M, new_face), dim)
To remove a face:
def remove_face(M, dim, front_side):
if front_side:
dim_slice = slice(1, None)
else:
dim_slice = slice(None, -1)
index = tuple(dim_slice if i == dim else slice(None) for i in range(M.ndim))
return M[index]
Iterate over all faces:
def iter_faces(M):
for dim in range(M.ndim):
for front_side in (True, False):
yield get_face(M, dim, front_side)
Some quick tests:
In [18]: M = np.arange(27).reshape((3,3,3))
In [19]: for face in iter_faces(M): print face
[[0 1 2]
[3 4 5]
[6 7 8]]
[[18 19 20]
[21 22 23]
[24 25 26]]
[[ 0 1 2]
[ 9 10 11]
[18 19 20]]
[[ 6 7 8]
[15 16 17]
[24 25 26]]
[[ 0 3 6]
[ 9 12 15]
[18 21 24]]
[[ 2 5 8]
[11 14 17]
[20 23 26]]