I have a 3D numpy array A with shape(k, l, m) and a 2D numpy array B with shape (k,l) with the indexes (between 0 and m-1) of particular items that I want to create a new 2D array C with shape (k,l), like this:
import numpy as np
A = np.random.random((2,3,4))
B = np.array([[0,0,0],[2,2,2]))
C = np.zeros((2,3))
for i in range(2):
for j in range(3):
C[i,j] = A[i, j, B[i,j]]
Is there a more efficient way of doing this?
Use inbuilt routine name fromfunction of Numpy library. And turn your code into
C = np.fromfunction(lambda i, j: A[i, j, B[i,j]], (5, 5))
Setup:
import numpy as np
k,l,m = 2,3,4
a = np.arange(k*l*m).reshape(k,l,m)
b = np.random.randint(0,4,(k,l))
print(a)
print('*'*10)
print(b)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
**********
[[3 0 3]
[2 1 2]]
Use integer indexing to select the values then reshape.
x,y = np.indices(a.shape[:-1])
c = a[x,y,b]
print(c)
[[ 3 4 11]
[14 17 22]]
Using numpy.ix_.
x,y = np.ix_(np.arange(a.shape[0]),np.arange(a.shape[1]))
d = a[x,y,b]
Related
How can I write a function named split which accepts three parameters a, b, c and then do the following.
create a n dimensional array 'x' having first a natural numbers (use np.arange method).
change the shape of x to (c, b) and assign to new array y.
split the array y horizontally into two arrays, then assign it to i and j.
display i and j.
I tried using hsplit and array_split methods and then assign it to i and j. But the output is not matching as given below.
import numpy as np
x=np.arange(20)
y = np.array(x)
z= y.reshape(10,2)
#a = np.hsplit(z,2)
(a,b)=np.array_split(z,2,axis=0)
print(a)
print(b)
Actual output:-
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
[[10 11]
[12 13]
[14 15]
[16 17]
[18 19]]
Desired output:-
[[ 0 1 2 3 4]
[10 11 12 13 14]]
[[ 5 6 7 8 9]
[15 16 17 18 19]]
You were right with hsplit, the problem is just the shape is the other way around to get the desired output:
import numpy as np
x=np.arange(20)
y = np.array(x)
z= y.reshape(2,10)
a,b = np.hsplit(z,2)
print(a)
print(b)
output:
[[ 0 1 2 3 4]
[10 11 12 13 14]]
[[ 5 6 7 8 9]
[15 16 17 18 19]]
I need to populate a dataframe with a matrix built from a single list, but the math and python syntax are beyond me. I essentially need to perform some math operations as if the same list were both the rows and the columns.
So it should look something like this....
#Input
list = [1,2,3,4]
create a matrix using some math on the list, like matrix[i,j] = list[i] * list[j]
#output
np.matrix([[1,2,3,4], [2,4,6,8], [3,6,9,12], [4,8,12,16]])
df = pd.dataframe[np.matrix]
Broadcasted multiplication will work here:
arr = np.array([1, 2, 3, 4])
pd.DataFrame(arr * arr[:,None])
0 1 2 3
0 1 2 3 4
1 2 4 6 8
2 3 6 9 12
3 4 8 12 16
Alternatively, most numpy arithmetic functions define an .outer unfunc:
pd.DataFrame(np.multiply.outer(arr, arr))
0 1 2 3
0 1 2 3 4
1 2 4 6 8
2 3 6 9 12
3 4 8 12 16
data = [1,2,3,4]
Nested for loops would work:
import numpy as np
a = []
for n in data:
row = []
for m in data:
math = some_operation_on(m,n)
row.append(math)
a.append(row)
a = np.array(a)
For simple operations like your example use numpy.meshgrid.
In [21]: a = [1,2,3,4]
In [22]: x,y = np.meshgrid(a,a)
In [23]: x*y
Out[23]:
array([[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12],
[ 4, 8, 12, 16]])
I am trying to multiply the leading diagonal in a pandas dataframe and I am not sure how to proceed in a computationally reasonable way.
df = [ 3 4 5
6 7 8
9 10 11]
ouput_df = [231 32 5
60 77 8
9 10 11]
Explanation - lookoing to 3 * 7 * 11 for the first element, 4 * 8 for the second element, 7 * 11 for the fifth element etc.,
Note: The matrix I am working on is not a square matrix, but a rectangular matrix.
Here's one based on NumPy -
def cumprod_upper_diag(a):
m,n = a.shape
mask = ~np.tri(m,n, dtype=bool)
p = np.ones((m,n),dtype=a.dtype)
p[mask[:,::-1]] = a[mask]
a[mask] = p[::-1].cumprod(0)[::-1][mask[:,::-1]]
return a
a = df.to_numpy(copy=False) # For older versions : a = df.values
out = a.copy()
cumprod_upper_diag(out)
cumprod_upper_diag(out.T)
out.ravel()[::a.shape[1]+1] = out.ravel()[::out.shape[1]+1][::-1].cumprod()[::-1]
out_df = pd.DataFrame(out)
You can use a sparse diagonal matrix here with some finnicking. This assumes all non-zero elements in your original matrix, or else this will not work.
from scipy import sparse
a = df.to_numpy()
b = sparse.dia_matrix(a)
c = b.data[:, ::-1]
cp = np.cumprod(np.where(c != 0, c, 1), axis=1)
b.data = cp[:, ::-1]
b.A
array([[231, 32, 5],
[ 60, 77, 8],
[ 9, 10, 11]], dtype=int64)
As Chris mentioned, this is cumprod in reverse order:
# stack for groupby
new_df = df.stack().reset_index()[::-1]
# diagonals meaning col_num - row_num are the same
diags = new_df['level_0']-new_df['level_1']
# groupby diagonals
new_df['out'] = new_df.groupby(diags)[0].cumprod()
# pivot to get the original shape
new_df.pivot('level_0', 'level_1', 'out')
output:
level_1 0 1 2
level_0
0 231 32 5
1 60 77 8
2 9 10 11
Here's a method that operates on the DataFrame in place.
df = pd.DataFrame(data=[[3, 4, 5], [6, 7, 8], [9, 10, 11]])
m, n = df.shape
for i in range(-m + 1, n):
ri, rj = max(-i, 0), min(m - 1, n - i - 1)
ci, cj = max( i, 0), min(n - 1, m + i - 1)
np.fill_diagonal(df.values[ri:rj+1,ci:cj+1],
df.values.diagonal(i)[::-1].cumprod()[::-1])
print(df)
Result:
0 1 2
0 231 32 5
1 60 77 8
2 9 10 11
I have a 2D array of shape (50,50). I need to subtract a value from each column of this array skipping the first), which is calculated based on the index of the column. For example, using a for loop it would look something like this:
for idx in range(1, A[0, :].shape[0]):
A[0, idx] -= idx * (...) # simple calculations with idx
Now, of course this works fine, but it's very slow and performance is critical for my application. I've tried computing the values to be subtracted using np.fromfunction() and then subtracting it from the original array, but results are different than those obtained by the for loop iteractive subtraction:
func = lambda i, j: j * (...) #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (1,50))
A[0, 1:] -= subtraction_matrix
What am I doing wrong? Or is there some other method that would be better? Any help is appreciated!
All your code snippets indicate that you require the subtraction to happen only in the first row of A (though you've not explicitly mentioned that). So, I'm proceeding with that understanding.
Referring to your use of from_function(), you can use the subtraction_matrix as below:
A[0,1:] -= subtraction_matrix[1:]
Testing it out (assuming shape (5,5) instead of (50,50)):
import numpy as np
A = np.arange(25).reshape(5,5)
print (A)
func = lambda j: j * 10 #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (5,), dtype=A.dtype)
A[0,1:] -= subtraction_matrix[1:]
print (A)
Output:
[[ 0 1 2 3 4] # print(A), before subtraction
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 -9 -18 -27 -36] # print(A), after subtraction
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
If you want the subtraction to happen in all the rows of A, you just need to use the line A[:,1:] -= subtraction_matrix[1:], instead of the line A[0,1:] -= subtraction_matrix[1:]
I have multiple numpy arrays with the same number of rows (axis_0) that I'd like to shuffle in unison. After one shuffle, I'd like to shuffle them again with a different random seed.
Till now, I've used the solution from
Better way to shuffle two numpy arrays in unison :
def shuffle_in_unison(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
However, this doesn't work for multiple unison shuffles, since rng_state is always the same.
I've tried to use RandomState in order to get a different seed for each call, but this doesn't even work for a single unison shuffle:
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,50])
def shuffle_in_unison(a, b):
r = np.random.RandomState() # different state from /dev/urandom for each call
state = r.get_state()
np.random.shuffle(a) # array([4, 2, 1, 5, 3])
np.random.set_state(state)
np.random.shuffle(b) # array([40, 20, 50, 10, 30])
# -> doesn't work
return a,b
for i in xrange(10):
a,b = shuffle_in_unison(a,b)
print a,b
What am I doing wrong?
Edit:
For everyone that doesn't have huge arrays like me, just use the solution by Francesco (https://stackoverflow.com/a/47156309/3955022):
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.permutation(n_elem)
return a[indeces], b[indeces]
The only drawback is that this is not an in-place operation, which is a pity for large arrays like mine (500G).
I don't know what are you doing wrong with the way you set the state. However I found an alternative solution: instead of shuffling n arrays, shuffle their indeces only once with numpy.random.choice and then reorder all the arrays.
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,5])
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.choice(n_elem, size=n_elem, replace=False)
return a[indeces], b[indeces]
for i in xrange(5):
a, b = shuffle_in_unison(a ,b)
print(a, b)
I get:
[5 2 4 3 1] [50 20 40 30 10]
[1 3 4 2 5] [10 30 40 20 50]
[1 2 5 4 3] [10 20 50 40 30]
[3 2 1 4 5] [30 20 10 40 50]
[1 2 5 3 4] [10 20 50 30 40]
edit
Thanks to #Divakar for the suggestion.
Here is a more readable way to obtain the same result using numpy.random.premutation
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.permutation(n_elem)
return a[indeces], b[indeces]
I don't know exactly what you are doing well, but you have not chosen the solution with the most votes on that page or with the second most votes. Try this one:
from sklearn.utils import shuffle
for i in range(10):
X, Y = shuffle(X, Y, random_state=i)
print ("X - ", X, "Y - ", Y)
Output:
X - [3 5 1 4 2] Y - [30 50 10 40 20]
X - [1 5 2 3 4] Y - [10 50 20 30 40]
X - [2 4 5 3 1] Y - [20 40 50 30 10]
X - [3 1 4 2 5] Y - [30 10 40 20 50]
X - [3 2 1 5 4] Y - [30 20 10 50 40]
X - [4 3 2 1 5] Y - [40 30 20 10 50]
X - [1 5 4 3 2] Y - [10 50 40 30 20]
X - [1 3 4 5 2] Y - [10 30 40 50 20]
X - [2 4 3 1 5] Y - [20 40 30 10 50]
X - [1 2 4 3 5] Y - [10 20 40 30 50]
I don't normally have to shuffle my data more than once at a time. But this function accommodates any number of input arrays, as well as any number of random shuffles - and it shuffles in-place.
import numpy as np
def shuffle_arrays(arrays, shuffle_quant=1):
assert all(len(arr) == len(arrays[0]) for arr in arrays)
max_int = 2**(32 - 1) - 1
for i in range(shuffle_quant):
seed = np.random.randint(0, max_int)
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c], shuffle_quant=5)
A few things to note:
Method uses NumPy and no other packages.
The assert ensures that all input arrays have the same length along
their first dimension.
The max_int keeps random seed within int32 range.
Arrays shuffled in-place by their first dimension - nothing returned.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.