We try to select values from matrices into pairs according to the procedure where values are selected diagonally. my code doesn't work as it should
You can see this sequence in the example below. It can be seen that the values are selected sequentially in a cross-form, where it starts in the penultimate line of the first value and joins it from the second value of the last line. It then moves one line up and continues in the same way.
. In the 1st example, the principle is that it takes cross values in the 1st example 21-> 32, then it starts 11-> 22, 11-> 33,22-> 33,12-> 23 and so on for all matrices. The same goes for the second example
code:
import numpy as np
a=np.array([[11,12,13],
[21,22,23],
[31,32,33]])
w,h = a.shape
for y0 in range(1,h):
y = h-y0-1
for x in range(h-y-1):
print( a[y+x,x], a[y+x+1,x+1] )
for x in range(1,w-1):
for y in range(w-x-1):
print( a[y,x+y], a[y+1,x+y+1] )
my outupt:
21 32
11 22
22 33
12 23
required output
21 32
11 22
11 33
22 33
12 23
However, if I use this matrix, for example, it will throw me an error.
a=np.array([[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]])
required output
21 32
11 22
11 33
22 33
12 23
12 34
23 34
13 24
13 35
24 35
14 25
14 36
25 36
15 26
my output
error
File "C:\Users\Pifkoooo\dp\skuska.py", line 24, in <module>
print( a[y+x,x], a[y+x+1,x+1] )
IndexError: index 2 is out of bounds for axis 0 with size 2
Can anyone advise me how to solve this problem and generalize it to work on all matrices with different shapes? Or if there is another way to approach this task?
Let's look for patterns (like here, but simpler)! First, let's say that you have an array of shape (M, N), with M=4 and N=5. First, let's note the linear indices of the elements:
i =
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
Once you have identified the first element in a pair, the linear index of the next element is just i + N + 1.
Now let's try to establish the path of the first element using the example in the linked question. First, look at the column indices and the row indices:
x =
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
y =
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
Now take the difference, and add a factor to account for the shape:
x - y + 2M - N =
3 4 5 6 7
2 3 4 5 6
1 2 3 4 5
0 1 2 3 4
The first element follows the index of the diagonals except at the bottom row and rightmost column. If you can stably argsort this array (np.argsort has a stable method that uses timsort), then apply that index to the linear indices, you have the path taken by the first element of every pair for any matrix at all. The first observation will then yield the second element.
So it all boils down to this:
M, N = a.shape
path = (np.arange(N - 1) - np.arange(M - 1)[:, None] + 2 * M - N).argsort(None)
indices = np.arange(M * N).reshape(M, N)[:-1, :-1].ravel()[path]
Now you have a couple of different options going forward:
Apply linear indices to the raveled a:
result = a.ravel()[indices[:, None] + [0, N + 1]]
Preserve the shape of a and use np.unravel_index to transform indices and indices + N + 1 into a 2D index:
result = a[np.unravel_index(indices[:, None] + [0, N + 1], a.shape)]
Moral of the story: this is all black magic!
Probably not the best performance, but it gets the job done if order does not matter. Iterate over all elements and try to access all of its diagonal partners. If the diagonal partner does not exist catch the raised IndexError and continue with the next element.
def print_diagonal_pairs(a):
rows, cols = a.shape
for row in range(rows):
for col in range(cols):
max_shift_amount = min(rows, cols) - min(row, col)
for shift_amount in range(1, max_shift_amount+1):
try:
print(a[row, col], a[row+shift_amount, col+shift_amount])
except IndexError:
continue
a = np.array([
[11,12,13],
[21,22,23],
[31,32,33],
])
print_diagonal_pairs(a)
# Output:
11 22
11 33
12 23
21 32
22 33
b = np.array([
[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]
])
print_diagonal_pairs(b)
# Output:
11 22
11 33
12 23
12 34
13 24
13 35
14 25
14 36
15 26
21 32
22 33
23 34
24 35
25 36
Not a solution, but I think you can use fancy indexing for this task. In the code snippet below i am selecting the indices x = [[0,1], [0,2], [1,2]] along the first axis. These indices will be broadcasted against the indices in y along the first dimension.
from itertools import combinations
a=np.array([[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]])
x = np.array(list(combinations(range(a.shape[0]), 2)))
y = x + np.arange(a.shape[1]-2)[:,None,None]
a[x,y].reshape(-1,2)
Output:
array([[11, 22],
[11, 33],
[22, 33],
[12, 23],
[12, 34],
[23, 34],
[13, 24],
[13, 35],
[24, 35],
[14, 25],
[14, 36],
[25, 36]])
This will select all correct values except for the start and end values for the second example. There is probably a smart way to include these edge values and select all values in one sweep, but I cannot think of a solution for this atm.
I thought the pattern was to select combinations of size 2 along each diagonal, but apparently not - so this solution will not give the correct "middle" values in your first example.
EDIT
You could extend the selection range and modify the two edge values:
x = np.array(list(combinations(range(a.shape[0]), 2)))
y = x + np.arange(-1,a.shape[1]-1)[:,None,None]
# assign edge values
y[0] = y[1][0]
y[-1] = y[-2][-1]
a[x,y].reshape(-1,2)[2:-2]
Output:
array([[21, 32],
[11, 22],
[11, 33],
[22, 33],
[12, 23],
[12, 34],
[23, 34],
[13, 24],
[13, 35],
[24, 35],
[14, 25],
[14, 36],
[25, 36],
[15, 26]])
My original answer was for the case in the original question where the pairs slid along the diagonals rather than spreading across them with the first point staying anchored. While the solution is not exactly the same, the concept of computing indices in a vectorized manner applies here too.
Start with the matrix of row minus column which gives diagonals as before:
diag = np.arange(1, N) - np.arange(1, M)[:, None] + 2 * M - N
This shows that the the second element is given by
second = a[1:, 1:].ravel()[diag.argsort(None, kind='stable')]
The heads of the diagonals are the first column in reverse and the first row. If you index them correctly, you get the first element of each pair:
head = np.r_[a[::-1, 0], a[0, 1:]]
first = head[np.sort(diag, axis=None)]
Now you can just concatenate the result:
result = np.stack((first, second), axis=-1)
See: black magic! And totally vectorized.
I have a 3D numpy array A with shape(k, l, m) and a 2D numpy array B with shape (k,l) with the indexes (between 0 and m-1) of particular items that I want to create a new 2D array C with shape (k,l), like this:
import numpy as np
A = np.random.random((2,3,4))
B = np.array([[0,0,0],[2,2,2]))
C = np.zeros((2,3))
for i in range(2):
for j in range(3):
C[i,j] = A[i, j, B[i,j]]
Is there a more efficient way of doing this?
Use inbuilt routine name fromfunction of Numpy library. And turn your code into
C = np.fromfunction(lambda i, j: A[i, j, B[i,j]], (5, 5))
Setup:
import numpy as np
k,l,m = 2,3,4
a = np.arange(k*l*m).reshape(k,l,m)
b = np.random.randint(0,4,(k,l))
print(a)
print('*'*10)
print(b)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
**********
[[3 0 3]
[2 1 2]]
Use integer indexing to select the values then reshape.
x,y = np.indices(a.shape[:-1])
c = a[x,y,b]
print(c)
[[ 3 4 11]
[14 17 22]]
Using numpy.ix_.
x,y = np.ix_(np.arange(a.shape[0]),np.arange(a.shape[1]))
d = a[x,y,b]
I was trying to do an or boolean logical indexing on a Numpy array but I cannot find a good way.
The and operator & works properly like:
X = np.arange(25).reshape(5, 5)
# We print X
print()
print('Original X = \n', X)
print()
X[(X > 10) & (X < 17)] = -1
# We print X
print()
print('X = \n', X)
print()
Original X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 -1 -1 -1 -1]
[-1 -1 17 18 19]
[20 21 22 23 24]]
But when I try with:
X = np.arange(25).reshape(5, 5)
# We use Boolean indexing to assign the elements that are between 10 and 17 the value of -1
X[ (X < 10) or (X > 20) ] = 0 # No or condition possible!?!
I got the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Does exist any good way to use the or logic operator?
You can use numpy.logical_or for that task following way:
import numpy as np
X = np.arange(25).reshape(5,5)
X[np.logical_or(X<10,X>20)] = 0
print(X)
Output:
[[ 0 0 0 0 0]
[ 0 0 0 0 0]
[10 11 12 13 14]
[15 16 17 18 19]
[20 0 0 0 0]]
There is also numpy.logical_and, numpy.logical_xor and numpy.logical_not
I would use something with np.logical_and and np.where.
For your given example, I believe this would work.
X = np.arange(25).reshape(5, 5)
i = np.where(np.logical_and(X > 10 , X < 17))
X[i] = -1
This is not a very pythonic answer. But it's pretty clear
I am trying to multiply the leading diagonal in a pandas dataframe and I am not sure how to proceed in a computationally reasonable way.
df = [ 3 4 5
6 7 8
9 10 11]
ouput_df = [231 32 5
60 77 8
9 10 11]
Explanation - lookoing to 3 * 7 * 11 for the first element, 4 * 8 for the second element, 7 * 11 for the fifth element etc.,
Note: The matrix I am working on is not a square matrix, but a rectangular matrix.
Here's one based on NumPy -
def cumprod_upper_diag(a):
m,n = a.shape
mask = ~np.tri(m,n, dtype=bool)
p = np.ones((m,n),dtype=a.dtype)
p[mask[:,::-1]] = a[mask]
a[mask] = p[::-1].cumprod(0)[::-1][mask[:,::-1]]
return a
a = df.to_numpy(copy=False) # For older versions : a = df.values
out = a.copy()
cumprod_upper_diag(out)
cumprod_upper_diag(out.T)
out.ravel()[::a.shape[1]+1] = out.ravel()[::out.shape[1]+1][::-1].cumprod()[::-1]
out_df = pd.DataFrame(out)
You can use a sparse diagonal matrix here with some finnicking. This assumes all non-zero elements in your original matrix, or else this will not work.
from scipy import sparse
a = df.to_numpy()
b = sparse.dia_matrix(a)
c = b.data[:, ::-1]
cp = np.cumprod(np.where(c != 0, c, 1), axis=1)
b.data = cp[:, ::-1]
b.A
array([[231, 32, 5],
[ 60, 77, 8],
[ 9, 10, 11]], dtype=int64)
As Chris mentioned, this is cumprod in reverse order:
# stack for groupby
new_df = df.stack().reset_index()[::-1]
# diagonals meaning col_num - row_num are the same
diags = new_df['level_0']-new_df['level_1']
# groupby diagonals
new_df['out'] = new_df.groupby(diags)[0].cumprod()
# pivot to get the original shape
new_df.pivot('level_0', 'level_1', 'out')
output:
level_1 0 1 2
level_0
0 231 32 5
1 60 77 8
2 9 10 11
Here's a method that operates on the DataFrame in place.
df = pd.DataFrame(data=[[3, 4, 5], [6, 7, 8], [9, 10, 11]])
m, n = df.shape
for i in range(-m + 1, n):
ri, rj = max(-i, 0), min(m - 1, n - i - 1)
ci, cj = max( i, 0), min(n - 1, m + i - 1)
np.fill_diagonal(df.values[ri:rj+1,ci:cj+1],
df.values.diagonal(i)[::-1].cumprod()[::-1])
print(df)
Result:
0 1 2
0 231 32 5
1 60 77 8
2 9 10 11
I have a 2D array of shape (50,50). I need to subtract a value from each column of this array skipping the first), which is calculated based on the index of the column. For example, using a for loop it would look something like this:
for idx in range(1, A[0, :].shape[0]):
A[0, idx] -= idx * (...) # simple calculations with idx
Now, of course this works fine, but it's very slow and performance is critical for my application. I've tried computing the values to be subtracted using np.fromfunction() and then subtracting it from the original array, but results are different than those obtained by the for loop iteractive subtraction:
func = lambda i, j: j * (...) #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (1,50))
A[0, 1:] -= subtraction_matrix
What am I doing wrong? Or is there some other method that would be better? Any help is appreciated!
All your code snippets indicate that you require the subtraction to happen only in the first row of A (though you've not explicitly mentioned that). So, I'm proceeding with that understanding.
Referring to your use of from_function(), you can use the subtraction_matrix as below:
A[0,1:] -= subtraction_matrix[1:]
Testing it out (assuming shape (5,5) instead of (50,50)):
import numpy as np
A = np.arange(25).reshape(5,5)
print (A)
func = lambda j: j * 10 #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (5,), dtype=A.dtype)
A[0,1:] -= subtraction_matrix[1:]
print (A)
Output:
[[ 0 1 2 3 4] # print(A), before subtraction
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 -9 -18 -27 -36] # print(A), after subtraction
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
If you want the subtraction to happen in all the rows of A, you just need to use the line A[:,1:] -= subtraction_matrix[1:], instead of the line A[0,1:] -= subtraction_matrix[1:]