We try to select values from matrices into pairs according to the procedure where values are selected diagonally. my code doesn't work as it should
You can see this sequence in the example below. It can be seen that the values are selected sequentially in a cross-form, where it starts in the penultimate line of the first value and joins it from the second value of the last line. It then moves one line up and continues in the same way.
. In the 1st example, the principle is that it takes cross values in the 1st example 21-> 32, then it starts 11-> 22, 11-> 33,22-> 33,12-> 23 and so on for all matrices. The same goes for the second example
code:
import numpy as np
a=np.array([[11,12,13],
[21,22,23],
[31,32,33]])
w,h = a.shape
for y0 in range(1,h):
y = h-y0-1
for x in range(h-y-1):
print( a[y+x,x], a[y+x+1,x+1] )
for x in range(1,w-1):
for y in range(w-x-1):
print( a[y,x+y], a[y+1,x+y+1] )
my outupt:
21 32
11 22
22 33
12 23
required output
21 32
11 22
11 33
22 33
12 23
However, if I use this matrix, for example, it will throw me an error.
a=np.array([[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]])
required output
21 32
11 22
11 33
22 33
12 23
12 34
23 34
13 24
13 35
24 35
14 25
14 36
25 36
15 26
my output
error
File "C:\Users\Pifkoooo\dp\skuska.py", line 24, in <module>
print( a[y+x,x], a[y+x+1,x+1] )
IndexError: index 2 is out of bounds for axis 0 with size 2
Can anyone advise me how to solve this problem and generalize it to work on all matrices with different shapes? Or if there is another way to approach this task?
Let's look for patterns (like here, but simpler)! First, let's say that you have an array of shape (M, N), with M=4 and N=5. First, let's note the linear indices of the elements:
i =
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
Once you have identified the first element in a pair, the linear index of the next element is just i + N + 1.
Now let's try to establish the path of the first element using the example in the linked question. First, look at the column indices and the row indices:
x =
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
y =
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
Now take the difference, and add a factor to account for the shape:
x - y + 2M - N =
3 4 5 6 7
2 3 4 5 6
1 2 3 4 5
0 1 2 3 4
The first element follows the index of the diagonals except at the bottom row and rightmost column. If you can stably argsort this array (np.argsort has a stable method that uses timsort), then apply that index to the linear indices, you have the path taken by the first element of every pair for any matrix at all. The first observation will then yield the second element.
So it all boils down to this:
M, N = a.shape
path = (np.arange(N - 1) - np.arange(M - 1)[:, None] + 2 * M - N).argsort(None)
indices = np.arange(M * N).reshape(M, N)[:-1, :-1].ravel()[path]
Now you have a couple of different options going forward:
Apply linear indices to the raveled a:
result = a.ravel()[indices[:, None] + [0, N + 1]]
Preserve the shape of a and use np.unravel_index to transform indices and indices + N + 1 into a 2D index:
result = a[np.unravel_index(indices[:, None] + [0, N + 1], a.shape)]
Moral of the story: this is all black magic!
Probably not the best performance, but it gets the job done if order does not matter. Iterate over all elements and try to access all of its diagonal partners. If the diagonal partner does not exist catch the raised IndexError and continue with the next element.
def print_diagonal_pairs(a):
rows, cols = a.shape
for row in range(rows):
for col in range(cols):
max_shift_amount = min(rows, cols) - min(row, col)
for shift_amount in range(1, max_shift_amount+1):
try:
print(a[row, col], a[row+shift_amount, col+shift_amount])
except IndexError:
continue
a = np.array([
[11,12,13],
[21,22,23],
[31,32,33],
])
print_diagonal_pairs(a)
# Output:
11 22
11 33
12 23
21 32
22 33
b = np.array([
[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]
])
print_diagonal_pairs(b)
# Output:
11 22
11 33
12 23
12 34
13 24
13 35
14 25
14 36
15 26
21 32
22 33
23 34
24 35
25 36
Not a solution, but I think you can use fancy indexing for this task. In the code snippet below i am selecting the indices x = [[0,1], [0,2], [1,2]] along the first axis. These indices will be broadcasted against the indices in y along the first dimension.
from itertools import combinations
a=np.array([[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]])
x = np.array(list(combinations(range(a.shape[0]), 2)))
y = x + np.arange(a.shape[1]-2)[:,None,None]
a[x,y].reshape(-1,2)
Output:
array([[11, 22],
[11, 33],
[22, 33],
[12, 23],
[12, 34],
[23, 34],
[13, 24],
[13, 35],
[24, 35],
[14, 25],
[14, 36],
[25, 36]])
This will select all correct values except for the start and end values for the second example. There is probably a smart way to include these edge values and select all values in one sweep, but I cannot think of a solution for this atm.
I thought the pattern was to select combinations of size 2 along each diagonal, but apparently not - so this solution will not give the correct "middle" values in your first example.
EDIT
You could extend the selection range and modify the two edge values:
x = np.array(list(combinations(range(a.shape[0]), 2)))
y = x + np.arange(-1,a.shape[1]-1)[:,None,None]
# assign edge values
y[0] = y[1][0]
y[-1] = y[-2][-1]
a[x,y].reshape(-1,2)[2:-2]
Output:
array([[21, 32],
[11, 22],
[11, 33],
[22, 33],
[12, 23],
[12, 34],
[23, 34],
[13, 24],
[13, 35],
[24, 35],
[14, 25],
[14, 36],
[25, 36],
[15, 26]])
My original answer was for the case in the original question where the pairs slid along the diagonals rather than spreading across them with the first point staying anchored. While the solution is not exactly the same, the concept of computing indices in a vectorized manner applies here too.
Start with the matrix of row minus column which gives diagonals as before:
diag = np.arange(1, N) - np.arange(1, M)[:, None] + 2 * M - N
This shows that the the second element is given by
second = a[1:, 1:].ravel()[diag.argsort(None, kind='stable')]
The heads of the diagonals are the first column in reverse and the first row. If you index them correctly, you get the first element of each pair:
head = np.r_[a[::-1, 0], a[0, 1:]]
first = head[np.sort(diag, axis=None)]
Now you can just concatenate the result:
result = np.stack((first, second), axis=-1)
See: black magic! And totally vectorized.
I'm trying to create a function which can look at previous rows in a DataFrame and sum them based on a set number of rows to look back over. Here I have used 3 but ideally I would like to scale it up to look back over more rows. My solution works but doesn't seem very efficient. The other criteria is each time it hits a new team the count must start again, so the first row for each new team is always 0, the data will be ordered in team order but if a solution is known for where the data isn't in team order this would be incredible.
Is there a function in Pandas which could help with this?
So far I've tried the code below and tried googling the issue, the closest example I could find is: here! but this groups the index and I'm unsure how to apply this when the value has to keep resetting each time it hits a new team, as it wouldn't distinguish each time there is a new team.
np.random.seed(0)
data = {'team':['a','a','a','a','a','a','a','a','b','b',
'b','b','b','b','b','b','c','c','c','c','c','c','c','c'],
'teamPoints': np.random.randint(0,4,24)}
df = pd.DataFrame.from_dict(data)
df.reset_index(inplace=True)
def find_sum_last_3(x):
if x == 0:
return 0
elif x == 1:
return df['teamPoints'][x-1]
elif x == 2:
return df['teamPoints'][x-1] + df['teamPoints'][x-2]
elif df['team'][x] != df['team'][x-1]:
return 0
elif df['team'][x] != df['team'][x-2]:
return df['teamPoints'][x-1]
elif df['team'][x] != df['team'][x-3]:
return df['teamPoints'][x-1] + df['teamPoints'][x-2]
else:
return df['teamPoints'][x-1] + df['teamPoints'][x-2] +
df['teamPoints'][x-3]
df['team_form_3games'] = df['index'].apply(lambda x : find_sum_last_3(x))
The first part of the function addresses the edge cases where a sum of 3 isn't possible because there are less than 3 elements
The second part of the function addresses the problem of the 'team' changing. When the team changes the sum needs to start again, so each 'team' is considered seperately
The final part simply looks at the previous 3 elements of the dataFrame and sums them together.
This example works as expected and gives a new column with expected output as follows:
0, 0, 3, 4, 4, 4, 6, 9, 0, 1, 4, 5, 6, 3, 5, 5, 0, 0, 0, 2, 3, 5, 6, 8
1st element is 0 as it is edge case, 2nd is 0 because the sum of the first element is 0. 3rd is 3 as the sum of the 1st and 2nd elements are 3. 4th is the sum of 1st,2nd,3rd. 5th is sum of 2nd,3rd,4th. 6th is sum of 3rd,4th,5th
However when scaled up to 10 it is shown to be very inefficient which makes it difficult to scale up to 10 or 15. It is also inelegant and a new function needs to be written for each different length of sum.
I think you are looking for GroupBy.apply + rolling:
r3=df.groupby('team')['teamPoints'].apply(lambda x: x.rolling(3).sum().shift())
r2=df.groupby('team')['teamPoints'].apply(lambda x: x.rolling(2).sum().shift())
r1=df.groupby('team')['teamPoints'].apply(lambda x: x.shift())
df['team_form_3games'] = r3.fillna(r2.fillna(r1).fillna(0))
print(df)
Output:
index team teamPoints team_form_3games
0 0 a 0 0.0
1 1 a 3 0.0
2 2 a 1 3.0
3 3 a 0 4.0
4 4 a 3 4.0
5 5 a 3 4.0
6 6 a 3 6.0
7 7 a 3 9.0
8 8 b 1 0.0
9 9 b 3 1.0
10 10 b 1 4.0
11 11 b 2 5.0
12 12 b 0 6.0
13 13 b 3 3.0
14 14 b 2 5.0
15 15 b 0 5.0
16 16 c 0 0.0
17 17 c 0 0.0
18 18 c 2 0.0
19 19 c 1 2.0
20 20 c 2 3.0
21 21 c 3 5.0
22 22 c 3 6.0
23 23 c 2 8.0
I have a 2D array of shape (50,50). I need to subtract a value from each column of this array skipping the first), which is calculated based on the index of the column. For example, using a for loop it would look something like this:
for idx in range(1, A[0, :].shape[0]):
A[0, idx] -= idx * (...) # simple calculations with idx
Now, of course this works fine, but it's very slow and performance is critical for my application. I've tried computing the values to be subtracted using np.fromfunction() and then subtracting it from the original array, but results are different than those obtained by the for loop iteractive subtraction:
func = lambda i, j: j * (...) #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (1,50))
A[0, 1:] -= subtraction_matrix
What am I doing wrong? Or is there some other method that would be better? Any help is appreciated!
All your code snippets indicate that you require the subtraction to happen only in the first row of A (though you've not explicitly mentioned that). So, I'm proceeding with that understanding.
Referring to your use of from_function(), you can use the subtraction_matrix as below:
A[0,1:] -= subtraction_matrix[1:]
Testing it out (assuming shape (5,5) instead of (50,50)):
import numpy as np
A = np.arange(25).reshape(5,5)
print (A)
func = lambda j: j * 10 #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (5,), dtype=A.dtype)
A[0,1:] -= subtraction_matrix[1:]
print (A)
Output:
[[ 0 1 2 3 4] # print(A), before subtraction
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 -9 -18 -27 -36] # print(A), after subtraction
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
If you want the subtraction to happen in all the rows of A, you just need to use the line A[:,1:] -= subtraction_matrix[1:], instead of the line A[0,1:] -= subtraction_matrix[1:]
I have a pandas.DataFrame() object like below
start, end
5, 9
6, 11
13, 11
14, 11
15, 17
16, 17
18, 17
19, 17
20, 24
22, 26
"end" has to always be > "start"
So, I need to filter it from when the "end" values becomes < "start" till the next row where they are again are back to normal.
In above example, I need:
1.
13,11
15,17
2.
18,17
20,24
Edit: (updated)
Think of these as timestamps in seconds. So I can find that it took 2 seconds in both scenario to recover back.
I can do this in iterating the data, but does Pandas have a better way ?
You could use panda's boolean indexing to find the rows where start < end. Then if you reset the index you can calculate the difference between the original indices that act as the upper and lower bounds delta between rows where start > end.
For example you could do something like the following:
# A = starts, B = ends
df = pd.DataFrame({'B' : [9, 11, 11, 11, 17, 17, 17, 17, 24, 26],
'A': [5, 6, 13, 14, 15, 16, 18, 19, 20, 22]})
# use boolean indexing
df = df[df['A'] < df['B']].reset_index()
# calculate the difference of each row's "old" index to determine delta
diffs = df['index'].diff()
# create a column to show deltas
df['delta'] = diffs
print(diffs)
print(df)
The diffs data frame looks like:
0 NaN
1 1
2 3
3 1
4 3
5 1
Name: index, dtype: float64
Notice the NaN value since the diff() method subtracts the previous row from the current row, but since the first row has no previous row it marks a NaN. One must only look at the first value of the index column to calculate the delta in the case that the first arbitrary number of n starts were > ends.
The fully augmented data frame would then look like:
index A B delta
0 0 5 9 NaN
1 1 6 11 1
2 4 15 17 3
3 5 16 17 1
4 8 20 24 3
5 9 22 26 1
If you wish to delete any of the extraneous columns you can use the del method like so:
del col1, col2, col3, etc..
I need to fetch random numbers from a list of values in Python. I tried using random.choice() function but it sometimes returns same values consecutively. I want to return new random values from the list each time. Is there any function in Python that allows me to perform such an action ?
Create a copy of the list, shuffle it, then pop items from that one by one as you need a new random value:
shuffled = origlist[:]
random.shuffle(shuffled)
def produce_random_value():
return shuffled.pop()
This is guaranteed to not repeat elements. You can, however, run out of numbers to pick, at which point you could copy again and re-shuffle.
To do this continuously, you could make this a generator function:
def produce_randomly_from(items):
while True:
shuffled = list(items)
random.shuffle(shuffled)
while shuffled:
yield shuffled.pop()
then use this in a loop or grab a new value with the next() function:
random_items = produce_randomly_from(inputsequence)
# grab one random value from the sequence
random_item = next(random_items)
Here is an example:
>>> random.sample(range(10), 10)
[9, 5, 2, 0, 6, 3, 1, 8, 7, 4]
Just replace the sequence given by range with the one you want to choose from. The second number is how many samples, and should be the length of the input sequence.
If you just want to avoid consecutive random values, you can try this:
import random
def nonrepeating_rand(n):
''' Generate random numbers in [0, n) such that no two consecutive numbers are equal. '''
k = random.randrange(n)
while 1:
yield k
k2 = random.randrange(n-1)
if k2 >= k: # Skip over the previous number
k2 += 1
k = k2
Test:
for i,j in zip(range(25), nonrepeating_rand(3)):
print i,j
prints (for example)
0 1
1 0
2 2
3 0
4 2
5 0
6 2
7 1
8 0
9 1
10 0
11 2
12 0
13 1
14 0
15 2
16 1
17 0
18 2
19 1
20 0
21 2
22 1
23 2
24 0
You can use nonrepeating_rand(len(your_list)) to get random indices for your list.