How to efficiently subtract values from each column with numpy - python

I have a 2D array of shape (50,50). I need to subtract a value from each column of this array skipping the first), which is calculated based on the index of the column. For example, using a for loop it would look something like this:
for idx in range(1, A[0, :].shape[0]):
A[0, idx] -= idx * (...) # simple calculations with idx
Now, of course this works fine, but it's very slow and performance is critical for my application. I've tried computing the values to be subtracted using np.fromfunction() and then subtracting it from the original array, but results are different than those obtained by the for loop iteractive subtraction:
func = lambda i, j: j * (...) #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (1,50))
A[0, 1:] -= subtraction_matrix
What am I doing wrong? Or is there some other method that would be better? Any help is appreciated!

All your code snippets indicate that you require the subtraction to happen only in the first row of A (though you've not explicitly mentioned that). So, I'm proceeding with that understanding.
Referring to your use of from_function(), you can use the subtraction_matrix as below:
A[0,1:] -= subtraction_matrix[1:]
Testing it out (assuming shape (5,5) instead of (50,50)):
import numpy as np
A = np.arange(25).reshape(5,5)
print (A)
func = lambda j: j * 10 #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (5,), dtype=A.dtype)
A[0,1:] -= subtraction_matrix[1:]
print (A)
Output:
[[ 0 1 2 3 4] # print(A), before subtraction
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 -9 -18 -27 -36] # print(A), after subtraction
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
If you want the subtraction to happen in all the rows of A, you just need to use the line A[:,1:] -= subtraction_matrix[1:], instead of the line A[0,1:] -= subtraction_matrix[1:]

Related

How to select pairs of values in an array according to a given sequence for all matrix shapes?

We try to select values from matrices into pairs according to the procedure where values are selected diagonally. my code doesn't work as it should
You can see this sequence in the example below. It can be seen that the values are selected sequentially in a cross-form, where it starts in the penultimate line of the first value and joins it from the second value of the last line. It then moves one line up and continues in the same way.
. In the 1st example, the principle is that it takes cross values in the 1st example 21-> 32, then it starts 11-> 22, 11-> 33,22-> 33,12-> 23 and so on for all matrices. The same goes for the second example
code:
import numpy as np
a=np.array([[11,12,13],
[21,22,23],
[31,32,33]])
w,h = a.shape
for y0 in range(1,h):
y = h-y0-1
for x in range(h-y-1):
print( a[y+x,x], a[y+x+1,x+1] )
for x in range(1,w-1):
for y in range(w-x-1):
print( a[y,x+y], a[y+1,x+y+1] )
my outupt:
21 32
11 22
22 33
12 23
required output
21 32
11 22
11 33
22 33
12 23
However, if I use this matrix, for example, it will throw me an error.
a=np.array([[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]])
required output
21 32
11 22
11 33
22 33
12 23
12 34
23 34
13 24
13 35
24 35
14 25
14 36
25 36
15 26
my output
error
File "C:\Users\Pifkoooo\dp\skuska.py", line 24, in <module>
print( a[y+x,x], a[y+x+1,x+1] )
IndexError: index 2 is out of bounds for axis 0 with size 2
Can anyone advise me how to solve this problem and generalize it to work on all matrices with different shapes? Or if there is another way to approach this task?
Let's look for patterns (like here, but simpler)! First, let's say that you have an array of shape (M, N), with M=4 and N=5. First, let's note the linear indices of the elements:
i =
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
Once you have identified the first element in a pair, the linear index of the next element is just i + N + 1.
Now let's try to establish the path of the first element using the example in the linked question. First, look at the column indices and the row indices:
x =
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
y =
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
Now take the difference, and add a factor to account for the shape:
x - y + 2M - N =
3 4 5 6 7
2 3 4 5 6
1 2 3 4 5
0 1 2 3 4
The first element follows the index of the diagonals except at the bottom row and rightmost column. If you can stably argsort this array (np.argsort has a stable method that uses timsort), then apply that index to the linear indices, you have the path taken by the first element of every pair for any matrix at all. The first observation will then yield the second element.
So it all boils down to this:
M, N = a.shape
path = (np.arange(N - 1) - np.arange(M - 1)[:, None] + 2 * M - N).argsort(None)
indices = np.arange(M * N).reshape(M, N)[:-1, :-1].ravel()[path]
Now you have a couple of different options going forward:
Apply linear indices to the raveled a:
result = a.ravel()[indices[:, None] + [0, N + 1]]
Preserve the shape of a and use np.unravel_index to transform indices and indices + N + 1 into a 2D index:
result = a[np.unravel_index(indices[:, None] + [0, N + 1], a.shape)]
Moral of the story: this is all black magic!
Probably not the best performance, but it gets the job done if order does not matter. Iterate over all elements and try to access all of its diagonal partners. If the diagonal partner does not exist catch the raised IndexError and continue with the next element.
def print_diagonal_pairs(a):
rows, cols = a.shape
for row in range(rows):
for col in range(cols):
max_shift_amount = min(rows, cols) - min(row, col)
for shift_amount in range(1, max_shift_amount+1):
try:
print(a[row, col], a[row+shift_amount, col+shift_amount])
except IndexError:
continue
a = np.array([
[11,12,13],
[21,22,23],
[31,32,33],
])
print_diagonal_pairs(a)
# Output:
11 22
11 33
12 23
21 32
22 33
b = np.array([
[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]
])
print_diagonal_pairs(b)
# Output:
11 22
11 33
12 23
12 34
13 24
13 35
14 25
14 36
15 26
21 32
22 33
23 34
24 35
25 36
Not a solution, but I think you can use fancy indexing for this task. In the code snippet below i am selecting the indices x = [[0,1], [0,2], [1,2]] along the first axis. These indices will be broadcasted against the indices in y along the first dimension.
from itertools import combinations
a=np.array([[11,12,13,14,15,16],
[21,22,23,24,25,26],
[31,32,33,34,35,36]])
x = np.array(list(combinations(range(a.shape[0]), 2)))
y = x + np.arange(a.shape[1]-2)[:,None,None]
a[x,y].reshape(-1,2)
Output:
array([[11, 22],
[11, 33],
[22, 33],
[12, 23],
[12, 34],
[23, 34],
[13, 24],
[13, 35],
[24, 35],
[14, 25],
[14, 36],
[25, 36]])
This will select all correct values except for the start and end values for the second example. There is probably a smart way to include these edge values and select all values in one sweep, but I cannot think of a solution for this atm.
I thought the pattern was to select combinations of size 2 along each diagonal, but apparently not - so this solution will not give the correct "middle" values in your first example.
EDIT
You could extend the selection range and modify the two edge values:
x = np.array(list(combinations(range(a.shape[0]), 2)))
y = x + np.arange(-1,a.shape[1]-1)[:,None,None]
# assign edge values
y[0] = y[1][0]
y[-1] = y[-2][-1]
a[x,y].reshape(-1,2)[2:-2]
Output:
array([[21, 32],
[11, 22],
[11, 33],
[22, 33],
[12, 23],
[12, 34],
[23, 34],
[13, 24],
[13, 35],
[24, 35],
[14, 25],
[14, 36],
[25, 36],
[15, 26]])
My original answer was for the case in the original question where the pairs slid along the diagonals rather than spreading across them with the first point staying anchored. While the solution is not exactly the same, the concept of computing indices in a vectorized manner applies here too.
Start with the matrix of row minus column which gives diagonals as before:
diag = np.arange(1, N) - np.arange(1, M)[:, None] + 2 * M - N
This shows that the the second element is given by
second = a[1:, 1:].ravel()[diag.argsort(None, kind='stable')]
The heads of the diagonals are the first column in reverse and the first row. If you index them correctly, you get the first element of each pair:
head = np.r_[a[::-1, 0], a[0, 1:]]
first = head[np.sort(diag, axis=None)]
Now you can just concatenate the result:
result = np.stack((first, second), axis=-1)
See: black magic! And totally vectorized.

Deleting certain elements from a matrix

I have the following problem:
I have a matrix. Now, I want to delete one entry in each row of the matrix: In rows that contain a certain number (say 4) I want to delete the entry with that number, and in other rows I simply want to delete the last element.
E.g. if I have the matrix
matrix=np.zeros((2,2))
matrix[0,0]=2
matrix[1,0]=4
matrix
which gives
2 0
4 0
after the deletion it should simply be
2
0
thanks for your help!
so, assuming there's maximum only one 4 in a row, what you want to do is:
iterate all rows, and if there's a four use roll so it becomes the last element
delete the last column
in rows that have 4, it will delete this 4 and shift the remaining values that come after it,
in rows that don't have 4, it will delete the last element.
(I took the liberty of trying with a little bigger matrix just to make sure output is as expected)
try this:
import numpy as np
# Actual solution
def remove_in_rows(mat, num):
for i, row in enumerate(mat):
if num in row.tolist():
index = row.tolist().index(num)
mat[i][index:] = np.roll(row[index:], -1)
return np.delete(mat, -1, 1)
# Just some example to demonstrate it works
matrix = np.array([[10 * y + x for x in range(6)] for y in range(6)])
matrix[1, 2] = 4
matrix[3, 3] = 4
matrix[4, 0] = 4
print("BEFORE:")
print(matrix)
matrix = remove_in_rows(matrix, 4)
print("AFTER:")
print(matrix)
Output:
BEFORE:
[[ 0 1 2 3 4 5]
[10 11 4 13 14 15]
[20 21 22 23 24 25]
[30 31 32 4 34 35]
[ 4 41 42 43 44 45]
[50 51 52 53 54 55]]
AFTER:
[[ 0 1 2 3 5]
[10 11 13 14 15]
[20 21 22 23 24]
[30 31 32 34 35]
[41 42 43 44 45]
[50 51 52 53 54]]

pyfinance module. rolling OLS - min window needed

i'm tring to do a simple linear regression using pyfinance package and using PandasRollingOLS to have rolling regression beta (rolling with min_window option).
it works but i would like to have a min_window in the function.
i would like to have min_window in the rollingOLS function, because if we have a window of 90 it does not perform OLS on first 90 values. i would like to perform a OLS expanding until 90 observations starting when there is at least 12 observation (min_window), then rolling of 90 (window)
i tried to understand the code of the package but i'm not able to include min_window in the code.
i would like this kind of function (this is init of PandasRollingOLS class):
def __init__(self, y, x=None, window=None, **min_window=None**, has_const=False, use_const=True):
i think i should update the code on utils.rolling_windows posted below, can someone help me please?
def rolling_windows(a, window):
"""Creates rolling-window 'blocks' of length `window` from `a`.
Note that the orientation of rows/columns follows that of pandas.
Example
-------
import numpy as np
onedim = np.arange(20)
twodim = onedim.reshape((5,4))
print(twodim)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
print(rwindows(onedim, 3)[:5])
[[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]]
print(rwindows(twodim, 3)[:5])
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]]
"""
if window > a.shape[0]:
raise ValueError('Specified `window` length of {0} exceeds length of'
' `a`, {1}.'.format(window, a.shape[0]))
if isinstance(a, (Series, DataFrame)):
a = a.values
if a.ndim == 1:
a = a.reshape(-1, 1)
shape = (a.shape[0] - window + 1, window) + a.shape[1:]
strides = (a.strides[0],) + a.strides
windows = np.squeeze(np.lib.stride_tricks.as_strided(a, shape=shape,
strides=strides))
# In cases where window == len(a), we actually want to "unsqueeze" to 2d.
# I.e., we still want a "windowed" structure with 1 window.
if windows.ndim == 1:
windows = np.atleast_2d(windows)
return windows
thank you all!
Alessandro
I am struggling with this myself at the moment using PandasRollingOLS. I came to the temporary conclusion to simply take care of it before the regression, i.e. delete every column with below min_window value before running regressions.
min_window = 3
df.loc[:,~(df.rolling(min_window).count() < min_window).all()]
Note that it requires that your dataframe has NaNs (which is why I guess you want to have a min_window):
NaN NaN
0.5 NaN
0.8 NaN
0.7 0.5
0.6 0.4
This might be a temporary (ugly) solution until a Python guru stumbles upon your post.

Using generator items selectively

Let's say I have some arrays/lists that contains a lot of values, which means that loading several of these into memory would ultimately result in a memory error due to lack of memory. One way to circumvent this is to load these arrays/lists into a generator, and then use them when needed. However, with generators you don't have so much control as with arrays/lists - and that is my problem.
Let me explain.
As an example I have the following code, which produces a generator with some small lists. So yeah, this is not memory intensive at all, just an example:
import numpy as np
np.random.seed(10)
number_of_lists = range(0, 5)
generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)
If I iterate over this list I get the following:
for i in generator_list:
print(i)
>> [9 4 0 1 9 0 1 8 9 0]
>> [8 6 4 3 0 4 6 8 1 8]
>> [4 1 3 6 5 3 9 6 9 1]
>> [9 4 2 6 7 8 8 9 2 0]
>> [6 7 8 1 7 1 4 0 8 5]
What I would like to do is sum element wise for all the lists (axis = 0). So the above should in turn result in:
[36, 22, 17, 17, 28, 16, 28, 31, 29, 14]
To do this I could use the following:
sum = [0]*10
for i in generator_list:
sum += i
where 10 is the length of one of the lists.
So far so good. I am not sure if there is a better/more optimized way of doing it, but it works.
My problem is that I would like to determine which lists in the generator_list I want to use. For example, what if I wanted to sum two of the first [0] list, one of the third, and 2 of the last, i.e.:
[9 4 0 1 9 0 1 8 9 0]
[9 4 0 1 9 0 1 8 9 0]
[4 1 3 6 5 3 9 6 9 1]
[6 7 8 1 7 1 4 0 8 5]
[6 7 8 1 7 1 4 0 8 5]
>> [34, 23, 19, 10, 35, 5, 19, 22, 43, 11]
How would I go about doing that ?
And before any questions arise why I want to do it this way, the reason is that in my real case, getting the arrays into the generator takes some time. I could then in principle just generate a new generator where I put in the order of lists as seen in the new list, but again, that would mean I would have to wait to get them in a new generator. And if this is to happen thousands of times (as seen with bootstrapping), well, it would take some time. With the first generator I have ALL lists that are available. Now I just wish to use them selectively so I don't have to create a new generator every time I want to mix it up, and sum a new set of arrays/lists.
import numpy as np
np.random.seed(10)
number_of_lists = range(5)
generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)
indices = [0, 0, 2, 4, 4]
assert sorted(indices) == indices, "only works for sorted list"
# sum_ = [0] * 10
# I prefer this:
sum_ = np.zeros((10,), dtype=int)
generator_index = -1
for index in indices:
while generator_index < index:
vector = next(generator_list)
generator_index += 1
sum_ += vector
print(sum_)
outputs
[34 23 19 10 37 5 19 22 43 11]

In Python how do you split a list into evenly sized chunks starting with the last element from the previous chunk?

What would be the most pythonic way to convert a list like:
mylist = [0,1,2,3,4,5,6,7,8]
into chunks of n elements that always start with the last element of the previous chunk.
The last element of the last chunk should be identical to the first element of the first chunk to make the data structure circular.
Like:
[
[0,1,2,3],
[3,4,5,6],
[6,7,8,0],
]
under the assumption that len(mylist) % (n-1) == 0 . So that it always works nicely.
What about the straightforward solution?
splitlists = [mylist[i:i+n] for i in range(0, len(mylist), n-1)]
splitlists[-1].append(splitlists[0][0])
A much less straightforward solution involving numpy (for the sake of overkill):
from numpy import arange, roll, column_stack
n = 4
values = arange(10, 26)
# values -> [10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25]
idx = arange(0, values.size, n) # [ 0 4 8 12]
idx = roll(idx, -1) # [ 4 8 12 0]
col = values[idx] # [14 18 22 10]
values = column_stack( (values.reshape(n, -1), col) )
[[10 11 12 13 14]
[14 15 16 17 18]
[18 19 20 21 22]
[22 23 24 25 10]]

Categories

Resources