Creating history of data from 2d numpy arrays? - python

Suppose I have a 2-dimensional numpy array of shape n X m (where n is large number and m >=1 ). Each column represents one attribute. An example for n=5, m=3 is provided below:
[[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]]
I want to train my model on the history of attributes with history_steps = p(1< p <= n). For p=2, the output I expect (of shape (n-p+1 X m*p)) is
[[1,4,2,5,3,6],
[4,7,5,8,6,9],
[7,10,8,11,9,12],
[10,13,11,14,12,15]]
I tried to implement this in pandas by separating columns and then concatenating outputs.
def buff(s, n):
return (pd.concat([s.shift(-i) for i in range(n)], axis=1).dropna().astype(float))
But, for my purposes a numpy based approach will be better. Also, I would like to avoid splitting and concatenating.
How do I go about doing this?

Here's a NumPy based approach with focus on performance using np.lib.stride_tricks.as_strided -
def strided_axis0(a, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
nrows = m - L + 1
strided = np.lib.stride_tricks.as_strided
# Finally use strides to get the 3D array view and then reshape
return strided(a, shape=(nrows,n,L), strides=(s0,s1,s0)).reshape(nrows,-1)
Sample run -
In [27]: a
Out[27]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
In [28]: strided_axis0(a, L=2)
Out[28]:
array([[ 1, 4, 2, 5, 3, 6],
[ 4, 7, 5, 8, 6, 9],
[ 7, 10, 8, 11, 9, 12],
[10, 13, 11, 14, 12, 15]])

You can use dstack + reshape:
a = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]])
# use `dstack` to stack the two arrays(one with last row removed, the other with first
# row removed), along the third axis, and then use reshape to flatten the second and third
# dimensions
np.dstack([a[:-1], a[1:]]).reshape(a.shape[0]-1, -1)
#array([[ 1, 4, 2, 5, 3, 6],
# [ 4, 7, 5, 8, 6, 9],
# [ 7, 10, 8, 11, 9, 12],
# [10, 13, 11, 14, 12, 15]])
To generalize to arbitrary p, use a list comprehension to generate a list of shifted arrays and then do stack+reshape:
n, m = a.shape
p = 3
np.dstack([a[i:(n-p+i+1)] for i in range(p)]).reshape(n-p+1, -1)
#array([[ 1, 4, 7, 2, 5, 8, 3, 6, 9],
# [ 4, 7, 10, 5, 8, 11, 6, 9, 12],
# [ 7, 10, 13, 8, 11, 14, 9, 12, 15]])

Related

How to find an array with the min values of multiple arrays with the same length

I have a multidimensional gridded array with dimensions of (29,320,180), where 29 is the number of array, 320 is the latidutal value and 180 is the longitudal value. I want to find the min value at every grid point out of all 29 arrays, so finally i can have an array with dimensions of 320x180 consisting of the minimum value at each grid point. I have to undermine that every array has a large number of nan values. How can i achieve that?
For example two arrays with same dimensions:
a=[[1,2,3],[3,5,8],[4,8,12]]
b=[[3,5,6],[9,12,5],[5,6,14]]
and the wanted output will be an array with the min value at each index, meaning:
c=[[1,2,3],[3,5,5],[4,6,12]]
I wasn't sure if you needed the minimum of each array in terms of columns or rows, you can choose which one you want with the example below.
Let's create an example of several small 2D arrays:
import numpy as np
ex_dict = {}
lat_min = []
lon_min = []
# creating fake data assuming instead of the 29 arrays of dimensions 320x180 you have 5 arrays of dimensions 2x5 (so we can see the output) and all the arrays are stored in a dictionnary (because it's easier for me to randomly create them that way :)
for i in range(0,5):
ex_dict[i] = np.stack([np.random.choice(range(i,20), 5, replace=False) for _ in range(2)])
Let's look at our arrays:
ex_dict
{0: array([[19, 18, 5, 13, 6],
[ 5, 12, 3, 8, 0]]),
1: array([[10, 13, 2, 19, 15],
[ 5, 19, 6, 8, 14]]),
2: array([[ 5, 17, 10, 11, 7],
[19, 2, 11, 5, 6]]),
3: array([[14, 3, 17, 4, 11],
[18, 10, 8, 3, 7]]),
4: array([[15, 8, 18, 14, 10],
[ 5, 19, 12, 16, 13]])}
Then let's create a list to store the minimum values for each array (lat_min contains the minimum for each raw and lat_lon for each column through all the arrays):
# for each of the 5 arrays (in this example, stored in the ex_dict dictionnary), find the minimum in each row (axis = 1) and each column (axis = 2)
for i in ex_dict:
lat_min.append(np.nanmin(ex_dict[i], axis=1))
lon_min.append(np.nanmin(ex_dict[i], axis=0))
Our lists with minimum values:
lat_min
[array([5, 0]), array([2, 5]), array([5, 2]), array([3, 3]), array([8, 5])]
lon_min
[array([ 5, 12, 3, 8, 0]),
array([ 5, 13, 2, 8, 14]),
array([ 5, 2, 10, 5, 6]),
array([14, 3, 8, 3, 7]),
array([ 5, 8, 12, 14, 10])]

flattening of a numpy array along columns, in the order: lower triangle, diagonal, upper triangle

Here is a problem I'm trying to solve. Let's say we've a square array:
In [10]: arr
Out[10]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
What I'd like to have is to flatten this array in a specific order: first I want to flatten the lower triangle along axis-0 and then pick the diagonal, and finally flatten the upper triangle again along axis-0, which would finally give the flattened array as:
# | lower triangle |diag.elements| upper triangle |
res = np.array([5, 9, 13, 10, 14, 15, 1, 6, 11, 16, 2, 3, 7, 4, 8, 12])
Here is my partial solution so far, which doesn't give desired result yet.
In [16]: arr[np.tril(arr, k=-1) != 0]
Out[16]: array([ 5, 9, 10, 13, 14, 15]) # not correct!
In [17]: np.diag(arr)
Out[17]: array([ 1, 6, 11, 16])
In [18]: arr[np.triu(arr, k=1) != 0]
Out[18]: array([ 2, 3, 4, 7, 8, 12]) # not correct!
Finally, to concatenate these 3 intermediate results. How to correctly index to obtain desired result? Alternatively, are there other ways of solving this problem?
Here's one based on masking and concatenating/stacking -
In [50]: r = np.arange(len(arr))
In [51]: mask = r[:,None]<r
In [54]: np.concatenate((arr.T[mask],np.diag(arr),arr.T[mask.T]))
Out[54]: array([ 5, 9, 13, 10, 14, 15, 1, 6, 11, 16, 2, 3, 7, 4, 8, 12])
Another based solely on masking -
n = len(arr)
r = np.arange(n)
mask = r[:,None]<r
diag_mask = r[:,None]==r
comp_mask = np.vstack((mask[None],diag_mask[None],mask.T[None]))
out = np.broadcast_to(arr.T,(3,n,n))[comp_mask]
Use the transpose:
lower = np.tril(a, -1).T.ravel()
diag = np.diag(a)
upper = np.triu(a, 1).T.ravel()
result = np.concatenate([lower[lower != 0], diag, upper[upper != 0]])
print(result)
Output:
[ 5 9 13 10 14 15 1 6 11 16 2 3 7 4 8 12]
I am using index to select (numpy broadcast)
ary=ary.T
i,c=ary.shape
x=np.arange(i)
y=np.arange(c)
np.concatenate([ary[x[:,None]<y],ary[x[:,None]==y],ary[x[:,None]>y]])
Out[1065]: array([ 5, 9, 13, 10, 14, 15, 1, 6, 11, 16, 2, 3, 7, 4, 8, 12])

how can I reshape a numpy array of (100,) to (250,100)

Imagine that you have created an array with 100 dimensions and then you calculate something and fill this array. for whatever reason, you have not created 2d array, what is wrong with this question that you want to assign another dimension to this data, with this justification that for example 250 samples should have this calculated data?!!
I have searched this but I could not find any solution. Maybe I am not searching with correct keyword!
Actually I want to reshape a numpy array of (100,) to (250,100).
I have read this link and a couple of other links but did not help me.
I have also tried this way:
numpyarray = (100,)
transformed_numpyarray = np.reshape(numpyarray,(100,-1)).T
which gives me this output:
(1, 100)
but I really do not want 1 as the first item of 2d array.
what Im trying to do is to either convert to (,100) or at least something like this (250,100). "250" is a constant number I know already so I want to say for example for 250 samples with 100 dimension.
Thanks.
I'm still confused about what you are trying to do. So far I can picture two alternatives - reshape and repeat. To illustrate:
In [148]: x = np.arange(16)
In [149]: x
Out[149]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
In [150]: x.reshape(4,4)
Out[150]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [151]: np.repeat(x[None,:], 4, axis=0)
Out[151]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]])
numpy's arrays are static sized, you can't have an array with a variable shape. If you don't know beforehand how many samples you will have you can gradually add them with vstack:
In [4]: numpyarray.shape
Out[4]: (3, 4)
In [5]: new_sample.shape
Out[5]: (4,)
In [6]: numpyarray = np.vstack([numpyarray, new_sample])
In [7]: numpyarray.shape
Out[7]: (4, 4)
you can also first define the size by creating an array full of zeros and then progressively fill it with samples.
numpyarray = np.zeros((250,100))
...
numpyarray[i] = new_sample

Stack slices of numpy array from given indices

I'm struggling to perform the below operation on a numpy vector.
I want to take previous_n samples from vector finishing at indices.
It's like I want to perform a np.take with slicing of the previous_n samples.
Example:
import numpy as np
vector = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
# number of previous samples
previous_n = 3
indices = np.array([ 5, 7, 12])
result
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])
Ok, this seems to do what I want. Found here
def stack_slices(arr, previous_n, indices):
all_idx = indices[:, None] + np.arange(previous_n) - (previous_n - 1)
return arr[all_idx]
>>> stack_slices(vector, 3, indices)
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])

sum groups rows of numpy matrix using list of lists of indices

slice numpy array using lists of indices and apply function, is it possible to vectorize (or nonvectorized way to do this)? vectorized would be ideal for large matrices
import numpy as np
index = [[1,3], [2,4,5]]
a = np.array(
[[ 3, 4, 6, 3],
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[1, 1, 4, 5]])
summing by the groups of row indices in index, giving:
np.array([[8, 10, 12, 14],
[17, 19, 24, 37]])
Approach #1 : Here's an almost* vectorized approach -
def sumrowsby_index(a, index):
index_arr = np.concatenate(index)
lens = np.array([len(i) for i in index])
cut_idx = np.concatenate(([0], lens[:-1].cumsum() ))
return np.add.reduceat(a[index_arr], cut_idx)
*Almost because of the step that computes lens with a loop-comprehension, but since we are simply getting the lengths and no computation is involved there, that step won't sway the timings in any big way.
Sample run -
In [716]: a
Out[716]:
array([[ 3, 4, 6, 3],
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[ 1, 1, 4, 5]])
In [717]: index
Out[717]: [[1, 3], [2, 4, 5]]
In [718]: sumrowsby_index(a, index)
Out[718]:
array([[ 8, 10, 12, 14],
[17, 19, 24, 27]])
Approach #2 : We could leverage fast matrix-multiplication with numpy.dot to perform those sum-reductions, giving us another method as listed below -
def sumrowsby_index_v2(a, index):
lens = np.array([len(i) for i in index])
id_ar = np.zeros((len(lens), a.shape[0]))
c = np.concatenate(index)
r = np.repeat(np.arange(len(index)), lens)
id_ar[r,c] = 1
return id_ar.dot(a)
Using a list comprehension...
For each index list in index, create a new list which is a list of the rows in a of those indexes. From here, we have a list of numpy arrays which we can apply the sum() method to. On a numpy array, sum() will return a new array of each element from the arrays added which will give you what you want:
np.array([sum([a[r] for r in i]) for i in index])
giving:
array([[ 8, 10, 12, 14],
[17, 19, 24, 27]])

Categories

Resources