Reshape a six-dimensional array into a 2d dataframe - python

I have a six-dimensional numeric array A, and I want to reshape it into a two-dimensional array. The rows of the resulting matrix should be multi-indexed by the first three dimensions of A, and the columns should be multi-indexed by the last three dimensions of A. What is the best way to achieve this using pandas or numpy?

Here is a handy function to do just this.
def make2d(a):
shape = a.shape
n = len(shape)
col_lvls = n // 2
idx_lvls = n - col_lvls
midx = pd.MultiIndex.from_product(
[range(i) for i in shape[:idx_lvls]],
names=['d-{}'.format(d) for d in range(1, idx_lvls + 1)])
mcol = pd.MultiIndex.from_product(
[range(i) for i in shape[idx_lvls:]],
names=['d-{}'.format(d) for d in range(idx_lvls + 1, idx_lvls + col_lvls + 1)])
return pd.DataFrame(
a.reshape(np.array(shape[:3]).prod(), -1),
midx, mcol
)
demonstration
a = np.arange(216).reshape(2, 3, 2, 3, 2, 3)
make2d(a)

Related

Adding a multidimensional numpy array into one array

I have a multidimensional numpy array that has the shape (5, 6192, 1) so essentially 5 arrays of length 6192 into one array.
How could I add the elements of all the arrays into one array of length 6192 in the following way.
For example if the 5 arrays look like
ar1 = [1,2,3...]
ar2 = [1,2,3...]
ar3 = [1,2,3...]
ar4 = [1,2,3...]
ar5 = [1,2,3...]
I want my final array to look like:
ar = [5,10,15,...]
So for each inner array, add the values of each same position into a new value for the final array that is the sum of all the values in this position.
The shape should be, I guess shape(1,6192,1).
IIUC, simply use numpy.sum:
ar1 = [1,2,3]
ar2 = [1,2,3]
ar3 = [1,2,3]
ar4 = [1,2,3]
ar5 = [1,2,3]
arrays = [ar1, ar2, ar3, ar4, ar5]
ar = np.sum(arrays, axis=0)
output: array([ 5, 10, 15])
If really the shapes you describe are correct:
arr = np.array(arrays).reshape((5, 3, 1))
print(arr.shape)
# (5, 3, 1)
ar = np.sum(arr, axis=0)[None,:]
print(ar.shape)
# (1, 3, 1)

NumPy: Concatenating 1D array to 3D array

Suppose I have a 5x10x3 array, which I interpret as 5 'sub-arrays', each consisting of 10 rows and 3 columns. I also have a seperate 1D array of length 5, which I call b.
I am trying to insert a new column into each sub-array, where the column inserted into the ith (i=0,1,2,3,4) sub-array is a 10x1 vector where each element is equal to b[i].
For example:
import numpy as np
np.random.seed(777)
A = np.random.rand(5,10,3)
b = np.array([2,4,6,8,10])
A[0] should look like:
A[1] should look like:
And similarly for the other 'sub-arrays'.
(Notice b[0]=2 and b[1]=4)
What about this?
# Make an array B with the same dimensions than A
B = np.tile(b, (1, 10, 1)).transpose(2, 1, 0) # shape: (5, 10, 1)
# Concatenate both
np.concatenate([A, B], axis=-1) # shape: (5, 10, 4)
One method would be np.pad:
np.pad(A, ((0,0),(0,0),(0,1)), 'constant', constant_values=[[[],[]],[[],[]],[[],b[:, None,None]]])
# array([[[9.36513084e-01, 5.33199169e-01, 1.66763960e-02, 2.00000000e+00],
# [9.79060284e-02, 2.17614285e-02, 4.72452812e-01, 2.00000000e+00],
# etc.
Or (more typing but probably faster):
i,j,k = A.shape
res = np.empty((i,j,k+1), np.result_type(A, b))
res[...,:-1] = A
res[...,-1] = b[:, None]
Or dstack after broadcast_to:
np.dstack([A,np.broadcast_to(b[:,None],A.shape[:2])]

Alter a 3D ndarray at the positions represented by a 2d ndarray

This is my first nontrivial use of numpy, and I'm having some trouble in one spot.
So, I have colors, a (xsize + 2, ysize + 2, 3) ndarray, and newlife, a (xsize + 2, ysize + 2) ndarray of booleans. I want to add a random value between -5 and 5 to all three values in colors at all positions where newlife is true. In other words newlife maps 2D vectors to whether or not I want to add a random value to the color in colors at that position.
I've tried a million variations on this:
colors[np.nonzero(newlife)] += (np.random.random_sample((xsize + 2,ysize + 2, 3)) * 10 - 5)
but I keep getting stuff like
ValueError: operands could not be broadcast together with shapes (589,3) (130,42,3) (589,3)
How do I do this?
I think this does what you want:
# example data
colors = np.random.randint(0, 100, (5,4,3))
newlife = np.random.randint(0, 2, (5,4), bool)
# create values to add, then mask with newlife
to_add = np.random.randint(-5,6, (5,4,3))
to_add[~newlife] = 0
# modify in place
colors += to_add
This changes the colors in-place assuming uint8 dtype. Both assumptions are not essential:
import numpy as np
n_x, n_y = 2, 2
colors = np.random.randint(5, 251, (n_x+2, n_y+2, 3), dtype=np.uint8)
mask = np.random.randint(0, 2, (n_x+2, n_y+2), dtype=bool)
n_change = np.count_nonzero(mask)
print(colors)
print(mask)
colors[mask] += np.random.randint(-5, 6, (n_change, 3), dtype=np.int8).view(np.uint8)
print(colors)
The easiest way of understanding this is to look at the shape of colors[mask].

Creating matrix by multiply columns

I want to create matrix as it is shown at the picture:
Creating new matrices by multiplying columns of matrix in elementwise
Is it possible to create it without using 3 for loop?
It is a bit hard to verify without actual input and desired output data, but you can use NumPy reshaping and broadcasting to do the operation without any for loops:
a = numpy.arange(3 * 6).reshape(3, 6)
b = numpy.arange(3 * 3).reshape(3, 3)
c = numpy.arange(3 * 2).reshape(3, 2)
x = a.reshape(3, 3, 2).transpose(1, 0, 2) * b[..., None]
y = a.reshape(3, 3, 2).transpose(0, 2, 1) * c[..., None]

Compute weighted sums on rolling window with pandas dataframes of different length

I have a large dataframe > 5000000 rows that I am performing a rolling calculation on.
df = pd.DataFrame(np.randn(10000,1), columns = ['rand'])
sum_abs = df.rolling(5).sum()
I would like to do the same calculations but add in a weighted sum.
df2 = pd.DataFrame(pd.Series([1,2,3,4,5]), name ='weight'))
df3 = df.mul(df2.set_index(df.index)).rolling(5).sum()
However, I am getting a Length Mismatch expected axis has 5 elements error.
I know I could do something like [a *b for a, b in zip(L, weight)] if I converted everything to a list but I would like to keep it in a dataframe if possible. Is there a way to multiply against different size frames or do I need to repeat the set of numbers the length of the dataset I'm multiplying against?
Easy way to do this is
w = np.arange(1, 6)
df.rolling(5).apply(lambda x: (x * w).sum())
A less easy way using strides
from numpy.lib.stride_tricks import as_strided as strided
v = df.values
n, m = v.shape
s1, s2 = v.strides
k = 5
w = np.arange(1, 6).reshape(1, 1, k)
pd.DataFrame(
(strided(v, (n - k + 1, m, k), (s1, s2, s1)) * w).sum(-1),
df.index[k - 1:], df.columns)
naive time test

Categories

Resources