Pythonic method for stacking np.array's of different row length - python

Assume I have following multiple numpy np.array with different number of rows but same number of columns:
a=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
b=np.array([[1, 2, 3],
[4, 5, 6]])
I want to combine them to have following:
result=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[1, 2, 3],
[4, 5, 6]])
Here's what I do using for loop but I don't like it. Is there a pythonic way to do this?
c=[a,b]
num_row=sum([x.shape[0] for x in c])
num_col=a.shape[1] # or b.shape[1]
result=np.zeros((num_row,num_col))
k=0
for s in c:
for i in s:
reult[k]=i
k+=1
result=
array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[1, 2, 3],
[4, 5, 6]])

Use numpy.concatenate(), this is its exact purpose.
import numpy as np
a=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
b=np.array([[1, 2, 3],
[4, 5, 6]])
result = np.concatenate((a, b), axis=0)
In my opinion, the most "Pythonic" way is to use a builtin or package rather than writing a bunch of code. Writing everything from scratch is for C developers.

Related

python pandas exploding multiple varying length dataframe columns containing arrays

I have a dataframe that contains 1 single array per column that i need to explode to multiple rows per column. the arrays are nested (twice) and varying length (6-12 arrays contained in each)
I'm trying to either;
explode the top level of each nested array or
explode everything and have a multi-index
example data:
I have tried varying methods found here on SO using the built in explode function, haven't been able to produce anything that
My example df:
df = pd.DataFrame({"k_6_cluster":[[[1,2,3],[4,5,6],[7,8,9]],[[1,2,3],[4,5,6],[7,8,9]]],"k_7_cluster":[[[10,20,30],[40,50,60],[70,80,90]],[[10,20,30],[40,50,60],[70,80,90]]]
print(df)
k_6_cluster k_7_cluster
0 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
1 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
Following lines of code will explode the top level of each nested array
list_cols = df.columns
exploded = [df[col].explode() for col in list_cols]
out_df = pd.DataFrame(dict(zip(list_cols, exploded)))
print(out_df)
k_6_cluster k_7_cluster
0 [1, 2, 3] [10, 20, 30]
0 [4, 5, 6] [40, 50, 60]
0 [7, 8, 9] [70, 80, 90]
1 [1, 2, 3] [10, 20, 30]
1 [4, 5, 6] [40, 50, 60]
1 [7, 8, 9] [70, 80, 90]

Python numpy function for matrix math

I have to np arrays
a = np.array[[1,2]
[2,3]
[3,4]
[5,6]]
b = np.array [[2,4]
[6,8]
[10,11]
I want to multiple each row of a against each element in array b so that array c is created with dimensions of a-rows x b rows (as columns)
c = np.array[[2,8],[6,16],[10,22]
[4,12],[12,21],[20,33]
....]
There are other options for doing this, but I would really like to leverage the speed of numpy's ufuncs...if possible.
any and all help is appreciated.
Does this do what you want?
>>> a
array([[1, 2],
[2, 3],
[3, 4],
[5, 6]])
>>> b
array([[ 2, 4],
[ 6, 8],
[10, 11]])
>>> a[:,None,:]*b
array([[[ 2, 8],
[ 6, 16],
[10, 22]],
[[ 4, 12],
[12, 24],
[20, 33]],
[[ 6, 16],
[18, 32],
[30, 44]],
[[10, 24],
[30, 48],
[50, 66]]])
>>> _.shape
(4, 3, 2)
Or if that doesn't have the right shape, you can reshape it:
>>> (a[:,None,:]*b).reshape((a.shape[0]*b.shape[0], 2))
array([[ 2, 8],
[ 6, 16],
[10, 22],
[ 4, 12],
[12, 24],
[20, 33],
[ 6, 16],
[18, 32],
[30, 44],
[10, 24],
[30, 48],
[50, 66]])

Replace python list comprehension generating 3D array with numpy functions

I have two matrices (numpy arrays), mu and nu. From these I would like to create a third array as follows:
new_array_{j, k, l} = mu_{l, k} nu_{j, k}
I can do it naively using list comprehensions:
[[[mu[l, k] * nu[j, k] for k in np.arange(N)] for l in np.arange(N)] for j in np.arange(N)]
but it quickly becomes slow.
How can I create new_array using numpy functions which should be faster?
Two quick solutions (without my usual proofs and explanations):
res = np.einsum('lk,jk->jkl', mu, nu)
res = mu.T[None,:,:] * nu[:,:,None] # axes in same order as result
#!/usr/bin/env python
import numpy as np
# example data
mu = np.arange(10).reshape(2,5)
nu = np.arange(15).reshape(3,5) + 20
# get array sizes
nl, nk = mu.shape
nj, nk_ = nu.shape
assert(nk == nk_)
# get arrays with dimensions (nj, nk, nl)
# in the case of mu3d, we need to add a slowest varying dimension
# so (after transposing) this can be done by cycling through the data
# nj times along the slowest existing axis and then reshaping
mu3d = np.concatenate((mu.transpose(),) * nj).reshape(nj, nk, nl)
# in the case of nu3d, we need to add a new fastest varying dimension
# so this can be done by repeating each element nl times, and again it
# needs reshaping
nu3d = nu.repeat(nl).reshape(nj, nk, nl)
# now just multiple element by element
new_array = mu3d * nu3d
print(new_array)
Gives:
>>> mu
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> nu
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> nj, nk, nl
(3, 5, 2)
>>> mu3d
array([[[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]],
[[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]],
[[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]]])
>>> nu3d
array([[[20, 20],
[21, 21],
[22, 22],
[23, 23],
[24, 24]],
[[25, 25],
[26, 26],
[27, 27],
[28, 28],
[29, 29]],
[[30, 30],
[31, 31],
[32, 32],
[33, 33],
[34, 34]]])
>>> new_array
array([[[ 0, 100],
[ 21, 126],
[ 44, 154],
[ 69, 184],
[ 96, 216]],
[[ 0, 125],
[ 26, 156],
[ 54, 189],
[ 84, 224],
[116, 261]],
[[ 0, 150],
[ 31, 186],
[ 64, 224],
[ 99, 264],
[136, 306]]])

Numpy: Combine list of arrays by another array (np.choose alternative)

I have a list of numpy arrays, each of the same shape. Let's say:
a = [np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]),
np.array([[11, 12, 13],
[14, 15, 16],
[17, 18, 19]]),
np.array([[99, 98, 97],
[96, 95, 94],
[93, 92, 91]])]
And I have another array of the same shape that gives the list indices I want to take the elements from:
b = np.array([[0, 0, 1],
[2, 1, 0],
[2, 1, 2]])
What I want to get is the following:
np.array([[1, 2, 13],
[96, 15, 6],
[93, 18, 91]])
There was a simple solution that worked fine:
np.choose(b, a)
But this is limited to 32 arrays at most. But in my case, I have to combine more arrays (more than 100). So I need another way to do so.
I guess, it has to be something about advances indexing or maybe the np.take method. So probably, the first step is a = np.array(a) and then something like a[np.arange(a.shape[0]), b]. But I do not get it working.
Can somebody help? :)
You can try using np.ogrid. Based on this answer. Of course you will have to convert a to a NumPy array first
i, j = np.ogrid[0:3, 0:3]
print (a[b, i, j])
# array([[ 1, 2, 13],
# [96, 15, 6],
# [93, 18, 91]])
In [129]: a = [np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]]),
...: np.array([[11, 12, 13],
...: [14, 15, 16],
...: [17, 18, 19]]),
...: np.array([[99, 98, 97],
...: [96, 95, 94],
...: [93, 92, 91]])]
In [130]: b = np.array([[0, 0, 1],
...: [2, 1, 0],
...: [2, 1, 2]])
In [131]:
In [131]: A = np.array(a)
In [132]: A.shape
Out[132]: (3, 3, 3)
You want to use b to index the first dimension. For the other dimensions you need a indices that broadcast with b, i.e. a column vector and a row vector:
In [133]: A[b, np.arange(3)[:,None], np.arange(3)]
Out[133]:
array([[ 1, 2, 13],
[96, 15, 6],
[93, 18, 91]])
there are various convenience functions for creating these arrays, e.g.
In [134]: np.ix_(range(3),range(3))
Out[134]:
(array([[0],
[1],
[2]]), array([[0, 1, 2]]))
and ogrid as mentioned in the other answer.
Here's a relatively new function that also does the job:
In [138]: np.take_along_axis(A, b[None,:,:], axis=0)
Out[138]:
array([[[ 1, 2, 13],
[96, 15, 6],
[93, 18, 91]]])
I had to think a bit before I got the adjustment to b right.

Understanding ndarray shapes

I'm new to numpy and am having trouble understanding how shapes of arrays are decided.
An array of the form
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4,3]]
has a shape of (2,) while one of the form
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4]]
has a shape of (2,3). Moreover,
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [[1,2,4], [3,4,2]]]
has a shape of (2,) but adding another vector as
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [[1,2,4], [3,4,2], [1,2,4]]]
changes the shape to (2,3,3). Intuitively, I feel that all the arrays should be 3 - dimensional. Could anyone help me understand what's happening exactly?
The underlying idea is that np.array tries to create as high a dimensional array as it can. When the sublists have matching numbers of elements the result is easy to see. When they mix lists of differing lengths the result can be confusing.
In your first case you have 2 sublists, one of length 3, the other length 4. So it makes a 2 element object array, and doesn't try to parse the sublists of the 1st sublist
In [1]: arr = np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4,3]])
In [2]: arr
Out[2]: array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]],
[1, 2, 4, 3]
], dtype=object) # adjusted format
In [3]: arr.dtype
Out[3]: dtype('O')
In [4]: arr.shape
Out[4]: (2,)
In [5]: arr[0]
Out[5]: [[5, 10, 15], [20, 25, 30], [35, 40, 45]] # 3 element list of lists
In [6]: arr[1]
Out[6]: [1, 2, 4, 3] # 4 element list of numbers
In the 2nd case you have two sublists, both of length 3. So it makes a 2x3 array. But one sublist contains lists, the other numbers - so the result is again object array:
In [7]: arr = np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4]] )
In [8]: arr
Out[8]:
array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]],
[1, 2, 4]
], dtype=object)
In [9]: arr.shape
Out[9]: (2, 3)
In [10]: arr[0,0]
Out[10]: [5, 10, 15]
Finally, 2 lists, each with 3 elements, each of which is also 3 element lists - a 3d array.
In [11]: arr = np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [[1,2,4], [3,4,2], [1,2,4]]] )
In [12]: arr
Out[12]:
array([[[ 5, 10, 15],
[20, 25, 30],
[35, 40, 45]],
[[ 1, 2, 4],
[ 3, 4, 2],
[ 1, 2, 4]]])
In [13]: arr.shape
Out[13]: (2, 3, 3)
There are also mixes of sublist lengths that can raise an error.
In general don't mix sublists of differing size and content type casually. np.array behaves most predictably when given lists that will produce a nice multidimensional array. Mixing list lengths leads to confusion.
Updated numpy:
In [1]: np.__version__
Out[1]: '1.13.1'
In [2]: np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4,3]])
Out[2]: array([list([[5, 10, 15], [20, 25, 30], [35, 40, 45]]), list([1, 2, 4, 3])], dtype=object)
In [3]: np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4]] )
Out[3]:
array([[list([5, 10, 15]), list([20, 25, 30]), list([35, 40, 45])],
[1, 2, 4]], dtype=object)
It now identifies the list elements
This last example is still (2,3) object array. As such each of those 6 elements could be a different Python type, e.g.:
In [11]: np.array([[[5, 10, 15], np.array([20, 25, 30]), (35, 40, 45)], [None,2,'astr']] )
Out[11]:
array([[list([5, 10, 15]), array([20, 25, 30]), (35, 40, 45)],
[None, 2, 'astr']], dtype=object)
In [12]: [type(x) for x in _.flat]
Out[12]: [list, numpy.ndarray, tuple, NoneType, int, str]

Categories

Resources