So numpy has some convenience functions for combining several arrays into one, e.g. hstack and vstack. I'm wondering if there's something similar but for stacking the component arrays diagonally?
Say I have N arrays of shape (n_i, m_i), and I want to combine them into a single array of size (sum_{1,N}n_i, sum_{1,N}m_i) such that the component arrays form blocks on the diagonal of the result array.
And yes, I know how to solve it manually, e.g. with the approach described in How to "embed" a small numpy array into a predefined block of a large numpy array? . Just wondering if there's an easier way.
Ah, How can I transform blocks into a blockdiagonal matrix (NumPy) mentions that scipy.linalg.block_diag() is the solution, except that the version of scipy installed on my workstation is so old it doesn't have it. Any other ideas?
It does seem block_diag does exactly what you want. So if for some reason you can't update scipy, then here is the source from v0.8.0 if you wish to simply define it!
import numpy as np
def block_diag(*arrs):
"""Create a block diagonal matrix from the provided arrays.
Given the inputs `A`, `B` and `C`, the output will have these
arrays arranged on the diagonal::
[[A, 0, 0],
[0, B, 0],
[0, 0, C]]
If all the input arrays are square, the output is known as a
block diagonal matrix.
Parameters
----------
A, B, C, ... : array-like, up to 2D
Input arrays. A 1D array or array-like sequence with length n is
treated as a 2D array with shape (1,n).
Returns
-------
D : ndarray
Array with `A`, `B`, `C`, ... on the diagonal. `D` has the
same dtype as `A`.
References
----------
.. [1] Wikipedia, "Block matrix",
http://en.wikipedia.org/wiki/Block_diagonal_matrix
Examples
--------
>>> A = [[1, 0],
... [0, 1]]
>>> B = [[3, 4, 5],
... [6, 7, 8]]
>>> C = [[7]]
>>> print(block_diag(A, B, C))
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 3 4 5 0]
[0 0 6 7 8 0]
[0 0 0 0 0 7]]
>>> block_diag(1.0, [2, 3], [[4, 5], [6, 7]])
array([[ 1., 0., 0., 0., 0.],
[ 0., 2., 3., 0., 0.],
[ 0., 0., 0., 4., 5.],
[ 0., 0., 0., 6., 7.]])
"""
if arrs == ():
arrs = ([],)
arrs = [np.atleast_2d(a) for a in arrs]
bad_args = [k for k in range(len(arrs)) if arrs[k].ndim > 2]
if bad_args:
raise ValueError("arguments in the following positions have dimension "
"greater than 2: %s" % bad_args)
shapes = np.array([a.shape for a in arrs])
out = np.zeros(np.sum(shapes, axis=0), dtype=arrs[0].dtype)
r, c = 0, 0
for i, (rr, cc) in enumerate(shapes):
out[r:r + rr, c:c + cc] = arrs[i]
r += rr
c += cc
return out
Related
I have a 1D array of Boolean "True" counts that I want to map to a 2D array.
#Array of boolean True counts
b = [1,3,2,5]
#want this 2D array:
[1,1,1,1]
[0,1,1,1]
[0,1,0,1]
[0,0,0,1]
[0,0,0,1]
The faster the implementation (NumPy/SciPy) the better.
Thank you
Pure numpy method, using np.tri and advanced indexing:
b = np.array([1,3,2,5])
k = b.max()
np.tri(k+1,k,-1,dtype=int)[b].T
# array([[1, 1, 1, 1],
# [0, 1, 1, 1],
# [0, 1, 0, 1],
# [0, 0, 0, 1],
# [0, 0, 0, 1]])
UPDATE:
Two solns that should work better if k >> len(b). m5 and m6 in the benchmarks.
Benchmark code borrowed and extended from #Ehsan, 2nd condition. Changes: Added m5,m6. Reduced highest test size from 1000 to 200. Changed output dtype from int to int8.
Interesting observation; my original solution m2 performs significantly worse on my (low RAM) computer than on #Ehsan's.
Code (new functions only):
##Paul's solution 2
def m5(b):
k = b.max()
n = b.size
return (np.arange(1,2*n+1,dtype=np.int8)&1).repeat(np.ravel([b,k-b],order="F")).reshape(k,n,order="F")
##Paul's solution 3
def m6(b):
k = b.max()
mytri = np.array([1,0],dtype=np.int8).repeat(k)
mytri = np.lib.stride_tricks.as_strided(mytri[k:],(k,k+1),
(mytri.strides[0],-mytri.strides[0]))
return mytri[:,b]
Try:
pd.DataFrame([[1]*x for x in [1,3,2,5]]).T.fillna(0).values
output:
array([[1., 1., 1., 1.],
[0., 1., 1., 1.],
[0., 1., 0., 1.],
[0., 0., 0., 1.],
[0., 0., 0., 1.]])
You can create array of zeroes of shape required:
arr = np.zeros((np.max(b), len(b)))
Then you can create a temporary array x = np.indices(arr.shape)[0] which is:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])
And pad arr with ones like so:
arr[np.where(x<b)] = 1
Numpy approach without the need to create tri in case b.max() is large:
b = np.array([1,3,2,5])
r, c = b.size, b.max()
a = np.zeros((c,r), dtype=int)
a[np.arange(c)[:,None]<b] = 1
output:
[[1 1 1 1]
[0 1 1 1]
[0 1 0 1]
[0 0 0 1]
[0 0 0 1]]
Comparison using benchit:
##Ehsan's solution
def m1(b):
r, c = b.size, b.max()
a = np.zeros((c,r), dtype=int)
a[np.arange(c)[:,None]<b] = 1
return a
##Paul's solution
def m2(b):
k = b.max()
return np.tri(k+1,k,-1,dtype=int)[b].T
##Binyamin's solution
def m3(b):
return pd.DataFrame([[1]*x for x in b]).T.fillna(0).values
##mathfux's solution
def m4(b):
arr = np.zeros((np.max(b), len(b)), dtype=int)
x = np.indices(arr.shape)[0]
arr[np.where(x<b)] = 1
return arr
For different inputs:
in_ = [np.random.randint(100, size=n) for n in [10,100,1000,10000]]
in_ = [np.random.randint(n, size=n) for n in [10,100,1000,10000]]
So what you pick depends on your b.max() value vs. b.size. For larger b.max() values (compared to b.size), m1 is faster and for smaller b.max() (compared to b.size), m2 seems to be faster.
UPDATE: Adding a new solution and comparison with #Paul's new solutions:
##Ehsan's solution 2
def m7(b):
return np.less.outer(np.arange(b.max()),b)+0
Or almost equally:
def m8(b):
return (np.arange(b.max())<b[:,None]).T+0
comparison:
in_ = [np.random.randint(10, size=n) for n in [10,100,1000]]
in_ = [np.random.randint(10000, size=n) for n in [10,100,1000,10000]]
including m8:
in_ = [np.random.randint(10000, size=n) for n in [10,100,1000]]
Given a 2D array, I'm looking for a pythonic way to get an array of same shape, with only the maximum element per each row.
See max_row_filter function below
def max_row_filter(mat2d):
m = np.zeros(mat2d.shape)
for r in range(mat2d.shape[0]):
c = np.argmax(mat2d[r])
m[r,c]=mat2d[r,c]
return m
p = np.array([[1,2,3],[5,4,3,],[9,10,3]])
max_row_filter(p)
Out: array([[ 0., 0., 3.],
[ 5., 0., 0.],
[ 0., 10., 0.]])
I'm looking for an efficient way to do this, suitable to be done on big arrays.
Alternative answer (this will keep duplicates):
p * (p==p.max(axis=1, keepdims=True))
If there are no duplicates, you could use numpy.argmax:
import numpy as np
p = np.array([[1, 2, 3],
[5, 4, 3, ],
[9, 10, 3]])
result = np.zeros_like(p)
rows, cols = zip(*enumerate(np.argmax(p, axis=1)))
result[rows, cols] = p[rows, cols]
print(result)
Output
[[ 0 0 3]
[ 5 0 0]
[ 0 10 0]]
Note that, for multiple occurrences argmax return the first occurence.
I'm porting a MATLAB code in Python with numpy and scipy and I need to use numpy/scipy equivalent of the sparse function in MATLAB.
Here's the usage of the sparse function in MATLAB,
sparse([3; 2], [2; 4], [3; 0])
gives:
Trial>> m = sparse([3; 2], [2; 4], [3; 0])
m =
(3,2) 3
Trial>> full(m)
ans =
0 0 0 0
0 0 0 0
0 3 0 0
I have these, but they don't give what MATLAB version does,
sps.csr_matrix([3, 2], [2, 4], [3, 0])
sps.csr_matrix(np.array([[3], [2]]), np.array([[2], [4]]), np.array([[3], [0]]))
sps.csr_matrix([[3], [2]], [[2], [4]], [[3], [0]])
Any ideas?
Thanks.
You're using the sparse(I, J, SV) form [note: link goes to documentation for GNU Octave, not Matlab]. The scipy.sparse equivalent is csr_matrix((SV, (I, J))) -- yes, a single argument which is a 2-tuple containing a vector and a 2-tuple of vectors. You also have to correct the index vectors because Python consistently uses 0-based indexing.
>>> m = sps.csr_matrix(([3,0], ([2,1], [1,3]))); m
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> m.todense()
matrix([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 3, 0, 0]], dtype=int64)
Note that scipy, unlike Matlab, does not automatically discard explicit zeroes, and will use integer storage for matrices containing only integers. To perfectly match the matrix you got in Matlab, you must explicitly ask for floating-point storage and you must call eliminate_zeros() on the result:
>>> m2 = sps.csr_matrix(([3,0], ([2,1], [1,3])), dtype=np.float)
>>> m2.eliminate_zeros()
>>> m2
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
>>> m2.todense()
matrix([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 3., 0., 0.]])
You could also change [3,0] to [3., 0.] but I recommend an explicit dtype= argument because that will prevent surprises when you are feeding in real data.
(I don't know what Matlab's internal sparse matrix representation is, but Octave appears to default to compressed sparse column representation. The difference between CSC and CSR should only affect performance. If your NumPy code winds up being slower than your Matlab code, try using sps.csc_matrix instead of csr_matrix, as well as all the usual NumPy performance tips.)
(You probably need to read NumPy for Matlab users if you haven't already.)
here a conversion I made. It is working for the 5 arguments version of sparse.
def sparse(i, j, v, m, n):
"""
Create and compressing a matrix that have many zeros
Parameters:
i: 1-D array representing the index 1 values
Size n1
j: 1-D array representing the index 2 values
Size n1
v: 1-D array representing the values
Size n1
m: integer representing x size of the matrix >= n1
n: integer representing y size of the matrix >= n1
Returns:
s: 2-D array
Matrix full of zeros excepting values v at indexes i, j
"""
return scipy.sparse.csr_matrix((v, (i, j)), shape=(m, n))
I try to multiply 2 matrix x,y with shape (41) and (41,6)
as it is supposed to broadcast the single matrix to every arrow in the multi-dimensions
I want to do it as :
x*y
but i get this error
ValueError: operands could not be broadcast together with shapes (41,6) (41,)
Is there anything I miss here to make that possible ?
Broadcasting involves 2 steps
give all arrays the same number of dimensions
expand the 1 dimensions to match the other arrays
With your inputs
(41,6) (41,)
one is 2d, the other 1d; broadcasting can change the 1d to (1, 41), but it does not automatically expand in the other direction (41,1).
(41,6) (1,41)
Neither (41,41) or (6,41) matches the other.
So you need to change your y to (41,1) or the x to (6,41)
x.T*y
x*y[:,None]
I'm assuming, of course, that you want element by element multiplication, not the np.dot matrix product.
You can try out this, it will works!
>>> import numpy as np
>>> x = np.array([[1, 2], [1, 2], [1, 2]])
>>> y = np.array([1, 2, 3])
>>> np.dot(y,x)
array([ 6, 12])
Not exactly sure, what you are trying to achieve. Maybe you could give an example of your input and your expected output. One possibility is:
import numpy as np
x = np.array([[1, 2], [1, 2], [1, 2]])
y = np.array([1, 2, 3])
res = x * np.transpose(np.array([y,]*2))
This will multiply each column of x with y, so the result of the above example is:
array([[1, 2],
[2, 4],
[3, 6]])
The multiplication of a ND array (say A) with a 1D one (B) is performed on the last axis by default, which means that the multiplication A * B is only valid if
A.shape[-1] == len(B)
A manipulation on A and B is needed to multiply A with B on another axis than -1:
Method 1: swapaxes
Swap the axes of A so that the axis to multiply with B appear on last postion
C = (A.swapaxes(axis, -1) * B).swapaxes(axis, -1)
example
A = np.arange(2 * 3 * 4).reshape((2, 3, 4))
B = np.array([0., 1., 2.])
print(A)
print(B)
pormpts :
(A)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
(B)
[0. 1. 2.]
A * B returns :
ValueError: operands could not be broadcast together with shapes (2,3,4) (3,)
now multiply A with B on axis 1
axis = 1
C = (A.swapaxes(axis, -1) * B).swapaxes(axis, -1)
returns C :
array([[[ 0., 0., 0., 0.],
[ 4., 5., 6., 7.],
[16., 18., 20., 22.]],
[[ 0., 0., 0., 0.],
[16., 17., 18., 19.],
[40., 42., 44., 46.]]])
Note that first raws of A have been multiplied by 0
last raws have been multiplied by 2
Method 2: reshape B
make B have the same number of dimensions than A, place the items of B on the dimension to be multiplied with A
A * B.reshape((1, len(B), 1))
or equivalently using the convenient 'numpy.newaxis' syntax :
A * B[np.newaxis, :, np.newaxis]
Depends on what you're expecting. One simple solution would be:
y*x
That should give you a matrix of dimensions (1,6).
If you wish to multiply X of dimension (n) to Y of dimension(n,m), you may consider the answers from this post
Tips can be found in the Wikipedia as well:
In Python with the numpy numerical library or the sympy symbolic library, multiplication of array objects as a1*a2 produces the Hadamard product, but with otherwise matrix objects m1*m2 will produce a matrix product.
Simply speaking, slice it up to arrays and perform x*y, or use other routes to fit the requirement.
So, if x has shape (41,6) and y (41,), I'd use np.expand_dims() to add an empty, second dimension (index 1) to y, i.e.,
x * np.expand_dims(y, 1)
This will automatically yield a result with shape (41,6).
Are there good ways to "expand" a numpy ndarray? Say I have an ndarray like this:
[[1 2]
[3 4]]
And I want each row to contains more elements by filling zeros:
[[1 2 0 0 0]
[3 4 0 0 0]]
I know there must be some brute-force ways to do so (say construct a bigger array with zeros then copy elements from old smaller arrays), just wondering are there pythonic ways to do so. Tried numpy.reshape but didn't work:
import numpy as np
a = np.array([[1, 2], [3, 4]])
np.reshape(a, (2, 5))
Numpy complains that: ValueError: total size of new array must be unchanged
You can use numpy.pad, as follows:
>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
Here np.pad says, "Take the array a and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant specified by constant_values".
There are the index tricks r_ and c_.
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
If this is performance critical code, you might prefer to use the equivalent np.concatenate rather than the index tricks.
>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
There are also np.resize and np.ndarray.resize, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.
By the way, when I've needed to do this I usually just do it the basic way you've already mentioned (create an array of zeros and assign the smaller array inside it), I don't see anything wrong with that!
Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.
A lot of functions are available for convenience (the np.concatenate function and its np.*stack shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_...), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate and others, I think), some are not.
Note that there's nothing at all with your initial suggestion of creating a large array 'by hand' (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.
A simple way:
# what you want to expand
x = np.ones((3, 3))
# expand to what shape
target = np.zeros((6, 6))
# do expand
target[:x.shape[0], :x.shape[1]] = x
# print target
array([[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
Functional way:
borrow from https://stackoverflow.com/a/35751427/1637673, with a little modification.
def pad(array, reference_shape, offsets=None):
"""
array: Array to be padded
reference_shape: tuple of size of narray to create
offsets: list of offsets (number of elements must be equal to the dimension of the array)
will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
"""
if not offsets:
offsets = np.zeros(array.ndim, dtype=np.int32)
# Create an array of zeros with the reference shape
result = np.zeros(reference_shape, dtype=np.float32)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = array
return result
You should use np.column_stack or append
import numpy as np
p = np.array([ [1,2] , [3,4] ])
p = np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
p
Out[277]:
array([[1, 2, 0, 0],
[3, 4, 0, 0]])
Append seems to be faster though:
timeit np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
10000 loops, best of 3: 61.8 us per loop
timeit np.append(p, [[0,0],[0,0]],1)
10000 loops, best of 3: 48 us per loop
And a comparison with np.c_ and np.hstack [append still seems to be the fastest]:
In [295]: z=np.zeros((2, 2), dtype=a.dtype)
In [296]: timeit np.c_[a, z]
10000 loops, best of 3: 47.2 us per loop
In [297]: timeit np.append(p, z,1)
100000 loops, best of 3: 13.1 us per loop
In [305]: timeit np.hstack((p,z))
10000 loops, best of 3: 20.8 us per loop
and np.concatenate [that is a even a bit faster than append]:
In [307]: timeit np.concatenate((p, z), axis=1)
100000 loops, best of 3: 11.6 us per loop
there are also similar methods like np.vstack, np.hstack, np.dstack. I like these over np.concatente as it makes it clear what dimension is being "expanded".
temp = np.array([[1, 2], [3, 4]])
np.hstack((temp, np.zeros((2,3))))
it's easy to remember becase numpy's first axis is vertical so vstack expands the first axis and 2nd axis is horizontal so hstack.