takes two 2D NumPy matrices as arguments
returns 1 2D NumPy array: the product of the two input matrices
This function will perform the operation Z=X×Y, where X and Y are the function arguments. Remember how to perform matrix-matrix multiplication:
First, you need to make sure the matrix dimensions line up. For computing X×Y, this means the number of columns of X (first matrix) should match the number of rows of Y (second matrix). These are referred to as the "inner dimensions"--matrix dimensions are usually cited as "rows by columns", so the second dimension of the first operand X is on the "inside" of the operation; same with the first dimension of the second operand, Y. If the operation were instead Y×X, you would need to make sure that the number of columns of Y matches the number of rows of X. If these numbers don't match, you should return None.
Second, you'll need to create your output matrix, Z. The dimensions of this matrix will be the "outer dimensions" of the two operands: if we're computing X×Y, then Z's dimensions will have the same number of rows as X (the first matrix), and the same number of columns as Y (the second matrix).
Third, you'll need to compute pairwise dot products. If the operation is X×Y, then these dot products will be between the ith row of X with the jth column of Y. This resulting dot product will then go in Z[i][j]. So first, you'll find the dot product of row 0 of X with column 0 of Y, and put that in Z[0][0]. Then you'll find the dot product of row 0 of X with column 1 of Y, and put that in Z[0][1]. And so on, until all rows and columns of X and Y have been dot-product-ed with each other.
You can use numpy.array, but no functions associated with computing matrix products (and definitely not the # operator).
You CAN use numpy.dot, but ONLY to multiply vectors, since it can also be used to multiply full matrices.
import numpy as np
def mm_multiply(X, Y):
X = [[1,2], [2,1]]
Y = [[2,3], [3,3]]
X = np.array(X)
Y = np.array(Y)
[I,J] = X.shape
[K,H] = Y.shape
Z = np.zeros((I,H))
for i in range(I):
for j in range(H):
for k in range(K):
Z[i,j] += X[i,k]*Y[k,j]
print(Z)
________________
My submission code is mv_multiply is not defined, but that is the previous problem - but it might go away with the correct coding. and the other error im getting is:
Any help will be much appreciated!! Thanks in advance!
TypeError Traceback (most recent call last)
<ipython-input-38-1b8bf5d47d82> in <module>
4 A = np.random.random((48, 683))
5 B = np.random.random((683, 58))
----> 6 np.testing.assert_allclose(mm_multiply(A, B), A # B)
/opt/conda/lib/python3.7/site-packages/numpy/testing/_private/utils.py in assert_allclose(actual, desired, rtol, atol, equal_nan, err_msg, verbose)
1491 header = 'Not equal to tolerance rtol=%g, atol=%g' % (rtol, atol)
1492 assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
-> 1493 verbose=verbose, header=header, equal_nan=equal_nan)
1494
1495
/opt/conda/lib/python3.7/site-packages/numpy/testing/_private/utils.py in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
779 return
780
--> 781 val = comparison(x, y)
782
783 if isinstance(val, bool):
/opt/conda/lib/python3.7/site-packages/numpy/testing/_private/utils.py in compare(x, y)
1486 def compare(x, y):
1487 return np.core.numeric.isclose(x, y, rtol=rtol, atol=atol,
-> 1488 equal_nan=equal_nan)
1489
1490 actual, desired = np.asanyarray(actual), np.asanyarray(desired)
/opt/conda/lib/python3.7/site-packages/numpy/core/numeric.py in isclose(a, b, rtol, atol, equal_nan)
2519 y = array(y, dtype=dt, copy=False, subok=True)
2520
-> 2521 xfin = isfinite(x)
2522 yfin = isfinite(y)
2523 if all(xfin) and all(yfin):
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Related
I have a NumPy array of X * Y elements, represented as a flatted array (arr = np.array(x * y)).
Given the following values:
X = 832
Y = 961
I need to access elements of the array in the following sequence:
arr[0:832:2]
arr[1:832:2]
arr[832:1664:2]
arr[833:1664:2]
...
arr[((Y-1) * X):(X * Y):2]
I'm not sure, mathematically, how to achieve the start and stop for each iteration in a loop.
This should do the trick
Y = 961
X = 832
all_ = np.random.rand(832*961)
# Iterating over the values of y
for i in range(1,Y):
# getting the indicies from the array we need
# i - 1 = Start
# X*i = END
# 2 is the step
indicies = list(range(i-1,X*i,2))
# np.take slice values from the array or get values corresponding to the list of indicies we prepared above
required_array = np.take(indices=indices)
To anybody interested in this solution (per iteration, not incrementing the shift each iteration):
for i in range(Y):
shift = X * (i // 2)
begin = (i % 2) + shift
end = X + shift
print(f'{begin}:{end}:2')
Let's say you have an array of shape (x * y,) that you want to process in chunks of x. You can simply reshape your array to shape (y, x) and process the rows:
>>> x = 832
>>> y = 961
>>> arr = np.arange(x * y)
Now reshape, and process in bulk. In the following example, I take the mean of each row. You can apply whatever functions you want to the entire array this way:
>>> arr = arr.reshape(y, x)
>>> np.mean(arr[:, ::2], axis=1)
>>> np.mean(arr[:, 1::2], axis=1)
The reshape operation does not alter the data in your array. The buffer it points to is the same as the original. You can invert the reshape by calling ravel on the array.
I want to parallelize the numpy.bincount function using the apply_ufunc API of xarray and the following code is what I've tried:
import numpy as np
import xarray as xr
da = xr.DataArray(np.random.rand(2,16,32),
dims=['time', 'y', 'x'],
coords={'time': np.array(['2019-04-18', '2019-04-19'],
dtype='datetime64'),
'y': np.arange(16), 'x': np.arange(32)})
f = xr.DataArray(da.data.reshape((2,512)),dims=['time','idx'])
x = da.x.values
y = da.y.values
r = np.sqrt(x[np.newaxis,:]**2 + y[:,np.newaxis]**2)
nbins = 4
if x.max() > y.max():
ri = np.linspace(0., y.max(), nbins)
else:
ri = np.linspace(0., x.max(), nbins)
ridx = np.digitize(np.ravel(r), ri)
func = lambda a, b: np.bincount(a, weights=b)
xr.apply_ufunc(func, xr.DataArray(ridx,dims=['idx']), f)
but I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-203-974a8f0a89e8> in <module>()
12
13 func = lambda a, b: np.bincount(a, weights=b)
---> 14 xr.apply_ufunc(func, xr.DataArray(ridx,dims=['idx']), f)
~/anaconda/envs/uptodate/lib/python3.6/site-packages/xarray/core/computation.py in apply_ufunc(func, *args, **kwargs)
979 signature=signature,
980 join=join,
--> 981 exclude_dims=exclude_dims)
982 elif any(isinstance(a, Variable) for a in args):
983 return variables_ufunc(*args)
~/anaconda/envs/uptodate/lib/python3.6/site-packages/xarray/core/computation.py in apply_dataarray_ufunc(func, *args, **kwargs)
208
209 data_vars = [getattr(a, 'variable', a) for a in args]
--> 210 result_var = func(*data_vars)
211
212 if signature.num_outputs > 1:
~/anaconda/envs/uptodate/lib/python3.6/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, *args, **kwargs)
558 raise ValueError('unknown setting for dask array handling in '
559 'apply_ufunc: {}'.format(dask))
--> 560 result_data = func(*input_data)
561
562 if signature.num_outputs == 1:
<ipython-input-203-974a8f0a89e8> in <lambda>(a, b)
11 ridx = np.digitize(np.ravel(r), ri)
12
---> 13 func = lambda a, b: np.bincount(a, weights=b)
14 xr.apply_ufunc(func, xr.DataArray(ridx,dims=['idx']), f)
ValueError: object too deep for desired array
I am kind of lost where the error is stemming from and help would be greatly appreciated...
The issue is that apply_along_axis iterates over 1D slices of the first argument to the applied function and not any of the others. If I understand your use-case correctly, you actually want to iterate over 1D slices of the weights (weights in the np.bincount signature), not the integer array (x in the np.bincount signature).
One way to work around this is to write a thin wrapper function around np.bincount that simply switches the order of the arguments:
def wrapped_bincount(weights, x):
return np.bincount(x, weights=weights)
We can then use np.apply_along_axis with this function for your use-case:
def apply_bincount_along_axis(x, weights, axis=-1):
return np.apply_along_axis(wrapped_bincount, axis, weights, x)
Finally, we can wrap this new function for use with xarray using apply_ufunc, noting that it can be automatically parallelized with dask (also note that that we do not need to provide an axis argument, because xarray will automatically move the input core dimension dim to the last position in the weights array before applying the function):
def xbincount(x, weights):
if len(x.dims) != 1:
raise ValueError('x must be one-dimensional')
dim, = x.dims
nbins = x.max() + 1
return xr.apply_ufunc(apply_bincount_along_axis, x, weights,
input_core_dims=[[dim], [dim]],
output_core_dims=[['bin']], dask='parallelized',
output_dtypes=[np.float], output_sizes={'bin': nbins})
Applying this function to your example then looks like:
xbincount(ridx, f)
<xarray.DataArray (time: 2, bin: 5)>
array([[ 0. , 7.934821, 34.066872, 51.118065, 152.769169],
[ 0. , 11.692989, 33.262936, 44.993856, 157.642972]])
Dimensions without coordinates: time, bin
As desired it also works with dask arrays:
xbincount(ridx, f.chunk({'time': 1}))
<xarray.DataArray (time: 2, bin: 5)>
dask.array<shape=(2, 5), dtype=float64, chunksize=(1, 5)>
Dimensions without coordinates: time, bin
I know this is a bit late, but here is an alternative for computing bincount with multiple sets of weights. Please refer to #spencerkclark's answer for information about parallelizing the function.
There is a WARNING before using this: the function bincount_2d_SLOW is used to demonstrate the idea! Do not use this function in your code directly, it is very slow!
I will explain at the end why the idea of the function can greatly speed up your code relative to the solution posted by #spencerkclark, but only if you are computing the bincount multiple times using the same set of groups.
The idea of the code is that while we can't use np.bincount with 2d weights, we can convert 2d weights into 1d data that is directly usable by np.bincount.
The way we do this is:
we repeat our grouping column by the 2nd dimension of the weights, so the grouping and weights have the same shape.
we adjust the grouping values along the 2nd dimension, so that each set of weights has its own unique grouping values. This way, we are grouping along each set of weights separately.
We flatten the data, so it's 1d. Now we can run np.bincount.
Finally, reshape the data.
def bincount_2d_SLOW(x, weights=None):
if weights is None:
return np.bincount(x)
if len(weights.shape) == 1:
return np.bincount(x, weights=weights)
n_groups = x.max() + 1
n_dims = weights.shape[1]
# Expand x to the same number of dimensions as weights
repeated_x = np.tile(x, (n_dims, 1)).T
# Take Kronecker product, so bincount works separately along each dimension
repeated_x = repeated_x + n_groups * np.arange(n_dims)
# Flatten
repeated_x = repeated_x.flatten()
# Compute bincount
return np.bincount(repeated_x, weights=weights.flatten()).reshape((n_dims, n_groups)).T
Here is why the idea of this function can speed up your code: if you are computing the bincount many times using the same set of groups, you can pre-compute the tiled-and-flattened groupings, and suddenly the code is incredibly fast. Here is an alternative function (I also added an option to specify n_groups, which can speed up the code even more):
def bincount_2d(x, weights=None, n_groups=None):
if weights is None:
return np.bincount(x)
if len(weights.shape) == 1:
return np.bincount(x, weights=weights)
n_dims = weights.shape[1]
if n_groups is None:
n_groups = (x.max() + 1) // n_dims
return np.bincount(x, weights=weights.flatten()).reshape((n_dims, n_groups)).T
In some testing I did, bincount_2d_SLOW is about 1/3 slower than apply_bincount_along_axis. But bincount_2d was about 2x faster than apply_bincount_along_axis when I didn't specify n_groups, and when I did specify n_groups, it was about 3x faster.
I'd like to write an extrapolated spline function for a 2D matrix. What I have now is an extrapolated spline function for 1D arrays as below. scipy.interpolate.InterpolatedUnivariateSpline() is used.
import numpy as np
import scipy as sp
def extrapolated_spline_1D(x0,y0):
x0 = np.array(x0)
y0 = np.array(y0)
assert x0.shape == y.shape
spline = sp.interpolate.InterpolatedUnivariateSpline(x0,y0)
def f(x, spline=spline):
return np.select(
[(x<x0[0]), (x>x0[-1]), np.ones_like(x,dtype='bool')],
[np.zeros_like(x)+y0[0], np.zeros_like(x)+y0[-1], spline(x)])
return f
It takes x0, which is where the function is defined, and y0, which is the according values. When x < x0[0], y = y0[0]; and when x > x0[-1], y = y0[-1]. Here, assuming x0 is in an ascending order.
I want to have a similar extrapolated spline function for dealing with 2D matrices using np.select() as in extrapolated_spline_1D. I thought scipy.interpolate.RectBivariateSpline() might help, but I'm not sure how to do it.
For reference, my current version of the extrapolated_spline_2D is very slow.
The basic idea is:
(1) first, given 1D arrays x0, y0 and 2D array z2d0 as input, making nx0 extrapolated_spline_1D functions, y0_spls, each of which stands for a layer z2d0 defined on y0;
(2) second, for a point (x,y) not on the grid, calculating nx0 values, each equals to y0_spls[i](y);
(3) third, fitting (x0, y0_spls[i](y)) with extrapolated_spline_1D to x_spl and returning x_spl(x) as the final result.
def extrapolated_spline_2D(x0,y0,z2d0):
'''
x0,y0 : array_like, 1-D arrays of coordinates in strictly monotonic order.
z2d0 : array_like, 2-D array of data with shape (x.size,y.size).
'''
nx0 = x0.shape[0]
ny0 = y0.shape[0]
assert z2d0.shape == (nx0,ny0)
# make nx0 splines, each of which stands for a layer of z2d0 on y0
y0_spls = [extrapolated_spline_1D(y0,z2d0[i,:]) for i in range(nx0)]
def f(x, y):
'''
f takes 2 arguments at the same time --> x, y have the same dimention
Return: a numpy ndarray object with the same shape of x and y
'''
x = np.array(x,dtype='f4')
y = np.array(y,dtype='f4')
assert x.shape == y.shape
ndim = x.ndim
if ndim == 0:
'''
Given a point on the xy-plane.
Make ny = 1 splines, each of which stands for a layer of new_xs on x0
'''
new_xs = np.array([y0_spls[i](y) for i in range(nx0)])
x_spl = extrapolated_spline_1D(x0,new_xs)
result = x_spl(x)
elif ndim == 1:
'''
Given a 1-D array of points on the xy-plane.
'''
ny = len(y)
new_xs = np.array([y0_spls[i](y) for i in range(nx0)]) # new_xs.shape = (nx0,ny)
x_spls = [extrapolated_spline_1D(x0,new_xs[:,i]) for i in range(ny)]
result = np.array([x_spls[i](x[i]) for i in range(ny)])
else:
'''
Given a multiple dimensional array of points on the xy-plane.
'''
x_flatten = x.flatten()
y_flatten = y.flatten()
ny = len(y_flatten)
new_xs = np.array([y0_spls[i](y_flatten) for i in range(nx0)])
x_spls = [extrapolated_spline_1D(x0,new_xs[:,i]) for i in range(ny)]
result = np.array([x_spls[i](x_flatten[i]) for i in range(ny)]).reshape(y.shape)
return result
return f
I've done a similar work called GlobalSpline2D here, and it works perfectly under either liner, cubic, or quintic splines.
Basically it inherits interp2d, and promoting the usage to 2D extrapolation by InterpolatedUnivariateSpline. Both of them are scipy internal functions.
Its usage should be referred to the document as well as the call method of interp2d.
I think I've come up with an answer myself, which utilizes scipy.interpolate.RectBivariateSpline() and is over 10 times faster than my old one.
Here is the function extrapolated_spline_2D_new.
def extrapolated_spline_2D_new(x0,y0,z2d0):
'''
x0,y0 : array_like,1-D arrays of coordinates in strictly ascending order.
z2d0 : array_like,2-D array of data with shape (x.size,y.size).
'''
assert z2d0.shape == (x0.shape[0],y0.shape[0])
spline = scipy.interpolate.RectBivariateSpline(x0,y0,z2d0,kx=3,ky=3)
'''
scipy.interpolate.RectBivariateSpline
x,y : array_like, 1-D arrays of coordinates in strictly ascending order.
z : array_like, 2-D array of data with shape (x.size,y.size).
'''
def f(x,y,spline=spline):
'''
x and y have the same shape with the output.
'''
x = np.array(x,dtype='f4')
y = np.array(y,dtype='f4')
assert x.shape == y.shape
ndim = x.ndim
# We want the output to have the same dimension as the input,
# and when ndim == 0 or 1, spline(x,y) is always 2D.
if ndim == 0: result = spline(x,y)[0][0]
elif ndim == 1:
result = np.array([spline(x[i],y[i])[0][0] for i in range(len(x))])
else:
result = np.array([spline(x.flatten()[i],y.flatten()[i])[0][0] for i in range(len(x.flatten()))]).reshape(x.shape)
return result
return f
Note:
In the above version, I calculate the value one by one instead of using the codes beneath.
def f(x,y,spline=spline):
'''
x and y have the same shape with the output.
'''
x = np.array(x,dtype='f4')
y = np.array(y,dtype='f4')
assert x.shape == y.shape
ndim = x.ndim
if ndim == 0: result = spline(x,y)[0][0]
elif ndim == 1:
result = spline(x,y).diagonal()
else:
result = spline(x.flatten(),y.flatten()).diagonal().reshape(x.shape)
return result
Because when I tried to do the calculation with the codes beneath, it sometimes give the error message as:
<ipython-input-65-33285fd2319d> in f(x, y, spline)
29 if ndim == 0: result = spline(x,y)[0][0]
30 elif ndim == 1:
---> 31 result = spline(x,y).diagonal()
32 else:
33 result = spline(x.flatten(),y.flatten()).diagonal().reshape(x.shape)
/usr/local/lib/python2.7/site-packages/scipy/interpolate/fitpack2.pyc in __call__(self, x, y, mth, dx, dy, grid)
826 z,ier = dfitpack.bispev(tx,ty,c,kx,ky,x,y)
827 if not ier == 0:
--> 828 raise ValueError("Error code returned by bispev: %s" % ier)
829 else:
830 # standard Numpy broadcasting
ValueError: Error code returned by bispev: 10
I don't know what it means.
The lines in question are:
# Make efficient matrix that can be built
K = sparse.lil_matrix((N, N))
# Calculate K matrix (<i|pHp|j> in the LGL-nodes basis)
for i in range(Ne):
idx_s, idx_e = i*(Np-1), i*(Np-1)+Np
print(shape(K[idx_s:idx_e, idx_s:idx_e]))
print(shape(dmat.T.dot(sparse.spdiags(w*peq[idx_s:idx_e], 0, Np, Np)).dot(dmat)))
K[idx_s:idx_e, idx_s:idx_e] += dmat.T.dot(sparse.spdiags(w*peq[idx_s:idx_e], 0, Np, Np)).dot(dmat)
But, currently, Numpy is yielding the error
(8, 8)
(8, 8)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-cc7cc21f07e5> in <module>()
22
23 for _ in range(N):
---> 24 ll, q = getLL(Ne, Np, x_d, w_d, dmat_d, x, w, dL, peq*peq, data)
25 peq = (peq*q)
26
<ipython-input-61-a52c13d48b87> in getLL(Ne, Np, x_d, w_d, dmat_d, x, w, dmat, peq, data)
15 print(shape(K[idx_s:idx_e, idx_s:idx_e]))
16 print(shape(dmat.T.dot(sparse.spdiags(w*peq[idx_s:idx_e], 0, Np, Np)).dot(dmat)))
---> 17 K[idx_s:idx_e, idx_s:idx_e] += dmat.T.dot(sparse.spdiags(w*peq[idx_s:idx_e], 0, Np, Np)).dot(dmat)
18
19 # Re-make matrix for efficient vector products
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/scipy/sparse/lil.py in __iadd__(self, other)
157
158 def __iadd__(self,other):
--> 159 self[:,:] = self + other
160 return self
161
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/scipy/sparse/lil.py in __setitem__(self, index, x)
307
308 # Make x and i into the same shape
--> 309 x = np.asarray(x, dtype=self.dtype)
310 x, _ = np.broadcast_arrays(x, i)
311
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
460
461 """
--> 462 return array(a, dtype, copy=False, order=order)
463
464 def asanyarray(a, dtype=None, order=None):
ValueError: setting an array element with a sequence.
This is a little cryptic as it seems that the error is happening somewhere inside of the Numpy library---not in my code. But I'm not terribly familiar with numpy, per se, so perhaps I'm indirectly causing the error.
Both slices are of the same shape, so that doesn't seem to be the actual error.
The problem is that
(dmat.T.dot(sparse.spdiags(w*peq[idx_s:idx_e], 0, Np, Np)).dot(dmat)
is not a simple array. It has the right shape, but the elements are sparse matrices (the 'sequence' in the error message).
Turning the inner sparse matrix into a dense array should solve the problem:
dmat.T.dot(sparse.spdiags(w*peq[idx_s:idx_e], 0, Np, Np).A).dot(dmat)
The np.dot method is not aware of sparse matrices, at least not in your version of numpy (1.8?), so it treats it as sequence. Newer versions are 'sparse' aware.
Another solution is to use the sparse matrix product (dot or *).
sparse.spdiags(...).dot(dmat etc)
I had to play around to get reasonable values for N,Np,Ns, dmat,peq. You really should have given us small samples. It makes testing ideas much easier.
I have numeric data stored in two DataFrames x and y. The inner product from numpy works but the dot product from pandas does not.
In [63]: x.shape
Out[63]: (1062, 36)
In [64]: y.shape
Out[64]: (36, 36)
In [65]: np.inner(x, y).shape
Out[65]: (1062L, 36L)
In [66]: x.dot(y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-66-76c015be254b> in <module>()
----> 1 x.dot(y)
C:\Programs\WinPython-64bit-2.7.3.3\python-2.7.3.amd64\lib\site-packages\pandas\core\frame.pyc in dot(self, other)
888 if (len(common) > len(self.columns) or
889 len(common) > len(other.index)):
--> 890 raise ValueError('matrices are not aligned')
891
892 left = self.reindex(columns=common, copy=False)
ValueError: matrices are not aligned
Is this a bug or am I using pandas wrong?
Not only must the shapes of x and y be correct, but also
the column names of x must match the index names of y. Otherwise
this code in pandas/core/frame.py will raise a ValueError:
if isinstance(other, (Series, DataFrame)):
common = self.columns.union(other.index)
if (len(common) > len(self.columns) or
len(common) > len(other.index)):
raise ValueError('matrices are not aligned')
If you just want to compute the matrix product without making the column names of x match the index names of y, then use the NumPy dot function:
np.dot(x, y)
The reason why the column names of x must match the index names of y is because the pandas dot method will reindex x and y so that if the column order of x and the index order of y do not naturally match, they will be made to match before the matrix product is performed:
left = self.reindex(columns=common, copy=False)
right = other.reindex(index=common, copy=False)
The NumPy dot function does no such thing. It will just compute the matrix product based on the values in the underlying arrays.
Here is an example which reproduces the error:
import pandas as pd
import numpy as np
columns = ['col{}'.format(i) for i in range(36)]
x = pd.DataFrame(np.random.random((1062, 36)), columns=columns)
y = pd.DataFrame(np.random.random((36, 36)))
print(np.dot(x, y).shape)
# (1062, 36)
print(x.dot(y).shape)
# ValueError: matrices are not aligned