Related
I have to translate a Matlab script to python, it transforms some complicated data into an array. And I don't know how to translate this part of the code:
accumarray([j2,i2],iq,[],[],NaN)
That is in Matlab, the shapes of j2, i2 and iq are (1362730 x 1). But the shape of [j2, i2] would be (1362730 x 2).
I found this function in python to use accumarray:
def accum(accmap, a, func=None, size=None, fill_value=0, dtype=None):
"""
An accumulation function similar to Matlab's `accumarray` function.
Parameters
----------
accmap : ndarray
This is the "accumulation map". It maps input (i.e. indices into
`a`) to their destination in the output array. The first `a.ndim`
dimensions of `accmap` must be the same as `a.shape`. That is,
`accmap.shape[:a.ndim]` must equal `a.shape`. For example, if `a`
has shape (15,4), then `accmap.shape[:2]` must equal (15,4). In this
case `accmap[i,j]` gives the index into the output array where
element (i,j) of `a` is to be accumulated. If the output is, say,
a 2D, then `accmap` must have shape (15,4,2). The value in the
last dimension give indices into the output array. If the output is
1D, then the shape of `accmap` can be either (15,4) or (15,4,1)
a : ndarray
The input data to be accumulated.
func : callable or None
The accumulation function. The function will be passed a list
of values from `a` to be accumulated.
If None, numpy.sum is assumed.
size : ndarray or None
The size of the output array. If None, the size will be determined
from `accmap`.
fill_value : scalar
The default value for elements of the output array.
dtype : numpy data type, or None
The data type of the output array. If None, the data type of
`a` is used.
Returns
-------
out : ndarray
The accumulated results.
The shape of `out` is `size` if `size` is given. Otherwise the
shape is determined by the (lexicographically) largest indices of
the output found in `accmap`.
Examples
--------
>>> from numpy import array, prod
>>> a = array([[1,2,3],[4,-1,6],[-1,8,9]])
>>> a
array([[ 1, 2, 3],
[ 4, -1, 6],
[-1, 8, 9]])
>>> # Sum the diagonals.
>>> accmap = array([[0,1,2],[2,0,1],[1,2,0]])
>>> s = accum(accmap, a)
array([9, 7, 15])
>>> # A 2D output, from sub-arrays with shapes and positions like this:
>>> # [ (2,2) (2,1)]
>>> # [ (1,2) (1,1)]
>>> accmap = array([
[[0,0],[0,0],[0,1]],
[[0,0],[0,0],[0,1]],
[[1,0],[1,0],[1,1]],
])
>>> # Accumulate using a product.
>>> accum(accmap, a, func=prod, dtype=float)
array([[ -8., 18.],
[ -8., 9.]])
>>> # Same accmap, but create an array of lists of values.
>>> accum(accmap, a, func=lambda x: x, dtype='O')
array([[[1, 2, 4, -1], [3, 6]],
[[-1, 8], [9]]], dtype=object)
"""
# Check for bad arguments and handle the defaults.
if accmap.shape[:a.ndim] != a.shape:
raise ValueError("The initial dimensions of accmap must be the same as a.shape")
if func is None:
func = np.sum
if dtype is None:
dtype = a.dtype
if accmap.shape == a.shape:
accmap = np.expand_dims(accmap, -1)
adims = tuple(range(a.ndim))
if size is None:
size = 1 + np.squeeze(np.apply_over_axes(np.max, accmap, axes=adims))
size = np.atleast_1d(size)
# Create an array of python lists of values.
vals = np.empty(size, dtype='O')
for s in product(*[range(k) for k in size]):
vals[s] = []
for s in product(*[range(k) for k in a.shape]):
indx = tuple(accmap[s])
val = a[s]
vals[indx].append(val)
# Create the output array.
out = np.empty(size, dtype=dtype)
for s in product(*[range(k) for k in size]):
if vals[s] == []:
out[s] = fill_value
else:
out[s] = func(vals[s])
return out
But it doesnt work when the shapes of accmap and a are different, which is the case because my accmap would be [j2, i2] with shape (1362730 x 2) and a would be iq with shape (1362730 x 1). I don't quite understand what does Matlab do when the inputs are of different sizes. Is there a way to modify the python function to be able to do that, or just another way to translate that line to python?
I had a project in Matlab where I used accumarray(). I recently ported it to Python using numpy.histogramdd() as its closest replacement.
When running the following code:
from platform import python_version
print(python_version())
import numpy as np
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
x[1,:] = x[1,:] / 5
print(x)
y = np.array([1,2,3])
y = y / 5
print(y)
I get the following output:
3.8.6
[[1 2 3]
[0 1 1]
[7 8 9]]
[0.2 0.4 0.6]
Why does numpy / python use integer division when dividing a row in a matrix by a scalar while dividing a single row using regular division? I thought " / " division in numpy 3 was always regular?
Why does numpy / python use integer division when dividing a row in a matrix by a scalar
It doesn't - the symptom you are seeing is due to the assignment.
>>> x = np.array([[1,2,3],[4,5,6],[7,8,9]])
Dividing by an integer produces an array of floats,
>>> z = x[1,:] / 5
>>> z
array([0.8, 1. , 1.2])
But assigning that array to a slice of an integer array causes the dtype conversion.
>>> x[1,:] = z
>>> x
array([[1, 2, 3],
[0, 1, 1],
[7, 8, 9]])
>>> z.dtype
dtype('float64')
>>> x.dtype
dtype('int32')
>>>
This is mentioned in the documentation - Assigning values to indexed arrays
Note that assignments may result in changes if assigning higher types to lower types (like floats to ints) or even exceptions (assigning complex to floats or ints):
The trick here is in line:
x[1,:] = x[1,:] / 5
According to numpy documentation of dtype: https://numpy.org/doc/stable/reference/generated/numpy.dtype.html
A numpy array is homogeneous, and contains elements described by a dtype object
So when manually assigning the row, it's taking dtype of x matrix into account, which is of type dtype('int64').
The same will happen to you if you tried to manually assign an element to the y array:
y = np.array([1,2,3])
y[1] = 0.5
print(y)
# this will print array([1, 0, 3])
Why does numpy / python use integer division when dividing a row in a matrix by a scalar while dividing a single row using regular division?
So it's about enforcing the homogenous dtype of the np.array itself rather than dividing a row in a matrix, as shown in the line below:
x[1] / 5
>>> array([0.8, 1. , 1.2])
I have a NumPy array A with shape (m,n) and want to run all the elements through some function f. For a non-constant function such as for example f(x) = x or f(x) = x**2 broadcasting works perfectly fine and returns the expected result. For f(x) = 1, applying the function to my array A however just returns the scalar 1.
Is there a way to force broadcasting to keep the shape, i.e. in this case to return an array of 1s?
F(x) = 1 is not a function you need to create a function with def or lambda and return 1. Then use np.vectorize to apply the function on your array.
>>> import numpy as np
>>> f = lambda x: 1
>>>
>>> f = np.vectorize(f)
>>>
>>> f(np.arange(10).reshape(2, 5))
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
This sounds like a job for np.ones_like, or np.full_like in the general case:
def f(x):
result = np.full_like(x, 1) # or np.full_like(x, 1, dtype=int) if you don't want to
# inherit the dtype of x
if result.shape == 0:
# Return a scalar instead of a 0D array.
return result[()]
else:
return result
Use x.fill(1). Make sure to return it properly as fill doesn't return a new variable, it modifies x
I have two numpy arrays
import numpy as np
x = np.linspace(1e10, 1e12, num=50) # 50 values
y = np.linspace(1e5, 1e7, num=50) # 50 values
x.shape # output is (50,)
y.shape # output is (50,)
I would like to create a function which returns an array shaped (50,50) such that the first x value x0 is evaluated for all y values, etc.
The current function I am using is fairly complicated, so let's use an easier example. Let's say the function is
def func(x,y):
return x**2 + y**2
How do I shape this to be a (50,50) array? At the moment, it will output 50 values. Would you use a for loop inside an array?
Something like:
np.array([[func(x,y) for i in x] for j in y)
but without using two for loops. This takes forever to run.
EDIT: It has been requested I share my "complicated" function. Here it goes:
There is a data vector which is a 1D numpy array of 4000 measurements. There is also a "normalized_matrix", which is shaped (4000,4000)---it is nothing special, just a matrix with entry values of integers between 0 and 1, e.g. 0.5567878. These are the two "given" inputs.
My function returns the matrix multiplication product of transpose(datavector) * matrix * datavector, which is a single value.
Now, as you can see in the code, I have initialized two arrays, x and y, which pass through a series of "x parameters" and "y parameters". That is, what does func(x,y) return for value x1 and value y1, i.e. func(x1,y1)?
The shape of matrix1 is (50, 4000, 4000). The shape of matrix2 is (50, 4000, 4000). Ditto for total_matrix.
normalized_matrix is shape (4000,4000) and id_mat is shaped (4000,4000).
normalized_matrix
print normalized_matrix.shape #output (4000,4000)
data_vector = datarr
print datarr.shape #output (4000,)
def func(x, y):
matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
matrix2 = y[:, None, None] * id_mat[None, :, :]
total_matrix = matrix1 + matrix2
# transpose(datavector) * matrix * datavector
# by matrix multiplication, equals single value
return np.array([ np.dot(datarr.T, np.dot(total_matrix, datarr) ) ])
If I try to use np.meshgrid(), that is, if I try
x = np.linspace(1e10, 1e12, num=50) # 50 values
y = np.linspace(1e5, 1e7, num=50) # 50 values
X, Y = np.meshgrid(x,y)
z = func(X, Y)
I get the following value error: ValueError: operands could not be broadcast together with shapes (50,1,1,50) (1,4000,4000).
reshape in numpy as different meaning. When you start with a (100,) and change it to (5,20) or (10,10) 2d arrays, that is 'reshape. There is anumpy` function to do that.
You want to take 2 1d array, and use those to generate a 2d array from a function. This is like taking an outer product of the 2, passing all combinations of their values through your function.
Some sort of double loop is one way of doing this, whether it is with an explicit loop, or list comprehension. But speeding this up depends on that function.
For at x**2+y**2 example, it can be 'vectorized' quite easily:
In [40]: x=np.linspace(1e10,1e12,num=10)
In [45]: y=np.linspace(1e5,1e7,num=5)
In [46]: z = x[:,None]**2 + y[None,:]**2
In [47]: z.shape
Out[47]: (10, 5)
This takes advantage of numpy broadcasting. With the None, x is reshaped to (10,1) and y to (1,5), and the + takes an outer sum.
X,Y=np.meshgrid(x,y,indexing='ij') produces two (10,5) arrays that can be used the same way. Look at is doc for other parameters.
So if your more complex function can be written in a way that takes 2d arrays like this, it is easy to 'vectorize'.
But if that function must take 2 scalars, and return another scalar, then you are stuck with some sort of double loop.
A list comprehension form of the double loop is:
np.array([[x1**2+y1**2 for y1 in y] for x1 in x])
Another is:
z=np.empty((10,5))
for i in range(10):
for j in range(5):
z[i,j] = x[i]**2 + y[j]**2
This double loop can be sped up somewhat by using np.vectorize. This takes a user defined function, and returns one that can take broadcastable arrays:
In [65]: vprod=np.vectorize(lambda x,y: x**2+y**2)
In [66]: vprod(x[:,None],y[None,:]).shape
Out[66]: (10, 5)
Test that I've done in the past show that vectorize can improve on the list comprehension route by something like 20%, but the improvement is nothing like writing your function to work with 2d arrays in the first place.
By the way, this sort of 'vectorization' question has been asked many times on SO numpy. Beyond these broad examples, we can't help you without knowning more about that more complicated function. As long as it is a black box that takes scalars, the best we can help you with is np.vectorize. And you still need to understand broadcasting (with or without meshgrid help).
I think there is a better way, it is right on the tip of my tongue, but as an interim measure:
You are operating on 1x2 windows of a meshgrid. You can use as_strided from numpy.lib.stride_tricks to rearrange the meshgrid into two-element windows, then apply your function to the resultant array. I like to use a generic nd solution, sliding_windows (http://www.johnvinyard.com/blog/?p=268) (Not mine) to transform the array.
import numpy as np
a = np.array([1,2,3])
b = np.array([.1, .2, .3])
z= np.array(np.meshgrid(a,b))
def foo((x,y)):
return x+y
>>> z.shape
(2, 3, 3)
>>> t = sliding_window(z, (2,1,1))
>>> t
array([[ 1. , 0.1],
[ 2. , 0.1],
[ 3. , 0.1],
[ 1. , 0.2],
[ 2. , 0.2],
[ 3. , 0.2],
[ 1. , 0.3],
[ 2. , 0.3],
[ 3. , 0.3]])
>>> v = np.apply_along_axis(foo, 1, t)
>>> v
array([ 1.1, 2.1, 3.1, 1.2, 2.2, 3.2, 1.3, 2.3, 3.3])
>>> v.reshape((len(a), len(b)))
array([[ 1.1, 2.1, 3.1],
[ 1.2, 2.2, 3.2],
[ 1.3, 2.3, 3.3]])
>>>
This should be somewhat faster.
You may need to modify your function's argument signature.
If the link to the johnvinyard.com blog breaks, I've posted the the sliding_window implementation in other SO answers - https://stackoverflow.com/a/22749434/2823755
Search around and you'll find many other tricky as_strided solutions.
In response to your edited question:
normalized_matrix
print normalized_matrix.shape #output (4000,4000)
data_vector = datarr
print datarr.shape #output (4000,)
def func(x, y):
matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
matrix2 = y[:, None, None] * id_mat[None, :, :]
total_matrix = matrix1 + matrix2
# transpose(datavector) * matrix * datavector
# by matrix multiplication, equals single value
# return np.array([ np.dot(datarr.T, np.dot(total_matrix, datarr))])
return np.einsum('j,ijk,k->i',datarr,total_matrix,datarr)
Since datarr is shape (4000,), transpose does nothing. I believe you want the result of the 2 dots to be shape (50,). I'm suggesting using einsum. But it can be done with tensordot, or I think even np.dot(np.dot(total_matrix, datarr),datarr). Test the expression with smaller arrays, focusing on getting the shapes right.
x = np.linspace(1e10, 1e12, num=50) # 50 values
y = np.linspace(1e5, 1e7, num=50) # 50 values
z = func(x,y)
# X, Y = np.meshgrid(x,y)
# z = func(X, Y)
X,Y is wrong. func takes x and y that are 1d. Notice how you expand the dimensions with [:, None, None]. Also you aren't creating a 2d array from an outer combination of x and y. None of your arrays in func is (50,50) or (50,50,...). The higher dimensions are provided by nomalied_matrix and id_mat.
When showing us the ValueError you should also indicate where in your code that occurred. Otherwise we have to guess, or recreate the code ourselves.
In fact when I run my edited func(X,Y), I get this error:
----> 2 matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
3 matrix2 = y[:, None, None] * id_mat[None, :, :]
4 total_matrix = matrix1 + matrix2
5 # transpose(datavector) * matrix * datavector
ValueError: operands could not be broadcast together with shapes (50,1,1,50) (1,400,400)
See, the error occurs right at the start. normalized_matrix is expanded to (1,400,400) [I'm using smaller examples]. The (50,50) X is expanded to (50,1,1,50). x expands to (50,1,1), which broadcasts just fine.
To address the edit and the broadcasting error in the edit:
Inside your function you are adding dimensions to arrays to try to get them to broadcast.
matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
This expression looks like you want to broadcast a 1d array with a 2d array.
The results of your meshgrid are two 2d arrays:
X,Y = np.meshgrid(x,y)
>>> X.shape, Y.shape
((50, 50), (50, 50))
>>>
When you try to use X in in your broadcasting expression the dimensions don't line up, that is what causes the ValueError - refer to the General Broadcasting Rules:
>>> x1 = X[:, np.newaxis, np.newaxis]
>>> nm = normalized_matrix[np.newaxis, :, :]
>>> x1.shape
(50, 1, 1, 50)
>>> nm.shape
(1, 4000, 4000)
>>>
You're on the right track with your list comprehension, you just need to add in an extra level of iteration:
np.array([[func(i,j) for i in x] for j in y])
Numpy's meshgrid is very useful for converting two vectors to a coordinate grid. What is the easiest way to extend this to three dimensions? So given three vectors x, y, and z, construct 3x3D arrays (instead of 2x2D arrays) which can be used as coordinates.
Numpy (as of 1.8 I think) now supports higher that 2D generation of position grids with meshgrid. One important addition which really helped me is the ability to chose the indexing order (either xy or ij for Cartesian or matrix indexing respectively), which I verified with the following example:
import numpy as np
x_ = np.linspace(0., 1., 10)
y_ = np.linspace(1., 2., 20)
z_ = np.linspace(3., 4., 30)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
assert np.all(x[:,0,0] == x_)
assert np.all(y[0,:,0] == y_)
assert np.all(z[0,0,:] == z_)
Here is the source code of meshgrid:
def meshgrid(x,y):
"""
Return coordinate matrices from two coordinate vectors.
Parameters
----------
x, y : ndarray
Two 1-D arrays representing the x and y coordinates of a grid.
Returns
-------
X, Y : ndarray
For vectors `x`, `y` with lengths ``Nx=len(x)`` and ``Ny=len(y)``,
return `X`, `Y` where `X` and `Y` are ``(Ny, Nx)`` shaped arrays
with the elements of `x` and y repeated to fill the matrix along
the first dimension for `x`, the second for `y`.
See Also
--------
index_tricks.mgrid : Construct a multi-dimensional "meshgrid"
using indexing notation.
index_tricks.ogrid : Construct an open multi-dimensional "meshgrid"
using indexing notation.
Examples
--------
>>> X, Y = np.meshgrid([1,2,3], [4,5,6,7])
>>> X
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> Y
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]])
`meshgrid` is very useful to evaluate functions on a grid.
>>> x = np.arange(-5, 5, 0.1)
>>> y = np.arange(-5, 5, 0.1)
>>> xx, yy = np.meshgrid(x, y)
>>> z = np.sin(xx**2+yy**2)/(xx**2+yy**2)
"""
x = asarray(x)
y = asarray(y)
numRows, numCols = len(y), len(x) # yes, reversed
x = x.reshape(1,numCols)
X = x.repeat(numRows, axis=0)
y = y.reshape(numRows,1)
Y = y.repeat(numCols, axis=1)
return X, Y
It is fairly simple to understand. I extended the pattern to an arbitrary number of dimensions, but this code is by no means optimized (and not thoroughly error-checked either), but you get what you pay for. Hope it helps:
def meshgrid2(*arrs):
arrs = tuple(reversed(arrs)) #edit
lens = map(len, arrs)
dim = len(arrs)
sz = 1
for s in lens:
sz*=s
ans = []
for i, arr in enumerate(arrs):
slc = [1]*dim
slc[i] = lens[i]
arr2 = asarray(arr).reshape(slc)
for j, sz in enumerate(lens):
if j!=i:
arr2 = arr2.repeat(sz, axis=j)
ans.append(arr2)
return tuple(ans)
Can you show us how you are using np.meshgrid? There is a very good chance that you really don't need meshgrid because numpy broadcasting can do the same thing without generating a repetitive array.
For example,
import numpy as np
x=np.arange(2)
y=np.arange(3)
[X,Y] = np.meshgrid(x,y)
S=X+Y
print(S.shape)
# (3, 2)
# Note that meshgrid associates y with the 0-axis, and x with the 1-axis.
print(S)
# [[0 1]
# [1 2]
# [2 3]]
s=np.empty((3,2))
print(s.shape)
# (3, 2)
# x.shape is (2,).
# y.shape is (3,).
# x's shape is broadcasted to (3,2)
# y varies along the 0-axis, so to get its shape broadcasted, we first upgrade it to
# have shape (3,1), using np.newaxis. Arrays of shape (3,1) can be broadcasted to
# arrays of shape (3,2).
s=x+y[:,np.newaxis]
print(s)
# [[0 1]
# [1 2]
# [2 3]]
The point is that S=X+Y can and should be replaced by s=x+y[:,np.newaxis] because
the latter does not require (possibly large) repetitive arrays to be formed. It also generalizes to higher dimensions (more axes) easily. You just add np.newaxis where needed to effect broadcasting as necessary.
See http://www.scipy.org/EricsBroadcastingDoc for more on numpy broadcasting.
i think what you want is
X, Y, Z = numpy.mgrid[-10:10:100j, -10:10:100j, -10:10:100j]
for example.
Here is a multidimensional version of meshgrid that I wrote:
def ndmesh(*args):
args = map(np.asarray,args)
return np.broadcast_arrays(*[x[(slice(None),)+(None,)*i] for i, x in enumerate(args)])
Note that the returned arrays are views of the original array data, so changing the original arrays will affect the coordinate arrays.
Instead of writing a new function, numpy.ix_ should do what you want.
Here is an example from the documentation:
>>> ixgrid = np.ix_([0,1], [2,4])
>>> ixgrid
(array([[0],
[1]]), array([[2, 4]]))
>>> ixgrid[0].shape, ixgrid[1].shape
((2, 1), (1, 2))'
You can achieve that by changing the order:
import numpy as np
xx = np.array([1,2,3,4])
yy = np.array([5,6,7])
zz = np.array([9,10])
y, z, x = np.meshgrid(yy, zz, xx)