I have a very large array, but I'll use a smaller one to explain.
Given source array X
X = [ [1,1,1,1],
[2,2,2,2],
[3,3,3,3]]
A target array with the same size Y
Y = [ [-1,-1,-1,-1],
[-2,-2,-2,-2],
[-3,-3,-3,-3]]
And an assigment array IDX:
IDX = [ [1,0,0,0],
[0,0,1,0],
[0,1,0,1]]
I want to assign Y to X by IDX - Only assign where IDX==1
In this case, something like:
X[IDX] = Y[IDX]
will result in:
X = [ [-1,1,1,1],
[2,2,-2,2],
[3,-3,3,-3]]
How can this be done efficiently (not a for-loop) in numpy/pandas?
Thx
If IDX is a NumPy array of Boolean type, and X and Y are NumPy arrays then your intuition works:
X = np.array(X)
Y = np.array(Y)
IDX = np.array(IDX).astype(bool)
X[IDX] = Y[IDX]
This changes X in place.
If you don't want to do all this type casting, or don't want to overwrite X, then np.where() does what you want in one go:
np.where(IDX==1, Y, X)
Related
I need to write a list comprehension to create a vector twice the square of the middle column of a matrix. (My matrix x = [[1,2,3],[4,5,6],[7,8,9]].) Problem is, I know how to extract the middle column BUT I don't know how to square it or double the square. Any help would be greatly appreciated (...still learning but trying my best)!
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(x)
z = [b[1] for b in x]
print(z)
To create a vector twice the square of the column:
import numpy as np
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(x)
with a list comprehension: (not recommended)
z = [2*b[1]**2 for b in x]
print(z)
The output is a python list:
[8, 50, 128]
using numpy indexing: (recommended)
more info here
z = 2 * x[:,1] ** 2
print(z)
The output is a numpy array:
[ 8 50 128]
I have an array with m rows and arrays as values, which indicate the index of columns and are bounded to a large number n.
E.g:
Y = [[1,34,203,2032],...,[2984]]
Now I want an efficient way to initialize a sparse numpy matrix X with dimensions m,n and values corresponding to Y (X[i,j] = 1, if j is in Y[i], = 0 otherwise).
Your data are already close to csr format, so I suggest using that:
import numpy as np
from scipy import sparse
from itertools import chain
# create an example
m, n = 20, 10
X = np.random.random((m, n)) < 0.1
Y = [list(np.where(y)[0]) for y in X]
# construct the sparse matrix
indptr = np.fromiter(chain((0,), map(len, Y)), int, len(Y) + 1).cumsum()
indices = np.fromiter(chain.from_iterable(Y), int, indptr[-1])
data = np.ones_like(indices)
S = sparse.csr_matrix((data, indices, indptr), (m, n))
# or
S = sparse.csr_matrix((data, indices, indptr))
# check
assert np.all(S==X)
Right now I have a 2-D numpy arrays that represents the coordinates pixels of an image
points = [[-1,-2,0,1,2,3,5,8] [-3,-4,0,-3,5,9,2,1]]
Each column represents a coordinate in the image, e.g:
array[0] = [-1,-3] means x = -1 and y = -3
Right now, I want to remove columns that either has x less than 0 && more than 5 or y less than 0 && more than 5
I know how to remove elements of a certain value
#remove x that is less than 0 and more than 5
x = points[0,:]
x = x[np.logical_and(x>=0, x<=5)]
#remove y that is less than 0 and more than 5
y = points[1,:]
y = y[np.logical_and(y>=0,y<=5)]
Is there a way to remove the y that shares the same index with the x that is deleted?(in other words, remove columns when either the condition for x deletion or y deletion is satisfied)
You can convert list to ndarray, then create a mask of boolean and reassign x, y. The nested logical_and mean you create a mask of x>=0 and x<=5 and y>=0 and y<=5, then the AND operator ensure that if once x[i] deleted, y[i] got deleted as well
points = [[-1,-2,0,1,2,3,5,8], [-3,-4,0,-3,5,9,2,1]]
x = np.array(points[0,:])
y = np.array(points[1,:])
mask = np.logical_and(np.logical_and(x>=0, x<=5), np.logical_and(y>=0, y<=5))
# mask = array([False, False, True, False, True, False, True, False])
x = x[mask] # x = array([0, 2, 5])
y = y[mask] # y = array([0, 5, 2])
You can use np.compress along the axis=1 to get the points you need:
np.compress((x>=0) * (x<=5) * (y>=0) * (y<=5), points, axis=1)
array([[0, 2, 5],
[0, 5, 2]])
where I have assumed that x, y and points are numpy arrays.
I want to remove nans from two arrays if there is a nan in the same position in either of them. The arrays are of same length. Here is what I am doing:
y = numpy.delete(y, numpy.where(numpy.isnan(x)))
numpy.delete(y, numpy.where(numpy.isnan(x)))
However, this only works if x is the one with nan's. How do I make it work if either x or y have nan?
You have to keep track of the indices to remove from both arrays. You don't need where since numpy supports boolean indexing (masks). Also, you don't need delete since you can just get a subset of the array.
mask = ~np.isnan(x)
x = x[mask]
y = y[mask]
mask = ~np.isnan(y)
x = x[mask]
y = y[mask]
Or more compactly:
mask = ~np.isnan(x) & ~np.isnan(y)
x = x[mask]
y = y[mask]
The first implementation only has an advantage if the arrays are enormous and computing the mask for y from a smaller array has a performance benefit. In general, I would recommend the second approach.
import numpy as np
import numpy.ma as ma
y = ma.masked_array(y, mask=~np.isnan(x))
y = y.compress() # y without nan where x has nan's
or, after the comments:
mask = ~np.isnan(x) & ~np.isnan(y)
y = ma.masked_array(y, mask=mask)
y = y.compress() # y without nan where x and y have nan's
x = ma.masked_array(x, mask=mask)
x = x.compress() # x without nan where x and y have nan's
or without mask:
mask = ~np.isnan(x) & ~np.isnan(y)
y = y[mask]
x = x[mask]
Numpy's meshgrid is very useful for converting two vectors to a coordinate grid. What is the easiest way to extend this to three dimensions? So given three vectors x, y, and z, construct 3x3D arrays (instead of 2x2D arrays) which can be used as coordinates.
Numpy (as of 1.8 I think) now supports higher that 2D generation of position grids with meshgrid. One important addition which really helped me is the ability to chose the indexing order (either xy or ij for Cartesian or matrix indexing respectively), which I verified with the following example:
import numpy as np
x_ = np.linspace(0., 1., 10)
y_ = np.linspace(1., 2., 20)
z_ = np.linspace(3., 4., 30)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
assert np.all(x[:,0,0] == x_)
assert np.all(y[0,:,0] == y_)
assert np.all(z[0,0,:] == z_)
Here is the source code of meshgrid:
def meshgrid(x,y):
"""
Return coordinate matrices from two coordinate vectors.
Parameters
----------
x, y : ndarray
Two 1-D arrays representing the x and y coordinates of a grid.
Returns
-------
X, Y : ndarray
For vectors `x`, `y` with lengths ``Nx=len(x)`` and ``Ny=len(y)``,
return `X`, `Y` where `X` and `Y` are ``(Ny, Nx)`` shaped arrays
with the elements of `x` and y repeated to fill the matrix along
the first dimension for `x`, the second for `y`.
See Also
--------
index_tricks.mgrid : Construct a multi-dimensional "meshgrid"
using indexing notation.
index_tricks.ogrid : Construct an open multi-dimensional "meshgrid"
using indexing notation.
Examples
--------
>>> X, Y = np.meshgrid([1,2,3], [4,5,6,7])
>>> X
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> Y
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]])
`meshgrid` is very useful to evaluate functions on a grid.
>>> x = np.arange(-5, 5, 0.1)
>>> y = np.arange(-5, 5, 0.1)
>>> xx, yy = np.meshgrid(x, y)
>>> z = np.sin(xx**2+yy**2)/(xx**2+yy**2)
"""
x = asarray(x)
y = asarray(y)
numRows, numCols = len(y), len(x) # yes, reversed
x = x.reshape(1,numCols)
X = x.repeat(numRows, axis=0)
y = y.reshape(numRows,1)
Y = y.repeat(numCols, axis=1)
return X, Y
It is fairly simple to understand. I extended the pattern to an arbitrary number of dimensions, but this code is by no means optimized (and not thoroughly error-checked either), but you get what you pay for. Hope it helps:
def meshgrid2(*arrs):
arrs = tuple(reversed(arrs)) #edit
lens = map(len, arrs)
dim = len(arrs)
sz = 1
for s in lens:
sz*=s
ans = []
for i, arr in enumerate(arrs):
slc = [1]*dim
slc[i] = lens[i]
arr2 = asarray(arr).reshape(slc)
for j, sz in enumerate(lens):
if j!=i:
arr2 = arr2.repeat(sz, axis=j)
ans.append(arr2)
return tuple(ans)
Can you show us how you are using np.meshgrid? There is a very good chance that you really don't need meshgrid because numpy broadcasting can do the same thing without generating a repetitive array.
For example,
import numpy as np
x=np.arange(2)
y=np.arange(3)
[X,Y] = np.meshgrid(x,y)
S=X+Y
print(S.shape)
# (3, 2)
# Note that meshgrid associates y with the 0-axis, and x with the 1-axis.
print(S)
# [[0 1]
# [1 2]
# [2 3]]
s=np.empty((3,2))
print(s.shape)
# (3, 2)
# x.shape is (2,).
# y.shape is (3,).
# x's shape is broadcasted to (3,2)
# y varies along the 0-axis, so to get its shape broadcasted, we first upgrade it to
# have shape (3,1), using np.newaxis. Arrays of shape (3,1) can be broadcasted to
# arrays of shape (3,2).
s=x+y[:,np.newaxis]
print(s)
# [[0 1]
# [1 2]
# [2 3]]
The point is that S=X+Y can and should be replaced by s=x+y[:,np.newaxis] because
the latter does not require (possibly large) repetitive arrays to be formed. It also generalizes to higher dimensions (more axes) easily. You just add np.newaxis where needed to effect broadcasting as necessary.
See http://www.scipy.org/EricsBroadcastingDoc for more on numpy broadcasting.
i think what you want is
X, Y, Z = numpy.mgrid[-10:10:100j, -10:10:100j, -10:10:100j]
for example.
Here is a multidimensional version of meshgrid that I wrote:
def ndmesh(*args):
args = map(np.asarray,args)
return np.broadcast_arrays(*[x[(slice(None),)+(None,)*i] for i, x in enumerate(args)])
Note that the returned arrays are views of the original array data, so changing the original arrays will affect the coordinate arrays.
Instead of writing a new function, numpy.ix_ should do what you want.
Here is an example from the documentation:
>>> ixgrid = np.ix_([0,1], [2,4])
>>> ixgrid
(array([[0],
[1]]), array([[2, 4]]))
>>> ixgrid[0].shape, ixgrid[1].shape
((2, 1), (1, 2))'
You can achieve that by changing the order:
import numpy as np
xx = np.array([1,2,3,4])
yy = np.array([5,6,7])
zz = np.array([9,10])
y, z, x = np.meshgrid(yy, zz, xx)