I have two arrays x=[1,2,3,4] and y=[1,0,0,1] describing 2D points (x,y). I want to know how many elements have x>2 and y==1. What is the most simple way to do this (without any loops)?
Is it possible to do something like x[x>2], but for two conditions?
Assuming these are numpy arrays, since your x[x>2] is numpy syntax, you just need the and (&) operator:
meet_cond = (x > 2) & (y == 1)
how_many = meet_cond.sum()
which_x = x[meet_cond]
which_y = y[meet_cond]
If x and y belong together as points, you might want to pack them into a np 2D array:
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4])
>>> y = np.array([1, 0, 0, 1])
>>> xy = np.array([x, y]).T
>>> xy[(x > 2) & (y == 1)]
array([[4, 1]])
>>> xy[(xy[:, 0] > 2) & (xy[:, 1] == 1)]
array([[4, 1]])
>>> np.count_nonzero((xy[:, 0] > 2) & (xy[:, 1] == 1))
1
Related
I have two 1D NumPy arrays x = [x[0], x[1], ..., x[n-1]] and y = [y[0], y[1], ..., y[n-1]]. The array x is known, and I need to determine the values for array y. For every index in np.arange(n), the value of y[index] depends on x[index] and on x[index + 1: ]. My code is this:
import numpy as np
n = 5
q = 0.5
x = np.array([1, 2, 0, 1, 0])
y = np.empty(n, dtype=int)
for index in np.arange(n):
if (x[index] != 0) and (np.any(x[index + 1:] == 0)):
y[index] = np.random.choice([0,1], 1, p=(1-q, q))
else:
y[index] = 0
print(y)
The problem with the for loop is that the size of n in my experiment can become very large. Is there any vectorized way to do this?
Randomly generate the array y with the full shape.
Generate a bool array indicating where to set zeros.
Use np.where to set zeros.
Try this,
import numpy as np
n = 5
q = 0.5
x = np.array([1, 2, 0, 1, 0])
y = np.random.choice([0, 1], n, p=(1-q, q))
condition = (x != 0) & (x[::-1].cumprod() == 0)[::-1] # equivalent to the posted one
y = np.where(condition, y, 0)
I would like get two arrays' sum of minumums efficiently with numpy. For example;
X=np.array([[1,2,3],[1,2,0]])
Y=np.array([[0,2,0],[1,3,1]])
My result should be;
result = array([[2, 4],[2, 3]])
The calculation for first cell;
result[0,0] = min(X[0,0],Y[0,0])+ min(X[0,1],Y[0,1])+min(X[0,2],Y[0,2])
In general, the result should be:
res[i,j] = sum(np.minimum(X[i, :], Y[j, :]))
but looking for fastest way.
dot is the equivalent of taking outer products, and summing on the appropriate axis.
The equivalent in your case is:
In [291]: np.minimum(X[:,None,:], Y[None,:,:])
Out[291]:
array([[[0, 2, 0],
[1, 2, 1]],
[[0, 2, 0],
[1, 2, 0]]])
In [292]: np.sum(np.minimum(X[:,None,:], Y[None,:,:]),axis=-1)
Out[292]:
array([[2, 4],
[2, 3]])
Best I could do:
import numpy as np
def sum_mins(x, y):
mask = (X - Y) < 0
return np.sum(X*mask + Y*np.logical_not(mask))
X=np.array([1,2,3])
Y=np.array([0,2,0])
print(sum_mins(X, Y))
One naive approach close to definition:
result = np.array([[np.sum(np.minimum(v_x, v_y)) for v_y in Y] for v_x in X])
A combination of hpaulj's and my former answer (deleted) that works in case you run out of memory otherwise:
# maximum number of float32s in memory - determining a max. chunk size
MAX_CHUNK_MEM_SIZE = 1000 * 1024 * 1024 / 4
def _fast_small(x, y):
"""Process a case with small size of x and y."""
# see answer of #hpaulj
return np.sum(np.minimum(x[:, None, :], y[None, :, :]), axis = -1)
def fast(x, y):
"""Process a case with potentially large size of x and y."""
assert len(x.shape) == len(y.shape) == 2
assert x.shape[1] == y.shape[1]
num_chunks = int(np.ceil(x.shape[0] * y.shape[0] * x.shape[0] / MAX_CHUNK_MEM_SIZE))
result_blocks = []
for x_block in np.array_split(x, num_chunks):
result_blocks_row = []
for y_block in np.array_split(y, num_chunks):
result_blocks_row.append(_fast_small(x_block, y_block))
result_blocks.append(result_blocks_row)
return np.block(result_blocks)
How can I select specific rows, where these rows equal to another row in another parallel array?
in other words; can I vectorize code? here p, y are ndarray withe the same shape
for inx,val in enumerate(p):
if val ==y[inx]:
pprob.append(1)
else:
pprob.append(0)
I just ran this in a python shell in Python 3.9.4
import numpy as np
x = np.array([1,2,3,4,5])
y = np.array([1,1,3,3,5])
matching_idx = np.where(x == y) # (array([0, 2, 4]),)
x[matching_idx] # array([1, 3, 5])
Seems like x[matching_idx] is what you want
The key to this is np.where(), explained here
import numpy as np
a = np.random.normal(size=(10, 5))
b = np.random.normal(size=(10, 5))
a[1] = b[1]
a[7] = b[7]
rows = np.all(a == b, axis=1).astype(np.int32)
rows = rows.tolist() # if you really want the result to be a list
print(rows)
Result as expected
[0 1 0 0 0 0 0 1 0 0]
If you could be dealing with more than two dimensions, change the following (works for 2d as well):
# this
np.all(a == b, axis=1)
# to this
np.all(a == b, axis=tuple(range(len(a.shape)))[1:])
How do I transform a Boolean array into an iterable of indexes?
E.g.,
import numpy as np
import itertools as it
x = np.array([1,0,1,1,0,0])
y = x > 0
retval = [i for i, y_i in enumerate(y) if y_i]
Is there a nicer way?
Try np.where or np.nonzero.
x = np.array([1, 0, 1, 1, 0, 0])
np.where(x)[0] # returns a tuple hence the [0], see help(np.where)
# array([0, 2, 3])
x.nonzero()[0] # in this case, the same as above.
See help(np.where) and help(np.nonzero).
Possibly worth noting that in the np.where page it mentions that for 1D x it's basically equivalent to your longform in the question.
Numpy's meshgrid is very useful for converting two vectors to a coordinate grid. What is the easiest way to extend this to three dimensions? So given three vectors x, y, and z, construct 3x3D arrays (instead of 2x2D arrays) which can be used as coordinates.
Numpy (as of 1.8 I think) now supports higher that 2D generation of position grids with meshgrid. One important addition which really helped me is the ability to chose the indexing order (either xy or ij for Cartesian or matrix indexing respectively), which I verified with the following example:
import numpy as np
x_ = np.linspace(0., 1., 10)
y_ = np.linspace(1., 2., 20)
z_ = np.linspace(3., 4., 30)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
assert np.all(x[:,0,0] == x_)
assert np.all(y[0,:,0] == y_)
assert np.all(z[0,0,:] == z_)
Here is the source code of meshgrid:
def meshgrid(x,y):
"""
Return coordinate matrices from two coordinate vectors.
Parameters
----------
x, y : ndarray
Two 1-D arrays representing the x and y coordinates of a grid.
Returns
-------
X, Y : ndarray
For vectors `x`, `y` with lengths ``Nx=len(x)`` and ``Ny=len(y)``,
return `X`, `Y` where `X` and `Y` are ``(Ny, Nx)`` shaped arrays
with the elements of `x` and y repeated to fill the matrix along
the first dimension for `x`, the second for `y`.
See Also
--------
index_tricks.mgrid : Construct a multi-dimensional "meshgrid"
using indexing notation.
index_tricks.ogrid : Construct an open multi-dimensional "meshgrid"
using indexing notation.
Examples
--------
>>> X, Y = np.meshgrid([1,2,3], [4,5,6,7])
>>> X
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> Y
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]])
`meshgrid` is very useful to evaluate functions on a grid.
>>> x = np.arange(-5, 5, 0.1)
>>> y = np.arange(-5, 5, 0.1)
>>> xx, yy = np.meshgrid(x, y)
>>> z = np.sin(xx**2+yy**2)/(xx**2+yy**2)
"""
x = asarray(x)
y = asarray(y)
numRows, numCols = len(y), len(x) # yes, reversed
x = x.reshape(1,numCols)
X = x.repeat(numRows, axis=0)
y = y.reshape(numRows,1)
Y = y.repeat(numCols, axis=1)
return X, Y
It is fairly simple to understand. I extended the pattern to an arbitrary number of dimensions, but this code is by no means optimized (and not thoroughly error-checked either), but you get what you pay for. Hope it helps:
def meshgrid2(*arrs):
arrs = tuple(reversed(arrs)) #edit
lens = map(len, arrs)
dim = len(arrs)
sz = 1
for s in lens:
sz*=s
ans = []
for i, arr in enumerate(arrs):
slc = [1]*dim
slc[i] = lens[i]
arr2 = asarray(arr).reshape(slc)
for j, sz in enumerate(lens):
if j!=i:
arr2 = arr2.repeat(sz, axis=j)
ans.append(arr2)
return tuple(ans)
Can you show us how you are using np.meshgrid? There is a very good chance that you really don't need meshgrid because numpy broadcasting can do the same thing without generating a repetitive array.
For example,
import numpy as np
x=np.arange(2)
y=np.arange(3)
[X,Y] = np.meshgrid(x,y)
S=X+Y
print(S.shape)
# (3, 2)
# Note that meshgrid associates y with the 0-axis, and x with the 1-axis.
print(S)
# [[0 1]
# [1 2]
# [2 3]]
s=np.empty((3,2))
print(s.shape)
# (3, 2)
# x.shape is (2,).
# y.shape is (3,).
# x's shape is broadcasted to (3,2)
# y varies along the 0-axis, so to get its shape broadcasted, we first upgrade it to
# have shape (3,1), using np.newaxis. Arrays of shape (3,1) can be broadcasted to
# arrays of shape (3,2).
s=x+y[:,np.newaxis]
print(s)
# [[0 1]
# [1 2]
# [2 3]]
The point is that S=X+Y can and should be replaced by s=x+y[:,np.newaxis] because
the latter does not require (possibly large) repetitive arrays to be formed. It also generalizes to higher dimensions (more axes) easily. You just add np.newaxis where needed to effect broadcasting as necessary.
See http://www.scipy.org/EricsBroadcastingDoc for more on numpy broadcasting.
i think what you want is
X, Y, Z = numpy.mgrid[-10:10:100j, -10:10:100j, -10:10:100j]
for example.
Here is a multidimensional version of meshgrid that I wrote:
def ndmesh(*args):
args = map(np.asarray,args)
return np.broadcast_arrays(*[x[(slice(None),)+(None,)*i] for i, x in enumerate(args)])
Note that the returned arrays are views of the original array data, so changing the original arrays will affect the coordinate arrays.
Instead of writing a new function, numpy.ix_ should do what you want.
Here is an example from the documentation:
>>> ixgrid = np.ix_([0,1], [2,4])
>>> ixgrid
(array([[0],
[1]]), array([[2, 4]]))
>>> ixgrid[0].shape, ixgrid[1].shape
((2, 1), (1, 2))'
You can achieve that by changing the order:
import numpy as np
xx = np.array([1,2,3,4])
yy = np.array([5,6,7])
zz = np.array([9,10])
y, z, x = np.meshgrid(yy, zz, xx)