first order differences along a given axis in NumPy array - python

#compute first differences of 1d array
from numpy import *
x = arange(10)
y = zeros(len(x))
for i in range(1,len(x)):
y[i] = x[i] - x[i-1]
print y
The above code works but there must be at least one easy, pythonesque way to do this without having to use a for loop. Any suggestions?

What about:
diff(x)
# array([1, 1, 1, 1, 1, 1, 1, 1, 1])

Yes, this exactly the kind of loop numpy elementwise operations is designed for. You just need to learn to take the right slices of the arrays.
x = numpy.arange(10)
y = numpy.zeros(x.shape)
y[1:] = x[1:] - x[:-1]
print y

several NumPy builtins will do the job--in particular, diff, ediff1d, and gradient.
i suspect ediff1d is the better choice for the specific cast described in the OP--unlike the other two, ediff1d is acdtually directed/limited to this specific use case--ie, first-order differences along a single axis (or axis of a 1D array).
>>> import numpy as NP
>>> x = NP.random.randint(1, 10, 10)
>>> x
array([4, 6, 6, 8, 1, 2, 1, 1, 5, 4])
>>> NP.ediff1d(x)
array([ 2, 0, 2, -7, 1, -1, 0, 4, -1])

Here's a pattern I used a lot for a while:
from itertools import izip
d = [a-b for a,b in izip(x[1:],x[:-1])]

y = [item - x[i - 1] for i, item in enumerate(x[1:])]
If you need to access the index of an item while looping over it, enumerate() is the Pythonic way. Also, a list comprehension is, in this case, more readable.
Moreover, you should never use wild imports (from numpy import *). It will always import more than you need and leads to unnecessary ambiguity. Rather, just import numpy or import what you need, e.g.
from numpy import arange, zeros

Related

What is a best way to intersect multiple arrays with numpy array?

Suppose I have an example of numpy array:
import numpy as np
X = np.array([2,5,0,4,3,1])
And I also have a list of arrays, like:
A = [np.array([-2,0,2]), np.array([0,1,2,3,4,5]), np.array([2,5,4,6])]
I want to leave only these items of each list that are also in X. I expect also to do it in a most efficient/common way.
Solution I have tried so far:
Sort X using X.sort().
Find locations of items of each array in X using:
locations = [np.searchsorted(X, n) for n in A]
Leave only proper ones:
masks = [X[locations[i]] == A[i] for i in range(len(A))]
result = [A[i][masks[i]] for i in range(len(A))]
But it doesn't work because locations of third array is out of bounds:
locations = [array([0, 0, 2], dtype=int64), array([0, 1, 2, 3, 4, 5], dtype=int64), array([2, 5, 4, 6], dtype=int64)]
How to solve this issue?
Update
I ended up with idx[idx==len(Xs)] = 0 solution. I've also noticed two different approaches posted between the answers: transforming X into set vs np.sort. Both of them has plusses and minuses: set operations uses iterations which is quite slow in compare with numpy methods; however np.searchsorted speed increases logarithmically unlike acceses of set items which is instant. That why I decided to compare performance using data with huge sizes, especially 1 million items for X, A[0], A[1], A[2].
One idea would be less compute and minimal work when looping. So, here's one with those in mind -
a = np.concatenate(A)
m = np.isin(a,X)
l = np.array(list(map(len,A)))
a_m = a[m]
cut_idx = np.r_[0,l.cumsum()]
l_m = np.add.reduceat(m,cut_idx[:-1])
cl_m = np.r_[0,l_m.cumsum()]
out = [a_m[i:j] for (i,j) in zip(cl_m[:-1],cl_m[1:])]
Alternative #1 :
We can also use np.searchsorted to get the isin mask, like so -
Xs = np.sort(X)
idx = np.searchsorted(Xs,a)
idx[idx==len(Xs)] = 0
m = Xs[idx]==a
Another way with np.intersect1d
If you are looking for the most common/elegant one, think it would be with np.intersect1d -
In [43]: [np.intersect1d(X,A_i) for A_i in A]
Out[43]: [array([0, 2]), array([0, 1, 2, 3, 4, 5]), array([2, 4, 5])]
Solving your issue
You can also solve your out-of-bounds issue, with a simple fix -
for l in locations:
l[l==len(X)]=0
How about this, very simple and efficent:
import numpy as np
X = np.array([2,5,0,4,3,1])
A = [np.array([-2,0,2]), np.array([0,1,2,3,4,5]), np.array([2,5,4,6])]
X_set = set(X)
A = [np.array([a for a in arr if a in X_set]) for arr in A]
#[array([0, 2]), array([0, 1, 2, 3, 4, 5]), array([2, 5, 4])]
According to the docs, set operations all have O(1) complexity, therefore the overall is O(N)

Numpy array multiple mask

Trying to slice and average a numpy array multiple times, based on an integer mask array:
i.e.
import numpy as np
data = np.arange(11)
mask = np.array([0, 1, 1, 1, 0, 2, 2, 3, 3, 3, 3])
results = list()
for maskid in range(1,4):
result = np.average(data[mask==maskid])
results.append(result)
output = np.array(result)
Is there a way to do this faster, aka without the "for" loop?
One approach using np.bincount -
np.bincount(mask, data)/np.bincount(mask)
Another one with np.unique for a generic case when the elements in mask aren't necessarily sequential starting from 0 -
_,ids, count = np.unique(mask, return_inverse=1, return_counts=1)
out = np.bincount(ids, data)/count

Comparing two numpy arrays and removing elements

I have been going through several solutions, but I am not able to find a solution I need.
I have two numpy arrays. Let's take a small example here.
x = [1,2,3,4,5,6,7,8,9]
y = [3,4,5]
I want to compare x and y, and remove those values of x that are in y.
So I expect my final_x to be
final_x = [1,2,6,7,8,9]
I found out that np.in1d returns a boolean array the same length as x that is True where an element of x is in y and False otherwise. But how do I use it, if not any other method to get my final_x.??
If you really do have numpy arrays then you can use numpy.setdiff1d as below
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9])
y = np.array([3,4,5])
z = np.setdiff1d(x, y)
# array([1, 2, 6, 7, 8, 9])
Simply pass the negated version of boolean array returned by np.in1d to array x:
>>> x = np.array([1,2,3,4,5,6,7,8,9])
>>> y = [3,4,5]
>>> x[~np.in1d(x, y)]
array([1, 2, 6, 7, 8, 9])
You can use built-in sets:
final_x = set(x) - set(y)
and subtract the second from the first. You can convert final_x to a list or numpy.array if you feel so inclined.

Python numpy index set from Boolean array

How do I transform a Boolean array into an iterable of indexes?
E.g.,
import numpy as np
import itertools as it
x = np.array([1,0,1,1,0,0])
y = x > 0
retval = [i for i, y_i in enumerate(y) if y_i]
Is there a nicer way?
Try np.where or np.nonzero.
x = np.array([1, 0, 1, 1, 0, 0])
np.where(x)[0] # returns a tuple hence the [0], see help(np.where)
# array([0, 2, 3])
x.nonzero()[0] # in this case, the same as above.
See help(np.where) and help(np.nonzero).
Possibly worth noting that in the np.where page it mentions that for 1D x it's basically equivalent to your longform in the question.

A 3-D grid of regularly spaced points

I want to create a list containing the 3-D coords of a grid of regularly spaced points, each as a 3-element tuple. I'm looking for advice on the most efficient way to do this.
In C++ for instance, I simply loop over three nested loops, one for each coordinate. In Matlab, I would probably use the meshgrid function (which would do it in one command). I've read about meshgrid and mgrid in Python, and I've also read that using numpy's broadcasting rules is more efficient. It seems to me that using the zip function in combination with the numpy broadcast rules might be the most efficient way, but zip doesn't seem to be overloaded in numpy.
Use ndindex:
import numpy as np
ind=np.ndindex(3,3,2)
for i in ind:
print(i)
# (0, 0, 0)
# (0, 0, 1)
# (0, 1, 0)
# (0, 1, 1)
# (0, 2, 0)
# (0, 2, 1)
# (1, 0, 0)
# (1, 0, 1)
# (1, 1, 0)
# (1, 1, 1)
# (1, 2, 0)
# (1, 2, 1)
# (2, 0, 0)
# (2, 0, 1)
# (2, 1, 0)
# (2, 1, 1)
# (2, 2, 0)
# (2, 2, 1)
Instead of meshgrid and mgrid, you can use ogrid, which is a "sparse" version of mgrid. That is, only the dimension along which the values change are filled in. The others are simply broadcast. This uses much less memory for large grids than the non-sparse alternatives.
For example:
>>> import numpy as np
>>> x, y = np.ogrid[-1:2, -2:3]
>>> x
array([[-1],
[ 0],
[ 1]])
>>> y
array([[-2, -1, 0, 1, 2]])
>>> x**2 + y**2
array([[5, 2, 1, 2, 5],
[4, 1, 0, 1, 4],
[5, 2, 1, 2, 5]])
I would say go with meshgrid or mgrid, in particular if you need non-integer coordinates. I'm surprised that Numpy's broadcasting rules would be more efficient, as meshgrid was designed especially for the problem that you want to solve.
for multi-d (greater than 2) meshgrids, use numpy.lib.index_tricks.nd_grid like so:
import numpy
grid = numpy.lib.index_tricks.nd_grid()
g1 = grid[:3,:3,:3]
g2 = grid[0:1:0.5, 0:1, 0:2]
g3 = grid[0:1:3j, 0:1:2j, 0:2:2j]
where g1 has x values of [0,1,2]
and g2 has x values of [0,.5],
and g3 has x values of [0.0,0.5,1.0] (the 3j defining the step count instead of the step increment. see the documentation for more details.
Here's an efficient option similar to your C++ solution, which I've used for exactly the same purpose:
import numpy, itertools, collections
def grid(xmin, xmax, xstep, ymin, ymax, ystep, zmin, zmax, zstep):
"return nested tuples of grid-sampled coordinates that include maxima"
return collections.deque( itertools.product(
numpy.arange(xmin, xmax+xstep, xstep).tolist(),
numpy.arange(ymin, ymax+ystep, ystep).tolist(),
numpy.arange(zmin, zmax+zstep, zstep).tolist() ) )
Performance is best (in my tests) when using a.tolist(), as shown above, but you can use a.flat instead and drop the deque() to get an iterator that will sip memory. Of course, you can also use a plain old tuple() or list() instead of deque() for a slight performance penalty (again, in my tests).

Categories

Resources