Say I have a Numpy vector,
A = zeros(100)
and I divide it into subvectors by a list of breakpoints which index into A, for instance,
breaks = linspace(0, 100, 11, dtype=int)
So the i-th subvector would be lie between the indices breaks[i] (inclusive) and breaks[i+1] (exclusive).
The breaks are not necessarily equispaced, this is only an example.
However, they will always be strictly increasing.
Now I want to operate on these subvectors. For instance, if I want to set all elements of the i-th subvector to i, I might do:
for i in range(len(breaks) - 1):
A[breaks[i] : breaks[i+1]] = i
Or I might want to compute the subvector means:
b = empty(len(breaks) - 1)
for i in range(len(breaks) - 1):
b = A[breaks[i] : breaks[i+1]].mean()
And so on.
How can I avoid using for loops and instead vectorize these operations?
You can use simple np.cumsum -
import numpy as np
# Form zeros array of same size as input array and
# place ones at positions where intervals change
A1 = np.zeros_like(A)
A1[breaks[1:-1]] = 1
# Perform cumsum along it to create a staircase like array, as the final output
out = A1.cumsum()
Sample run -
In [115]: A
Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6])
In [116]: breaks
Out[116]: array([ 0, 4, 9, 11, 18, 20])
In [142]: out
Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)
If you want to have mean values of those subvectors from A, you can use np.bincount -
mean_vals = np.bincount(out, weights=A)/np.bincount(out)
If you are looking to extend this functionality and use a custom function instead, you might want to look into MATLAB's accumarray equivalent for Python/Numpy: numpy_groupies whose source code is available here.
There really isn't a single answer to your question, but several techniques that you can use as building blocks. Another one you may find helpful:
All numpy ufuncs have a .reduceat method, which you can use to your advantage for some of your calculations:
>>> a = np.arange(100)
>>> breaks = np.linspace(0, 100, 11, dtype=np.intp)
>>> counts = np.diff(breaks)
>>> counts
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
>>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float)
>>> sums
array([ 45., 145., 245., 345., 445., 545., 645., 745., 845., 945.])
>>> sums / counts # i.e. the mean
array([ 4.5, 14.5, 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5])
You could use np.repeat:
In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks))
Out[35]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9])
To compute arbitrary binned statistics you could use scipy.stats.binned_statistic:
import numpy as np
import scipy.stats as stats
breaks = np.linspace(0, 100, 11, dtype=int)
A = np.random.random(100)
means, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)
stats.binned_statistic can compute means, medians, counts, sums; or,
to compute an arbitrary statistics for each bin, you can pass a callable to the statistic parameter:
def func(values):
return values.mean()
funcmeans, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic=func, bins=breaks)
assert np.allclose(means, funcmeans)
Related
I am trying to implement message passing in graph neural nets. In each graph, there are edges and nodes and a node-to-edge update is implemented as follows:
Where the square brackets denote the concatenation operation, subscripts are indexes and the superscripts are time indexes.
So I am trying to concatenate 3 matrixes of dimensions: AxN, AxBxM, and BxN. And the resulting concatenation is of dimension: AxBx(2N+M). So every (i,j) of the resulting matrix is a concatenation of the ith row of the first matrix, jth row of the third matrix and the (i,j)th element of the second matrix. I managed to implement this in a double for loop as follows:
edge_in = torch.zeros(a, b, m + 2 * n)
edge_in = edge_in.cuda()
for i in range(a):
for j in range(b):
edge_in[i,j] = torch.cat((nodes_a_embeds[i], edge_embeds[i,j], nodes_b_embeds[j]))
However, this is excruciatingly slow. Is this in any way vectorizable? I tried to come up with a solution and then I looked for a solution online but couldn't manage to vectorize it. Thanks.
edit: numeratic example as requested:
First matrix: 5x3
Second matrix: 5x4x2
Third matrix: 4x3
Output should be 5x4x8 then. Let's call our output matrix R.
Then R(1,2) = concatenate(First(1),Second(1,2),Third(2)).
Would this be the correct implementation of your code?
import numpy as np
A = 2
B = 3
M = 4
N = 5
first = np.arange(A*N).reshape((A, N))
first = np.tile(first[:, np.newaxis, :], (1, B, 1))
second = np.arange(A*B*M).reshape((A, B, M))
third = np.arange(B*N).reshape((B, N))
third = np.tile(third[np.newaxis, :, :], (A, 1, 1))
result = np.concatenate((first, second, third), axis=2)
Output:
array([[[ 0, 1, 2, 3, 4, 0, 1, 2, 3, 0, 1, 2, 3, 4],
[ 0, 1, 2, 3, 4, 4, 5, 6, 7, 5, 6, 7, 8, 9],
[ 0, 1, 2, 3, 4, 8, 9, 10, 11, 10, 11, 12, 13, 14]],
[[ 5, 6, 7, 8, 9, 12, 13, 14, 15, 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9, 16, 17, 18, 19, 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9, 20, 21, 22, 23, 10, 11, 12, 13, 14]]])
Given any N-tuple of slices (aka N-D slice) in NumPy how to convert it to corresponding indexes of N-D array represented as tuple of 1D arrays (indexes along each axes)? E.g. if we have np.nd_slice_to_indexes next code:
import numpy as np
print(np.nd_slice_to_indexes(np.s_[1 : 3]))
print(np.nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
should print
(array([1, 2]),)
(array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
It is common for NumPy to represent indexes of N-D array as N-tuple of 1-D arrays of same length (each element of k-th array in tuple represents next index along k-th dimension). E.g. np.nonzero returns such N-tuple in code
print(np.nonzero([[0, 1, 1], [1, 1, 0]])) # Non-zero elements in 2D array.
# (array([0, 0, 1, 1], dtype=int64), array([1, 2, 0, 1], dtype=int64))
Same behavior should be achieved like in Pythonic function below, but in a more efficient (performant) way:
Try it online!
import numpy as np
def nd_slice_to_indexes(nd_slice):
assert type(nd_slice) in [tuple, slice], type(nd_slice)
if type(nd_slice) is not tuple:
nd_slice = (nd_slice,)
def iter_slices(slices):
if len(slices) == 0:
yield ()
else:
for i in range(slices[0].start, slices[0].stop, slices[0].step or 1):
for r in iter_slices(slices[1:]):
yield (i,) + r
*res, = np.vstack(list(iter_slices(nd_slice))).T
return tuple(res)
print(nd_slice_to_indexes(np.s_[1 : 3]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2, 8 : 14 : 3]))
# (array([1, 2]),)
# (array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
# (array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]), array([5, 5, 7, 7, 9, 9, 5, 5, 7, 7, 9, 9]), array([ 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11]))
Thanks to suggestion of #hpaulj solved task efficiently using np.mgrid.
Try it online!
import numpy as np
def nd_slice_to_indexes(nd_slice):
grid = np.mgrid[{tuple: nd_slice, slice: (nd_slice,)}[type(nd_slice)]]
return tuple(grid[i].ravel() for i in range(grid.shape[0]))
print(nd_slice_to_indexes(np.s_[1 : 3]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2, 8 : 14 : 3]))
# (array([1, 2]),)
# (array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
# (array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]), array([5, 5, 7, 7, 9, 9, 5, 5, 7, 7, 9, 9]), array([ 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11]))
I am trying to get the indices of unique elements of a numpy array (long vector of 3628621 elements).
However, I must do something wrong, because when I try to select the unique elements I am still finding duplicates:
Vector
Out[165]: array([712450, 714390, 718560, ..., 384390, 992041, 94852])
Loc = np.where(np.unique(Vector)) # Find indices of unique elements
Vector_New = Vector[Loc] # Create new vector with all unique elements
np.where(Vector_New == 173020) # See how often/where '173020' exists
Out[166]: (array([ 7098, 11581], dtype=int64),)
So, the integer '173020' exists still twice in the new vector, although I expected that all elements should be unique. The new vector is 11594 elements long.
Thanks for the help!
Regards,
Timen
np.unique has several parameters that can be activated and will give you the needed information. It's calling signature is:
np.unique(ar, return_index=False, return_inverse=False, return_counts=False)
read the docs.
In [50]: keys
Out[50]:
array([1, 3, 5, 2, 0, 7, 4, 7, 7, 2, 7, 5, 5, 3, 6, 2, 3, 5, 5, 5, 6, 9, 6,
5, 2, 1, 6, 6, 5, 9, 9, 6, 5, 5, 9, 9, 6, 3, 7, 0, 5, 1, 7, 6, 2, 4,
1, 0, 6, 5, 4, 8, 8, 4, 2, 1, 8, 3, 1, 9, 8, 4, 4, 2, 4, 7, 2, 6, 8,
6, 5, 2, 4, 9, 1, 5, 3, 1, 5, 6, 2, 2, 8, 4, 0, 4, 9, 0, 8, 1, 5, 3,
1, 3, 7, 1, 5, 8, 5, 8])
In [51]: np.unique(keys, return_counts=True, return_index=True)
Out[51]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([ 4, 0, 3, 1, 6, 2, 14, 5, 51, 21], dtype=int32),
array([ 5, 11, 11, 8, 10, 18, 12, 8, 9, 8]))
Can I use numpy to generate repeating patterns of indices for example.
0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14, 15
or
0,1,2,1,2,3,4,5,6,5,6,7
Is there a method in numpy i can use to generate these lists between a range ?
currently I am doing this using lists in python but I was curious if I could use numpy to speed things up.
I am not sure what methods to even look into other than numpy.arange.
Just to further clarify I am generating indices to triangles in opengl in various patterns.
so for traingles in a circle I have some code like this.
for fan_set in range(0, len(self.vertices) / vertex_length, triangle_count):
for i in range(fan_set + 1, fan_set + 8):
self.indices.append(fan_set)
self.indices.append(i)
self.indices.append(i + 1)
Your first example can be produced via numpy methods as:
In [860]: np.concatenate((np.zeros((3,1),int),np.arange(1,16).reshape(3,5)),axis=1).ravel()
Out[860]:
array([ 0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14,
15])
That's because I see this 2d repeated pattern
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 6, 7, 8, 9, 10],
[ 0, 11, 12, 13, 14, 15]])
The second pattern can be produced by ravel of this 2d array (produced by broadcasting 2 arrays):
In [863]: np.array([0,1,4,5])[:,None]+np.arange(3)
Out[863]:
array([[0, 1, 2],
[1, 2, 3],
[4, 5, 6],
[5, 6, 7]])
I can produce the 1st pattern with a variation on the 2nd (the initial column of 0s disrupts the pattern)
I=np.array([0,5,10])[:,None]+np.arange(0,6)
I[:,0]=0
I think your double loop can be expressed as a list comprehension as
In [872]: np.array([ [k,i,i+1] for k in range(0,1,1) for i in range(k+1,k+8)]).ravel()
Out[872]: array([0, 1, 2, 0, 2, 3, 0, 3, 4, 0, 4, 5, 0, 5, 6, 0, 6, 7, 0, 7, 8])
or without the ravel:
array([[0, 1, 2],
[0, 2, 3],
[0, 3, 4],
[0, 4, 5],
[0, 5, 6],
[0, 6, 7],
[0, 7, 8]])
though I don't know what parameters produce your examples.
I'm not sure I understand exactly what you mean, but the following is what I use to generate unique indices for 3D points;
def indexate(points):
"""
Convert a numpy array of points into a list of indices and an array of
unique points.
Arguments:
points: A numpy array of shape (N, 3).
Returns:
An array of indices and an (M, 3) array of unique points.
"""
pd = {}
indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
unique = np.array([i[1] for i in pt])
return np.array(indices, np.uint16), unique
You can find this code in my stltools package on github.
It works like this;
In [1]: import numpy as np
In [2]: points = np.array([[1,0,0], [0,0,1], [1,0,0], [0,1,0]])
In [3]: pd = {}
In [4]: indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
In [5]: indices
Out[5]: [0, 1, 0, 2]
In [6]: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
In [7]: pt
Out[7]: [(0, (1, 0, 0)), (1, (0, 0, 1)), (2, (0, 1, 0))]
In [8]: unique = np.array([i[1] for i in pt])
In [9]: unique
Out[9]:
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]])
The key point (if you'll pardon the pun) is to use a tuple of the point (because a tuple is immutable and thus hashable) as the key in a dictionary with the setdefault method, while the length of the dict is the value. In effect, the value is the first time this exact point was seen.
I am not 100% certain this is what you're after, I think you can achieve this using pair of range values and increment n times 3 (the gap between each group), then use numpy.concatenate to concatenate the final array, like this:
import numpy as np
def gen_list(n):
return np.concatenate([np.array(range(i, i+3) + range(i+1, i+4)) + i*3
for i in xrange(n)])
Usage:
gen_list(2)
Out[16]: array([0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7])
gen_list(3)
Out[17]:
array([ 0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7, 8, 9, 10, 9, 10,
11])
list(gen_list(2))
Out[18]: [0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7]
In my sample I only use n as how many groups you want to generate, you may change this to suit your triangle-ish requirements.
I have a numpy array, for example
a = np.arange(10)
how can I move the first n elements to the end of the array?
I found this roll function but it seems like it only does the opposite, which shifts the last n elements to the beginning.
Why not just roll with a negative number?
>>> import numpy as np
>>> a = np.arange(10)
>>> np.roll(a,2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> np.roll(a,-2)
array([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])
you can use negative shift
a = np.arange(10)
print(np.roll(a, 3))
print(np.roll(a, -3))
returns
[7, 8, 9, 0, 1, 2, 3, 4, 5, 6]
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2]