I am trying to implement message passing in graph neural nets. In each graph, there are edges and nodes and a node-to-edge update is implemented as follows:
Where the square brackets denote the concatenation operation, subscripts are indexes and the superscripts are time indexes.
So I am trying to concatenate 3 matrixes of dimensions: AxN, AxBxM, and BxN. And the resulting concatenation is of dimension: AxBx(2N+M). So every (i,j) of the resulting matrix is a concatenation of the ith row of the first matrix, jth row of the third matrix and the (i,j)th element of the second matrix. I managed to implement this in a double for loop as follows:
edge_in = torch.zeros(a, b, m + 2 * n)
edge_in = edge_in.cuda()
for i in range(a):
for j in range(b):
edge_in[i,j] = torch.cat((nodes_a_embeds[i], edge_embeds[i,j], nodes_b_embeds[j]))
However, this is excruciatingly slow. Is this in any way vectorizable? I tried to come up with a solution and then I looked for a solution online but couldn't manage to vectorize it. Thanks.
edit: numeratic example as requested:
First matrix: 5x3
Second matrix: 5x4x2
Third matrix: 4x3
Output should be 5x4x8 then. Let's call our output matrix R.
Then R(1,2) = concatenate(First(1),Second(1,2),Third(2)).
Would this be the correct implementation of your code?
import numpy as np
A = 2
B = 3
M = 4
N = 5
first = np.arange(A*N).reshape((A, N))
first = np.tile(first[:, np.newaxis, :], (1, B, 1))
second = np.arange(A*B*M).reshape((A, B, M))
third = np.arange(B*N).reshape((B, N))
third = np.tile(third[np.newaxis, :, :], (A, 1, 1))
result = np.concatenate((first, second, third), axis=2)
Output:
array([[[ 0, 1, 2, 3, 4, 0, 1, 2, 3, 0, 1, 2, 3, 4],
[ 0, 1, 2, 3, 4, 4, 5, 6, 7, 5, 6, 7, 8, 9],
[ 0, 1, 2, 3, 4, 8, 9, 10, 11, 10, 11, 12, 13, 14]],
[[ 5, 6, 7, 8, 9, 12, 13, 14, 15, 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9, 16, 17, 18, 19, 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9, 20, 21, 22, 23, 10, 11, 12, 13, 14]]])
Related
Here is a problem I'm trying to solve. Let's say we've a square array:
In [10]: arr
Out[10]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
What I'd like to have is to flatten this array in a specific order: first I want to flatten the lower triangle along axis-0 and then pick the diagonal, and finally flatten the upper triangle again along axis-0, which would finally give the flattened array as:
# | lower triangle |diag.elements| upper triangle |
res = np.array([5, 9, 13, 10, 14, 15, 1, 6, 11, 16, 2, 3, 7, 4, 8, 12])
Here is my partial solution so far, which doesn't give desired result yet.
In [16]: arr[np.tril(arr, k=-1) != 0]
Out[16]: array([ 5, 9, 10, 13, 14, 15]) # not correct!
In [17]: np.diag(arr)
Out[17]: array([ 1, 6, 11, 16])
In [18]: arr[np.triu(arr, k=1) != 0]
Out[18]: array([ 2, 3, 4, 7, 8, 12]) # not correct!
Finally, to concatenate these 3 intermediate results. How to correctly index to obtain desired result? Alternatively, are there other ways of solving this problem?
Here's one based on masking and concatenating/stacking -
In [50]: r = np.arange(len(arr))
In [51]: mask = r[:,None]<r
In [54]: np.concatenate((arr.T[mask],np.diag(arr),arr.T[mask.T]))
Out[54]: array([ 5, 9, 13, 10, 14, 15, 1, 6, 11, 16, 2, 3, 7, 4, 8, 12])
Another based solely on masking -
n = len(arr)
r = np.arange(n)
mask = r[:,None]<r
diag_mask = r[:,None]==r
comp_mask = np.vstack((mask[None],diag_mask[None],mask.T[None]))
out = np.broadcast_to(arr.T,(3,n,n))[comp_mask]
Use the transpose:
lower = np.tril(a, -1).T.ravel()
diag = np.diag(a)
upper = np.triu(a, 1).T.ravel()
result = np.concatenate([lower[lower != 0], diag, upper[upper != 0]])
print(result)
Output:
[ 5 9 13 10 14 15 1 6 11 16 2 3 7 4 8 12]
I am using index to select (numpy broadcast)
ary=ary.T
i,c=ary.shape
x=np.arange(i)
y=np.arange(c)
np.concatenate([ary[x[:,None]<y],ary[x[:,None]==y],ary[x[:,None]>y]])
Out[1065]: array([ 5, 9, 13, 10, 14, 15, 1, 6, 11, 16, 2, 3, 7, 4, 8, 12])
Imagine that you have created an array with 100 dimensions and then you calculate something and fill this array. for whatever reason, you have not created 2d array, what is wrong with this question that you want to assign another dimension to this data, with this justification that for example 250 samples should have this calculated data?!!
I have searched this but I could not find any solution. Maybe I am not searching with correct keyword!
Actually I want to reshape a numpy array of (100,) to (250,100).
I have read this link and a couple of other links but did not help me.
I have also tried this way:
numpyarray = (100,)
transformed_numpyarray = np.reshape(numpyarray,(100,-1)).T
which gives me this output:
(1, 100)
but I really do not want 1 as the first item of 2d array.
what Im trying to do is to either convert to (,100) or at least something like this (250,100). "250" is a constant number I know already so I want to say for example for 250 samples with 100 dimension.
Thanks.
I'm still confused about what you are trying to do. So far I can picture two alternatives - reshape and repeat. To illustrate:
In [148]: x = np.arange(16)
In [149]: x
Out[149]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
In [150]: x.reshape(4,4)
Out[150]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [151]: np.repeat(x[None,:], 4, axis=0)
Out[151]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]])
numpy's arrays are static sized, you can't have an array with a variable shape. If you don't know beforehand how many samples you will have you can gradually add them with vstack:
In [4]: numpyarray.shape
Out[4]: (3, 4)
In [5]: new_sample.shape
Out[5]: (4,)
In [6]: numpyarray = np.vstack([numpyarray, new_sample])
In [7]: numpyarray.shape
Out[7]: (4, 4)
you can also first define the size by creating an array full of zeros and then progressively fill it with samples.
numpyarray = np.zeros((250,100))
...
numpyarray[i] = new_sample
I'm struggling to perform the below operation on a numpy vector.
I want to take previous_n samples from vector finishing at indices.
It's like I want to perform a np.take with slicing of the previous_n samples.
Example:
import numpy as np
vector = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
# number of previous samples
previous_n = 3
indices = np.array([ 5, 7, 12])
result
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])
Ok, this seems to do what I want. Found here
def stack_slices(arr, previous_n, indices):
all_idx = indices[:, None] + np.arange(previous_n) - (previous_n - 1)
return arr[all_idx]
>>> stack_slices(vector, 3, indices)
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])
Suppose I have a 2-dimensional numpy array of shape n X m (where n is large number and m >=1 ). Each column represents one attribute. An example for n=5, m=3 is provided below:
[[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]]
I want to train my model on the history of attributes with history_steps = p(1< p <= n). For p=2, the output I expect (of shape (n-p+1 X m*p)) is
[[1,4,2,5,3,6],
[4,7,5,8,6,9],
[7,10,8,11,9,12],
[10,13,11,14,12,15]]
I tried to implement this in pandas by separating columns and then concatenating outputs.
def buff(s, n):
return (pd.concat([s.shift(-i) for i in range(n)], axis=1).dropna().astype(float))
But, for my purposes a numpy based approach will be better. Also, I would like to avoid splitting and concatenating.
How do I go about doing this?
Here's a NumPy based approach with focus on performance using np.lib.stride_tricks.as_strided -
def strided_axis0(a, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
nrows = m - L + 1
strided = np.lib.stride_tricks.as_strided
# Finally use strides to get the 3D array view and then reshape
return strided(a, shape=(nrows,n,L), strides=(s0,s1,s0)).reshape(nrows,-1)
Sample run -
In [27]: a
Out[27]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
In [28]: strided_axis0(a, L=2)
Out[28]:
array([[ 1, 4, 2, 5, 3, 6],
[ 4, 7, 5, 8, 6, 9],
[ 7, 10, 8, 11, 9, 12],
[10, 13, 11, 14, 12, 15]])
You can use dstack + reshape:
a = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]])
# use `dstack` to stack the two arrays(one with last row removed, the other with first
# row removed), along the third axis, and then use reshape to flatten the second and third
# dimensions
np.dstack([a[:-1], a[1:]]).reshape(a.shape[0]-1, -1)
#array([[ 1, 4, 2, 5, 3, 6],
# [ 4, 7, 5, 8, 6, 9],
# [ 7, 10, 8, 11, 9, 12],
# [10, 13, 11, 14, 12, 15]])
To generalize to arbitrary p, use a list comprehension to generate a list of shifted arrays and then do stack+reshape:
n, m = a.shape
p = 3
np.dstack([a[i:(n-p+i+1)] for i in range(p)]).reshape(n-p+1, -1)
#array([[ 1, 4, 7, 2, 5, 8, 3, 6, 9],
# [ 4, 7, 10, 5, 8, 11, 6, 9, 12],
# [ 7, 10, 13, 8, 11, 14, 9, 12, 15]])
Say I have a Numpy vector,
A = zeros(100)
and I divide it into subvectors by a list of breakpoints which index into A, for instance,
breaks = linspace(0, 100, 11, dtype=int)
So the i-th subvector would be lie between the indices breaks[i] (inclusive) and breaks[i+1] (exclusive).
The breaks are not necessarily equispaced, this is only an example.
However, they will always be strictly increasing.
Now I want to operate on these subvectors. For instance, if I want to set all elements of the i-th subvector to i, I might do:
for i in range(len(breaks) - 1):
A[breaks[i] : breaks[i+1]] = i
Or I might want to compute the subvector means:
b = empty(len(breaks) - 1)
for i in range(len(breaks) - 1):
b = A[breaks[i] : breaks[i+1]].mean()
And so on.
How can I avoid using for loops and instead vectorize these operations?
You can use simple np.cumsum -
import numpy as np
# Form zeros array of same size as input array and
# place ones at positions where intervals change
A1 = np.zeros_like(A)
A1[breaks[1:-1]] = 1
# Perform cumsum along it to create a staircase like array, as the final output
out = A1.cumsum()
Sample run -
In [115]: A
Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6])
In [116]: breaks
Out[116]: array([ 0, 4, 9, 11, 18, 20])
In [142]: out
Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)
If you want to have mean values of those subvectors from A, you can use np.bincount -
mean_vals = np.bincount(out, weights=A)/np.bincount(out)
If you are looking to extend this functionality and use a custom function instead, you might want to look into MATLAB's accumarray equivalent for Python/Numpy: numpy_groupies whose source code is available here.
There really isn't a single answer to your question, but several techniques that you can use as building blocks. Another one you may find helpful:
All numpy ufuncs have a .reduceat method, which you can use to your advantage for some of your calculations:
>>> a = np.arange(100)
>>> breaks = np.linspace(0, 100, 11, dtype=np.intp)
>>> counts = np.diff(breaks)
>>> counts
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
>>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float)
>>> sums
array([ 45., 145., 245., 345., 445., 545., 645., 745., 845., 945.])
>>> sums / counts # i.e. the mean
array([ 4.5, 14.5, 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5])
You could use np.repeat:
In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks))
Out[35]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9])
To compute arbitrary binned statistics you could use scipy.stats.binned_statistic:
import numpy as np
import scipy.stats as stats
breaks = np.linspace(0, 100, 11, dtype=int)
A = np.random.random(100)
means, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)
stats.binned_statistic can compute means, medians, counts, sums; or,
to compute an arbitrary statistics for each bin, you can pass a callable to the statistic parameter:
def func(values):
return values.mean()
funcmeans, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic=func, bins=breaks)
assert np.allclose(means, funcmeans)