Pytorch Geometric sparse adjacency matrix to edge index tensor

Pytorch Geometric sparse adjacency matrix to edge index tensor - python

My data object has the data.adj_t parameter, giving me the sparse adjacency matrix. How can I get the edge_index tensor of size [2, num_edges] from this?

As you can see in the docs:
Since this feature is still experimental, some operations, e.g., graph pooling methods, may still require you to input the edge_index format. You can convert adj_t back to (edge_index, edge_attr) via:
row, col, edge_attr = adj_t.t().coo()
edge_index = torch.stack([row, col], dim=0)

You can use torch_geometric.utils.convert.from_scipy_sparse_matrix.
>>> from torch_geometric.utils.convert import from_scipy_sparse_matrix
>>> edge_index = torch.tensor([
... [0, 1, 1, 2, 2, 3],
... [1, 0, 2, 1, 3, 2],
>>> ])
>>> adj = to_scipy_sparse_matrix(edge_index)
>>> # `edge_index` and `edge_weight` are both returned
>>> from_scipy_sparse_matrix(adj)
(tensor([[0, 1, 1, 2, 2, 3],
[1, 0, 2, 1, 3, 2]]),
tensor([1., 1., 1., 1., 1., 1.]))

Related

using tf.where() to select 3d tensor by 2d conditions & replacing elements in a 2d indices with keys and values

There are 2 questions in the title. I am confused by both questions because tensorflow is such a static programming language (I really want to go back to either pytorch or chainer).
I give 2 examples. please answer me in tensorflow codes or providing the relevant function links.
1) tf.where()
data0 = tf.zeros([2, 3, 4], dtype = tf.float32)
data1 = tf.ones([2, 3, 4], dtype = tf.float32)
cond = tf.constant([[0, 1, 1], [1, 0, 0]])
# cond.shape == (2, 3)
# tf.where() works for 1d condition with 2d data,
# but not for 2d indices with 3d tensor
# currently, what I am doing is:
# cond = tf.stack([cond] * 4, 2)
data = tf.where(cond > 0, data1, data0)
# data should be [[0., 1., 1.], [1., 0., 0.]]
(I don't know how to broadcast cond to 3d tensor)
2) change element in 2d tensor
# all dtype == tf.int64
t2d = tf.Variable([[0, 1, 2], [3, 4, 5]])
k, v = tf.constant([[0, 2], [1, 0]]), tf.constant([-2, -3])
# TODO: change values at positions k to v
# I cannot do [t2d.copy()[i] = j for i, j in k, v]
t3d == [[[0, 1, -2], [3, 4, 5]],
[[0, 1, 2], [-3, 4, 5]]]
Thank you so much in advance. XD

This are two quite different questions, and they should probably have been posted as such, but anyway.
1)
Yes, you need to manually broadcast all the inputs to [tf.where](https://www.tensorflow.org/api_docs/python/tf/where] if they are different. For what is worth, there is an (old) open issue about it, but so far implicit broadcasting it has not been implemented. You can use tf.stack like you suggest, although tf.tile would probably be more obvious (and may save memory, although I'm not sure how it is implemented really):
cond = tf.tile(tf.expand_dims(cond, -1), (1, 1, 4))
Or simply with tf.broadcast_to:
cond = tf.broadcast_to(tf.expand_dims(cond, -1), tf.shape(data1))
2)
This is one way to do that:
import tensorflow as tf
t2d = tf.constant([[0, 1, 2], [3, 4, 5]])
k, v = tf.constant([[0, 2], [1, 0]]), tf.constant([-2, -3])
# Tile t2d
n = tf.shape(k)[0]
t2d_tile = tf.tile(tf.expand_dims(t2d, 0), (n, 1, 1))
# Add aditional coordinate to index
idx = tf.concat([tf.expand_dims(tf.range(n), 1), k], axis=1)
# Make updates tensor
s = tf.shape(t2d_tile)
t2d_upd = tf.scatter_nd(idx, v, s)
# Make updates mask
upd_mask = tf.scatter_nd(idx, tf.ones_like(v, dtype=tf.bool), s)
# Make final tensor
t3d = tf.where(upd_mask, t2d_upd, t2d_tile)
# Test
with tf.Session() as sess:
print(sess.run(t3d))
Output:
[[[ 0 1 -2]
[ 3 4 5]]
[[ 0 1 2]
[-3 4 5]]]

Numpy to assess letters as integers

In 3d arrays, I know I can transform integers with the following:
import numpy as np
x = np.array([[0,1,1,2], [0, 5, 0, 0], [2, 0,3,3]], np.int32)
x[x==0] = 99
print (x[0:])
[[99 1 1 2]
[99 5 99 99]
[ 2 99 3 3]]
Is there a way to input strings into matrices? For example something like this?
import numpy as np
x = np.array([[0,1,1,2], [0, 5, 0, 0], [2, 0, 3, 3]], np.int32)
x[x==0] = int('x') ## This might be something like str('x), but I want it to
## equal 0
print (x[0:])
[[x 1 1 2]
[x 5 x x]
[2 x 3 3]]

What you're asking for can be done, but only by changing from an array of numbers to an array of object—that is, general-anything-at-all values:
>>> x = np.array([[0,1,1,2], [0, 5, 0, 0], [2, 0,3,3]], dtype=object)
>>> x[x==0] = 'x'
>>> x
array([['x', 1, 1, 2],
['x', 5, 'x', 'x'],
[2, 'x', 3, 3]], dtype=object)
But this is probably not what you want. Most of the speed and space savings of numpy come from the fact that arrays have a specific data type; if you use the generic object, they're no smaller, and not much faster, than just using a list of lists. If you're only using numpy for syntactic convenience rather than space or speed benefits, that may be fine, but it's definitely something to think about before you do it.
Also, an array of objects follows the usual Python rules for any arithmetic on those objects. For example:
>>> x+2
TypeError: must be str, not int
That doesn't seem very useful.
If you're looking for a special "marker value" that prevents you from accidentally thinking some value is meaningful when it actually wasn't, you can do that with floats, although not with ints, by using nan:
>>> x = np.array([[0,1,1,2], [0, 5, 0, 0], [2, 0,3,3]], dtype=np.float64)
>>> x[x==0] = np.nan
>>> x
array([[nan, 1., 1., 2.],
[nan, 5., nan, nan],
[ 2., nan, 3., 3.]])
A nan can be stored in a float64 slot, so you still have all the space and speed benefits of a fixed-type array. And (by default), operations on nan don't raise an exception, they just return nan. So:
>>> x+2
array([[nan, 3., 3., 4.],
[nan, 7., nan, nan],
[ 4., nan, 5., 5.]])
In some cases, it may be even better to just leave the array alone and operate on the array with a mask:
>>> x = np.array([[0,1,1,2], [0, 5, 0, 0], [2, 0,3,3]], dtype=np.int64)
>>> x[x!=0] += 2
>>> x
array([[0, 3, 3, 4],
[0, 7, 0, 0],
[4, 0, 5, 5]])
Or, if your indices don't matter, only your values, you can even do this (the simplest version, ignoring even axes):
>>> x = np.array([[0,1,1,2], [0, 5, 0, 0], [2, 0,3,3]], dtype=np.int64)
>>> y = x[x!=0]
>>> y+2
array([3, 3, 4, 7, 4, 5, 5])

Duplicate array dimension with numpy (without np.repeat)

I'd like to duplicate a numpy array dimension, but in a way that the sum of the original and the duplicated dimension array are still the same. For instance consider a n x m shape array (a) which I'd like to convert to a n x n x m (b) array, so that a[i,j] == b[i,i,j]. Unfortunately np.repeat and np.resize are not suitable for this job. Is there another numpy function I could use or is this possible with some creative indexing?
>>> import numpy as np
>>> a = np.asarray([1, 2, 3])
>>> a
array([1, 2, 3])
>>> a.shape
(3,)
# This is not what I want...
>>> np.resize(a, (3, 3))
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
In the above example, I would like to get this result:
array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])

From 1d to 2d array, you can use the np.diagflat method, which Create a two-dimensional array with the flattened input as a diagonal:
import numpy as np
a = np.asarray([1, 2, 3])
np.diagflat(a)
#array([[1, 0, 0],
# [0, 2, 0],
# [0, 0, 3]])
More generally, you can create a zeros array and assign values in place with advanced indexing:
a = np.asarray([[1, 2, 3], [4, 5, 6]])
result = np.zeros((a.shape[0],) + a.shape)
idx = np.arange(a.shape[0])
result[idx, idx, :] = a
result
#array([[[ 1., 2., 3.],
# [ 0., 0., 0.]],
# [[ 0., 0., 0.],
# [ 4., 5., 6.]]])

How do I add an extra column to a NumPy array?

Given the following 2D array:
a = np.array([
[1, 2, 3],
[2, 3, 4],
])
I want to add a column of zeros along the second axis to get:
b = np.array([
[1, 2, 3, 0],
[2, 3, 4, 0],
])

np.r_[ ... ] and np.c_[ ... ]
are useful alternatives to vstack and hstack,
with square brackets [] instead of round ().
A couple of examples:
: import numpy as np
: N = 3
: A = np.eye(N)
: np.c_[ A, np.ones(N) ] # add a column
array([[ 1., 0., 0., 1.],
[ 0., 1., 0., 1.],
[ 0., 0., 1., 1.]])
: np.c_[ np.ones(N), A, np.ones(N) ] # or two
array([[ 1., 1., 0., 0., 1.],
[ 1., 0., 1., 0., 1.],
[ 1., 0., 0., 1., 1.]])
: np.r_[ A, [A[1]] ] # add a row
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.]])
: # not np.r_[ A, A[1] ]
: np.r_[ A[0], 1, 2, 3, A[1] ] # mix vecs and scalars
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
: np.r_[ A[0], [1, 2, 3], A[1] ] # lists
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
: np.r_[ A[0], (1, 2, 3), A[1] ] # tuples
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
: np.r_[ A[0], 1:4, A[1] ] # same, 1:4 == arange(1,4) == 1,2,3
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
(The reason for square brackets [] instead of round ()
is that Python expands e.g. 1:4 in square --
the wonders of overloading.)

I think a more straightforward solution and faster to boot is to do the following:
import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a
And timings:
In [23]: N = 10
In [24]: a = np.random.rand(N,N)
In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop
In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop

Use numpy.append:
>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
>>> z = np.zeros((2,1), dtype=int64)
>>> z
array([[0],
[0]])
>>> np.append(a, z, axis=1)
array([[1, 2, 3, 0],
[2, 3, 4, 0]])

One way, using hstack, is:
b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))

I was also interested in this question and compared the speed of
numpy.c_[a, a]
numpy.stack([a, a]).T
numpy.vstack([a, a]).T
numpy.ascontiguousarray(numpy.stack([a, a]).T)
numpy.ascontiguousarray(numpy.vstack([a, a]).T)
numpy.column_stack([a, a])
numpy.concatenate([a[:,None], a[:,None]], axis=1)
numpy.concatenate([a[None], a[None]], axis=0).T
which all do the same thing for any input vector a. Timings for growing a:
Note that all non-contiguous variants (in particular stack/vstack) are eventually faster than all contiguous variants. column_stack (for its clarity and speed) appears to be a good option if you require contiguity.
Code to reproduce the plot:
import numpy as np
import perfplot
b = perfplot.bench(
setup=np.random.rand,
kernels=[
lambda a: np.c_[a, a],
lambda a: np.ascontiguousarray(np.stack([a, a]).T),
lambda a: np.ascontiguousarray(np.vstack([a, a]).T),
lambda a: np.column_stack([a, a]),
lambda a: np.concatenate([a[:, None], a[:, None]], axis=1),
lambda a: np.ascontiguousarray(np.concatenate([a[None], a[None]], axis=0).T),
lambda a: np.stack([a, a]).T,
lambda a: np.vstack([a, a]).T,
lambda a: np.concatenate([a[None], a[None]], axis=0).T,
],
labels=[
"c_",
"ascont(stack)",
"ascont(vstack)",
"column_stack",
"concat",
"ascont(concat)",
"stack (non-cont)",
"vstack (non-cont)",
"concat (non-cont)",
],
n_range=[2 ** k for k in range(23)],
xlabel="len(a)",
)
b.save("out.png")

I find the following most elegant:
b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3
An advantage of insert is that it also allows you to insert columns (or rows) at other places inside the array. Also instead of inserting a single value you can easily insert a whole vector, for instance duplicate the last column:
b = np.insert(a, insert_index, values=a[:,2], axis=1)
Which leads to:
array([[1, 2, 3, 3],
[2, 3, 4, 4]])
For the timing, insert might be slower than JoshAdel's solution:
In [1]: N = 10
In [2]: a = np.random.rand(N,N)
In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1))))
100000 loops, best of 3: 7.5 µs per loop
In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 2.17 µs per loop
In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
100000 loops, best of 3: 10.2 µs per loop

I think:
np.column_stack((a, zeros(shape(a)[0])))
is more elegant.

Assuming M is a (100,3) ndarray and y is a (100,) ndarray append can be used as follows:
M=numpy.append(M,y[:,None],1)
The trick is to use
y[:, None]
This converts y to a (100, 1) 2D array.
M.shape
now gives
(100, 4)

np.concatenate also works
>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
>>> z = np.zeros((2,1))
>>> z
array([[ 0.],
[ 0.]])
>>> np.concatenate((a, z), axis=1)
array([[ 1., 2., 3., 0.],
[ 2., 3., 4., 0.]])

Add an extra column to a numpy array:
Numpy's np.append method takes three parameters, the first two are 2D numpy arrays and the 3rd is an axis parameter instructing along which axis to append:
import numpy as np
x = np.array([[1,2,3], [4,5,6]])
print("Original x:")
print(x)
y = np.array([[1], [1]])
print("Original y:")
print(y)
print("x appended to y on axis of 1:")
print(np.append(x, y, axis=1))
Prints:
Original x:
[[1 2 3]
[4 5 6]]
Original y:
[[1]
[1]]
y appended to x on axis of 1:
[[1 2 3 1]
[4 5 6 1]]

np.insert also serves the purpose.
matA = np.array([[1,2,3],
[2,3,4]])
idx = 3
new_col = np.array([0, 0])
np.insert(matA, idx, new_col, axis=1)
array([[1, 2, 3, 0],
[2, 3, 4, 0]])
It inserts values, here new_col, before a given index, here idx along one axis. In other words, the newly inserted values will occupy the idx column and move what were originally there at and after idx backward.

I like JoshAdel's answer because of the focus on performance. A minor performance improvement is to avoid the overhead of initializing with zeros, only to be overwritten. This has a measurable difference when N is large, empty is used instead of zeros, and the column of zeros is written as a separate step:
In [1]: import numpy as np
In [2]: N = 10000
In [3]: a = np.ones((N,N))
In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
1 loops, best of 3: 492 ms per loop
In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
1 loops, best of 3: 407 ms per loop

A bit late to the party, but nobody posted this answer yet, so for the sake of completeness: you can do this with list comprehensions, on a plain Python array:
source = a.tolist()
result = [row + [0] for row in source]
b = np.array(result)

For me, the next way looks pretty intuitive and simple.
zeros = np.zeros((2,1)) #2 is a number of rows in your array.
b = np.hstack((a, zeros))

In my case, I had to add a column of ones to a NumPy array
X = array([ 6.1101, 5.5277, ... ])
X.shape => (97,)
X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)
After
X.shape => (97, 2)
array([[ 1. , 6.1101],
[ 1. , 5.5277],
...

There is a function specifically for this. It is called numpy.pad
a = np.array([[1,2,3], [2,3,4]])
b = np.pad(a, ((0, 0), (0, 1)), mode='constant', constant_values=0)
print b
>>> array([[1, 2, 3, 0],
[2, 3, 4, 0]])
Here is what it says in the docstring:
Pads an array.
Parameters
----------
array : array_like of rank N
Input array
pad_width : {sequence, array_like, int}
Number of values padded to the edges of each axis.
((before_1, after_1), ... (before_N, after_N)) unique pad widths
for each axis.
((before, after),) yields same before and after pad for each axis.
(pad,) or int is a shortcut for before = after = pad width for all
axes.
mode : str or function
One of the following string values or a user supplied function.
'constant'
Pads with a constant value.
'edge'
Pads with the edge values of array.
'linear_ramp'
Pads with the linear ramp between end_value and the
array edge value.
'maximum'
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
<function>
Padding function, see Notes.
stat_length : sequence or int, optional
Used in 'maximum', 'mean', 'median', and 'minimum'. Number of
values at edge of each axis used to calculate the statistic value.
((before_1, after_1), ... (before_N, after_N)) unique statistic
lengths for each axis.
((before, after),) yields same before and after statistic lengths
for each axis.
(stat_length,) or int is a shortcut for before = after = statistic
length for all axes.
Default is ``None``, to use the entire axis.
constant_values : sequence or int, optional
Used in 'constant'. The values to set the padded values for each
axis.
((before_1, after_1), ... (before_N, after_N)) unique pad constants
for each axis.
((before, after),) yields same before and after constants for each
axis.
(constant,) or int is a shortcut for before = after = constant for
all axes.
Default is 0.
end_values : sequence or int, optional
Used in 'linear_ramp'. The values used for the ending value of the
linear_ramp and that will form the edge of the padded array.
((before_1, after_1), ... (before_N, after_N)) unique end values
for each axis.
((before, after),) yields same before and after end values for each
axis.
(constant,) or int is a shortcut for before = after = end value for
all axes.
Default is 0.
reflect_type : {'even', 'odd'}, optional
Used in 'reflect', and 'symmetric'. The 'even' style is the
default with an unaltered reflection around the edge value. For
the 'odd' style, the extented part of the array is created by
subtracting the reflected values from two times the edge value.
Returns
-------
pad : ndarray
Padded array of rank equal to `array` with shape increased
according to `pad_width`.
Notes
-----
.. versionadded:: 1.7.0
For an array with rank greater than 1, some of the padding of later
axes is calculated from padding of previous axes. This is easiest to
think about with a rank 2 array where the corners of the padded array
are calculated by using padded values from the first axis.
The padding function, if used, should return a rank 1 array equal in
length to the vector argument with padded values replaced. It has the
following signature::
padding_func(vector, iaxis_pad_width, iaxis, kwargs)
where
vector : ndarray
A rank 1 array already padded with zeros. Padded values are
vector[:pad_tuple[0]] and vector[-pad_tuple[1]:].
iaxis_pad_width : tuple
A 2-tuple of ints, iaxis_pad_width[0] represents the number of
values padded at the beginning of vector where
iaxis_pad_width[1] represents the number of values padded at
the end of vector.
iaxis : int
The axis currently being calculated.
kwargs : dict
Any keyword arguments the function requires.
Examples
--------
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2,3), 'constant', constant_values=(4, 6))
array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])
>>> np.pad(a, (2, 3), 'edge')
array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5])
>>> np.pad(a, (2, 3), 'linear_ramp', end_values=(5, -4))
array([ 5, 3, 1, 2, 3, 4, 5, 2, -1, -4])
>>> np.pad(a, (2,), 'maximum')
array([5, 5, 1, 2, 3, 4, 5, 5, 5])
>>> np.pad(a, (2,), 'mean')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])
>>> np.pad(a, (2,), 'median')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])
>>> a = [[1, 2], [3, 4]]
>>> np.pad(a, ((3, 2), (2, 3)), 'minimum')
array([[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[3, 3, 3, 4, 3, 3, 3],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1]])
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2, 3), 'reflect')
array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2])
>>> np.pad(a, (2, 3), 'reflect', reflect_type='odd')
array([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> np.pad(a, (2, 3), 'symmetric')
array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3])
>>> np.pad(a, (2, 3), 'symmetric', reflect_type='odd')
array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7])
>>> np.pad(a, (2, 3), 'wrap')
array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3])
>>> def pad_with(vector, pad_width, iaxis, kwargs):
... pad_value = kwargs.get('padder', 10)
... vector[:pad_width[0]] = pad_value
... vector[-pad_width[1]:] = pad_value
... return vector
>>> a = np.arange(6)
>>> a = a.reshape((2, 3))
>>> np.pad(a, 2, pad_with)
array([[10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10],
[10, 10, 0, 1, 2, 10, 10],
[10, 10, 3, 4, 5, 10, 10],
[10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10]])
>>> np.pad(a, 2, pad_with, padder=100)
array([[100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100],
[100, 100, 0, 1, 2, 100, 100],
[100, 100, 3, 4, 5, 100, 100],
[100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100]])

I liked this:
new_column = np.zeros((len(a), 1))
b = np.block([a, new_column])

How to add items into a numpy array

I need to accomplish the following task:
from:
a = array([[1,3,4],[1,2,3]...[1,2,1]])
(add one element to each row) to:
a = array([[1,3,4,x],[1,2,3,x]...[1,2,1,x]])
I have tried doing stuff like a[n] = array([1,3,4,x])
but numpy complained of shape mismatch. I tried iterating through a and appending element x to each item, but the changes are not reflected.
Any ideas on how I can accomplish this?

Appending data to an existing array is a natural thing to want to do for anyone with python experience. However, if you find yourself regularly appending to large arrays, you'll quickly discover that NumPy doesn't easily or efficiently do this the way a python list will. You'll find that every "append" action requires re-allocation of the array memory and short-term doubling of memory requirements. So, the more general solution to the problem is to try to allocate arrays to be as large as the final output of your algorithm. Then perform all your operations on sub-sets (slices) of that array. Array creation and destruction should ideally be minimized.
That said, It's often unavoidable and the functions that do this are:
for 2-D arrays:
np.hstack
np.vstack
np.column_stack
np.row_stack
for 3-D arrays (the above plus):
np.dstack
for N-D arrays:
np.concatenate

import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
b = np.array([10,20,30])
c = np.hstack((a, np.atleast_2d(b).T))
returns c:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 20],
[ 1, 2, 1, 30]])

One way to do it (may not be the best) is to create another array with the new elements and do column_stack. i.e.
>>>a = array([[1,3,4],[1,2,3]...[1,2,1]])
[[1 3 4]
[1 2 3]
[1 2 1]]
>>>b = array([1,2,3])
>>>column_stack((a,b))
array([[1, 3, 4, 1],
[1, 2, 3, 2],
[1, 2, 1, 3]])

Appending a single scalar could be done a bit easier as already shown (and also without converting to float) by expanding the scalar to a python-list-type:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack ((a, [[x]] * len (a) ))
returns b as:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 10],
[ 1, 2, 1, 10]])
Appending a row could be done by:
c = np.vstack ((a, [x] * len (a[0]) ))
returns c as:
array([[ 1, 3, 4],
[ 1, 2, 3],
[ 1, 2, 1],
[10, 10, 10]])

np.insert can also be used for the purpose
import numpy as np
a = np.array([[1, 3, 4],
[1, 2, 3],
[1, 2, 1]])
x = 5
index = 3 # the position for x to be inserted before
np.insert(a, index, x, axis=1)
array([[1, 3, 4, 5],
[1, 2, 3, 5],
[1, 2, 1, 5]])
index can also be a list/tuple
>>> index = [1, 1, 3] # equivalently (1, 1, 3)
>>> np.insert(a, index, x, axis=1)
array([[1, 5, 5, 3, 4, 5],
[1, 5, 5, 2, 3, 5],
[1, 5, 5, 2, 1, 5]])
or a slice
>>> index = slice(0, 3)
>>> np.insert(a, index, x, axis=1)
array([[5, 1, 5, 3, 5, 4],
[5, 1, 5, 2, 5, 3],
[5, 1, 5, 2, 5, 1]])

If x is just a single scalar value, you could try something like this to ensure the correct shape of the array that is being appended/concatenated to the rightmost column of a:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack((a,x*np.ones((a.shape[0],1))))
returns b as:
array([[ 1., 3., 4., 10.],
[ 1., 2., 3., 10.],
[ 1., 2., 1., 10.]])

target = []
for line in a.tolist():
new_line = line.append(X)
target.append(new_line)
return array(target)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pytorch Geometric sparse adjacency matrix to edge index tensor - python

My data object has the data.adj_t parameter, giving me the sparse adjacency matrix. How can I get the edge_index tensor of size [2, num_edges] from this?

Related

using tf.where() to select 3d tensor by 2d conditions & replacing elements in a 2d indices with keys and values

Numpy to assess letters as integers

Duplicate array dimension with numpy (without np.repeat)

How do I add an extra column to a NumPy array?

How to add items into a numpy array

Categories

Resources