derive path from adjacency matrix with numpy operations - python

I need to derive a path from an adjacency matrix in a fast way (I have 40000 points).
If a is the adjacency matrix:
a = array([[0., 0., 1., 0., 1.],
[0., 0., 1., 1., 0.],
[1., 1., 0., 0., 0.],
[0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0.]])
then I want to get:
path(a) = [0, 2, 1, 3, 4]
For now, I am using a while loop to get the path, but it is slow:
def create_path_from_joins(joins):
# not assuming the path is connected
i = 0
path = [i]
elems = np.where(joins[i] == 1)
elems = elems[0].tolist()
join_to = set(elems) - set(path)
while len(join_to) > 0:
# choose the one that is not already in the path
elem = list(join_to)[0]
path.append(elem)
i = elem
elems = np.where(np.array(joins[i]) == 1)
elems = elems[0].tolist()
join_to = set(elems) - set(path)
return path
So I wanted to know if this can be done with matrix operations somehow in order to make it faster.
Thanks.

You might remove the visited destinations from the adjacency matrix. It depends however which constraints you give on how building the path.
A = np.array([[0., 0., 1., 0., 1.],
[0., 0., 1., 1., 0.],
[1., 1., 0., 0., 0.],
[0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0.]], dtype=int) # int, so later we can test wether A==0
B = A.copy() # auxiliary matrix to work with
mx, my = B.shape
#istart = np.random.randint(mx) # chose a random starting point
istart = 0 # OR chose zero to start with
print("starting point is", istart)
path = []
path.append(istart)
ended = False
while not ended:
inew = (B[istart]!=0).argmax(axis=0) # find the next point as first element in the row that is not zero
B[istart, inew] = 0
B[inew, istart] = 0 # remove this path element since it has been visited now
ended = np.all(B==0) # check wether B == 0
if ended: break
path.append(inew) # insert the visited point into the path list
istart = inew # prepare next step
print('path', path)

Assuming you want the path between 0 and the last node pointing back to 0, you can use networkx.
a = np.array([[0., 0., 1., 0., 1.],
[0., 0., 1., 1., 0.],
[1., 1., 0., 0., 0.],
[0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0.]])
import networkx as nx
# convert adjacency matrix to graph
G = nx.from_numpy_matrix(a)
# or
#G = nx.from_edgelist(zip(*np.nonzero(a)))
# get last node pointing back to 0
last_node = np.nonzero(a[0])[0][-1] # last_node = 4
# add greater weight to the edge between the last node and 0
G[0][last_node]['weight'] = 2
# get minimum spanning tree
T = nx.minimum_spanning_tree(G)
# get path between 0 and last node
path = next(nx.all_simple_paths(T, source=0, target=last_edge))
Output:
[0, 2, 1, 3, 4]
Graph before and after conversion to minimum spanning tree:
alternative with random start/end (here first 1)
import networkx as nx
# convert adjacency matrix to graph
G = nx.from_numpy_matrix(a)
# get a random edge to define start/end
start, end = next(zip(*np.nonzero(a)))
# remove it
G.remove_edge(start, end)
# get path between start and end
path = next(nx.all_simple_paths(G, source=start, target=end))
Output: [0, 4, 3, 1, 2]

Related

Permutation sign for batched vectors - Python

I would like to find the permutation parity sign for a given batch of vectors (in Python /Jax).
n = jnp.array([[[0., 0., 1., 1.],
[0., 0., 1., 1.],
[1., 1., 0., 0.],
[1., 1., 0., 0.]],
[[1., 0., 1., 0.],
[0., 1., 0., 1.],
[1., 0., 1., 0.],
[0., 1., 0., 1.]],
[[0., 1., 1., 0.],
[1., 0., 0., 1.],
[1., 0., 0., 1.],
[0., 1., 1., 0.]]])
sorted_index = jax.vmap(sorted_idx)(n)
sorted_perms = jax.vmap(jax.vmap(sorted_perm, in_axes=(0, 0)), in_axes=(0,0))(n, sorted_index)
parities = jax.vmap(parities)(sorted_index)
I expect the following solution:
sorted_elements= [[[0., 0., 1., 1.],
[0., 0., 1., 1.],
[0., 0., 1., 1.],
[0., 0., 1., 1.]],
[[0., 0., 1., 1.],
[0., 0., 1., 1.],
[0., 0., 1., 1.],
[0., 0., 1., 1.]],
[[0., 0., 1., 1.],
[0., 0., 1., 1.],
[0., 0., 1., 1.],
[0., 0., 1., 1.]]]
parities = [[1, 1, 1, 1],
[-1, -1, -1, -1],
[1, 1, 1, 1]]
I tried the following:
# sort the array and return the arg_sort indices
def sorted_idx(permutations):
sort_idx = jnp.argsort(permutations)
return sort_idx
# sort the permutations (vectors) given the sorted_indices
def sorted_perm(permutations, sort_idx):
perm = permutations[sort_idx]
return perm
# Calculate the permutation cycle, from which we compute the permutation parity
#jax.vmap
def parities(sort_idx):
length = len(sort_idx)
elements_seen = jnp.zeros(length)
cycles = 0
for index in range(length):
if elements_seen[index] == True:
continue
cycles += 1
current = index
if elements_seen[current] == False:
elements_seen.at[current].set(True)
current = sort_idx[current]
is_even = (length - cycles) % 2 == 0
return +1 if is_even else -1
But I get the following: parities= [[1 1 1 1], [1 1 1 1], [1 1 1 1]]
I get for each permutation vector a parity factor of 1, which is wrong....
The reason your routine doesn't work is because you're attempting to vmap over Python control flow, and this must be done very carefully (See JAX Sharp Bits: Control Flow). I suspect it would be a bit complicated to try to construct your iterative parity approach in terms of jax.lax control flow operators, but there might be another way.
The parity of a permutation is related to the determinant of its cycle matrix, and the jacobian of a sort happens to be equivalent to that cycle matrix, so you could (ab)use JAX's automatic differentiation of the sort operator to compute the parities very concisely:
def compute_parity(p):
return jnp.linalg.det(jax.jacobian(jnp.sort)(p.astype(float))).astype(int)
sorted_index = jnp.argsort(n, axis=-1)
parities = jax.vmap(jax.vmap(compute_parity))(sorted_index)
print(parities)
# [[ 1 1 1 1]
# [-1 -1 -1 -1]
# [ 1 1 1 1]]
This does end up being O[N^3] where N is the length of the permutations, but due to the nature of XLA computations, particularly on accelerators like GPU, the vectorized approach will likely be more efficient than an iterative approach for reasonably-sized N.
Also note that there's no reason to compute the sorted_index with this implementation; you could call compute_parity directly on your array n instead.

Is there a way to get "white cells"/"black cells" views on a square board in Numpy?

Is there a way to build the two black and white views on some square array board in Numpy?
Of course board could also be itself a view on another array?
We obviously assume the rank of the board is even (like the classical 8x8 chess board), since achieving the required task on an odd board is truly easy.
I think it is not fully possible though I have a close match with the following idea:
a = np.zeros((81,))
board = a.reshape((9,9))[:8,:8]
black = a[::2]
white = a[1::2]
black += 1
white += 2
print(board)
Which almost does what is required: board is some 8x8 view on an internal array, and you can intialize black and white cells separately by using two other views. But this solution is not perfect since the two black and white views also contain useless hidden cells.
Is there a better solution for this question?
This is a mere theoretical challenge between colleagues (and not a "what are you trying to achieve?" question from some production context).
If black and white needn't be 1D it can be done:
board = np.zeros((18,12))[::3,::2]
# non contiguous to make it a bit intersting
m,n = board.shape
v4d = board.reshape(m//2,2,n//2,2)
black = np.einsum("ijkj->ijk",v4d)
white = np.einsum("ijkj->ijk",v4d[...,::-1])
board
# array([[0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0.]])
black += 1
board
# array([[1., 0., 1., 0., 1., 0.],
# [0., 1., 0., 1., 0., 1.],
# [1., 0., 1., 0., 1., 0.],
# [0., 1., 0., 1., 0., 1.],
# [1., 0., 1., 0., 1., 0.],
# [0., 1., 0., 1., 0., 1.]])
white += 2
board
# array([[1., 2., 1., 2., 1., 2.],
# [2., 1., 2., 1., 2., 1.],
# [1., 2., 1., 2., 1., 2.],
# [2., 1., 2., 1., 2., 1.],
# [1., 2., 1., 2., 1., 2.],
# [2., 1., 2., 1., 2., 1.]])
I think your hunch that it is impossible is correct.
If your board is not itself a view, it is easy to get views that you can operate on:
>>> board = np.zeros((8, 8), int)
>>> black = board.ravel()[::2]
>>> black.base is board # is black a view of board?
True
If board is a view of a, the memory won't be aligned in a way that you can get the views you want, so ravel will instead create copies:
>>> a = np.zeros(81)
>>> board = a.reshape(9, 9)[:8, :8]
>>> board.base is a # is board a view of a?
True
>>> black = board.ravel()[::2]
>>> black.base in (a, board) # is black a view of a or board?
False
One workaround I can think of is to split each view into two:
>>> black0 = board[::2, ::2]
>>> black1 = board[1::2, 1::2]
>>> black0.base is a and black1.base is a # are both views of a?
True

Opposite of binary_dilation

Is there a function that does the opposite of binary_dilation? I'm looking to remove 'islands' from an array of 0's and 1's. That is, if a value of 1 in a 2D array doesn't have at least 1 adjacent neighbor that is also 1, its value gets set to 0 (rather than have its neighbor's values set equal to 1 as in binary_dilation). So for example:
test = np.zeros((5,5))
test[1,1] = test[1,2] = test[3,3] = test[4,3] = test[0,3] = test[3,1] = 1
test
array([[0., 0., 0., 1., 0.],
[0., 1., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 1., 0., 1., 0.],
[0., 0., 0., 1., 0.]])
And the function I'm seeking would return:
array([[0., 0., 0., 0., 0.],
[0., 1., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0.]])
Note the values changed in locations [0,3] and [3,1] from 1 to 0 because they have no adjacent neighbors with a value equal 1 (diagonal doesn't count as a neighbor).
You can create a mask with the cells to check and do a 2d convolution with test to identify the cells with 1s adjacent to them. The logical and of the convolution and test should produce the desired output.
First define your mask. Since you are only looking for up/down and left/right adjacency, you want the following:
mask = np.ones((3, 3))
mask[1,1] = mask[0, 0] = mask[0, 2] = mask[2, 0] = mask[2, 2] = 0
print(mask)
#array([[0., 1., 0.],
# [1., 0., 1.],
# [0., 1., 0.]])
If you wanted to include diagonal elements, you'd simply update mask to include 1s in the corners.
Now apply a 2d convolution of test with mask. This will multiply and add the values from the two matrices. With this mask, this will have the effect of returning the sum of all adjacent values for each cell.
from scipy.signal import convolve2d
print(convolve2d(test, mask, mode='same'))
#array([[0., 1., 2., 0., 1.],
# [1., 1., 1., 2., 0.],
# [0., 2., 1., 1., 0.],
# [1., 0., 2., 1., 1.],
# [0., 1., 1., 1., 1.]])
You have to specify mode='same' so the result is the same size as the first input (test). Notice that the two cells that you wanted to remove from test are 0 in the convolution output.
Finally do a element wise and operation with this output and test to find the desired cells:
res = np.logical_and(convolve2d(test, mask, mode='same'), test).astype(int)
print(res)
#array([[0, 0, 0, 0, 0],
# [0, 1, 1, 0, 0],
# [0, 0, 0, 0, 0],
# [0, 0, 0, 1, 0],
# [0, 0, 0, 1, 0]])
Update
For the last step, you could also just clip the values in the convolution between 0 and 1 and do an element wise multiplication.
res = convolve2d(test, mask, mode='same').clip(0, 1)*test
#array([[0., 0., 0., 0., 0.],
# [0., 1., 1., 0., 0.],
# [0., 0., 0., 0., 0.],
# [0., 0., 0., 1., 0.],
# [0., 0., 0., 1., 0.]])

SMILES from graph

Is there a method or package that converts a graph (or adjacency matrix) into a SMILES string?
For instance, I know the atoms are [6 6 7 6 6 6 6 8] ([C C N C C C C O]), and the adjacency matrix is
[[ 0., 1., 0., 0., 0., 0., 0., 0.],
[ 1., 0., 2., 0., 0., 0., 0., 1.],
[ 0., 2., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 1., 1.],
[ 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.]]
I need some function to output 'CC1=NCCC(C)O1'.
It also works if some function can output the corresponding "mol" object. The RDkit software has a 'MolFromSmiles' function. I wonder if there is something like 'MolFromGraphs'.
Here is a simple solution, to my knowledge there is no built-in function for this in RDKit.
def MolFromGraphs(node_list, adjacency_matrix):
# create empty editable mol object
mol = Chem.RWMol()
# add atoms to mol and keep track of index
node_to_idx = {}
for i in range(len(node_list)):
a = Chem.Atom(node_list[i])
molIdx = mol.AddAtom(a)
node_to_idx[i] = molIdx
# add bonds between adjacent atoms
for ix, row in enumerate(adjacency_matrix):
for iy, bond in enumerate(row):
# only traverse half the matrix
if iy <= ix:
continue
# add relevant bond type (there are many more of these)
if bond == 0:
continue
elif bond == 1:
bond_type = Chem.rdchem.BondType.SINGLE
mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
elif bond == 2:
bond_type = Chem.rdchem.BondType.DOUBLE
mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
# Convert RWMol to Mol object
mol = mol.GetMol()
return mol
Chem.MolToSmiles(MolFromGraphs(nodes, a))
Out:
'CC1=NCCC(C)O1'
This solution is a simplified version of https://github.com/dakoner/keras-molecules/blob/dbbb790e74e406faa70b13e8be8104d9e938eba2/convert_rdkit_to_networkx.py
There are many other atom properties (such as Chirality or Protonation state) and bond types (Triple, Dative...) that may need to be set. It is better to keep track of these explicitly in your graph if possible (as in the link above), but this function can also be extended to incorporate these if required.

change a subset of elements' values in theano matrix

I want to create a mask matrix dynamically, for example, in numpy
mask = numpy.zeros((5,5))
row = numpy.arange(5)
col = [0, 2, 3, 0, 1]
mask[row, col] += 1 # that is setting some values to `1`
Here is what I tried in theano,
mask = tensor.zeros((5,5))
row = tensor.ivector('row')
col = tensor.ivector('col')
mask = tensor.set_subtensor(mask[row, col], 1)
the above theano code failed with error message: not supported.
Any other ways?
This works for me on 0.6.0. I used your code and created a function from it to check the output. Try copying and pasting this:
import theano
from theano import tensor
mask = tensor.zeros((5,5))
row = tensor.ivector('row')
col = tensor.ivector('col')
mask = tensor.set_subtensor(mask[row, col], 1)
f = theano.function([row, col], mask)
print f(np.array([0, 1, 2]).astype(np.int32), np.array([1, 2, 3]).astype(np.int32))
This yields
array([[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])

Categories

Resources