I'm working on a very simple example of random walk simulations using numpy. My professor insists that we use numpy's broadcast functionality instead of for loops as much as we can, and I want to know if it's possible to broadcast dictionary definitions.
e.g. I have the array [E W N S]. Running through that array using the dictionary would result in [[1, 0] [-1, 0] [0, 1] [0, -1]].
import numpy as np
import matplotlib.pyplot as plt
def random_path(origin, nsteps, choices, choice_probs, choice_map):
directions = np.random.choice(choices, size=(15,), p=choice_probs)
print directions
def main():
directions = ['N', 'S', 'E', 'W']
dir_probabilities = [.2, .3, .45, .05]
dir_map = {'N': [0, 1], 'S': [0, -1], 'E': [1, 0], 'W': [-1, 0]}
origin = [0, 0]
np.random.seed(12345)
path = random_path(origin, 15, directions, dir_probabilities, dir_map)
main()
Why not just ignore the actual directional labels and just store the directions as a (4,2) shaped numpy array? Then you would just index into that array directly.
def random_path(origin, nsteps, choices, choice_probs, choice_map):
directions = np.random.choice(choices, size=(15,), p=choice_probs)
return directions
dir_map = np.array([[0,1], [0,-1], [1,0], [-1,0]])
# Everything else is the same as defined by OP
path_directions = random_path(origin, 15, np.arange(4), dir_probabilities, dir_map)
path = dir_map[path_directions]
Now path is a (15,2) shaped numpy array containing the sequence of moves from the dir_map.
Related
Here is the description for louvain in scanpy.
I would like to pass a specific adj matrix, however, I tried the minimal example as follows and got the result of "Length of values (4) does not match length of index (6)". Is this mistake due to the misuse of the sparse matrix?
Code:
import scanpy as sc
import torch
import numpy as np
import networkx as nx
nodes = [[0, 0, 0, 1], [0, 0, 0, 2], [0, 10, 0, 0], [0, 11, 0, 0], [1, 0, 0, 0], [2, 0, 0, 0]]
features = torch.tensor(nodes)
print(features.shape)
edgelist = [(0,1), (1,2), (2,3)]
G = nx.Graph(edgelist)
G_adj = nx.convert_matrix.to_scipy_sparse_matrix(G) # transform to scipy sparse matrix
adata = sc.AnnData(features.numpy())
sc.pp.neighbors(adata, n_neighbors=2, use_rep='X')
sc.tl.louvain(adata, resolution=0.01, adjacency=G_adj) # pass the adj here
y_pred = adata.obs['louvain'].astype(int).to_numpy()
n_clusters = len(np.unique(y_pred))
Could you point out what is wrong and provide an example of how to explicitly pass an adjacency matrix when using scanpy.tl.louvain? Thanks!
G is a graph created with four nodes, and thus G_adj is a (4, 4) sparse matrix.
adata is a scanpy object with 6 observations, and four variables. the scanpy louvain algorithm clusters observations, and thus expects an adjacncy matrix of shape (6, 6).
Not sure what you were meaning to do:
If you truly have 6 nodes you should alter your code for the graph:
print(features.shape)
edgelist = [(0,1), (1,2), (2,3)]
G = nx.Graph()
G.add_nodes_from(range(6))
G.add_edges_from(edgelist)
G_adj = nx.convert_matrix.to_scipy_sparse_matrix(G) # transform to scipy sparse matrix
adata = sc.AnnData(features.numpy())
If you have 4 nodes, alter the adata creation line:
adata = sc.AnnData(features.numpy().T)
There is a numpy array that can be formed by combining an array of tuples in a for loop like "res" in this code. (Variable names and contents are simplified from the actual code.)
If you take a closer look at this, a for loop is executed for the length of arr_2, and the array extends () is executed.It turns out that the processing speed becomes extremely heavy when arr_2 becomes long.
Wouldn't it be possible to process at high speed by making array creation well?
# -*- coding: utf-8 -*-
import numpy as np
arr_1 = np.array([[0, 0, 1], [0, 0.5, -1], [-1, 0, -1], [0, -0.5, -1], [1, 0, -1]])
arr_2 = np.array([[0, 1, 2], [0, 1, 2]])
all_arr = []
for p in arr_2:
all_arr = [
(arr_1[0], p), (arr_1[1], p), (arr_1[2], p),
(arr_1[0], p), (arr_1[1], p), (arr_1[4], p),
(arr_1[0], p), (arr_1[2], p), (arr_1[3], p),
(arr_1[0], p), (arr_1[3], p), (arr_1[4], p),
(arr_1[1], p), (arr_1[2], p), (arr_1[4], p),
(arr_1[2], p), (arr_1[3], p), (arr_1[4], p)]
all_arr.extend(all_arr)
vtype = [('type_a', np.float32, 3), ('type_b', np.float32, 3)]
res = np.array(all_arr, dtype=vtype)
print(res)
I couldn't figure out why you used this indexing for arr_1 so I just copied it
import numpy as np
arr_1 = np.array([[0, 0, 1], [0, 0.5, -1], [-1, 0, -1], [0, -0.5, -1], [1, 0, -1]])
arr_2 = np.array([[0, 1, 2], [0, 1, 2]])
weird_idx = np.array([0,1,2,0,1,4,0,2,3,0,3,4,1,2,4,2,3,4])
weird_arr1 = arr_1[weird_idx]
all_arr = [(wiered_arr1[i],arr_2[j]) for j in range(len(arr_2)) for i in range(len(wiered_arr1)) ]
vtype = [('type_a', np.float32, 3), ('type_b', np.float32, 3)]
res = np.array(all_arr, dtype=vtype)
you can also repeat the arrays
arr1_rep = np.tile(weird_arr1.T,2).T
arr2_rep = np.repeat(arr_2,weird_arr1.shape[0],0)
res = np.empty(arr1_rep.shape[0],dtype=vtype)
res['type_a']=arr1_rep
res['type_b']=arr2_rep
Often with structured arrays it is faster to assign by field instead of the list of tuples approach:
In [388]: idx = [0,1,2,0,1,4,0,2,3,0,3,4,1,2,4,2,3,4]
In [400]: res1 = np.zeros(36, dtype=vtype)
In [401]: res1['type_a'][:18] = arr_1[idx]
In [402]: res1['type_a'][18:] = arr_1[idx]
In [403]: res1['type_b'][:18] = arr_2[0]
In [404]: res1['type_b'][18:] = arr_2[1]
In [405]: np.allclose(res['type_a'], res1['type_a'])
Out[405]: True
In [406]: np.allclose(res['type_b'], res1['type_b'])
Out[406]: True
I would like to know if there is an efficient method to get sub-arrays from a larger numpy array.
What I have is an application of np.where. I iterate 'manually' over x and y as offsets and apply where with a kernel to each rectangle extracted from the larger array with proper dimensions.
But is there a more direct approach in numpy's collection of methods?
import numpy as np
example = np.arange(20).reshape((5, 4))
# e.g. a cross kernel
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
np.where(a_kernel, example[1:4, 1:4], 0)
# returns
# array([[ 0, 6, 0],
# [ 9, 10, 11],
# [ 0, 14, 0]])
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)]
sub_arrays = arrays_from_kernel(example, a_kernel)
This returns the arrays I need for further processing.
# [array([[0, 1, 0],
# [4, 5, 6],
# [0, 9, 0]]),
# array([[ 0, 2, 0],
# [ 5, 6, 7],
# [ 0, 10, 0]]),
# ...
# array([[ 0, 9, 0],
# [12, 13, 14],
# [ 0, 17, 0]]),
# array([[ 0, 10, 0],
# [13, 14, 15],
# [ 0, 18, 0]])]
The context: similar to 2D convolution I would like to apply a custom function on each of the subarrays (e.g. product of squared numbers).
At the moment, you're manually advancing a sliding window over the data - stride tricks to the rescue! (And no, I didn't just make that up - there's actually a submodule called stride_tricks in numpy!) Instead of manually building windows into the data, and calling np.where() on them, if you had the windows in an array, you could call np.where() just once. Stride tricks allow you to create such an array without even having to copy the data.
Let me explain. Normal slices in numpy create views into the original data instead of copies. This is done by referring to the original data, but changing the strides used to access the data (ie. how much to jump between two elements or two rows, and so on). Stride tricks allow you to modify those strides more freely than just slicing and reshaping does, so you can eg. iterate over the same data more than once, which is useful here.
Let me demonstrate:
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def sliding_window(data, win_shape, **kwargs):
assert data.ndim == len(win_shape)
shape = tuple(dn - wn + 1 for dn, wn in zip(data.shape, win_shape)) + win_shape
strides = data.strides * 2
return np.lib.stride_tricks.as_strided(data, shape=shape, strides=strides, **kwargs)
def arrays_from_kernel(a, a_kernel):
windows = sliding_window(a, a_kernel.shape)
return np.where(a_kernel, windows, 0)
sub_arrays = arrays_from_kernel(example, a_kernel)
The scipy.ndimage module offers a number of filters -- one of which might meet your needs. If none of those filters do what you want, you could use ndimage.generic_filter
to call a custom function on each subarray. ndimage.generic_filter is not as fast as the other ndimage filters, however.
For example,
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
# def arrays_from_kernel(a, a_kernel):
# width, height = a_kernel.shape
# y_max, x_max = a.shape
# return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
# for y in range(y_max - height + 1)
# for x in range(x_max - width + 1)]
# sub_arrays = arrays_from_kernel(example, a_kernel)
# for arr in sub_arrays:
# print(arr)
# print('-'*80)
import scipy.ndimage as ndimage
def func(x):
# reject subarrays that extend beyond the border of the `example` array
if not np.isnan(x).any():
y = np.zeros_like(a_kernel, dtype=example.dtype)
np.put(y, np.flatnonzero(a_kernel), x)
print(y)
# Instead or returning 0, you can perform your desired computation on the subarray here.
# Note that you may not need the 2D array y; often, you only need the values in the 1D array x
return 0
result = ndimage.generic_filter(example, func, footprint=a_kernel, mode='constant', cval=np.nan)
For the particular problem of computing the product of squares for each subarray, you
could convert the product into a sum by taking advantage of the fact that A * B = exp(log(A)+log(B)). This would allow you to express the computation as a normal convolution. Now using ndimage.convolve can improve performance a lot. The amount of the improvement depends on the size of example:
import numpy as np
import scipy.ndimage as ndimage
import perfplot
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def orig(example, a_kernel=a_kernel):
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [
np.where(a_kernel, a[y : (y + height), x : (x + width)], 1)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)
]
return [np.prod(x) ** 2 for x in arrays_from_kernel(example, a_kernel)]
def alt(example, a_kernel=a_kernel):
logged = np.log(example)
result = ndimage.convolve(logged, a_kernel, mode="constant", cval=0)[1:-1, 1:-1]
return (np.exp(result) ** 2).ravel()
def make_example(N):
return np.random.random(size=(N, N))
def check(A, B):
return np.allclose(A, B)
perfplot.show(
setup=make_example,
kernels=[orig, alt],
n_range=[2 ** k for k in range(2, 11)],
logx=True,
logy=True,
xlabel="len(example)",
equality_check=check,
)
I want to generate an array A defined as
A = [[1, 2, 0], [3, 1, 2], [3, 3, 1]]
I have tried to construct it as
A = np.diag(np.ones(3), k=0)
What should I do to construct the diagonal above the main diagonal? if I mention k=0 it returns the values of diagonal above the main diagonal
you can use .fill_diagonal and some 2-d slicing to fill diagonals that are not the main one.
try this:
import numpy as np
A = np.diag(np.ones(3), k=0)
np.fill_diagonal(A[1:, :], np.ones(3) * 3)
np.fill_diagonal(A[:, 1:], np.ones(3) * 2)
print(A)
I am trying to use numpy.vectorize to iterate over a (2x5) matrix which contains two vectors representing the x- and y-values of coordinates. The coordinates (x- and y-value) are to be fed to a function returning a (1x1) vector for each iteration. So that in the end, the result should be a (1x5) vector. My problem is that instead of iterating through each element I want the algorithm to iterate through both vectors simultaneously, so it picks up the x- and y-values of the coordinates in parallel to feed it to the function.
data = np.transpose(np.array([[1, 2], [1, 3], [2, 1], [1, -1], [2, -1]]))
th_ = np.array([[1, 1]])
th0_ = -2
def positive(x, th = th_, th0 = th0_):
if signed_dist(x, th, th0)[0][0] > 0:
return np.array([[1]])
elif signed_dist(x, th, th0)[0][0] == 0:
return np.array([[0]])
else:
return np.array([[-1]])
positive_numpy = np.vectorize(positive)
results = positive_numpy(data)
Reading the numpy documentation did not really help and I want to avoid large workarounds in favor of computation timing. Thankful for any suggestion!
This is a bit of a guess, but looks like your code can be simplified to
data = np.array([[1, 2], [1, 3], [2, 1], [1, -1], [2, -1]]) # (5,2) array
th_ = np.array([[1, 1]])
th0_ = -2
alist = [signed_dist(x, th_, th0_) for x in data]
arr = np.array(alist) # (5,?,?) array
arr = arr[:,0,0] # (5,) array
arr[arr>0] = 1