I have written a code to find connected spheres paths using NetworkX library in Python. For doing so, I need to find distances between the spheres before using the graph. This part of the code (calculation section (the numba function) --> finding distances and connections) led to memory leaks when using arrays in parallel scheme by numba (I had this problem when using np.linalg or scipy.spatial.distance.cdist, too). So, I wrote a non-parallel numba code using lists to do so. Now, it is memory-friendly but consumes a much time to calculate these distances (it consumes just ~10-20% of 16GB memory and ~30-40% of each CPU cores of my 4-cores CPU machine). For example, when I was testing on ~12000 data volume, it took less than one second for each of the calculation section and the NetworkX graph creation and for ~550000 data volume, it took around 25 minutes for calculation section (numba part) and 7 seconds for graph creation and getting the output list.
import numpy as np
import numba as nb
import networkx as nx
radii = np.load('rad_dist_12000.npy')
poss = np.load('pos_dist_12000.npy')
#nb.njit("(Tuple([float64[:, ::1], float64[:, ::1]]))(float64[::1], float64[:, ::1])", parallel=True)
def distances_numba_parallel(radii, poss):
radii_arr = np.zeros((radii.shape[0], radii.shape[0]), dtype=np.float64)
poss_arr = np.zeros((poss.shape[0], poss.shape[0]), dtype=np.float64)
for i in nb.prange(radii.shape[0] - 1):
for j in range(i+1, radii.shape[0]):
radii_arr[i, j] = radii[i] + radii[j]
poss_arr[i, j] = ((poss[i, 0] - poss[j, 0]) ** 2 + (poss[i, 1] - poss[j, 1]) ** 2 + (poss[i, 2] - poss[j, 2]) ** 2) ** 0.5
return radii_arr, poss_arr
#nb.njit("(List(UniTuple(int64, 2)))(float64[::1], float64[:, ::1])")
def distances_numba_non_parallel(radii, poss):
connections = []
for i in range(radii.shape[0] - 1):
connections.append((i, i))
for j in range(i+1, radii.shape[0]):
radii_arr_ij = radii[i] + radii[j]
poss_arr_ij = ((poss[i, 0] - poss[j, 0]) ** 2 + (poss[i, 1] - poss[j, 1]) ** 2 + (poss[i, 2] - poss[j, 2]) ** 2) ** 0.5
if poss_arr_ij <= radii_arr_ij:
connections.append((i, j))
return connections
def connected_spheres_path(radii, poss):
# in parallel mode
# maximum_distances, distances = distances_numba_parallel(radii, poss)
# connections = distances <= maximum_distances
# connections[np.tril_indices_from(connections, -1)] = False
# in non-parallel mode
connections = distances_numba_non_parallel(radii, poss)
G = nx.Graph(connections)
return list(nx.connected_components(G))
My datasets will contain maximum of 10 millions spheres (data are positions and radii), mostly, up to 1 millions; As it is mentioned above, the most part of the consumed time is related to the calculation section. I have little experience using graphs and don't know if (and how) it can be handled much faster using all CPU cores or RAM capacity (max 12GB) or if it can be calculated internally (I doubt that it is needed to calculate and find the connected spheres separately before using graphs) using other Python libraries such as graph-tool, igraph, and netwrokit to do all the process in C or C++ in an efficient way.
I would be grateful for any suggested answer that can make my code faster for large data volumes (performance is the first priority; if much memory capacities are needed for large data volumes, mentioning (some benchmarks) its amounts will be helpful).
Update:
Since just using trees will not be helpful enough to improve the performance, I have written an advanced optimized code to improve the calculation section speed by combining tree-based algorithms and numba jitting.
Now, I am curious if it can be calculated internally (calculation section is an integral part and basic need for such graphing) by other Python libraries such as graph-tool, igraph, and netwrokit to do all the process in C or C++ in an efficient way.
Data
radii: 12000, 50000, 550000
poss: 12000, 50000, 550000
If you are computing the pairwise distance between all points, that's N^2 calculations, which will take a very long time for sufficiently many data points.
If you can place an upper bound on the distance you need to consider for any two points, then there are some nice data structures for finding pairs of neighbors in a set of points. If you already have scipy installed, then the most convenient structure to reach for is the KDTree (or the optimized version, cKDTree). (Read more here.)
The basic recipe is:
Load your point set into the KDTree.
Ask the KDTree for all pairs of points which are within some maximum distance from each other.
Calculate the actual distances between each of the returned points.
Compare those distances with the summed radii associated with the point pair. Drop the pairs whose distances are too large.
Finally, you need to determine the clusters of spheres. Your question mentions "paths", but in your example code you're only concerned with connected components. Of course you can use networkx or graph-tool for that, but maybe that's overkill.
If connected components are all you need, then you don't even need a proper graph data structure. You just need a way to find the groups of linked nodes, without maintaining the specific connections that linked them. Again, scipy has a nice tool: DisjointSet. (Read more here.)
Here is a complete example. The execution time depends on not only the number of points, but how "dense" they are. I tried some reasonable (I think) test data with 1M points, which took 24 seconds to process on my laptop.
Your example data (the largest of the sets provided above) takes longer: about 45 seconds. The KDTree finds 312M pairs of points to consider, of which fewer than 1M are actually valid connections.
import numpy as np
from scipy.spatial import cKDTree
from scipy.cluster.hierarchy import DisjointSet
## Example data (2D)
## N = 1000
# D = 2
# max_point = 1000
# min_radius = 10
# max_radius = 20
# points = np.random.randint(0, max_point, size=(N, D))
# radii = np.random.randint(min_radius, max_radius+1, size=N)
## Example data (3D)
# N = 1_000_000
# D = 3
# max_point = 3000
# min_radius = 10
# max_radius = 20
# points = np.random.randint(0, max_point, size=(N, D))
# radii = np.random.randint(min_radius, max_radius+1, size=N)
# Question data (3D)
points = np.load('b (556024).npy')
radii = np.load('a (556024).npy')
N = len(points)
# Load into a KD tree and extract all pairs which could possibly be linked
# (using the maximum radius as the upper bound of the search distance.)
kd = cKDTree(points)
pairs = kd.query_pairs(2 * radii.max(), output_type='ndarray')
def filter_pairs(pairs):
# Calculate the distance between each pair of points
vectors = points[pairs[:, 1]] - points[pairs[:, 0]]
distances = np.linalg.norm(vectors, axis=1)
# Drop the pairs whose summed radii aren't large enough
# to span the distance between the points.
thresholds = radii[pairs].sum(axis=1)
return pairs[distances <= thresholds]
# We could do this in one big step
# ...but that might require lots of RAM.
# It's cheaper to do it in big chunks, in a loop.
fp = []
CHUNK = 1_000_000
for i in range(0, len(pairs), CHUNK):
fp.append(filter_pairs(pairs[i:i+CHUNK]))
filtered_pairs = np.concatenate(fp)
# Load the pairs into a DisjointSet (a.k.a. UnionFind)
# data structure and extract the groups.
ds = DisjointSet(range(N))
for u, v in filtered_pairs:
ds.merge(u, v)
connected_sets = list(ds.subsets())
print(f"Found {len(connected_sets)} sets of circles/spheres")
Just for fun, here's a visualization of the 2D test data:
from bokeh.plotting import output_notebook, figure, show
output_notebook()
p = figure()
p.circle(*points.T, radius=radii, fill_alpha=0.25)
p.segment(*points[filtered_pairs[:, 0]].T,
*points[filtered_pairs[:, 1]].T,
line_color='red')
show(p)
to find connected spheres using NetworkX library in Python. For
doing so, I need to find distances between the spheres
Are you calculating the distance between every pair of spheres?
If all you need is to know the pairs of spheres that touch, or maybe that overlap, then you do NOT need to calculate the distance between every pair of spheres, only ones that are in reasonable proximity to each other. The standard way of handling this it to use an octree https://en.wikipedia.org/wiki/Octree
This takes some time to set up, but once you have it, you can find quickly all the spheres that are close but none that are two far away. A reasonable distance would be twice the radius of the largest sphere. For large dataset the improvement in performance can be spectacular
( For more details about this test https://github.com/JamesBremner/quadtree )
So, the complete algorithm to find the paths through the connected spheres can be broken out into four conceptual steps
Find the connected spheres, using an octree to optimize finding them. Instead of searching through every pair of spheres, loop over the spheres and search through the spheres in the same octree cell. For more details on how to make this work you might want to look at the C++ code at https://github.com/JamesBremner/quadtree
Create the adjacency matrix of connected spheres. Conceptually this is a separate step, however, you will probably want to do that as you search for connected sphere in the first step. Construct an empty adjacency matrix N by N where N is the number of spheres. Each time you find a pair of connected spheres, fill in in matrix.
Load the matrix into a graph library. It may be more efficient to simply add the link between two connected spheres directly into the library and let it build the adjacency matrix.
Use the graph library methods to find the path.
Related
I was using the Google Maps Distance matrix API in python to calculate distances on bicycle between two points, using latitude and longitude. I was using a loop to calculate almost 300,000 rows of data for a student project (I am studying Data Science with Python). I added a debug line to output the row# and distance every 10,000 rows, but after humming away for a while with no results, I stopped the kernel and changed it to every 1000 rows. With that, after about 5 minutes it finally got to row 1000. After over an hour, it was only on row 70,000. Unbelievable. I stopped execution and later that day got an email from Google saying I had used up my free trial. so not only did it work incredibly slowly, I can't even use it at all anymore for a student project without incurring enormous fees.
So I rewrote the code to use geometry and just calculate "as the crow flies" distance. Not really what I want, but short of any alternatives, that's my only option.
Does anyone know of another (open-source, free) way to calculate distance to get what I want, or how to use the google distance matrix API more efficiently?
thanks,
so here is some more information, as suggested I post a bit more. I am trying to calculate distances between "stations", and am given lat's and long's for about 300K pairs. I was going to set up a function and then apply that function to the dataframe (bear with me, I'm still new at python and dataframes) -- but for now I was using a loop to go through all the pairs. Here is my code:
i = 0
while i < len(trip):
from_coords = str(result.loc[i, 'from_lat']) + " " + str(result.loc[i, 'from_long'])
to_coords = str(result.loc[i, 'to_lat']) + " " + str(result.loc[i, 'to_long'])
# now to get distances!!!
distance = gmaps.distance_matrix([from_coords], #origin lat & long, formatted for gmaps
[to_coords], #destination lat & long, formatted for gmaps
mode='bicycling')['rows'][0]['elements'][0] #mode=bicycling to use streets for cycling
result['distance'] = distance['distance']['value']
# added this bit to see how quickly/slowly the code is running
# ... and btw it's running very slowly. had the debug line at 10000 and changed it to 1000
# ... and i am running on a with i9-9900K with 48GB ram
# ... why so slow?
if i % 1000 == 0:
print(distance['distance']['value'])
i += 1
You could approximate the distance in KM with the haversine distance.
Here I have my distances as lat/long pairs as random_distances with shape (300000, 2) as a numpy array:
import numpy as np
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
random_distances = np.random.random( (300000,2) )
Than we can approximate the distances with
distances = np.zeros( random_distances.shape[0] - 2 )
for idx in range(random_distances.shape[0]-2):
distances[idx] = dist.pairwise(np.radians(random_distances[idx:idx+2]), np.radians(random_distances[idx:idx+2]) )[0][1]
distances *= 6371000/1000 # to get output as KM
distances now contains the distances.
It is 'allright' in speed, but can be improved. We could get rid of the for loop for instance, also 2x2 distances are returned and only 1 is used.
The haversine distance is an good approximation, but not exact which I imagine the API is:
From sklearn:
As the Earth is nearly spherical, the haversine formula provides a good approximation of the distance between two points of the Earth surface, with a less than 1% error on average.
I have several points (x,y,z coordinates) in a 3D box with associated masses. I want to draw an histogram of the mass-density that is found in spheres of a given radius R.
I have written a code that, providing I did not make any errors which I think I may have, works in the following way:
My "real" data is something huge thus I wrote a little code to generate non overlapping points randomly with arbitrary mass in a box.
I compute a 3D histogram (weighted by mass) with a binning about 10 times smaller than the radius of my spheres.
I take the FFT of my histogram, compute the wave-modes (kx, ky and kz) and use them to multiply my histogram in Fourier space by the analytic expression of the 3D top-hat window (sphere filtering) function in Fourier space.
I inverse FFT my newly computed grid.
Thus drawing a 1D-histogram of the values on each bin would give me what I want.
My issue is the following: given what I do there should not be any negative values in my inverted FFT grid (step 4), but I get some, and with values much higher that the numerical error.
If I run my code on a small box (300x300x300 cm3 and the points of separated by at least 1 cm) I do not get the issue. I do get it for 600x600x600 cm3 though.
If I set all the masses to 0, thus working on an empty grid, I do get back my 0 without any noted issues.
I here give my code in a full block so that it is easily copied.
import numpy as np
import matplotlib.pyplot as plt
import random
from numba import njit
# 1. Generate a bunch of points with masses from 1 to 3 separated by a radius of 1 cm
radius = 1
rangeX = (0, 100)
rangeY = (0, 100)
rangeZ = (0, 100)
rangem = (1,3)
qty = 20000 # or however many points you want
# Generate a set of all points within 1 of the origin, to be used as offsets later
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
for z in range(-radius, radius+1):
if x*x + y*y + z*z<= radius*radius:
deltas.add((x,y,z))
X = []
Y = []
Z = []
M = []
excluded = set()
for i in range(qty):
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
z = random.randrange(*rangeZ)
m = random.uniform(*rangem)
if (x,y,z) in excluded: continue
X.append(x)
Y.append(y)
Z.append(z)
M.append(m)
excluded.update((x+dx, y+dy, z+dz) for (dx,dy,dz) in deltas)
print("There is ",len(X)," points in the box")
# Compute the 3D histogram
a = np.vstack((X, Y, Z)).T
b = 200
H, edges = np.histogramdd(a, weights=M, bins = b)
# Compute the FFT of the grid
Fh = np.fft.fftn(H, axes=(-3,-2, -1))
# Compute the different wave-modes
kx = 2*np.pi*np.fft.fftfreq(len(edges[0][:-1]))*len(edges[0][:-1])/(np.amax(X)-np.amin(X))
ky = 2*np.pi*np.fft.fftfreq(len(edges[1][:-1]))*len(edges[1][:-1])/(np.amax(Y)-np.amin(Y))
kz = 2*np.pi*np.fft.fftfreq(len(edges[2][:-1]))*len(edges[2][:-1])/(np.amax(Z)-np.amin(Z))
# I create a matrix containing the values of the filter in each point of the grid in Fourier space
R = 5
Kh = np.empty((len(kx),len(ky),len(kz)))
#njit(parallel=True)
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2) != 0:
Kh[i][j][k] = (np.sin((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)-(np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R*np.cos((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R))*3/((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)**3
else:
Kh[i][j][k] = 1
return Kh
Kh = func_njit(kx, ky, kz, Kh)
# I multiply each point of my grid by the associated value of the filter (multiplication in Fourier space = convolution in real space)
Gh = np.multiply(Fh, Kh)
# I take the inverse FFT of my filtered grid. I take the real part to get back floats but there should only be zeros for the imaginary part.
Density = np.real(np.fft.ifftn(Gh,axes=(-3,-2, -1)))
# Here it shows if there are negative values the magnitude of the error
print(np.min(Density))
D = Density.flatten()
N = np.mean(D)
# I then compute the histogram I want
hist, bins = np.histogram(D/N, bins='auto', density=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
plt.xlabel('rho/rhom')
plt.ylabel('P(rho)')
plt.show()
Do you know why I'm getting these negative values? Do you think there is a simpler way to proceed?
Sorry if this is a very long post, I tried to make it very clear and will edit it with your comments, thanks a lot!
-EDIT-
A follow-up question on the issue can be found [here].1
The filter you create in the frequency domain is only an approximation to the filter you want to create. The problem is that we are dealing with the DFT here, not the continuous-domain FT (with its infinite frequencies). The Fourier transform of a ball is indeed the function you describe, however this function is infinitely large -- it is not band-limited!
By sampling this function only within a window, you are effectively multiplying it with an ideal low-pass filter (the rectangle of the domain). This low-pass filter, in the spatial domain, has negative values. Therefore, the filter you create also has negative values in the spatial domain.
This is a slice through the origin of the inverse transform of Kh (after I applied fftshift to move the origin to the middle of the image, for better display):
As you can tell here, there is some ringing that leads to negative values.
One way to overcome this ringing is to apply a windowing function in the frequency domain. Another option is to generate a ball in the spatial domain, and compute its Fourier transform. This second option would be the simplest to achieve. Do remember that the kernel in the spatial domain must also have the origin at the top-left pixel to obtain a correct FFT.
A windowing function is typically applied in the spatial domain to avoid issues with the image border when computing the FFT. Here, I propose to apply such a window in the frequency domain to avoid similar issues when computing the IFFT. Note, however, that this will always further reduce the bandwidth of the kernel (the windowing function would work as a low-pass filter after all), and therefore yield a smoother transition of foreground to background in the spatial domain (i.e. the spatial domain kernel will not have as sharp a transition as you might like). The best known windowing functions are Hamming and Hann windows, but there are many others worth trying out.
Unsolicited advice:
I simplified your code to compute Kh to the following:
kr = np.sqrt(kx[:,None,None]**2 + ky[None,:,None]**2 + kz[None,None,:]**2)
kr *= R
Kh = (np.sin(kr)-kr*np.cos(kr))*3/(kr)**3
Kh[0,0,0] = 1
I find this easier to read than the nested loops. It should also be significantly faster, and avoid the need for njit. Note that you were computing the same distance (what I call kr here) 5 times. Factoring out such computation is not only faster, but yields more readable code.
Just a guess:
Where do you get the idea that the imaginary part MUST be zero? Have you ever tried to take the absolute values (sqrt(re^2 + im^2)) and forget about the phase instead of just taking the real part? Just something that came to my mind.
I'm currently searching for an efficient algorithm that takes in a set of points from three dimensional spaces and groups them into classes (maybe represented by a list). A point should belong to a class if it is close to one or more other points from the class. Two classes are then the same if they share any point.
Because I'm working with large data sets, I don't want to use recursive methods. Also, using something like a distance matrix with O(n^2) performance is what I try to avoid.
I tried to check for some algorithms online, but most of them don't appeal to this specific purpose (e.g. k-d tree or other cluster algorithms). I thought about parting space into smaller parts, but that (potentially) results in an inexact result.
I tried to write something myself, but it turned out to be flawed. I would sort my points after distance and append the distance as a fourth coordinate and then repeat the following the following code-segment:
def grouping_presorted(lst, distance):
positions = [0]
x = []
while positions:
curr_el = lst[ positions[-1] ]
nn_i = HasNeighbor(lst, distance, positions[-1])
if nn_i is None:
x.append(lst.pop(positions[-1]) )
positions.pop(-1)
else:
positions.append(nn_i)
return x
def HasNeighbor(lst,distance,index):
i = index+1
while lst[i][3]- lst[index][3] < distance:
dist = (lst[i][0]-lst[index][0])**2 + (lst[i][1]-lst[index][1])**2 + (lst[i][2]-lst[index][2])**2
if dist < distance:
return i
i+=1
return None
Aside from an (probably easy to fix) overflow error, there's a bigger flaw in the logic of linking the points. If you think of my points describing lines in space, the algorithm only works for lines that strictly point outwards the origin, but not for circles or similar structures.
Does anybody know of a prewritten code for this or have an idea what I could try?
Thanks in advance.
Edit: It seems my spelling and maybe confusion of some terms has sparked some misunderstandings. I hope that this (badly-made) sketch helps. In this example, I marked my reference distance as d and circled the two containers I wan't to end up with in red.
You could try https://en.wikipedia.org/wiki/OPTICS_algorithm. When you index the points first (e.g, with an R-Tree) this should be possible in O(n log n).
Edit:
If you already know your epsilon and how many points are minimally in a cluster (minpoints) then DBSCAN could be the better choice.
What I ended up doing
After following all the suggestions of your comments, help from cs.stackexchange and doing some research I was able to write down two different methods for solving this problem. In case someone might be interested, I decided to share them here. Again, the problem is to write a program that takes in a set of coordinate tuples and groups them into clusters. Two points x,y belong to the same cluster if there is a sequence of elements x=x_1,..,y=x_N such that d(x_i,x_i+1)
DBSCAN: By fixing euclidean metric, minPts = 2 and grouping distance epsilon = r.
scikit-learn provides a nice implementation of this algorithm. A minimal code snippet for the task would be:
from sklearn.cluster import DBSCAN
from sklearn.datasets.samples_generator import make_blobs
import networkx as nx
import scipy.spatial as sp
def cluster(data, epsilon,N): #DBSCAN, euclidean distance
db = DBSCAN(eps=epsilon, min_samples=N).fit(data)
labels = db.labels_ #labels of the found clusters
n_clusters = len(set(labels)) - (1 if -1 in labels else 0) #number of clusters
clusters = [data[labels == i] for i in range(n_clusters)] #list of clusters
return clusters, n_clusters
centers = [[1, 1,1], [-1, -1,1], [1, -1,1]]
X,_ = make_blobs(n_samples=N, centers=centers, cluster_std=0.4,
random_state=0)
cluster(X,epsilon,N)
On my machine, N=20000 for this clustering variation with an epsilon of epsilon = 0.1 takes just 290ms, so this seems really quick to me.
Graph components: One can think of this problem as follows: The coordinates define nodes of a graph, and two nodes are adjacent if their distance is smaller than epsilon/r. A cluster is then given as a connected component of this graph. At first I had problems implementing this graph, but there are many ways to write a linear time algorithm to do this. The easiest and fastest way however, for me, was to use scipy.spatial's cKDTree data structure and the corresponding query_pairs() method, that returns a list of indice tuples of points that are in given distance. One could for example write it like this:
class IGraph:
def __init__(self, nodelst=[], radius = 1):
self.igraph = nx.Graph()
self.radii = radius
self.nodelst = nodelst #nodelst is array of coordinate tuples, graph contains indices as nodes
self.__make_edges__()
def __make_edges__(self):
self.igraph.add_edges_from( sp.cKDTree(self.nodelst).query_pairs(r=self.radii) )
def get_conn_comp(self):
ind = [list(x) for x in nx.connected_components(self.igraph) if len(x)>1]
return [self.nodelst[indlist] for indlist in ind]
def graph_cluster(data, epsilon):
graph = IGraph(nodelst = data, radius = epsilon)
clusters = graph.get_conn_comp()
return clusters, len(clusters)
For the same dataset mentioned above, this method takes 420ms to find the connected components. However, for smaller clusters, e.g. N=700, this snippet runs faster. It also seems to have an advantage for finding smaller clusters (that is being given smaller epsilon values) and a vast disadvantage in the other direction (all on this specific dataset of course). I think, depending on the given situation, both methods are worth considering.
Hope this is of use for somebody.
Edit: Theoretically, DBSCAN has computational complexity O(n log n) when properly implemented (according to wikipedia...), while constructing the graph as well as finding its connected components runs linear in time. I'm not sure how well these statements hold for the given implementations though.
Adapt a path-finding algorithm, such as Dijkstra's or A*, or alternatively adapt the breadth-first or depth-first search of a graph. Start at any point in the set of unvisited points, and proceed with whichever algorithm you've picked with the caveat that a point is considered to be connected only to all points to which its distance is less than the threshhold. When you've finished off with one class (i.e. when you can discover no more new nodes), pick any node from the set of unvisited nodes and repeat.
I'm relatively new to Python coding (I'm switching from R mostly due to running time speed) and I'm trying to figure out how to code a proximity graph.
That is suppose i have an array of evenly-spaced points in d-dimensional Euclidean space, these will be my nodes. I want to make these into an undirected graph by connecting two points if and only if they lie within e apart. How can I encode this functionally with parameters:
n: spacing between two points on the same axis
d: dimension of R^d
e: maximum distance allowed for an edge to exist.
The graph-tool library has much of the functionality you need. So you could do something like this, assuming you have numpy and graph-tool:
coords = numpy.meshgrid(*(numpy.linspace(0, (n-1)*delta, n) for i in range(d)))
# coords is a Python list of numpy arrays
coords = [c.flatten() for c in coords]
# now coords is a Python list of 1-d numpy arrays
coords = numpy.array(coords).transpose()
# now coords is a numpy array, one row per point
g = graph_tool.generation.geometric_graph(coords, e*(1+1e-9))
The silly e*(1+1e-9) thing is because your criterion is "distance <= e" and geometric_graph's criterion is "distance < e".
There's a parameter called delta that you didn't mention because I think your description of parameter n is doing duty for two params (spacing between points, and number of points).
This bit of code should work, although it certainly isn't the most efficient. It will go through each node and check its distance to all the other nodes (that haven't already compared to it). If that distance is less than your value e then the corresponding value in the connected matrix is set to one. Zero indicates two nodes are not connected.
In this code I'm assuming that your nodeList is a list of cartesian coordinates of the form nodeList = [[x1,y1,...],[x2,y2,...],...[xN,yN,...]]. I also assume you have some function called calcDistance which returns the euclidean distance between two cartesian coordinates. This is basic enough to implement that I haven't written the code for that, and in any case using a function allows for future generalizing and modability.
numNodes = len(nodeList)
connected = np.zeros([numNodes,numNodes])
for i, n1 in enumerate(nodeList):
for j, n2 in enumerate(nodeList[i:]):
dist = calcDistance(n1, n2)
if dist < e:
connected[i,j] = 1
connected[j,i] = 1
I'd like to modify a Python script of mine operating on a square lattice (it's an agent based model for biology), to work in a hexagonal universe.
This is how I create and initialize the 2D matrix in the square model: basically, N is the size of the lattice and R gives the radius of the part of the matrix where I need to change value at the beginning of the algorithm:
a = np.zeros(shape=(N,N))
center = N/2
for i in xrange(N):
for j in xrange(N):
if( ( pow((i-center),2) + pow((j-center),2) ) < pow(R,2) ):
a[i,j] = 1
I then let the matrix evolve according to certains rules and finally print via the creation of a pickle file:
name = "{0}-{1}-{2}-{3}-{4}.pickle".format(R, A1, A2, B1, B2)
pickle.dump(a, open(name,"w"))
Now, I'd like to do exactly the same but on an hexagonal lattice. I read this interesting StackOverflow question which clearified how to represent the positions on a hexagonal lattice with three coordinates, but a couple of things stay obscure to my knowledge, i.e.
(a) how should I deal with the three axes in Python, considering that what I want is not equivalent to a 3D matrix, due to the constraints on the coordinates, and
(b) how to plot it?
As for (a), this is what I was trying to do:
a = np.zeros(shape=(N,N,N))
for i in xrange(N/2-R, N/2+R+1):
for j in xrange(N/2-R, N/2+R+1):
for k in xrange(N/2-R, N/2+R+1):
if((abs(i)+abs(j)+abs(k))/2 <= 3*N/4+R/2):
a[i,j,k] = 1
It seems to me pretty convoluted to initialize a NxNxN matrix like that and then find a way to print a subset of it according to the constraints over the coordinates. I'm looking for a simpler way and, more importantly, for understanding how to plot the hexagonal lattice resulting from the algorithm (no clue on that, I haven't tried anything for the moment).
I agree that trying to shoehorn a hexagonal lattice into a cubic is problematic. My suggestion is to use a general scheme - represent the neighboring sites as a graph. This works very well with pythons dictionary object and it trivial to implement the "axial coordinate scheme" in one of the links you provided. Here is an example that creates and draws the "lattice" using networkx.
import networkx as nx
G = nx.Graph(directed=False)
G.add_node((0,0))
for n in xrange(4):
for (q,r) in G.nodes():
G.add_edge((q,r),(q,r-1))
G.add_edge((q,r),(q-1,r))
G.add_edge((q,r),(q-1,r+1))
G.add_edge((q,r),(q,r+1))
G.add_edge((q,r),(q+1,r-1))
G.add_edge((q,r),(q+1,r))
pos = nx.graphviz_layout(G,prog="neato")
nx.draw(G,pos,alpha=.75)
import pylab as plt
plt.axis('equal')
plt.show()
This is isn't the most optimal implementation but it can generate arbitrarily large lattices: