I'd like to modify a Python script of mine operating on a square lattice (it's an agent based model for biology), to work in a hexagonal universe.
This is how I create and initialize the 2D matrix in the square model: basically, N is the size of the lattice and R gives the radius of the part of the matrix where I need to change value at the beginning of the algorithm:
a = np.zeros(shape=(N,N))
center = N/2
for i in xrange(N):
for j in xrange(N):
if( ( pow((i-center),2) + pow((j-center),2) ) < pow(R,2) ):
a[i,j] = 1
I then let the matrix evolve according to certains rules and finally print via the creation of a pickle file:
name = "{0}-{1}-{2}-{3}-{4}.pickle".format(R, A1, A2, B1, B2)
pickle.dump(a, open(name,"w"))
Now, I'd like to do exactly the same but on an hexagonal lattice. I read this interesting StackOverflow question which clearified how to represent the positions on a hexagonal lattice with three coordinates, but a couple of things stay obscure to my knowledge, i.e.
(a) how should I deal with the three axes in Python, considering that what I want is not equivalent to a 3D matrix, due to the constraints on the coordinates, and
(b) how to plot it?
As for (a), this is what I was trying to do:
a = np.zeros(shape=(N,N,N))
for i in xrange(N/2-R, N/2+R+1):
for j in xrange(N/2-R, N/2+R+1):
for k in xrange(N/2-R, N/2+R+1):
if((abs(i)+abs(j)+abs(k))/2 <= 3*N/4+R/2):
a[i,j,k] = 1
It seems to me pretty convoluted to initialize a NxNxN matrix like that and then find a way to print a subset of it according to the constraints over the coordinates. I'm looking for a simpler way and, more importantly, for understanding how to plot the hexagonal lattice resulting from the algorithm (no clue on that, I haven't tried anything for the moment).
I agree that trying to shoehorn a hexagonal lattice into a cubic is problematic. My suggestion is to use a general scheme - represent the neighboring sites as a graph. This works very well with pythons dictionary object and it trivial to implement the "axial coordinate scheme" in one of the links you provided. Here is an example that creates and draws the "lattice" using networkx.
import networkx as nx
G = nx.Graph(directed=False)
G.add_node((0,0))
for n in xrange(4):
for (q,r) in G.nodes():
G.add_edge((q,r),(q,r-1))
G.add_edge((q,r),(q-1,r))
G.add_edge((q,r),(q-1,r+1))
G.add_edge((q,r),(q,r+1))
G.add_edge((q,r),(q+1,r-1))
G.add_edge((q,r),(q+1,r))
pos = nx.graphviz_layout(G,prog="neato")
nx.draw(G,pos,alpha=.75)
import pylab as plt
plt.axis('equal')
plt.show()
This is isn't the most optimal implementation but it can generate arbitrarily large lattices:
Related
The Problem:
Using NumPy, I have created an array of random points within a range.
import numpy as np
min_square = 5
positions = (np.random.random(size=(100, 2)) - 0.5) * 2 * container_radius
Where container_radius is an integer and min_square is an integer.
Following that, using matplotlib, I plot the points on a graph.
import matplotlib.pyplot as plt
plt.plot(positions[:, 0], positions[:, 1], 'r.')
plt.show()
This graph shows me the distribution of the points in relation to each other.
What I am looking for is a method to implement something similar to or exactly a k-d tree to draw a rectangle over the densest area of the scatter plot with a defined minimum for the size.
This would be done using plt.gca().add_patch(plt.Rectangle((x, y), width=square_size, height=square_side, fill=None where square_side is the defined by the density function and is at least a minimum sizeo of min_square.
Attempts to Solve the Problem:
So far, I have created my own sort of density function that is within my understanding of Python and easy enough to code without lagging my computer too hard.
The solve comes in the form of creating an additional predefined variable intervals which is an integer.
Using what I had so far, I define a function to calculate the densities by checking if the points are within a range of floats.
# clb stands for calculate_lower_bound
def clb(x):
return -1 * container_radius + (x * 2 * container_radius - min_square) / (intervals - 1)
# crd stands for calculate_regional_density
def crd(x, y):
return np.where(np.logical_and(\
np.logical_and(positions[:, 0] >= clb(x), positions[:, 0] < clb(x) + min_square),\
np.logical_and(positions[:, 1] >= clb(y), positions[:, 1] < clb(y) + min_square)))[0].shape[0]
Then, I create a NumPy array of size size=(intervals, intervals) and pass the indices of the array (I have another question about this as I am currently using a quite inefficient method) as inputs into crd(x,y) and store the values in another array called densities. Then using some method, I calculate the maximum value in my densities array and draw the rectangle using some pretty straightforward code that I do not think is necessary to include here as it is not the problem.
What I Looking For:
I am looking for some function, f(x), that computes the dimensions and coordinates of a square encompassing the densest region on a scatterplot graph. The function would have access to all the variables it needs such as positions, min_square, etc. If you could use informative variable names or explain what each variable means, that would be a great help as well.
Other (Potentially) Important Notes:
I am looking for something that gets the job done in a reasonable time. In most scenarios, I am going to be working with around 10000 points and I need to calculate the densest region around 100 times so the function needs to be efficient enough so that the task completes within around 10-20 seconds.
As such, approximations using formulas like the example I have shown are completely valid as long as they implement well and are able to grow the dimensions of the square larger if necessary.
Thanks!
I'm currently searching for an efficient algorithm that takes in a set of points from three dimensional spaces and groups them into classes (maybe represented by a list). A point should belong to a class if it is close to one or more other points from the class. Two classes are then the same if they share any point.
Because I'm working with large data sets, I don't want to use recursive methods. Also, using something like a distance matrix with O(n^2) performance is what I try to avoid.
I tried to check for some algorithms online, but most of them don't appeal to this specific purpose (e.g. k-d tree or other cluster algorithms). I thought about parting space into smaller parts, but that (potentially) results in an inexact result.
I tried to write something myself, but it turned out to be flawed. I would sort my points after distance and append the distance as a fourth coordinate and then repeat the following the following code-segment:
def grouping_presorted(lst, distance):
positions = [0]
x = []
while positions:
curr_el = lst[ positions[-1] ]
nn_i = HasNeighbor(lst, distance, positions[-1])
if nn_i is None:
x.append(lst.pop(positions[-1]) )
positions.pop(-1)
else:
positions.append(nn_i)
return x
def HasNeighbor(lst,distance,index):
i = index+1
while lst[i][3]- lst[index][3] < distance:
dist = (lst[i][0]-lst[index][0])**2 + (lst[i][1]-lst[index][1])**2 + (lst[i][2]-lst[index][2])**2
if dist < distance:
return i
i+=1
return None
Aside from an (probably easy to fix) overflow error, there's a bigger flaw in the logic of linking the points. If you think of my points describing lines in space, the algorithm only works for lines that strictly point outwards the origin, but not for circles or similar structures.
Does anybody know of a prewritten code for this or have an idea what I could try?
Thanks in advance.
Edit: It seems my spelling and maybe confusion of some terms has sparked some misunderstandings. I hope that this (badly-made) sketch helps. In this example, I marked my reference distance as d and circled the two containers I wan't to end up with in red.
You could try https://en.wikipedia.org/wiki/OPTICS_algorithm. When you index the points first (e.g, with an R-Tree) this should be possible in O(n log n).
Edit:
If you already know your epsilon and how many points are minimally in a cluster (minpoints) then DBSCAN could be the better choice.
What I ended up doing
After following all the suggestions of your comments, help from cs.stackexchange and doing some research I was able to write down two different methods for solving this problem. In case someone might be interested, I decided to share them here. Again, the problem is to write a program that takes in a set of coordinate tuples and groups them into clusters. Two points x,y belong to the same cluster if there is a sequence of elements x=x_1,..,y=x_N such that d(x_i,x_i+1)
DBSCAN: By fixing euclidean metric, minPts = 2 and grouping distance epsilon = r.
scikit-learn provides a nice implementation of this algorithm. A minimal code snippet for the task would be:
from sklearn.cluster import DBSCAN
from sklearn.datasets.samples_generator import make_blobs
import networkx as nx
import scipy.spatial as sp
def cluster(data, epsilon,N): #DBSCAN, euclidean distance
db = DBSCAN(eps=epsilon, min_samples=N).fit(data)
labels = db.labels_ #labels of the found clusters
n_clusters = len(set(labels)) - (1 if -1 in labels else 0) #number of clusters
clusters = [data[labels == i] for i in range(n_clusters)] #list of clusters
return clusters, n_clusters
centers = [[1, 1,1], [-1, -1,1], [1, -1,1]]
X,_ = make_blobs(n_samples=N, centers=centers, cluster_std=0.4,
random_state=0)
cluster(X,epsilon,N)
On my machine, N=20000 for this clustering variation with an epsilon of epsilon = 0.1 takes just 290ms, so this seems really quick to me.
Graph components: One can think of this problem as follows: The coordinates define nodes of a graph, and two nodes are adjacent if their distance is smaller than epsilon/r. A cluster is then given as a connected component of this graph. At first I had problems implementing this graph, but there are many ways to write a linear time algorithm to do this. The easiest and fastest way however, for me, was to use scipy.spatial's cKDTree data structure and the corresponding query_pairs() method, that returns a list of indice tuples of points that are in given distance. One could for example write it like this:
class IGraph:
def __init__(self, nodelst=[], radius = 1):
self.igraph = nx.Graph()
self.radii = radius
self.nodelst = nodelst #nodelst is array of coordinate tuples, graph contains indices as nodes
self.__make_edges__()
def __make_edges__(self):
self.igraph.add_edges_from( sp.cKDTree(self.nodelst).query_pairs(r=self.radii) )
def get_conn_comp(self):
ind = [list(x) for x in nx.connected_components(self.igraph) if len(x)>1]
return [self.nodelst[indlist] for indlist in ind]
def graph_cluster(data, epsilon):
graph = IGraph(nodelst = data, radius = epsilon)
clusters = graph.get_conn_comp()
return clusters, len(clusters)
For the same dataset mentioned above, this method takes 420ms to find the connected components. However, for smaller clusters, e.g. N=700, this snippet runs faster. It also seems to have an advantage for finding smaller clusters (that is being given smaller epsilon values) and a vast disadvantage in the other direction (all on this specific dataset of course). I think, depending on the given situation, both methods are worth considering.
Hope this is of use for somebody.
Edit: Theoretically, DBSCAN has computational complexity O(n log n) when properly implemented (according to wikipedia...), while constructing the graph as well as finding its connected components runs linear in time. I'm not sure how well these statements hold for the given implementations though.
Adapt a path-finding algorithm, such as Dijkstra's or A*, or alternatively adapt the breadth-first or depth-first search of a graph. Start at any point in the set of unvisited points, and proceed with whichever algorithm you've picked with the caveat that a point is considered to be connected only to all points to which its distance is less than the threshhold. When you've finished off with one class (i.e. when you can discover no more new nodes), pick any node from the set of unvisited nodes and repeat.
I'm relatively new to Python coding (I'm switching from R mostly due to running time speed) and I'm trying to figure out how to code a proximity graph.
That is suppose i have an array of evenly-spaced points in d-dimensional Euclidean space, these will be my nodes. I want to make these into an undirected graph by connecting two points if and only if they lie within e apart. How can I encode this functionally with parameters:
n: spacing between two points on the same axis
d: dimension of R^d
e: maximum distance allowed for an edge to exist.
The graph-tool library has much of the functionality you need. So you could do something like this, assuming you have numpy and graph-tool:
coords = numpy.meshgrid(*(numpy.linspace(0, (n-1)*delta, n) for i in range(d)))
# coords is a Python list of numpy arrays
coords = [c.flatten() for c in coords]
# now coords is a Python list of 1-d numpy arrays
coords = numpy.array(coords).transpose()
# now coords is a numpy array, one row per point
g = graph_tool.generation.geometric_graph(coords, e*(1+1e-9))
The silly e*(1+1e-9) thing is because your criterion is "distance <= e" and geometric_graph's criterion is "distance < e".
There's a parameter called delta that you didn't mention because I think your description of parameter n is doing duty for two params (spacing between points, and number of points).
This bit of code should work, although it certainly isn't the most efficient. It will go through each node and check its distance to all the other nodes (that haven't already compared to it). If that distance is less than your value e then the corresponding value in the connected matrix is set to one. Zero indicates two nodes are not connected.
In this code I'm assuming that your nodeList is a list of cartesian coordinates of the form nodeList = [[x1,y1,...],[x2,y2,...],...[xN,yN,...]]. I also assume you have some function called calcDistance which returns the euclidean distance between two cartesian coordinates. This is basic enough to implement that I haven't written the code for that, and in any case using a function allows for future generalizing and modability.
numNodes = len(nodeList)
connected = np.zeros([numNodes,numNodes])
for i, n1 in enumerate(nodeList):
for j, n2 in enumerate(nodeList[i:]):
dist = calcDistance(n1, n2)
if dist < e:
connected[i,j] = 1
connected[j,i] = 1
I am working with an algorithm that, for each iteration, needs to find which region of a Voronoi diagram a set of arbirary coordinats belong to. that is, which region each coordinate is located within. (We can assume that all coordinates will belong to a region, if that makes any difference.)
I don't have any code that works in Python yet, but the the pseudo code looks something like this:
## we are in two dimensions and we have 0<x<1, 0<y<1.
for i in xrange(1000):
XY = get_random_points_in_domain()
XY_candidates = get_random_points_in_domain()
vor = Voronoi(XY) # for instance scipy.spatial.Voronoi
regions = get_regions_of_candidates(vor,XY_candidates) # this is the function i need
## use regions for something
I know that the scipy.Delaunay has a function called find_simplex which will do pretty much what I want for simplices in a Delaunay triangulation, but I need the Voronoi diagram, and constructing both is something I wish to avoid.
Questions:
1. Is there a library of some sort that will let me do this easily?
2. If not, is there a good algorithm I could look at that will let me do this efficiently?
Update
Jamie's solution is exactly what I wanted. I'm a little embarrassed that I didn't think of it myself though ...
You don't need to actually calculate the Voronoi regions for this. By definition the Voronoi region around a point in your set is made up of all points that are closer to that point than to any other point in the set. So you only need to calculate distances and find nearest neighbors. Using scipy's cKDTree you could do:
import numpy as np
from scipy.spatial import cKDTree
n_voronoi, n_test = 100, 1000
voronoi_points = np.random.rand(n_voronoi, 2)
test_points = np.random.rand(n_test, 2)
voronoi_kdtree = cKDTree(voronoi_points)
test_point_dist, test_point_regions = voronoi_kdtree.query(test_points, k=1)
test_point_regions Now holds an array of shape (n_test, 1) with the indices of the points in voronoi_points closest to each of your test_points.
So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.