I'm using Python 3.7.
There is a set of points.
I generate Delaunay triangulation through these points.
import numpy as np
points = np.array([[0, 0], [0, 1.1], [1, 0], [1, 1], [1.5, 0.6], [1.2, 0.5], [1.7, 0.9], [1.1, 0.1]])
from scipy.spatial import Delaunay
tri = Delaunay(points)
How can I remove some edges through the edge length threshold?
How to plot the new Delaunay triangulation (after remove the edges)?
My idea is generating edge index through the point. Like,
[e1, (0,0),(1,0),e1_length], [e2, (0,0),(1,1),e2_length], ...
We need to make three operations: convert triangles from Delaunay object to the set of edges (with removing duplicates), calculate length for each edge and select edges which meets the criterion.
Creating set of edges and calculation of lengths:
def less_first(a, b):
return [a,b] if a < b else [b,a]
def delaunay2edges(tri):
list_of_edges = []
for triangle in tri.simplices:
for e1, e2 in [[0,1],[1,2],[2,0]]: # for all edges of triangle
list_of_edges.append(less_first(triangle[e1],triangle[e2])) # always lesser index first
array_of_edges = np.unique(list_of_edges, axis=0) # remove duplicates
list_of_lengths = []
for p1,p2 in array_of_edges:
x1, y1 = tri.points[p1]
x2, y2 = tri.points[p2]
list_of_lengths.append((x1-x2)**2 + (y1-y2)**2)
array_of_lengths = np.sqrt(np.array(list_of_lengths))
return array_of_edges, array_of_lengths
edges, lengths = delaunay2edges(tri)
Selecting edges by criterion (length > 0.5 for example):
criterion = np.argwhere(lengths > 0.5).flatten()
selected_edges = edges[criterion]
print('Removed', len(edges) - len(selected_edges), 'edges')
Plotting:
import matplotlib.pyplot as plt
plt.triplot(tri.points[:,0], tri.points[:,1], tri.simplices, color='red')
x_lines = []
y_lines = []
for p1,p2 in selected_edges:
x1,y1 = points[p1]
x2,y2 = points[p2]
plt.plot([x1,x2],[y1,y2], color='blue')
plt.scatter(points[:,0],points[:,1])
plt.show()
Related
I would like to know, for this mixture of Gaussian distributions generated by the data we give ourselves, how do we figure out which component is more likely to belong to a new sample we are given?
I learned that Matlab seems to have functions that can be calculated directly, is there any in python? I haven't found an answer so far.
import matplotlib.pyplot as plt
import numpy as np
import random
# Bivariate example
dim = 2
# Settings
n = 500
NumberOfMixtures = 3
# Mixture weights (non-negative, sum to 1)
w = [0.5, 0.25, 0.25]
# Mean vectors and covariance matrices
MeanVectors = [ [0,0], [-5,5], [5,5] ]
CovarianceMatrices = [ [[1, 0], [0, 1]], [[1, .8], [.8, 1]], [[1, -.8], [-.8, 1]] ]
# Initialize arrays
samples = np.empty( (n,dim) ); samples[:] = np.NaN
componentlist = np.empty( (n,1) ); componentlist[:] = np.NaN
# Generate samples
for iter in range(n):
# Get random number to select the mixture component with probability according to mixture weights
DrawComponent = random.choices(range(NumberOfMixtures), weights=w, cum_weights=None, k=1)[0]
# Draw sample from selected mixture component
DrawSample = np.random.multivariate_normal(MeanVectors[DrawComponent], CovarianceMatrices[DrawComponent], 1)
# Store results
componentlist[iter] = DrawComponent
samples[iter, :] = DrawSample
# Report fractions
print('Fraction of mixture component 0:', np.sum(componentlist==0)/n)
print('Fraction of mixture component 1:',np.sum(componentlist==1)/n)
print('Fraction of mixture component 2:',np.sum(componentlist==2)/n)
# Visualize result
plt.plot(samples[:, 0], samples[:, 1], '.', alpha=0.5)
plt.grid()
plt.show()
The problem has been sovled, the answer can refer in the link:
https://stackoverflow.com/questions/42971126/multivariate-gaussian-distribution-scipy
I am trying to sort the vertices of a polygon in either clockwise or anti-clockwise manner. I am trying to calculate an average point [x_avg, y_avg] inside the polygon and calculate all the angles of the vertices from the average point. But my code is giving wrong angles. I am using the formula "atan((m1-m2)/1+m1m2))" to calculate the relative angle between the average point and any vertice. Please let me know what is wrong with the code? Or what algorithm can I use to calculate the ordered vertices? Here is the code:
import math
def rounding_polygon(polygon):
x, y = zip(*polygon)
print(x,y)
x_avg = sum(x)/len(x)
y_avg = sum(y)/len(y)
angles = []
print('x_a = ', x_avg, 'ya =', y_avg)
x1, y1 = polygon[0][0], polygon[0][1]
m_com = (y_avg-y1)/(x_avg-x1)
for v in polygon:
x2, y2 = v[0], v[1]
m_curr = (y_avg-y2)/(x_avg-x2)
slope = (m_com-m_curr)/(1 + (m_com*m_curr))
curr_angle = math.degrees(math.atan(slope))
angles.append([curr_angle, v])
angles = sorted(angles)
vertices = [x[1] for x in angles]
print('angles = ', angles)
print('vertices = ', vertices)
return vertices
polygon = [[1, 5], [4, 1], [7, 8], [7, 1], [1.8, 5.4]]
vertices = rounding_polygon(polygon)
print(vertices)
atan function gives results in limited range (half of circle). To get full angle range -Pi..Pi, you should use atan2 function that takes two arguments - y-difference and x-difference.
Example uses the lowest (left one if two points have the same Y) point as base for sorting.
import math
polygon = [[1, 5], [4, 1], [7, 8], [7, 1], [1.8, 5.4]]
lowest = min(polygon, key = lambda x: (x[1], x[0]))
vertices = sorted(polygon, key=lambda x: math.atan2(x[1]-lowest[1], x[0]-lowest[0]) + 2 * math.pi)
print(vertices)
>>[[4, 1], [7, 1], [7, 8], [1.8, 5.4], [1, 5]]
Why is the lowest point chosen? To exclude non-transitivity during comparing point direction angles (when A>B, B>C but C>B)
I have two copies of the same molecule as .xyz file. This means that each atom is has X, Y and Z coordinates. However, you can rotate the molecule and obtain different coordinates for each atom, although the relative positions are the same and the molecule remains the same. I want to align the two molecules using three atoms as reference points. However, I am struggling to completely align the two molecules.
Firstly, I align both molecules by translation for a single atom. Then, I am doing two subsequent rotation using rotation matrices as explained elsewhere. For some reason, I need to take the negative of the cross product of both vectors and use a sinus instead of a cosinus to get both structures to be perfectly aligned (I discovered this after a lot of trial and error).
For the second rotation, I project both vectors I want to align on a plane defined by the rotation vector. This is necessary because I don't want to rotate along the cross product of the two vectors to align, since that would disalign the rest of the molecule. Instead, I rotate along the two already aligned vectors. The project allows me to find the angle in the plane between the two vectors, and thus the rotation necessary.
However, this code does not properly align the two molecules.
"group1[0]" contains the XYZ coordinates of the three atoms to align in a list. Likewise for "group2[0]" and the structure 2.
#Point 1: align the functional groups to the origin
O1 = np.array(coords1[group1[0][0]])
O2 = np.array(coords2[group2[0][0]])
mat_2 = np.zeros((len(atoms2), 3))
for ind, c in enumerate(coords1):
coords1[ind] = np.array(c) - O1
for ind, c in enumerate(coords2):
coords2[ind] = np.array(c) - O2
mat_2[ind] = coords2[ind]
#Point 2: align according to a first vector
v1 = np.array(coords1[group1[0][1]])#Since atom 1 is the origin, the coordinates is the vector already
v2 = np.array(coords2[group2[0][1]])#Since atom 1 is the origin, the coordinates is the vector already
v1 = v1/np.linalg.norm(v1)
v2 = v2/np.linalg.norm(v2)
#Let v be the axis of rotation
v = -np.cross(v1, v2)#why do I need a minus here?
if np.linalg.norm(v) != 0:
a = np.arccos(np.dot(v1, v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
#c = np.dot(v1, v2)*np.cos(a)
c = np.dot(v1, v2)*np.sin(a)#The internet says cos, but this works perfectly
vx = np.array([[0, -v[2], v[1]], [v[2], 0, -v[0]], [-v[1], v[0], 0]])
rot_mat = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) + vx + vx.dot(vx)*(1-c)/(1-c**2)
mat_2 = np.array(mat_2)
R_mat_rot = np.matmul(rot_mat, mat_2.T).T
else:
exit(0)
coords3 = R_mat_rot.copy()
#I get exactly what I want up until here
#Point 3: Rotate along atom2-atom1 (v1) to align the third atom
v = -v1.copy()
v2 = np.array(coords3[group2[0][2]]) - np.array(coords3[group2[0][0]]) #Since atom 1 is the origin, the coordinates is the vector already
v2 = v2/np.linalg.norm(v2)
v1 = np.array(coords1[group1[0][2]]) - np.array(coords1[group1[0][0]]) #Since atom 1 is the origin, the coordinates is the vector already
v1 = v1/np.linalg.norm(v1)
if np.linalg.norm(v) != 0:
#consider v to be the vector normal to a plane
#we want the projection of v1 and v2 unto that plane
vp1 = np.cross(v, np.cross(v1, v)) - np.array(coords1[group1[0][0]])
vp1 = vp1/np.linalg.norm(vp1)
vp2 = np.cross(v, np.cross(v2, v)) - np.array(coords3[group2[0][0]])
vp2 = vp2/np.linalg.norm(vp2)
#we find the angle between those vectors on the plane
a = np.arccos(np.dot(vp1, vp2))/(np.linalg.norm(vp1)*np.linalg.norm(vp2))
#rotation of that amount
c = np.dot(v1, v2)*np.cos(a)
vx = np.array([[0, -v[2], v[1]], [v[2], 0, -v[0]], [-v[1], v[0], 0]])
rot_mat = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) + vx + np.dot(vx, vx)*(1-c)/(1-c**2)
R_mat_rot = np.matmul(rot_mat, coords3.T).T
coords4 = R_mat_rot.copy()#Final coordinates
Is it any fast way to merge two numpy histograms with different bin ranges and bin number?
For example:
x = [1,2,2,3]
y = [4,5,5,6]
a = np.histogram(x, bins=10)
# a[0] = [1, 0, 0, 0, 0, 2, 0, 0, 0, 1]
# a[1] = [ 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. ]
b = np.histogram(y, bins=5)
# b[0] = [1, 0, 2, 0, 1]
# b[1] = [ 4. , 4.4, 4.8, 5.2, 5.6, 6. ]
Now I want to have some function like this:
def merge(a, b):
# some actions here #
return merged_a_b_values, merged_a_b_bins
Actually I have not x and y, a and b are known only.
But the result of merge(a, b) must be equal to np.histogram(x+y, bins=10):
m = merge(a, b)
# m[0] = [1, 0, 2, 0, 1, 0, 1, 0, 2, 1]
# m[1] = [ 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ]
I'd actually have added a comment to dangom's answer, but I lack the reputation required.
I'm a little confused by your example. You're plotting the histogram of the histogram bins if I'm not mistaken. It should rather be this, right?
plt.figure()
plt.plot(a[1][:-1], a[0], marker='.', label='a')
plt.plot(b[1][:-1], b[0], marker='.', label='b')
plt.plot(c[1][:-1], c[0], marker='.', label='c')
plt.legend()
plt.show()
Also a note to your suggestion for combining the histogram. You are of course right, that there's no unique solution as you simply don't know, where the samples would've have been in the finer grid you use for the combination. When having two histograms, which have a significantly differing bin width the suggested merging function may result in a sparse and artificial looking histogram.
I tried combining the histograms by interpolation (assuming the samples within the count bin were distributed uniformly in the original bin - which is of course also only an assumption).
This leads however to a more natural looking result, at least for data sampled from distributions I typically encounter.
import numpy as np
def merge_hist(a, b):
edgesa = a[1]
edgesb = b[1]
da = edgesa[1]-edgesa[0]
db = edgesb[1]-edgesb[0]
dint = np.min([da, db])
min = np.min(np.hstack([edgesa, edgesb]))
max = np.max(np.hstack([edgesa, edgesb]))
edgesc = np.arange(min, max, dint)
def interpolate_hist(edgesint, edges, hist):
cumhist = np.hstack([0, np.cumsum(hist)])
cumhistint = np.interp(edgesint, edges, cumhist)
histint = np.diff(cumhistint)
return histint
histaint = interpolate_hist(edgesc, edgesa, a[0])
histbint = interpolate_hist(edgesc, edgesb, b[0])
c = histaint + histbint
return c, edgesc
An example for two gaussian distributions:
import numpy as np
a = 5 + 1*np.random.randn(100)
b = 10 + 2*np.random.randn(100)
hista, edgesa = np.histogram(a, bins=10)
histb, edgesb = np.histogram(b, bins=5)
histc, edgesc = merge_hist([hista, edgesa], [histb, edgesb])
plt.figure()
width = edgesa[1]-edgesa[0]
plt.bar(edgesa[:-1], hista, width=width)
width = edgesb[1]-edgesb[0]
plt.bar(edgesb[:-1], histb, width=width)
plt.figure()
width = edgesc[1]-edgesc[0]
plt.bar(edgesc[:-1], histc, width=width)
plt.show()
I, however, am no statistician, so please let me know if the suggestes approach is viable.
There is no unique solution to the problem of merging two different histograms. I propose here a simple and quick solution based on two design assumptions necessary to deal with the loss of information inherent from binning sequences:
Recovered values are represented by the start of the bin they belong to.
The merge shall keep the highest bin resolution to avoid further loss of information and shall completely encompass the intervals of the children histograms.
Here's the code:
import numpy as np
def merge(a, b):
def extract_vals(hist):
# Recover values based on assumption 1.
values = [[y]*x for x, y in zip(hist[0], hist[1])]
# Return flattened list.
return [z for s in values for z in s]
def extract_bin_resolution(hist):
return hist[1][1] - hist[1][0]
def generate_num_bins(minval, maxval, bin_resolution):
# Generate number of bins necessary to satisfy assumption 2
return int(np.ceil((maxval - minval) / bin_resolution))
vals = extract_vals(a) + extract_vals(b)
bin_resolution = min(map(extract_bin_resolution, [a, b]))
num_bins = generate_num_bins(min(vals), max(vals), bin_resolution)
return np.histogram(vals, bins=num_bins)
Here's the example code:
import matplotlib.pyplot as plt
x = [1,2,2,3]
y = [4,5,5,6]
a = np.histogram(x, bins=10)
# a[0] = [1, 0, 0, 0, 0, 2, 0, 0, 0, 1]
# a[1] = [ 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. ]
b = np.histogram(y, bins=5)
# b[0] = [1, 0, 2, 0, 1]
# b[1] = [ 4. , 4.4, 4.8, 5.2, 5.6, 6. ]
# Merge and plot results
c = merge(a, b)
c_num_bins = c[1].size - 1
plt.hist(a[0], bins=5, label='a')
plt.hist(b[0], bins=10, label='b')
plt.hist(c[0], bins=c_num_bins, label='c')
plt.legend()
plt.show()
I have a large list of x and y coordinates, stored in an numpy array.
Coordinates = [[ 60037633 289492298]
[ 60782468 289401668]
[ 60057234 289419794]]
...
...
What I want is to find all nearest neighbors within a specific distance (lets say 3 meters) and store the result so that I later can do some further analysis on the result.
For most packages I found it is necessary to decided how many NNs should be found but I just want all within the set distance.
How can I achieve something like that and what is the fastest and best way to achieve something like that for a large dataset (some million points)?
You could use a scipy.spatial.cKDTree:
import numpy as np
import scipy.spatial as spatial
points = np.array([(1, 2), (3, 4), (4, 5)])
point_tree = spatial.cKDTree(points)
# This finds the index of all points within distance 1 of [1.5,2.5].
print(point_tree.query_ball_point([1.5, 2.5], 1))
# [0]
# This gives the point in the KDTree which is within 1 unit of [1.5, 2.5]
print(point_tree.data[point_tree.query_ball_point([1.5, 2.5], 1)])
# [[1 2]]
# More than one point is within 3 units of [1.5, 1.6].
print(point_tree.data[point_tree.query_ball_point([1.5, 1.6], 3)])
# [[1 2]
# [3 4]]
Here is an example showing how you can
find all the nearest neighbors to an array of points, with one call
to point_tree.query_ball_point:
import numpy as np
import scipy.spatial as spatial
import matplotlib.pyplot as plt
np.random.seed(2015)
centers = [(1, 2), (3, 4), (4, 5)]
points = np.concatenate([pt+np.random.random((10, 2))*0.5
for pt in centers])
point_tree = spatial.cKDTree(points)
cmap = plt.get_cmap('copper')
colors = cmap(np.linspace(0, 1, len(centers)))
for center, group, color in zip(centers, point_tree.query_ball_point(centers, 0.5), colors):
cluster = point_tree.data[group]
x, y = cluster[:, 0], cluster[:, 1]
plt.scatter(x, y, c=color, s=200)
plt.show()