I'd like to calculate the mean square displacement at all times of a 1d trajectory. I'm a novice programmer, so I've attempted it from a simulated random walk.
import numpy as np
import matplotlib.pyplot as plt
# generate random walk
path = np.zeros(60)
position = 0
for i in range(1, path.size):
move = np.random.randint(0, 2)
if move == 0: position += -1
else: position += 1
path[i] = position
# returns a vector of MSDs from a given path
def calcMSD(data):
msd = np.zeros(data.size - 1)
for i in range(1, data.size):
squareddisplacements = (data[i:] - data[:data.size - i])**2
msd[i - 1] = squareddisplacements.mean()
return msd
msd = calcMSD(path)
plt.plot(msd)
plt.show()
Have I implemented this correctly? Any and all advice is appreciated, thanks.
Mean Squared Displacement
Definition of mean squared displacement
To make sure we are agreeing on the definition, the mean squared displacement (MSD) is the mean of the squared distance from origin of a collection of particles (in our case, walkers) at a specific step.
Since we are in 1D, the distance is simply the absolute position : distance at time t = | path[t] |
To get the MSD at step t, we must take all the squared positions at step t, then take the mean of the resulting array.
That means : MSD at step t == (path[:, t]**2).mean() (the element-wise squaring makes it useless to take the absolute values)
path[:, t] means here : the positions of each walker at step t.
Generating paths
We must edit your code a bit to have several independant paths.
walkers = 50 # Example for 50 walkers
path = []
steps = 60
for walker in range(walkers):
walker_path=[0]
position = 0
for i in range(1, steps):
move = np.random.randint(0, 2)
if move == 0: position += -1
else: position += 1
walker_path.append(position)
path.append(walker_path)
path = np.array(path)
As a sidenote, you could also use random.choice to randomly pick -1 or 1 directly, like so:
for i in range(1, steps):
position += np.random.choice([-1, 1])
walker_path.append(position)
From now, path is a 2D array : a row per walker path.
Getting MSD at each step
To get all the MSD at every steps, we vary t from 1 to the number of steps.
You can do it inside a list comprehension in Python (and getting rid of your calcMSD function):
msd = [(path[:, i]**2).mean() for i in range(1, steps)]
# you could also put range(1, len(path[0]))
Result
Your function was not adapted to plot several walkers.
The proposed solution will give you something like this :
What you want
I hope I understood what you truly wanted. I don't know how it's called but it can also be calculated in a list comprehension. Considering the path for a single walker :
not_msd = [(np.array([path[start:start+sep+1] for start in range(steps-sep)])**2).mean() for sep in range(1, steps)]
We go from start to start + step to get the difference between two positions that are step steps apart. We reproduce this operation, starting from 0 to the end of the list. Then we square the obtained list and take the mean.
We do that for all step size : 1 to the whole length.
Result :
Related
I have some random test data in a 2D array of shape (500,2) as such:
xy = np.random.randint(low=0.1, high=1000, size=[500, 2])
From this array, I first select 10 random samples, to select the 11th sample, I would like to pick the sample that is the furthest away from the original 10 selected samples collectively, I am using the euclidean distance to do this. I need to keep doing this until a certain amount have been picked. Here is my attempt at doing this.
# Function to get the distance between samples
def get_dist(a, b):
return np.sqrt(np.sum(np.square(a - b)))
# Set up variables and empty lists for the selected sample and starting samples
n_xy_to_select = 120
selected_xy = []
starting = []
# This selects 10 random samples and appends them to selected_xy
for i in range(10):
idx = np.random.randint(len(xy))
starting_10 = xy[idx, :]
selected_xy.append(starting_10)
starting.append(starting_10)
xy = np.delete(xy, idx, axis = 0)
starting = np.asarray(starting)
# This performs the selection based on the distances
for i in range(n_xy_to_select - 1):
# Set up an empty array dists
dists = np.zeros(len(xy))
for selected_xy_ in selected_xy:
# Get the distance between each already selected sample, and every other unselected sample
dists_ = np.array([get_dist(selected_xy_, xy_) for xy_ in xy])
# Apply some kind of penalty function - this is the key
dists_[dists_ < 90] -= 25000
# Sum dists_ onto dists
dists += dists_
# Select the largest one
dist_max_idx = np.argmax(dists)
selected_xy.append(xy[dist_max_idx])
xy = np.delete(xy, dist_max_idx, axis = 0)
The key to this is this line - the penalty function
dists_[dists_ < 90] -= 25000
This penalty function exists to prevent the code from just picking a ring of samples at the edge of the space, by artificially shortening values that are close together.
However, this eventually breaks down, and the selection starts clustering, as shown in the image. You can clearly see that there are much better selections that the code can make before any kind of clustering is necessary. I feel that a kind of decaying exponential function would be best for this, but I do not know how to implement it.
So my question is; how would I change the current penalty function to get what I'm looking for?
From your question, I understand that what you are looking for are Periodic Boundary Conditions (PBC). Meaning that a point which at the left edge of your space is just next to the on the right end side. Thus, the maximal distance you can get along one axis is given by the half of the box (i.e. between the edge and the center).
To take into account the PBC you need to compute the distance on each axis and subtract the half of the box to that:
For example, if you have a point with x1 = 100 and a second one with x2 = 900, using the PBC they are 200 units apart : |x1 - x2| - 500. In the general case, given 2 coordinates and the half size box, you end up by having:
In your case this simplifies to:
delta_x[delta_x > 500] = delta_x[delta_x > 500] - 500
To wrap it up, I rewrote your code using a new distance function (note that I removed some unnecessary for loops):
import numpy as np
def distance(p, arr, 500):
delta_x = np.abs(p[0] - arr[:,0])
delta_y = np.abs(p[1] - arr[:,1])
delta_x[delta_x > 500] = delta_x[delta_x > 500] - 500
delta_y[delta_y > 500] = delta_y[delta_y > 500] - 500
return np.sqrt(delta_x**2 + delta_y**2)
xy = np.random.randint(low=0.1, high=1000, size=[500, 2])
idx = np.random.randint(500, size=10)
selected_xy = list(xy[idx])
_initial_selected = xy[idx]
xy = np.delete(xy, idx, axis = 0)
n_xy_to_select = 120
for i in range(n_xy_to_select - 1):
# Set up an empty array dists
dists = np.zeros(len(xy))
for selected_xy_ in selected_xy:
# Compute the distance taking into account the PBC
dists_ = distance(selected_xy_, xy)
dists += dists_
# Select the largest one
dist_max_idx = np.argmax(dists)
selected_xy.append(xy[dist_max_idx])
xy = np.delete(xy, dist_max_idx, axis = 0)
And indeed it creates clusters, and this is normal as you will tend to create points clusters that are at the maximal distance from each others. More than that, due to the boundary conditions, we set that the maximal distance between 2 points along one axis is given by 500. The maximal distance between two clusters is thus also 500 ! And as you can see on the image, it is the case.
More over, picking more numbers will start to draws line to connect the different clusters, starting from the central one as you can see here :
What I was looking for is called 'Furthest Point Sampling'. I have some some more research into the solution, and the Python code used to perform this is found here: https://minibatchai.com/ai/2021/08/07/FPS.html
Given a 10x10 grid (2d-array) filled randomly with numbers, either 0, 1 or 2. How can I find the Euclidean distance (the l2-norm of the distance vector) between two given points considering periodic boundaries?
Let us consider an arbitrary grid point called centre. Now, I want to find the nearest grid point containing the same value as centre. I need to take periodic boundaries into account, such that the matrix/grid can be seen rather as a torus instead of a flat plane. In that case, say the centre = matrix[0,2], and we find that there is the same number in matrix[9,2], which would be at the southern boundary of the matrix. The Euclidean distance computed with my code would be for this example np.sqrt(0**2 + 9**2) = 9.0. However, because of periodic boundaries, the distance should actually be 1, because matrix[9,2] is the northern neighbour of matrix[0,2]. Hence, if periodic boundary values are implemented correctly, distances of magnitude above 8 should not exist.
So, I would be interested on how to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus by applying a wrap-around for the boundaries.
import numpy as np
matrix = np.random.randint(0,3,(10,10))
centre = matrix[0,2]
#rewrite the centre to be the number 5 (to exclude itself as shortest distance)
matrix[0,2] = 5
#find the points where entries are same as centre
same = np.where((matrix == centre) == True)
idx_row, idx_col = same
#find distances from centre to all values which are of same value
dist = np.zeros(len(same[0]))
for i in range(0,len(same[0])):
delta_row = same[0][i] - 0 #row coord of centre
delta_col = same[1][i] - 2 #col coord of centre
dist[i] = np.sqrt(delta_row**2 + delta_col**2)
#retrieve the index of the smallest distance
idx = dist.argmin()
print('Centre value: %i. The nearest cell with same value is at (%i,%i)'
% (centre, same[0][idx],same[1][idx]))
For each axis, you can check whether the distance is shorter when you wrap around or when you don't. Consider the row axis, with rows i and j.
When not wrapping around, the difference is abs(i - j).
When wrapping around, the difference is "flipped", as in 10 - abs(i - j). In your example with i == 0 and j == 9 you can check that this correctly produces a distance of 1.
Then simply take whichever is smaller:
delta_row = same[0][i] - 0 #row coord of centre
delta_row = min(delta_row, 10 - delta_row)
And similarly for delta_column.
The final dist[i] calculation needs no changes.
I have a working 'sketch' of how this could work. In short, I calculate the distance 9 times, 1 for the normal distance, and 8 shifts to possibly correct for a closer 'torus' distance.
As n is getting larger, the calculation costs can go sky high as the numbers go up. But, the torus effect, is probably not needed as there is always a point nearby without 'wrap around'.
You can easily test this, because for a grid of size 1, if a point is found of distance 1/2 or closer, you know there is not a closer torus point (right?)
import numpy as np
n=10000
np.random.seed(1)
A = np.random.randint(low=0, high=10, size=(n,n))
I create 10000x10000 points, and store the location of the 1's in ONES.
ONES = np.argwhere(A == 0)
Now I define my torus distance, which is trying which of the 9 mirrors is the closest.
def distance_on_torus( point=[500,500] ):
index_diff = [[1],[1],[0],[0],[0,1],[0,1],[0,1],[0,1]]
coord_diff = [[-1],[1],[-1],[1],[-1,-1],[-1,1],[1,-1],[1,1]]
tree = BallTree( ONES, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances = [dist[0]]
for indici_to_shift, coord_direction in zip(index_diff, coord_diff):
MIRROR = ONES.copy()
for i,shift in zip(indici_to_shift,coord_direction):
MIRROR[:,i] = MIRROR[:,i] + (shift * n)
tree = BallTree( MIRROR, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances.append(dist[0])
return np.min(distances)
%%time
distance_on_torus([2,3])
It is slow, the above takes 15 minutes.... For n = 1000 less than a second.
A optimisation would be to first consider the none-torus distance, and if the minimum distance is possibly not the smallest, calculate with only the minimum set of extra 'blocks' around. This will greatly increase speed.
I have this distribution of points (allPoints, which is a list of lists: [[x1,y1][x2,y2][x3,y3][x4,y4]...[xn,yn]]):
From which I'd like to select points, randomly.
in Python I would do something like:
from random import *
point = choice(allPoints)
Except, I need the random pick to not be biased by the existing density. For instance, here, "choice" would tend to pick a point in the upmost-leftmost part of the plot.
How can I, in Python, get rid of this bias?
I've tried to divide the space in portions of size "div", and then, sample within this portion, but in many cases, no points exist at all and the while loop doesn't find any solution:
def column(matrix, i):
return [row[i] for row in matrix]
div = 10
min_x,max_x = min(column(allPoints,0)),max(column(allPoints,0))
min_y, max_y = min(column(allPoints,1)),max(column(allPoints,1))
zone_x_min = randint(1,div-1) * (max_x - min_x) / div + min_x
zone_x_max = zone_x_min + (max_x - min_x) / div
zone_y_min = randint(1,div-1) * (max_y - min_y) / div + min_y
zone_y_max = zone_yl_min + (max_y - min_y) / div
p = choice(allPoints)
cont = True
while cont == True:
if (p[0] > zone_x_min and p[0] < zone_x_max) and (e[1] > zone_y_min and e[1] < zone_y_max):
cont = False
else:
p = choice(allPoints)
what would be a correct, inexpensive (if possible) solution to this problem?
If it wasn't ridiculous, I think something like would work for me, in theory:
p = [uniform(min_x,max_x),uniform(min_y,max_y)]
while p not in allPoints:
p = [uniform(min_x,max_x),uniform(min_y,max_y)]
The question is a little ill-formed, but here's a stab.
The idea is to use a gaussian kernel density estimate, then sample from your data with weights equal to the inverse of the pdf at each point.
This is not statistically justifiable in any real sense.
import numpy as np
from scipy import stats
#random data
x = np.random.normal(size = 200)
y = np.random.normal(size = 200)
#estimate the density
kernel = stats.gaussian_kde(np.vstack([x,y]))
#calculate the inverse of pdf for each point, and normalise to sum to 1
pvector = 1/kernel.pdf(np.vstack([x,y]))/sum(1/kernel.pdf(np.vstack([x,y])))
#get a vector of indices based on your weights
np.random.choice(range(len(x)), size = 10, replace = True, p = pvector)
I believe you want to randomly select a datum point from your graph.That is, one of the little black dots.
Compute a centroid, or pick a point like (1.0, 70).
Compute the distance from each point to the centroid and let that be the probability of your choice of that point.
That is if distance(P,C) is 100 and distance(Q,C) is 1 then let P be 100x more likely to be chosen. All points are eligible to win, but the crowded ones are individually less likely (but make it up with.volume).
If I understand your initial attempt correctly, I believe there is a simple adjustment you can make to make this work.
Randomly generate an x value (0,4.5), and a y value (0,70).
Then loop through allPoints to find the closest dot.
This has the downside of large empty areas all converging to a single point. A way to help (not remove) this problem would be to make your random point have a range. If no dot exists in that range, randomly generate a new dot.
Assuming you want your selected points to be visually spread I can think of at least one "efficient/easy" method.
Choose a random point (with random.choice for example) ;
remove from your initial set any point that is "close"*;
repeat until there is no point left in your set.
*This requires that you know from the beginning how dense you want your sample to be.
I am trying to get the neighbors of a cell in an n-dimensional space, something like 8-connected or 26-connected cells, but at any dimensionality provided an n-tuple.
Neighbors that are directly adjacent are easy enough, just +1/-1 in any dimension. The part I am having difficulty with are the diagonals, where I can have any quantity of coordinates differing by 1.
I wrote a function that recurs for each sub-dimension, and generates all +/- combinations:
def point_neighbors_recursive(point):
neighbors = []
# 1-dimension
if len(point) == 1:
neighbors.append([point[0] - 1]) # left
neighbors.append([point[0]]) # current
neighbors.append([point[0] + 1]) # right
return neighbors
# n-dimensional
for sub_dimension in point_neighbors_recursion(point[1:]):
neighbors.append([point[0] - 1] + sub_dimension) # left
neighbors.append([point[0]] + sub_dimension) # center
neighbors.append([point[0] + 1] + sub_dimension) # right
return neighbors
However this returns a lot of redundant neighbors.
Are there any better solutions?
I'll bet that all you need is in the itertools package, especially the product method. What you're looking for is the Cartesian product of your current location with each coordinate perturbed by 1 in each direction. Thus, you'll have a list of triples derived from your current point:
diag_coord = [(x-1, x, x+1) for x in point]
Now, you take the product of all those triples, recombine each set, and you have your diagonals.
Is that what you needed?
This question is somewhat similar to this. I've gone a bit farther than the OP, though, and I'm in Python 2 (not sure what he was using).
I have a Python function that can determine the distance from a point inside a convex polygon to regularly-defined intervals along the polygon's perimeter. The problem is that it returns "extra" distances that I need to eliminate. (Please note--I suspect this will not work for rectangles yet. I'm not finished with it.)
First, the code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# t1.py
#
# Copyright 2015 FRED <fred#matthew24-25>
#
# THIS IS TESTING CODE ONLY. IT WILL BE MOVED INTO THE CORRECT MODULE
# UPON COMPLETION.
#
from __future__ import division
import math
import matplotlib.pyplot as plt
def Dist(center_point, Pairs, deg_Increment):
# I want to set empty lists to store the values of m_lnsgmnt and b_lnsgmnts
# for every iteration of the for loop.
m_linesegments = []
b_linesegments = []
# Scream and die if Pairs[0] is the same as the last element of Pairs--i.e.
# it has already been run once.
#if Pairs[0] == Pairs[len(Pairs)-1]:
##print "The vertices contain duplicate points!"
## Creates a new list containing the original list plus the first element. I did this because, due
## to the way the for loop is set up, the last iteration of the loop subtracts the value of the
## last value of Pairs from the first value. I therefore duplicated the first value.
#elif:
new_Pairs = Pairs + [Pairs[0]]
# This will calculate the slopes and y-intercepts of the linesegments of the polygon.
for a in range(len(Pairs)):
# This calculates the slope of each line segment and appends it to m_linesegments.
m_lnsgmnt = (new_Pairs[a+1][2] - new_Pairs[a][3]) / (new_Pairs[a+1][0] - new_Pairs[a][0])
m_linesegments.append(m_lnsgmnt)
# This calculates the y-intercept of each line segment and appends it to b_linesegments.
b_lnsgmnt = (Pairs[a][4]) - (m_lnsgmnt * Pairs[a][0])
b_linesegments.append(b_lnsgmnt)
# These are temporary testing codes.
print "m_linesegments =", m_linesegments
print "b_linesegments =", b_linesegments
# I want to set empty lists to store the value of m_rys and b_rys for every
# iteration of the for loop.
m_rays = []
b_rays = []
# I need to set a range of degrees the intercepts will be calculated for.
theta = range(0, 360, deg_Increment)
# Temporary testing line.
print "theta =", theta
# Calculate the slope and y-intercepts of the rays radiating from the center_point.
for b in range(len(theta)):
m_rys = math.tan(math.radians(theta[b]))
m_rays.append(m_rys)
b_rys = center_point[1] - (m_rys * center_point[0])
b_rays.append(b_rys)
# Temporary testing lines.
print "m_rays =", m_rays
print "b_rays =", b_rays
# Set empty matrix for Intercepts.
Intercepts = []
angle = []
# Calculate the intersections of the rays with the line segments.
for c in range((360//deg_Increment)):
for d in range(len(Pairs)):
# Calculate the x-coordinates and the y-coordinates of each
# intersection
x_Int = (b_rays[c] - b_linesegments[d]) / (m_linesegments[d] - m_rays[c])
y_Int = ((m_linesegments[d] * x_Int) + b_linesegments[d])
Intercepts.append((x_Int, y_Int))
# Calculates the angle of the ray. Rounding is necessary to
# compensate for binary-decimal errors.
a_ngle = round(math.degrees(math.atan2((y_Int - center_point[1]), (x_Int - center_point[0]))))
# Substitutes positive equivalent for every negative angle,
# i.e. -270 degrees equals 90 degrees.
if a_ngle < 0:
a_ngle = a_ngle + 360
# Selects the angles that correspond to theta
if a_ngle == theta[c]:
angle.append(a_ngle)
print "INT1=", Intercepts
print "angle=", angle
dist = []
# Calculates distance.
for e in range(len(Intercepts) - 1):
distA = math.sqrt(((Intercepts[e][0] - center_point[0])**2) + ((Intercepts[e][5]- center_point[1])**2))
dist.append(distA)
print "dist=", dist
if __name__ == "__main__":
main()
Now, as to how it works:
The code takes 3 inputs: center_point (a point contained in the polygon, given in (x,y) coordinates), Pairs (the vertices of the polygon, also given in (x,y) coordinats), and deg_Increment ( which defines how often to calculate distance).
Let's assume that center_point = (4,5), Pairs = [(1, 4), (3, 8), (7, 2)], and deg_Increment = 20. This means that a polygon is created (sort of) whose vertices are Pairs, and center_point is a point contained inside the polygon.
Now rays are set to radiate from center_point every 20 degrees (which isdeg_Increment). The intersection points of the rays with the perimeter of the polygon are determined, and the distance is calculated using the distance formula.
The only problem is that I'm getting too many distances. :( In my example above, the correct distances are
1.00000 0.85638 0.83712 0.92820 1.20455 2.07086 2.67949 2.29898 2.25083 2.50000 3.05227 2.22683 1.93669 1.91811 2.15767 2.85976 2.96279 1.40513
But my code is returning
dist= [2.5, 1.0, 6.000000000000001, 3.2523178818773006, 0.8563799085248148, 3.0522653889161626, 5.622391569468206, 0.8371216462519347, 2.226834844885431, 37.320508075688686, 0.9282032302755089, 1.9366857335569072, 7.8429970322236064, 1.2045483557883576, 1.9181147622136665, 3.753460385470896, 2.070863609380179, 2.157671808913309, 2.6794919243112276, 12.92820323027545, 2.85976265663383, 2.298981118867903, 2.962792920643178, 5.162096782237789, 2.250827351906659, 1.4051274947736863, 69.47032761621092, 2.4999999999999996, 1.0, 6.000000000000004, 3.2523178818773006, 0.8563799085248148, 3.0522653889161626, 5.622391569468206, 0.8371216462519347, 2.226834844885431, 37.32050807568848, 0.9282032302755087, 1.9366857335569074, 7.842997032223602, 1.2045483557883576, 1.9181147622136665, 3.7534603854708997, 2.0708636093801767, 2.1576718089133085, 2.679491924311227, 12.928203230275532, 2.85976265663383, 2.298981118867903, 2.9627929206431776, 5.162096782237789, 2.250827351906659, 1.4051274947736847]
If anyone can help me get only the correct distances, I'd greatly appreciate it.
Thanks!
And just for reference, here's what my example looks like with the correct distances only:
You're getting too many values in Intercepts because it's being appended to inside the second for-loop [for d in range(len(Pairs))].
You only want one value in Intercept per step through the outer for-loop [for c in range((360//deg_Increment))], so the append to Intercept needs to be in this loop.
I'm not sure what you're doing with the inner loop, but you seem to be calculating a separate intercept for each of the lines that make up the polygon sides. But you only want the one that you're going to hit "first" when going in that direction.
You'll have to add some code to figure out which of the 3 (in this case) sides of the polygon you're actually going to encounter first.