I have a array with coordinates. I calculates the distances between all points. Now I only want to show the coordinates that have a distance above a certain threshold. How can I do this in python?
import numpy as np
import scipy
import matplotlib.pylab as plt
dx = np.array([b-a for a,b in combinations (x,2)])
dy = np.array([b-a for a,b in combinations (y,2)])
all_distances = scipy.stats.pdist( np.array(list(zip(x,y))) )
all_distances
df3=all_distances[~(all_distances<=35)]
df4=all_distances[~(all_distances<=40)]
df5=all_distances[~(all_distances<=45)]
fig, ax = plt.subplots()
plt.scatter(df3)
plt.ylabel('dy')
plt.xlabel('dx')
plt.show()
Below you see the point with all distances, but now I want a scatterplot with point that are above a threshold of 35
scatterplot
May you are looking for something like this
import numpy as np
from scipy.spatial.distance import pdist
combinations = np.array([(1,2), (3,4), (5,8), (10,12)])
all_distances = pdist( np.array(combinations))
print(all_distances)
print(all_distances[all_distances>3])
You are able to do the same with other arrays to, so probably something like plt.scatter(dx[all_distances>35], dy[all_distances>35]) solves your problem.
Related
I have created a list of values of Shannon entropy for a pair of multiple sequence aligned sequences. While plotting the values I get a simple plot. I want to plot a smooth curve over the lines. Can anyone suggest to me what will be the right way to process it? BAsically I want to plot a smooth curve that touches the tip of every bar and goes to zero where the "y axis value" is zero.
link for image: [1]: https://i.stack.imgur.com/SY3jH.png
#importing the relevant packages
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.interpolate import make_interp_spline
from Bio import AlignIO
import warnings
warnings.filterwarnings("ignore")
#function to calculate the Shannon Entropy of a MSA
# H = -sum[p(x).log2(px)]
def shannon_entropy(list_input):
unique_aa = set(list_input)
M = len(list_input)
entropy_list = []
# Number of residues in column
for aa in unique_aa:
n_i = list_input.count(aa)
P_i = n_i/float(M)
entropy_i = P_i*(math.log(P_i,2))
entropy_list.append(entropy_i)
sh_entropy = -(sum(entropy_list))
#print(sh_entropy)
return sh_entropy
#importing the MSA file
#importing the clustal file
align_clustal1 =AlignIO.read("/home/clustal.aln", "clustal")
def shannon_entropy_list_msa(alignment_file):
shannon_entropy_list = []
for col_no in range(len(list(alignment_file[0]))):
list_input = list(alignment_file[:, col_no])
shannon_entropy_list.append(shannon_entropy(list_input))
return shannon_entropy_list
clustal_omega1 = shannon_entropy_list_msa(align_clustal1)
# Plotting the data
plt.figure(figsize=(18,10))
plt.plot(clustal_omega1, 'r')
plt.xlabel('Residue', fontsize=16)
plt.ylabel("Shannon's entropy", fontsize=16)
plt.show()
Edit 1:
Here is what my graph looks like after implementing the "pchip" method. link for the pchip output: https://i.stack.imgur.com/hA3KW.png
pchip monotonic spline output
One approach would be to use PCHIP interpolation, which will give you the monotonic curve with the required behaviour for zero values on the y-axis.
We can't run your exact code example on our machines because you point to a local Clustal file in your 'home' directory.
Here's a simple working example, with link to output image:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import pchip
mylist = [10,0,0,0,0,9,9,0,0,0,11,11,11,0,0]
mylist_np = np.array(mylist)
samples = np.array(range(len(mylist)))
xnew = np.linspace(samples.min(), samples.max(), 100)
plt.plot(xnew,pchip(samples, mylist_np )(xnew))
plt.show()
Can someone explain why I get this strange output when running this code:
import matplotlib.pyplot as plt
import numpy as np
def x_y():
return np.random.randint(9999, size=1000), np.random.randint(9999, size=1000)
plt.plot(x_y())
plt.show()
The output:
Your data is a tuple of two 1000 length arrays.
def x_y():
return np.random.randint(9999, size=1000), np.random.randint(9999, size=1000)
xy = x_y()
print(len(xy))
# > 2
print(xy[0].shape)
# > (1000,)
Let's read pyplot's documentation:
plot(y) # plot y using x as index array 0..N-1
Thus pyplot will plot a line between (0, xy[0][i]) and (1, xy[1][i]), for i in range(1000).
You probably try to do this:
plt.plot(*x_y())
This time, it will plot 1000 points joined by lines: (xy[0][i], xy[1][i]) for i in range 1000.
Yet, the lines don't represent anything here. Therefore you probably want to see individual points:
plt.scatter(*x_y())
Your function x_y is returning a tuple, assigning each element to a variable gives the correct output.
import matplotlib.pyplot as plt
import numpy as np
def x_y():
return np.random.randint(9999, size=1000), np.random.randint(9999, size=1000)
x, y = x_y()
plt.plot(x, y)
plt.show()
I have a vector for velocity of a time series. for example :
u=[100,120,150,115,130,115,105,103,108,132,135,121]
now I need to calculate Δu and then draw a scatter plot.
something like the picture below.
how can I do that?
import numpy as np
import matplotlib.pyplot as plt
u = np.array([100,120,150,115,130,115,105,103,108,132,135,121])
du = u[1:] - u[:-1] # the difference between the current and the prior velocity
plt.scatter(u[1:],du)
plt.show()
Assuming Δu = u[i] - u[i-1], It is obvious you should use this formula for u[1:].
So just simply use these lines of codes:
from numpy import array
import matplotlib.pyplot as plt
u = aray(u)
del_u = u[1:] - u[:-1] # This line let you use that formula for all numbers in array except first one.
plt.scatter(u[1:], del_u)
plt.show()
I have a couple hundred coordinates in a 3d space, I need to merge the points closer than a given radius and replace them with the neighbors average.
It sounds like a pretty standard problem but I haven't been able to find a solution so far. The dataset is small enough to be able to compute pairwise distances for all the points.
Don't know, maybe some kind of graph analysis / connected components labelling on the sparse distance matrix?
I don't really need the averaging part, just the clustering (is clustering the correct term here?)
A toy dataset could be coords = np.random.random(size=(100,2))
Here's what I tried so far using scipy.cluster.hierarchy. It seems to work fine, but I'm open to more suggestions (DBSCAN maybe?)
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import fclusterdata
from scipy.spatial.distance import pdist
np.random.seed(0)
fig = plt.figure(figsize=(10,5))
gs = mpl.gridspec.GridSpec(1,2)
gs.update(wspace=0.01, hspace= 0.05)
coords = np.random.randint(30, size=(200,2))
img = np.zeros((30,30))
img[coords.T.tolist()] = 1
ax = plt.subplot(gs[0])
ax.imshow(img, cmap="nipy_spectral")
clusters = fclusterdata(coords, 2, criterion="distance", metric="euclidean")
print(len(np.unique(clusters)))
img[coords.T.tolist()] = clusters
ax = plt.subplot(gs[1])
ax.imshow(img, cmap="nipy_spectral")
plt.show()
Here is a method that uses KDTree to query neighbors and networkx module to gather connected components.
from scipy import spatial
import networkx as nx
cutoff = 2
components = nx.connected_components(
nx.from_edgelist(
(i, j) for i, js in enumerate(
spatial.KDTree(coords).query_ball_point(coords, cutoff)
)
for j in js
)
)
clusters = {j: i for i, js in enumerate(components) for j in js}
Example output:
I was wondering if there's a way to find tangents to curve from discrete data.
For example:
x = np.linespace(-100,100,100001)
y = sin(x)
so here x values are integers, but what if we want to find tangent at something like x = 67.875?
I've been trying to figure out if numpy.interp would work, but so far no luck.
I also found a couple of similar examples, such as this one, but haven't been able to apply the techniques to my case :(
I'm new to Python and don't entirely know how everything works yet, so any help would be appreciated...
this is what I get:
from scipy import interpolate
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-100,100,10000)
y = np.sin(x)
tck, u = interpolate.splprep([y])
ti = np.linspace(-100,100,10000)
dydx = interpolate.splev(ti,tck,der=1)
plt.plot(x,y)
plt.plot(ti,dydx[0])
plt.show()
There is a comment in this answer, which tells you that there is a difference between splrep and splprep. For the 1D case you have here, splrep is completely sufficient.
You may also want to limit your curve a but to be able to see the oscilations.
from scipy import interpolate
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-15,15,1000)
y = np.sin(x)
tck = interpolate.splrep(x,y)
dydx = interpolate.splev(x,tck,der=1)
plt.plot(x,y)
plt.plot(x,dydx, label="derivative")
plt.legend()
plt.show()
While this is how the code above would be made runnable, it does not provide a tangent. For the tangent you only need the derivative at a single point. However you need to have the equation of a tangent somewhere and actually use it; so this is more a math question.
from scipy import interpolate
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-15,15,1000)
y = np.sin(x)
tck = interpolate.splrep(x,y)
x0 = 7.3
y0 = interpolate.splev(x0,tck)
dydx = interpolate.splev(x0,tck,der=1)
tngnt = lambda x: dydx*x + (y0-dydx*x0)
plt.plot(x,y)
plt.plot(x0,y0, "or")
plt.plot(x,tngnt(x), label="tangent")
plt.legend()
plt.show()
It should be noted that you do not need to use splines at all if the points you have are dense enough. In that case obtaining the derivative is just taking the differences between the nearest points.
from scipy import interpolate
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-15,15,1000)
y = np.sin(x)
x0 = 7.3
i0 = np.argmin(np.abs(x-x0))
x1 = x[i0:i0+2]
y1 = y[i0:i0+2]
dydx, = np.diff(y1)/np.diff(x1)
tngnt = lambda x: dydx*x + (y1[0]-dydx*x1[0])
plt.plot(x,y)
plt.plot(x1[0],y1[0], "or")
plt.plot(x,tngnt(x), label="tangent")
plt.legend()
plt.show()
The result will be visually identical to the one above.