plotting conditional distribution in python

plotting conditional distribution in python - python

I'm new to python and trying to plot a gaussian distribution having the function defined as
I plotted normal distribution P(x,y) and it's giving correct output. code and output are below.
Code :
Output :
Now I need to plot a conditional distribution and the output should like . to do this I need to define a boundary condition for the equation. I tried to define a boundary condition but it's not working. the code which I tried is but it's giving wrong output
please help me how to plot the same.
Thanks,

You used the boundary condition on the wrong parameter, try to do it after creating the grid points.
R = np.arange(-4, 4, 0.1)
X, Y = np.meshgrid(R, R)
then validate X and Y based on the condition
valid_xy = np.sqrt(X**2+Y**2) >= 1
X = X[valid_xy]
Y = Y[valid_xy]
Then continue with the rest of the code.
Update
If you want just to reset values around the peak to zero, you can use the following code:
import numpy as np
import matplotlib.pyplot as plt
R = np.arange(-4, 4, 0.1)
X, Y = np.meshgrid(R, R)
Z = np.sum(np.exp(-0.5*(X**2+Y**2)))
P = (1/Z)*np.exp(-0.5*(X**2+Y**2))
# reset the peak
invalid_xy = (X**2+Y**2)<1
P[invalid_xy] = 0
# plot the result
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, P, s=0.5, alpha=0.5)
plt.show()

You can't use np.meshgrid anymore because it will output a matrix where the coordinates of X and Y form a grid (hence its name) and not a custom shape (a grid minus a disc like you want):
However you can create your custom grid the following way:
R = np.arange(-,4,0.1)
xy_coord = np.array(((x,y) for x in R for y in R if (x*x + y*y) > 1))
X,Y = xy_coord.transpose()
X
# array([ 0. , 0. , 0. , ..., 3.9, 3.9, 3.9])
Y
# array([ 1.1, 1.2, 1.3, ..., 3.7, 3.8, 3.9])

Related

How to plot 2 trendlines on a single scatterplot? (python)

I want to plot 2 trendlines for one scatterplot using Matplotlib in Python but I don't know how. The graph should be similar to this target plot (from here, fig.2).
I managed to plot 1 trendline on a scatterplot here but can't figure out how to plot another trend.
Underneath is what I tried until now:
This proved ok for other parameters that I plotted, but not for this case, which led me to the conclusion that it's not too correct.
X = vO2.reshape(-1, 1)
Y = ve.reshape(-1, 1)
linear_regressor = LinearRegression()
linear_regressor.fit(X, Y)
y_pred = linear_regressor.predict(X)
x_pred = linear_regressor.predict(Y)
plt.scatter(X, Y)
plt.plot(X, y_pred, '-*',label="O2")
plt.plot(x_pred, Y, '-*',label="vent")
plt.xlabel("VO2 (L/min)")
plt.ylabel("VE (L/min)")
plt.show()
and also
z1 = np.polyfit(vO2, ve, 1)
p1 = np.poly1d(z1)
z2 = np.polyfit(ve, vO2, 1)
p2 = np.poly1d(z2)
plt.scatter(vO2, ref_vent, label='original')
plt.plot(vO2, p1(vO2), label='trendline')
plt.plot(ve, p2(ve), label='trendline')
plt.show()
which also didn't look similar to the target plot.
I don't know how to continue. Thanks in advance!
example dataset:
vo2 = [1.673925 1.9015125 1.981775 2.112875 2.1112625 2.086375 2.13475
2.1777 2.176975 2.1857125 2.258925 2.2718375 2.3381 2.3330875
2.353725 2.4879625 2.448275 2.4829875 2.5084375 2.511275 2.5511
2.5678375 2.5844625 2.6101875 2.6457375 2.6602125 2.6939875 2.7210625
2.720475 2.767025 2.751375 2.7771875 2.776025 2.7319875 2.564
2.3977625 2.4459125 2.42965 2.401275 2.387175 2.3544375]
ve = [ 3.93125 7.1975 9.04375 14.06125 14.11875 13.24375
14.6625 15.3625 15.2 15.035 17.7625 17.955
19.2675 19.875 21.1575 22.9825 23.75625 23.30875
25.9925 25.6775 27.33875 27.7775 27.9625 29.35
31.86125 32.2425 33.7575 34.69125 36.20125 38.6325
39.4425 42.085 45.17 47.18 42.295 37.5125
38.84375 37.4775 34.20375 33.18 32.67708333]

OK, so you need to find the point, where slope of line changes. I tried 2nd derivative, but it was noisy and I coulnd't find the right spot.
Another way is to try all possible points, calculate left and right regression lines and find pair with best fit (r2 coeff). Give this code a try. It is not complete. I do not know, how to force regression lines to go through point in the middle. And it might be better to work with interpolated data, if there are not enough datapoints.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
vo2 = [1.673925,1.9015125,1.981775,2.112875,2.1112625,2.086375,2.13475,2.1777,2.176975,2.1857125,2.258925,2.2718375,2.3381,2.3330875,2.353725,2.4879625,2.448275,2.4829875,2.5084375,2.511275,2.5511,2.5678375,2.5844625,2.6101875,2.6457375,2.6602125,2.6939875,2.7210625,2.720475,2.767025,2.751375,2.7771875,2.776025,2.7319875,2.564,2.3977625,2.4459125,2.42965,2.401275,2.387175,2.3544375]
ve = [ 3.93125,7.1975,9.04375,14.06125,14.11875,13.24375,14.6625,15.3625,15.2,15.035,17.7625,17.955,19.2675,19.875,21.1575,22.9825,23.75625,23.30875,25.9925,25.6775,27.33875,27.7775,27.9625,29.35,31.86125,32.2425,33.7575,34.69125,36.20125,38.6325,39.4425,42.085,45.17,47.18,42.295,37.5125,38.84375,37.4775,34.20375,33.18,32.67708333]
x = np.array(vo2)
y = np.array(ve)
sort_idx = x.argsort()
x = x[sort_idx]
y = y[sort_idx]
assert len(x) == len(y)
def fit(x,y):
p = np.polyfit(x, y, 1)
f = np.poly1d(p)
r2 = r2_score(y, f(x))
return p, f, r2
skip = 5 # minimal length of split data
r2 = [0] * len(x)
funcs = {}
for i in range(len(x)):
if i < skip or i > len(x) - skip:
continue
_, f_left, r2_left = fit(x[:i], y[:i])
_, f_right, r2_right = fit(x[i:], y[i:])
r2[i] = r2_left * r2_right
funcs[i] = (f_left, f_right)
split_ix = np.argmax(r2) # index of split
f_left,f_right = funcs[split_ix]
print(f"split point index: {split_ix}, x: {x[split_ix]}, y: {y[split_ix]}")
xd = np.linspace(min(x), max(x), 100)
plt.plot(x, y, "o")
plt.plot(xd, f_left(xd))
plt.plot(xd, f_right(xd))
plt.plot(x[split_ix], y[split_ix], "x")
plt.show()

how make a density plot of the eigenvalues of a symbolic matrix in python

I want to make a colour plot of the difference of the two first eigenvalues of that matrix. In order to do this, first I have defined a symbolic matrix with two parametters "x" and "y". Then I obtain the eigenvectors and eigenvalues (shorted) and compute the gap beetwen the two first eigenvalues . Finally (and I think that here is the problem...) I make a grid of points X and Y in order to evaluate it with the function "energy_gap(x,y)" storing the result in Z and then using this in order to do the plot, but it doesn't work....Any idea why?
import numpy as np
import numpy
import matplotlib.pyplot as plt
from sympy.utilities.lambdify import lambdify
from sympy import symbols
x = symbols("x")
y = symbols("y")
matrix = [[x+2, x,y],[y**2,x,3],[y+4,2,1]]
simbolic_matrix = lambdify((x,y), matrix,'numpy')
def eigen_system(x,y):
values, vectors = numpy.linalg.eig(np.array(simbolic_matrix(x,y)))
values_short = np.sort(values)
vectors_short = vectors[:,values.argsort()]
return values_short , vectors_short
def energy_gap(x,y):
values , vectors = eigen_system(x,y)
gap = abs(values[1])-abs(values[0])
return gap
def plot_energy_gap():
x = np.arange(1.1, 3.0, 0.1)
y = np.arange(1.1, 3.0, 0.1)
X, Y = np.meshgrid(x, y)
Z = energy_gap(X,Y)
im = plt.imshow(Z, cmap=plt.cm.RdBu,extent=(1.1,3,1.1,3))
plt.colorbar(im)
plt.show()
plot_energy_gap()

Ok, after some extensive testing, I'm afraid I've come to the conclusion that numpys Eigen stuff calculator can operate on a mesh of matrices like you're trying. The best solution I could get was creating the mesh manuaslly:
def plot_energy_gap()
Z = []
for x in np.arange(1.1, 3.0, 0.1):
Z.append([])
for y in np.arange(1.1, 3.0, 0.1):
Z[-1].append(energy_gap(x, y))
im = plt.imshow(Z, cmap=plt.cm.RdBu,extent=(1.1,3,1.1,3))
plt.colorbar(im)
Maybe someone else can vectorize this. EDIT The one line version (forgot it):
Z = [[energy_gap(x, y) for y in np.arange(1.1, 3.0, 0.1)] for x in np.arange(1.1, 3.0, 0.1)]]

3-D interpolation using LinearNDInterpolator (Python)

I want to interpolate some 3-d data using the scipy LinearNDInterpolator function (Python 2.7). I can't quite figure out how to use it, though: below is my attempt. I'm getting the error ValueError: different number of values and points. This leads me to believe that the shape of "coords" is not appropriate for these data, but it looks in the documentation like the shape is okay.
Note that in the data I really want to use (instead of this example) the spacing of my grid is irregular, so something like RegularGridInterpolator will not do the trick.
Thanks very much for your help!
def f(x,y,z):
return 2 * x**3 + 3 * y**2 - z
x = np.linspace(1,2,2)
y = np.linspace(1,2,2)
z = np.linspace(1,2,2)
data = f(*np.meshgrid(x, y, z, indexing='ij', sparse=True))
coords = np.zeros((len(x),len(y),len(z),3))
coords[...,0] = x.reshape((len(x),1,1))
coords[...,1] = y.reshape((1,len(y),1))
coords[...,2] = z.reshape((1,1,len(z)))
coords = coords.reshape(data.size,3)
my_interpolating_function = LinearNDInterpolator(coords,data)
pts = np.array([[2.1, 6.2, 8.3], [3.3, 5.2, 7.1]])
print(my_interpolating_function(pts))

2d density contour plot with matplotlib

I'm attempting to plot my dataset, x and y (generated from a csv file via numpy.genfromtxt('/Users/.../somedata.csv', delimiter=',', unpack=True)) as a simple density plot. To ensure this is self containing I will define them here:
x = [ 0.2933215 0.2336305 0.2898058 0.2563835 0.1539951 0.1790058
0.1957057 0.5048573 0.3302402 0.2896122 0.4154893 0.4948401
0.4688092 0.4404935 0.2901995 0.3793949 0.6343423 0.6786809
0.5126349 0.4326627 0.2318232 0.538646 0.1351541 0.2044524
0.3063099 0.2760263 0.1577156 0.2980986 0.2507897 0.1445099
0.2279241 0.4229934 0.1657194 0.321832 0.2290785 0.2676585
0.2478505 0.3810182 0.2535708 0.157562 0.1618909 0.2194217
0.1888698 0.2614876 0.1894155 0.4802076 0.1059326 0.3837571
0.3609228 0.2827142 0.2705508 0.6498625 0.2392224 0.1541462
0.4540277 0.1624592 0.160438 0.109423 0.146836 0.4896905
0.2052707 0.2668798 0.2506224 0.5041728 0.201774 0.14907
0.21835 0.1609169 0.1609169 0.205676 0.4500787 0.2504743
0.1906289 0.3447547 0.1223678 0.112275 0.2269951 0.1616036
0.1532181 0.1940938 0.1457424 0.1094261 0.1636615 0.1622345
0.705272 0.3158471 0.1416916 0.1290324 0.3139713 0.2422002
0.1593835 0.08493619 0.08358301 0.09691083 0.2580497 0.1805554 ]
y = [ 1.395807 1.31553 1.333902 1.253527 1.292779 1.10401 1.42933
1.525589 1.274508 1.16183 1.403394 1.588711 1.346775 1.606438
1.296017 1.767366 1.460237 1.401834 1.172348 1.341594 1.3845
1.479691 1.484053 1.468544 1.405156 1.653604 1.648146 1.417261
1.311939 1.200763 1.647532 1.610222 1.355913 1.538724 1.319192
1.265142 1.494068 1.268721 1.411822 1.580606 1.622305 1.40986
1.529142 1.33644 1.37585 1.589704 1.563133 1.753167 1.382264
1.771445 1.425574 1.374936 1.147079 1.626975 1.351203 1.356176
1.534271 1.405485 1.266821 1.647927 1.28254 1.529214 1.586097
1.357731 1.530607 1.307063 1.432288 1.525117 1.525117 1.510123
1.653006 1.37388 1.247077 1.752948 1.396821 1.578571 1.546904
1.483029 1.441626 1.750374 1.498266 1.571477 1.659957 1.640285
1.599326 1.743292 1.225557 1.664379 1.787492 1.364079 1.53362
1.294213 1.831521 1.19443 1.726312 1.84324 ]
Now, I have used many attempts to plot my contours using variations on:
delta = 0.025
OII_OIII_sAGN_sorted = numpy.arange(numpy.min(OII_OIII_sAGN), numpy.max(OII_OIII_sAGN), delta)
Dn4000_sAGN_sorted = numpy.arange(numpy.min(Dn4000_sAGN), numpy.max(Dn4000_sAGN), delta)
OII_OIII_sAGN_X, Dn4000_sAGN_Y = np.meshgrid(OII_OIII_sAGN_sorted, Dn4000_sAGN_sorted)
Z1 = matplotlib.mlab.bivariate_normal(OII_OIII_sAGN_X, Dn4000_sAGN_Y, 1.0, 1.0, 0.0, 0.0)
Z2 = matplotlib.mlab.bivariate_normal(OII_OIII_sAGN_X, Dn4000_sAGN_Y, 0.5, 1.5, 1, 1)
# difference of Gaussians
Z = 0.2 * (Z2 - Z1)
pyplot_middle.contour(OII_OIII_sAGN_X, Dn4000_sAGN_Y, Z, 12, colors='k')
This doesn't seem to give the desired output.I have also tried:
H, xedges, yedges = np.histogram2d(OII_OIII_sAGN,Dn4000_sAGN)
extent = [xedges[0],xedges[-1],yedges[0],yedges[-1]]
ax.contour(H, extent=extent)
Not quite working as I wanted either. Essentially, I am looking for something similar to this:
If anyone could help me with this I would be very grateful, either by suggesting a totally new method or modifying my existing code. Please also attach images of your output if you have some useful techniques or ideas.

seaborn does density plots right out of the box:
import seaborn as sns
import matplotlib.pyplot as plt
sns.kdeplot(x, y)
plt.show()

It seems that histogram2d takes some fiddling to plot the contour in the right place. I took the transpose of the histogram matrix and also took the mean values of the elements in xedges and yedges instead of just removing one from the end.
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
h, xedges, yedges = np.histogram2d(x, y, bins=9)
xbins = xedges[:-1] + (xedges[1] - xedges[0]) / 2
ybins = yedges[:-1] + (yedges[1] - yedges[0]) / 2
h = h.T
CS = plt.contour(xbins, ybins, h)
plt.scatter(x, y)
plt.show()

Plot periodic trajectories

I have some data of a particle moving in a corridor with closed boundary conditions.
Plotting the trajectory leads to a zig-zag trajectory.
I would like to know how to prevent plot() from connecting the points where the particle comes back to the start. Some thing like in the upper part of the pic, but without "."
The first idea I had was to find the index where the numpy array a[:-1]-a[1:] becomes positive and then plot from 0 to that index. But how would I get the index of the first occurrence of a positive element of a[:-1]-a[1:]?
Maybe there are some other ideas.

I'd go a different approach. First, I'd determine the jump points not by looking at the sign of the derivative, as probably the movement might go up or down, or even have some periodicity in it. I'd look at those points with the biggest derivative.
Second, an elegant approach to have breaks in a plot line is to mask one value on each jump. Then matplotlib will make segments automatically. My code is:
import pylab as plt
import numpy as np
xs = np.linspace(0., 100., 1000.)
data = (xs*0.03 + np.sin(xs) * 0.1) % 1
plt.subplot(2,1,1)
plt.plot(xs, data, "r-")
#Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(data))
mask = np.hstack([ abs_d_data > abs_d_data.mean()+3*abs_d_data.std(), [False]])
masked_data = np.ma.MaskedArray(data, mask)
plt.subplot(2,1,2)
plt.plot(xs, masked_data, "b-")
plt.show()
And gives us as result:
The disadvantage of course is that you lose one point at each break - but with the sampling rate you seem to have I guess you can trade this in for simpler code.

To find where the particle has crossed the upper boundary, you can do something like this:
>>> import numpy as np
>>> a = np.linspace(0, 10, 50) % 5
>>> a = np.linspace(0, 10, 50) % 5 # some sample data
>>> np.nonzero(np.diff(a) < 0)[0] + 1
array([25, 49])
>>> a[24:27]
array([ 4.89795918, 0.10204082, 0.30612245])
>>> a[48:]
array([ 4.79591837, 0. ])
>>>
np.diff(a) calculates the discrete difference of a, while np.nonzero finds where the condition np.diff(a) < 0 is negative, i.e., the particle has moved downward.

To avoid the connecting line you will have to plot by segments.
Here's a quick way to plot by segments when the derivative of a changes sign:
import numpy as np
a = np.linspace(0, 20, 50) % 5 # similar to Micheal's sample data
x = np.arange(50) # x scale
indices = np.where(np.diff(a) < 0)[0] + 1 # the same as Micheal's np.nonzero
for n, i in enumerate(indices):
if n == 0:
plot(x[:i], a[:i], 'b-')
else:
plot(x[indices[n - 1]:i], a[indices[n - 1]:i], 'b-')

Based on Thorsten Kranz answer a version which adds points to the original data when the 'y' crosses the period. This is important if the density of data-points isn't very high, e.g. np.linspace(0., 100., 100) vs. the original np.linspace(0., 100., 1000). The x position of the curve transitions are linear interpolated. Wrapped up in a function its:
import numpy as np
def periodic2plot(x, y, period=np.pi*2.):
indexes = np.argwhere(np.abs(np.diff(y))>.5*period).flatten()
index_shift = 0
for i in indexes:
i += index_shift
index_shift += 3 # in every loop it adds 3 elements
if y[i] > .5*period:
x_transit = np.interp(period, np.unwrap(y[i:i+2], period=period), x[i:i+2])
add = np.ma.array([ period, 0., 0.], mask=[0,1,0])
else:
# interpolate needs sorted xp = np.unwrap(y[i:i+2], period=period)
x_transit = np.interp(0, np.unwrap(y[i:i+2], period=period)[::-1], x[i:i+2][::-1])
add = np.ma.array([ 0., 0., period], mask=[0,1,0])
x_add = np.ma.array([x_transit]*3, mask=[0,1,0])
x = np.ma.hstack((x[:i+1], x_add, x[i+1:]))
y = np.ma.hstack((y[:i+1], add, y[i+1:]))
return x, y
The code for comparison to the original answer of Thorsten Kranz with lower data-points density.
import matplotlib.pyplot as plt
x = np.linspace(0., 100., 100)
y = (x*0.03 + np.sin(x) * 0.1) % 1
#Thorsten Kranz: Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(y))
mask = np.hstack([np.abs(np.diff(y))>.5, [False]])
masked_y = np.ma.MaskedArray(y, mask)
# Plot
plt.figure()
plt.plot(*periodic2plot(x, y, period=1), label='This answer')
plt.plot(x, masked_y, label='Thorsten Kranz')
plt.autoscale(enable=True, axis='both', tight=True)
plt.legend(loc=1)
plt.tight_layout()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

plotting conditional distribution in python - python

Related

How to plot 2 trendlines on a single scatterplot? (python)

how make a density plot of the eigenvalues of a symbolic matrix in python

3-D interpolation using LinearNDInterpolator (Python)

2d density contour plot with matplotlib

Plot periodic trajectories

Categories

Resources