My aim:
I have x, y and z values as arrays. For example:
x=np.array([10,2,-4,12,3,6,8,14])
y=np.array([5,5,-6,8,20,10,2,2])
z=np.array([4,6,10,40,22,14,20,8])
I want to plot a heatmap where the z-values will act as the intensity or 'weight' for every pair of (x,y) and the axes will be x and y values. So, my plot will be on a x-y plane. I want to lay a 'grid' on top of my plot by dividing my x-y plane into bins and then calculate the mean of the z-values within every bin and use that mean value as my color or intensity for that bin. I also want to make another plot but there I want to plot the variance of z-values as the intensity within the bins.
What I have done:
I coded it the following way but I think I am misinterpreting things..I don't think I understand bins etc well (I am new to programming).
import numpy as np
import matplotlib.pyplot as plt
x=np.array([10,2,-4,12,3,6,8,14])
y=np.array([5,5,-6,8,20,10,2,2])
z=np.array([4,6,10,40,22,14,-20,8])
# Bin the data onto a 2x2 grid
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(2,2), weights=z, normed=False)
counts, _, _ = np.histogram2d(y, x, bins=(2,2))
#to get mean divide by counts
zi = zi / counts
print(zi)
zi = np.ma.masked_invalid(zi)
fig, ax = plt.subplots()
sc=ax.pcolormesh(xi, yi, zi, edgecolors='black')
sct = ax.scatter(x, y, c=z, s=200) #shows the points in the bins
fig.colorbar(sc)
ax.margins(0.05)
plt.show()
Where I am stuck:
I am not even sure if the above code is doing the right thing. So, feel free to forget it and advise me on any other standard way of solving this problem.
With the above code I get a plot where the axes limits are determined by the given dataset automatically but I want to keep my axes constant at xmin=-20,xmax=20,ymin=-20,ymax=20.
Also, I am not sure how to manipulate the z-values within the bins to calculate other statistical quantities like variance or standard deviation etc.
EDIT: so, I have got some better code that gives the mean z values in bins and plot using np.histogram2d and the I can set the axes etc to my liking now but using this gives H as the sum of values in bins and I can get the mean from that but not other statistical quantities like variance. I wanted a way to code this so that I can have access to the values in the bin and I can calculate variance of those and use that result as the weight/intensity of the heatmap.
I am attaching the plot for mean z in bins.
import numpy as np
import matplotlib.pyplot as plt
x=np.array([10,2,4,12,3,6,8,14])
y=np.array([5,5,6,8,20,10,2,2])
z=np.array([4,6,10,40,22,14,20,8])
x_bins = np.linspace(0, 20, 3)
y_bins = np.linspace(0, 20, 3)
H, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins], weights = z)
H_counts, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins])
print(H)
H1 = H/H_counts
print(H1)
plt.xlabel("x")
plt.ylabel("y")
plt.imshow(H1.T, origin='lower', cmap='RdBu',
extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar().set_label('mean z', rotation=270)
EDIT 2: When I use stats for standard deviation I get the following plot
The deep red bin on the top right is actually empty and has no z values so I want the standard deviation to be 'Nan' instead of being assigned a value of 0. How can I do that?
My code for this plot is:
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
x=np.array([10,2,4,12,3,6,8,14])
y=np.array([5,5,6,8,20,10,2,2])
z=np.array([4,6,10,40,22,14,20,8])
x_bins = np.linspace(0, 20, 3)
y_bins = np.linspace(0, 20, 3)
H, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins], weights = z)
#mean = stats.binned_statistic_2d(x,y,z,statistic='',bins=[x_bins,y_bins])
#mean.statistic
std = stats.binned_statistic_2d(x,y,z,statistic='std',bins=[x_bins,y_bins])
#std.statistic
#print(std.statistic)
plt.xlabel("x")
plt.ylabel("y")
plt.imshow(std.statistic.T, origin='lower', cmap='RdBu',
extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
#plt.clim(0, 20)
plt.colorbar().set_label('std z', rotation=270)
You data need to be interpolated on a regular grid since your computer do not know which is the z value where there is no value. Lukily there is already a function for that: scipy.interpolate.griddata.
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
# Dummy data
x=np.array([10,2,-4,12,3,6,8,14])
y=np.array([5,5,-6,8,20,10,2,2])
z=np.array([4,6,10,40,22,14,-20,8])
# Create a regular grid along x and y axis
grid_x, grid_y = np.mgrid[x.min():x.max()+1, y.min():y.max()+1]
# Linear interpolation
# But you could also use a cubic interpolation or whatever you want/need
z_interpolated = griddata((x,y), z, (grid_x, grid_y), method='linear')
# Plot the result:
plt.imshow(z_interpolated, cmap='plasma')
And we obtain:
Noticed that there is no value on the image boundary because your spatial domain is not defined beyond the value contained in x and y so with a linear interpolation, your computer can not guess what should be the value beyond those points. So the heatmap is restricted to the convexhull formed by your points, anything else will be extrapolation.
Edit:
If you need to compute a bidimentionnal binned statistic you can use:
scipy.stats.binned_statistic_2d()
In your case if we want to compute the variance and the mean:
from scipy import stats
std = stats.binned_statistic_2d(x,y,z,statistic='std',bins=[x_bins,y_bins])
mean = stats.binned_statistic_2d(x,y,z,statistic='mean',bins=[x_bins,y_bins])
Where mean is totally equivalent to your H/H_counts
I have a few of integration equations and need to convert it into Python. The problem is when I tried to plot a graph according to the equation, some of the plot is not same with the original one.
The first equation is the error probability of authentication in normal operation:
The second equation is the error probability of authentication under MIM attack:
The error probability can be calculated by:
It is noted that:
Supposedly, the graph (original) will be shown like this:
Pe^normal = blue lines
Pe^MIM = red lines
Differences between two error probabilities = green lines
I tried to code it into Python and this is my full codes:
import matplotlib.pyplot as plt
import math
import numpy as np
from scipy.special import iv,modstruve
x=np.arange(0.1,21,1)
x = np.array(x)
t = 0.9
y = (np.exp(t*x/2)*(iv(0, t*x/2) - modstruve(0,t*x/2))-1)/(np.exp(t*x)-1)
z = (np.exp((1-t**2)*x/2)*(iv(0, (1-t**2)*x/2) - modstruve(0,(1-t**2)*x/2))-1)/(np.exp((1-t**2)*x)-1)
z2= y+z
plt.plot(x, y,'o', color='red',label='Normal')
plt.plot(x, z2, '-', color='black', label='MIM')
plt.plot(x, z, marker='s', linestyle='--', color='g', label='DIFF')
plt.xlabel('Mean photon number N')
plt.ylabel('Error probabiity')
plt.scatter(x,y)
plt.text(10, 0.4, 't=0.9', size=12, ha='center', va='center')
plt.ylim([0, 0.5])
plt.xlim([0, 20])
plt.legend()
plt.show()
The graph produce from the code is:
It looks like that my plot is not same with the original one in terms N=0 of Pe^MIM (red line) and differences between two error probabilities (green line).
I hope that anyone may help me to solve this problem.
Thank you.
I'm trying to create a piecewise linear interpolation routine and I'm pretty new to all of this so I'm very uncertain of what needs to be done.
I've generate a set of data points in 3D which gives variation in all 3 directions. I want to interpolate between these data points and plot in 3D.
The current data set is much smaller than the final one will be. Linear interpolation is important.
here's the current code
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import scipy.interpolate as interp
x = np.linspace(-1.3,1.3,10)
y1 = np.linspace(.5,0.,5)
y2 = np.linspace(0.,.5,5)
y = np.hstack((y1,y2))
z1 = np.linspace(.1,0.,5)
z2 = np.linspace(0.,.1,5)
z = np.hstack((z1,z2))
data = np.dstack([x,y,z])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
f = interp.interp2d(x, y, z, kind='linear')
xnew = np.linspace(-1.3,1.3,100)
y1new = np.linspace(.5,0.,50)
y2new = np.linspace(0.,.5,50)
ynew = np.hstack((y1new,y2new))
znew = f(xnew,ynew)
ax.plot(x,y,znew, 'b-')
ax.scatter(x,y,z,'ro')
plt.show()
As I said, dataset is just to add variation. The real set will be much bigger but have less variation. I don't really understand the interpolation tool and the scipy documentation isn't very clear
would appreciate suggestions
2D ok. Please help with 3D
What I'm trying to do is build something that takes data points for deflections of a beam an interpolates between the data points. I wanted to to this in 3D and get a 3D plot showing the deflection along the x-axis in both y and z directions at the same time. As a stop gap measure I've used the below code to individually show deflection in y dir and z dir. Note, the data set is randomly generated for the moment. Some choices might look strange at the mo, but that's to sorta stick to the kinda range the final data set will use. The code below works for a 2D system so may be helpful to someone. I'd still really appreciate if someone could help me do this in 3D.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import CubicSpline
u=10
x = np.linspace(-1.3,1.3,u) #regular x-data
y = np.random.random_sample(u)/4 #random y data
z = np.random.random_sample(u)/10 # random zdata
ynone = np.ones(u)*0.1 #no deflection dataset
znone = np.ones(u)*0.05
xspace = np.linspace(-1.3, 1.3, u*100)
ydefl = CubicSpline(x, y) #creating cubinc spline function for original data
zdefl = CubicSpline(x, z)
plt.subplot(2, 1, 1)
plt.plot(x, ynone, '-',label='y - no deflection')
plt.plot(x, y, 'go',label='y-deflection data')
plt.plot(xspace, ydefl(xspace), label='spline') #plot xspace vs spline function of xspace
plt.title('X [m]s')
plt.ylabel('Y [m]')
plt.legend(loc='best', ncol=3)
plt.subplot(2, 1, 2)
plt.plot(x, znone, '-',label='z - no deflection')
plt.plot(x, z, 'go',label='z-deflection data')
plt.plot(xspace, zdefl(xspace),label='spline')
plt.xlabel('X [m]')
plt.ylabel('Z [m]')
plt.legend(loc='best', ncol=3)
plt.show()
I have a correlation plot for two variables, the predictor variable (temperature) on the x-axis, and the response variable (density) on the y-axis. My best fit least squares regression line is a 2nd order polynomial. I would like to also plot confidence and prediction intervals. The method described in this answer seems perfect. However, my dataset (n=2340) has repeated entries for many (x,y) pairs. My resulting plot looks like this:
Here is my relevant code (slightly modified from linked answer above):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import summary_table
d = {'temp': x, 'dens': y}
df = pd.DataFrame(data=d)
x = df.temp
y = df.dens
plt.figure(figsize=(6 * 1.618, 6))
plt.scatter(x,y, s=10, alpha=0.3)
plt.xlabel('temp')
plt.ylabel('density')
# points linearly spaced for predictor variable
x1 = pd.DataFrame({'temp': np.linspace(df.temp.min(), df.temp.max(), 100)})
# 2nd order polynomial
poly_2 = smf.ols(formula='dens ~ 1 + temp + I(temp ** 2.0)', data=df).fit()
# this correctly plots my single 2nd-order poly best-fit line:
plt.plot(x1.temp, poly_2.predict(x1), 'g-', label='Poly n=2 $R^2$=%.2f' % poly_2.rsquared,
alpha=0.9)
prstd, iv_l, iv_u = wls_prediction_std(poly_2)
st, data, ss2 = summary_table(poly_2, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T
# check we got the right things
print np.max(np.abs(poly_2.fittedvalues - fittedvalues))
print np.max(np.abs(iv_l - predict_ci_low))
print np.max(np.abs(iv_u - predict_ci_upp))
plt.plot(x, y, 'o')
plt.plot(x, fittedvalues, '-', lw=2)
plt.plot(x, predict_ci_low, 'r--', lw=2)
plt.plot(x, predict_ci_upp, 'r--', lw=2)
plt.plot(x, predict_mean_ci_low, 'r--', lw=2)
plt.plot(x, predict_mean_ci_upp, 'r--', lw=2)
The print statements evaluate to 0.0, as expected.
However, I need single lines for the polynomial best fit line, and the confidence and prediction intervals (rather than the multiple lines I currently have in my plot). Any ideas?
Update:
Following first answer from #kpie, I ordered my confidence and prediction interval arrays according to temperature:
data_intervals = {'temp': x, 'predict_low': predict_ci_low, 'predict_upp': predict_ci_upp, 'conf_low': predict_mean_ci_low, 'conf_high': predict_mean_ci_upp}
df_intervals = pd.DataFrame(data=data_intervals)
df_intervals_sort = df_intervals.sort(columns='temp')
This achieved desired results:
You need to order your predict values based on temperature. I think*
So to get nice curvy lines you will have to use numpy.polynomial.polynomial.polyfit This will return a list of coefficients. You will have to split the x and y data into 2 lists so it fits in the function.
You can then plot this function with:
def strPolynomialFromArray(coeffs):
return("".join([str(k)+"*x**"+str(n)+"+" for n,k in enumerate(coeffs)])[0:-1])
from numpy import *
from matplotlib.pyplot import *
x = linespace(-15,45,300) # your smooth line will be made of 300 smooth pieces
y = exec(strPolynomialFromArray(numpy.polynomial.polynomial.polyfit(xs,ys,degree)))
plt.plot(x , y)
You can look more into plotting smooth lines here just remember all lines are linear splines, becasue continuous curvature is irrational.
I believe that the polynomial fitting is done with least squares fitting (process described here)
Good Luck!