Seaborn joint_plot and marginal hists mis-aligned - python

I'm trying to generate a jointplot for data with linear x and log y. The ranges are -22, -13 for x and 1e-3, 1 for y. The plot seems ok, however the marginal histograms are not correct: at least the one for the x data:
Here's my code...
# Convert observed magnitude to Absolute ...
absMag, pop3Mag, nmAbsMag = compMags(dir,z)
pop3Fraction = haloData[dir][z]['1500A_P3']/haloData[dir][z]['1500A']
pop3Fraction[pop3Fraction < 1e-3] = 1e-3 # Map Pop 3 flux < 1e-3 to 1e-3
data = np.array((absMag,pop3Fraction)).T # data is list of (x,y) pairs...
df = pd.DataFrame(data, columns=["M", "f"])
x, y = data.T
# g = sns.jointplot(x="x", y="y", data=df)
g = sns.JointGrid(x='M', y='f', data=df, xlim=[-22,-13],ylim=[0.001,1])
g.plot_joint(plt.scatter)
g.ax_marg_x.set_xscale('linear')
g.ax_marg_y.set_yscale('log')
x_h = g.ax_marg_x.hist(df['M'], color='b', edgecolor='k', bins=magBins)
y_h = g.ax_marg_y.hist(df['f'], orientation="horizontal", color='r', edgecolor='k', bins=fracBins, log=True)
ax = g.ax_joint
ax.set_xscale('linear')
ax.set_yscale('log')
ax.set_xlim([-22,-13])
ax.set_xticks([-21,-19,-17,-15,-13,-11])
ax.set_ylim([1e-3,1])
I'm not sure why the top histogram is not aligned with the data... ???

Never-mind ... on closer inspection there really are more points near -13 than anywhere else... I really need a 2d histogram here to show these nuances.
If someone has a suggestion as to how to make that plot clearly with seaborn I'd appreciate it.

Related

Python matplotlib polar coordinate is not plotting as it is supposed to be

I am plotting from a CSV file that contains Cartesian coordinates and I want to change it to Polar coordinates, then plot using the Polar coordinates.
Here is the code
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.read_csv('test_for_plotting.csv',index_col = 0)
x_temp = df['x'].values
y_temp = df['y'].values
df['radius'] = np.sqrt( np.power(x_temp,2) + np.power(y_temp,2) )
df['theta'] = np.arctan2(y_temp,x_temp)
df['degrees'] = np.degrees(df['theta'].values)
df['radians'] = np.radians(df['degrees'].values)
ax = plt.axes(polar = True)
ax.set_aspect('equal')
ax.axis("off")
sns.set(rc={'axes.facecolor':'white', 'figure.facecolor':'white','figure.figsize':(10,10)})
# sns.scatterplot(data = df, x = 'x',y = 'y', s= 1,alpha = 0.1, color = 'black',ax = ax)
sns.scatterplot(data = df, x = 'radians',y = 'radius', s= 1,alpha = 0.1, color = 'black',ax = ax)
plt.tight_layout()
plt.show()
Here is the dataset
If you run this command using polar = False and use this line to plot sns.scatterplot(data = df, x = 'x',y = 'y', s= 1,alpha = 0.1, color = 'black',ax = ax) it will result in this picture
now after setting polar = True and run this line to plot sns.scatterplot(data = df, x = 'radians',y = 'radius', s= 1,alpha = 0.1, color = 'black',ax = ax) It is supposed to give you this
But it is not working as if you run the actual code the shape in the Polar format is the same as Cartesian which does not make sense and it does not match the picture I showed you for polar (If you are wondering where did I get the second picture from, I plotted it using R)
I would appreciate your help and insights and thanks in advance!
For a polar plot, the "x-axis" represents the angle in radians. So, you need to switch x and y, and convert the angles to radians (I also added ax=ax, as the axes was created explicitly):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
data = {'radius': [0, 0.5, 1, 1.5, 2, 2.5], 'degrees': [0, 25, 75, 155, 245, 335]}
df_temp = pd.DataFrame(data)
ax = plt.axes(polar=True)
sns.scatterplot(x=np.radians(df_temp['degrees']), y=df_temp['radius'].to_numpy(),
s=100, alpha=1, color='black', ax=ax)
for deg, y in zip(df_temp['degrees'], df_temp['radius']):
x = np.radians(deg)
ax.axvline(x, color='skyblue', ls=':')
ax.text(x, y, f' {deg}', color='crimson')
ax.set_rlabel_position(-15) # Move radial labels away from plotted dots
plt.tight_layout()
plt.show()
About your new question: if you have an xy plot, and you convert these xy values to polar coordinates, and then plot these on a polar plot, you'll get again the same plot.
After some more testing with the data, I decided to create the plot directly with matplotlib, as seaborn makes some changes that don't have exactly equal effects across seaborn and matplotlib versions.
What seems to be happening in R:
The angles (given by "x") are spread out to fill the range (0,2 pi). This either requires a rescaling of x, or change how the x-values are mapped to angles. One way to get this, is subtracting the minimum. And with that result divide by the new maximum and multiply by 2 pi.
The 0 of the angles it at the top, and the angles go clockwise.
The following code should create the plot with Python. You might want to experiment with alpha and with s in the scatter plot options. (Default the scatter dots get an outline, which often isn't desired when working with very small dots, and can be removed by lw=0.)
ax = plt.axes(polar=True)
ax.set_aspect('equal')
ax.axis('off')
x_temp = df['x'].to_numpy()
y_temp = df['y'].to_numpy()
x_temp -= x_temp.min()
x_temp = x_temp / x_temp.max() * 2 * np.pi
ax.scatter(x=x_temp, y=y_temp, s=0.05, alpha=1, color='black', lw=0)
ax.set_rlim(y_temp.min(), y_temp.max())
ax.set_theta_zero_location("N") # set zero at the north (top)
ax.set_theta_direction(-1) # go clockwise
plt.show()
At the left the resulting image, at the right using the y-values for coloring (ax.scatter(..., c=y_temp, s=0.05, alpha=1, cmap='plasma_r', lw=0)):

how to change the scale of the y axis to see better in a heatmap

I currently have a scale problem with a heatmap:
As you can see at the beginning and at the end, there is a temperature variation, but as it is done on a very small distance and the scale is big, we can't see anything at all.
So is there a way or a function to fix this problem and automatically apply a better scale to see better ? To apply a small scale where there is a variation and a big scale when it is not ?
Here is the code to generate this image :
x = np.linspace(0,L,Nx+1) #array for y-axis
t = np.linspace(0.0, t_fin,Nt+1) #array to plot the time in the title
x = np.round(x,2) #change decimals
t = np.round(t,5)
y = np.arange(T[Nt,:].shape[0]) #T[Nt,:] is an array that contains the temperature
my_yticks = x #change the number of points in the y-axis
frequency = 100
data = np.vstack(T[Nt,:]) #to use in imshow
df = pd.DataFrame(data)
fig = plt.figure(figsize=(3,9)) #plotting
titre = f"Température à {t[Nt]} s"
plt.ylabel('Profondeur en m')
plt.yticks(y[::frequency], my_yticks[::frequency])
im = plt.imshow(df, cmap='jet', aspect ='auto', interpolation='bilinear')
ax = plt.gca()
ax.get_xaxis().set_visible(False)
cb = plt.colorbar()
cb.set_label('Température en °C')
plt.title(titre)
If you have any questions, do not hesitate.
Thank you !
You could use a logit scale on the y-axis. This won't however work with imshow as the values must be between 0 and 1 exclusively. You could use a pcolormesh instead.
import matplotlib.pyplot as plt
import numpy as np
M,N = 100,20
a = np.array(M*[15])
a[:3] = [0,5,10]
a[-3:] = [20,25,30]
a = np.outer(a, np.ones(N))
fig, (axl,axr) = plt.subplots(ncols=2, figsize=(3,6))
axl.imshow(a, cmap='jet', aspect ='auto', interpolation='bilinear', extent=(N,0,M,0))
axl.yaxis.set_ticklabels([f'{t/M*10:.3g}' for t in axl.yaxis.get_ticklocs()])
axl.get_xaxis().set_visible(False)
axl.set_title('linear')
eps = 1e-3
X, Y = np.meshgrid(np.linspace(0, 1, N), np.linspace(1-eps, eps, M))
cm = axr.pcolormesh(X, Y, a, cmap='jet', shading='gouraud')
axr.set_yscale('logit')
axr.yaxis.set_ticklabels([f'{10*(1-t):.3g}' for t in axr.yaxis.get_ticklocs()])
axr.get_xaxis().set_visible(False)
axr.set_title('logit')
cb = plt.colorbar(cm, pad=0.2)
Warning: setting the y labels as fixed values is only useful if you don't want to pan/zoom your image.
You can set the y axis limit by ax.set_ylim([,])

plotting an mXnXk matrix as a 3d model in python

I have a matrix generated by parsing a file the numpy array is the size 101X101X41 and each entry has a value which represents the magnitude at each point.
Now what I want to do is to plot it in a 3d plot where the 4th dimension will be represented by color. so that I will be able to see the shape of the data points (represent molecular orbitals) and deduce its magnitude at that point.
If I plot each slice of data I get the desired outcome, but in a 2d with the 3rd dimension as the color.
Is there a way to plot this model in python using Matplotlib or equivalent library
Thanks
EDIT:
Im trying to get the question clearer to what I desire.
Ive tried the solution suggested but ive received the following plot:
as one can see, due to the fact the the mesh has lots of zeros in it it "hide" the 3d orbitals. in the following plot one can see a slice of the data, where I get the following plot:
So as you can see I have a certain structure I desire to show in the plot.
my question is, is there a way to plot only the structure and ignore the zeroes such that they won't "hide" the structure.
the code I used to generate the plots:
x = np.linspase(1,101,101)
y = np.linspase(1,101,101)
z = np.linspase(1,101,101)
xx,yy,zz = np.meshgrid(x,y,z)
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xx, yy, zz, c=cube.calc_data.flatten())
plt.show()
plt.imshow(cube.calc_data[:,:,11],cmap='jet')
plt.show()
Hope that now the question is much clearer, and that you'd appreciate the question enough now to upvote
Thanks.
you can perform the following:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
epsilon = 2.5e-2 # threshold
height, width, depth = data.shape
global_min = np.inf
global_max = -np.inf
for d in range(depth):
slice = data[:, :, d]
minima = slice.min()
if (minima < global_min): global_min = minima
maxima = slice.max()
if (maxima>global_max): global_max=maxima
norm = colors.Normalize(vmin=minima, vmax=maxima, clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.jet)
points_gt_epsilon = np.where(slice >= epsilon)
ax.scatter(points_gt_epsilon[0], points_gt_epsilon[1], d,
c=mapper.to_rgba(data[points_gt_epsilon[0],points_gt_epsilon[1],d]), alpha=0.015, cmap=cm.jet)
points_lt_epsilon = np.where(slice <= -epsilon)
ax.scatter(points_lt_epsilon[0], points_lt_epsilon[1], d,
c=mapper.to_rgba(data[points_lt_epsilon[0], points_lt_epsilon[1], d]), alpha=0.015, cmap=cm.jet)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
plt.title('Electron Density Prob.')
norm = colors.Normalize(vmin=global_min, vmax=global_max, clip=True)
cax, _ = colorbar.make_axes(ax)
colorbar.ColorbarBase(cax, cmap=cm.jet,norm=norm)
plt.savefig('test.png')
plt.clf()
What this piece of code does is going slice by slice from the data matrix and for each scatter plot only the points desired (depend on epsilon).
in this case you avoid plotting a lot of zeros that 'hide' your model, using your words.
Hope this helps
You can adjust the color and size of the markers for the scatter. So for example you can filter out all markers below a certain threshold by putting their size to 0. You can also make the size of the marker adaptive to the field strength.
As an example:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
f = lambda x,y,z: np.exp(-(x-3)**2-(y-3)**2-(z-1)**2) - \
np.exp(-(x+3)**2-(y+3)**2-(z+1)**2)
t1 = np.linspace(-6,6,101)
t2 = np.linspace(-3,3,41)
# Data of shape 101,101,41
data = f(*np.meshgrid(t1,t1,t2))
print(data.shape)
# Coordinates
x = np.linspace(1,101,101)
y = np.linspace(1,101,101)
z = np.linspace(1,101,41)
xx,yy,zz = np.meshgrid(x,y,z)
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')
s = np.abs(data/data.max())**2*25
s[np.abs(data) < 0.05] = 0
ax.scatter(xx, yy, zz, s=s, c=data.flatten(), linewidth=0, cmap="jet", alpha=.5)
plt.show()

Density scatter plot for huge dataset in matplotlib

I wrote some code a while ago that used gaussian kde to make simple density scatter plots. However, for datasets larger than about 100,000 points, it just ran 'forever' (I killed it after a few days). A friend gave me some code in R that could create such a density plot in seconds (plot_fun.R), and it seems like matplotlib should be able to do the same thing.
I think the right place to look is 2d histograms, but I am struggling to get the density to be 'right'. I modified code I found at this question to accomplish this, but the density is not showing, it looks like only the densist posible points are getting any color.
Here is approximately the code I am using:
# initial data
x = -np.log10(np.random.random_sample(10000))
y = -np.log10(np.random.random_sample(10000))
#histogram definition
bins = [1000, 1000] # number of bins
thresh = 3 #density threshold
#data definition
mn = min(x.min(), y.min())
mx = max(x.max(), y.max())
mn = mn-(mn*.1)
mx = mx+(mx*.1)
xyrange = [[mn, mx], [mn, mx]]
# histogram the data
hh, locx, locy = np.histogram2d(x, y, range=xyrange, bins=bins)
posx = np.digitize(x, locx)
posy = np.digitize(y, locy)
#select points within the histogram
ind = (posx > 0) & (posx <= bins[0]) & (posy > 0) & (posy <= bins[1])
hhsub = hh[posx[ind] - 1, posy[ind] - 1] # values of the histogram where the points are
xdat1 = x[ind][hhsub < thresh] # low density points
ydat1 = y[ind][hhsub < thresh]
hh[hh < thresh] = np.nan # fill the areas with low density by NaNs
f, a = plt.subplots(figsize=(12,12))
c = a.imshow(
np.flipud(hh.T), cmap='jet',
extent=np.array(xyrange).flatten(), interpolation='none',
origin='upper'
)
f.colorbar(c, ax=ax, orientation='vertical', shrink=0.75, pad=0.05)
s = a.scatter(
xdat1, ydat1, color='darkblue', edgecolor='', label=None,
picker=True, zorder=2
)
That produces this plot:
The KDE code is here:
f, a = plt.subplots(figsize=(12,12))
xy = np.vstack([x, y])
z = sts.gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are
# plotted last
idx = z.argsort()
x2, y2, z = x[idx], y[idx], z[idx]
s = a.scatter(
x2, y2, c=z, s=50, cmap='jet',
edgecolor='', label=None, picker=True, zorder=2
)
That produces this plot:
The problem is, of course, that this code is unusable on large data sets.
My question is: how can I use the 2d histogram to produce a scatter plot like that? ax.hist2d does not produce a useful output, because it colors the whole plot, and all my efforts to get the above 2d histogram data to actually color the dense regions of the plot correctly have failed, I always end up with either no coloring or a tiny percentage of the densest points being colored. Clearly I just don't understand the code very well.
Your histogram code assigns a unique color (color='darkblue') so what are you expecting?
I think you are also over complicating things. This much simpler code works fine:
import numpy as np
import matplotlib.pyplot as plt
x, y = -np.log10(np.random.random_sample((2,10**6)))
#histogram definition
bins = [1000, 1000] # number of bins
# histogram the data
hh, locx, locy = np.histogram2d(x, y, bins=bins)
# Sort the points by density, so that the densest points are plotted last
z = np.array([hh[np.argmax(a<=locx[1:]),np.argmax(b<=locy[1:])] for a,b in zip(x,y)])
idx = z.argsort()
x2, y2, z2 = x[idx], y[idx], z[idx]
plt.figure(1,figsize=(8,8)).clf()
s = plt.scatter(x2, y2, c=z2, cmap='jet', marker='.')

Large Dataset Polynomial Fitting Using Numpy

I'm trying to fit a second order polynomial to raw data and output the results using Matplotlib. There are about a million points in the data set that I'm trying to fit. It is supposed to be simple, with many examples available around the web. However for some reason I cannot get it right.
I get the following warning message:
RankWarning: Polyfit may be poorly conditioned
This is my output:
This is output using Excel:
See below for my code. What am I missing??
xData = df['X']
yData = df['Y']
xTitle = 'X'
yTitle = 'Y'
title = ''
minX = 100
maxX = 300
minY = 500
maxY = 2200
title_font = {'fontname':'Arial', 'size':'30', 'color':'black', 'weight':'normal',
'verticalalignment':'bottom'} # Bottom vertical alignment for more space
axis_font = {'fontname':'Arial', 'size':'18'}
#Poly fit
# calculate polynomial
z = np.polyfit(xData, yData, 2)
f = np.poly1d(z)
print(f)
# calculate new x's and y's
x_new = xData
y_new = f(x_new)
#Plot
plt.scatter(xData, yData,c='#002776',edgecolors='none')
plt.plot(x_new,y_new,c='#C60C30')
plt.ylim([minY,maxY])
plt.xlim([minX,maxX])
plt.xlabel(xTitle,**axis_font)
plt.ylabel(yTitle,**axis_font)
plt.title(title,**title_font)
plt.show()
The array to plot must be sorted. Here is a comparisson between plotting a sorted and an unsorted array. The plot in the unsorted case looks completely distorted, however, the fitted function is of course the same.
2
-3.496 x + 2.18 x + 17.26
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
x = (np.random.normal(size=300)+1)
fo = lambda x: -3*x**2+ 1.*x +20.
f = lambda x: fo(x) + (np.random.normal(size=len(x))-0.5)*4
y = f(x)
fig, (ax, ax2) = plt.subplots(1,2, figsize=(6,3))
ax.scatter(x,y)
ax2.scatter(x,y)
def fit(ax, x,y, sort=True):
z = np.polyfit(x, y, 2)
fit = np.poly1d(z)
print(fit)
ax.set_title("unsorted")
if sort:
x = np.sort(x)
ax.set_title("sorted")
ax.plot(x, fo(x), label="original func", color="k", alpha=0.6)
ax.plot(x, fit(x), label="fit func", color="C3", alpha=1, lw=2.5 )
ax.legend()
fit(ax, x,y, sort=False)
fit(ax2, x,y, sort=True)
plt.show()
The problem is probably using a power basis for data that is displaced some distance from zero along the x axis. If you use the Polynomial class from numpy.polynomial it will scale and shift the data before the fit, which will help, and also keep track of the scale and shift used. Note that if you want the coefficients in the normal form you will need to convert to that form.

Categories

Resources