Seaborn jointplot with defined axes limits - python

I am trying to generate some figures using the joint plot command of Seaborn. I have a list of tuples called "data", with the coordinates (x,y) that needed to be plotted. The x-coordinates are in range (200,1400) and the y-coordinates are in range (300,900). However, I need to show the entire region I am working, demonstrating the concentration of the points. I need the x-coordinate to be in range (0,3000) and the y-coordinate to be in range (0-1200), and I am failing to do so.
Here is my code:
import seaborn as sns
import numpy as np
np.shape(y)
xx = np.linspace(0,1080,np.shape(y))
yy = np.linspace(0,1920,np.shape(y))
sns.jointplot(xx="x", yy="y", data=data, kind="kde")
I returns the error: "TypeError: jointplot() missing 2 required positional arguments: 'x' and 'y'".
If I don't use the xx and yy variables, it gives the plot with the axes limited automatically.
How can I set the axes to the ranges I need?

You can plot your data and modify the plot's axis limits later:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# generate some random date
x = np.random.normal(loc=650, scale=100, size=1000)
y = np.random.normal(loc=600, scale=200, size=1000)
plot = sns.jointplot(x, y, kind="kde")
plot.ax_marg_x.set_xlim(0, 3000)
plot.ax_marg_y.set_ylim(0, 1200)
plt.show()

Related

How do I cluster values of y axis against x axis in scatterplot?

Lets say I've 2 arrays
x = [1,2,3,4,5,6,7]
y = [1,2,2,2,3,4,5]
its scatter plot looks like this
what I want to do is that I want my x axis to look like this in the plot
0,4,8
as a result of which values of y in each piece of x should come closer .
The similar behavior I've seen is bar plots where this is called clustering , how do I do the same in case of scatter plot , or is there any other plot I should be using ?
I hope my question is clear/understandable .
All the help is appreciated
With you plot, try this, before you display the plot.
plt.xticks([0,4,8]))
or
import numpy as np
plt.xticks(np.arange(0, 8+1, step=4))
Then to change the scale you can try something like this,
plt.xticks([0,4,8]))
plt.rcParams["figure.figsize"] = (10,5)
I got this with my example,
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.xticks([0,4,8])
plt.rcParams["figure.figsize"] = (7,3)
plt.plot(x, y, 'o', color='black')
output
I think what you are looking for is close to swarmplots and stripplots in Seaborn. However, Seaborn's swarmplot and stripplot are purely categorical on one of the axes, which means that they wouldn't preserve the relative x-axis order of your elements inside each category.
One way to do what you want would be to increase the space in your x-axis between categories ([0,4,8]) and modify your xticks accordingly.
Below is an example of this where I assign the data to 3 different categories: [-2,2[, [2,6[, [6,10[. And each bar is dil_k away from its directly neighboring bars.
import matplotlib.pyplot as plt
import numpy as np
#Generating data
x= np.random.choice(8,size=(100))
y= np.random.choice(8,size=(100))
dil_k=20
#Creating the spacing between categories
x[np.logical_and(x<6, x>=2)]+=dil_k
x[np.logical_and(x<10, x>=6)]+=2*dil_k
#Plotting
ax=plt.scatter(x,y)
#Modifying axes accordingly
plt.xticks([0,2,22,24,26,46,48,50],[0,2,2,4,6,6,8,10])
plt.show()
And the output gives:
Alternatively, if you don't care about keeping the order of your elements along the x-axis inside each category, then you can use swarmplot directly.
The code can be seen below:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
#Generating data
x= np.random.choice(8,size=(100))
y= np.random.choice(8,size=(100))
#Creating the spacing between categories
x[np.logical_and(x<2,x>=-2)]=0
x[np.logical_and(x<6, x>=2)]=4
x[np.logical_and(x<10, x>=6)]=8
#Plotting
sns.swarmplot(x=x,y=y)
plt.show()
And the output gives:

How to plot histograms on a 3D plot?

I have collected data on an experiment, where I am looking at property A over time, and then making a histogram of property A at a given condition B. Now the deal is that A is collected over an array of B values.
So I have a histogram that corresponds to B=B1, B=B2, ..., B=Bn. What I want to do, is construct a 3D plot, with the z axis being for property B, and the x axis being property A, and y axis being counts.
As an example, I want the plot to look like this (B corresponds to Temperature, A corresponds to Rg):
How do I pull this off on python?
The python library joypy can plot graphs like this. But I'm not sure if you also want these molecules within your graph.
Here an example:
import joypy
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import cm
%matplotlib inline
temp = pd.read_csv("data/daily_temp.csv",comment="%")
labels=[y if y%10==0 else None for y in list(temp.Year.unique())]
fig, axes = joypy.joyplot(temp, by="Year", column="Anomaly", labels=labels, range_style='own',
grid="y", linewidth=1, legend=False, figsize=(6,5),
title="Global daily temperature 1880-2014 \n(°C above 1950-80 average)",
colormap=cm.autumn_r)
Output:
See this thread as reference.

2D histogram where one axis is cumulative and the other is not

Let's say I have instances of two random variables that can be treated as paired.
import numpy as np
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
Using matplotlib it is pretty easy to make a 2D histogram.
import matplotlib.pyplot as plt
plt.hist2d(x,y)
In 1D, matplotlib has an option to make a histogram cumulative.
plt.hist(x,cumulative=True)
What I would like incorporates elements of both classes. I would like to construct a 2D histogram such that the horizontal axis is cumulative and the vertical axis is not cumulative.
Is there are way to do this with Python/Matplotlib?
You can take advantage of np.cumsum to create your cumulative histogram. First save the output from hist2d, then apply to your data when plotting.
import matplotlib.pyplot as plt
import numpy as np
#Some random data
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
#create a figure
plt.figure(figsize=(16,8))
ax1 = plt.subplot(121) #Left plot original
ax2 = plt.subplot(122) #right plot the cumulative distribution along axis
#What you have so far
ax1.hist2d(x,y)
#save the data and bins
h, xedge, yedge,image = plt.hist2d(x,y)
#Plot using np.cumsum which does a cumulative sum along a specified axis
ax2.pcolormesh(xedge,yedge,np.cumsum(h.T,axis=1))
plt.show()

Colors in contour map not mapped to levels but to some other parameter [python]

I am making contour plot for from multiple data sets all mapped to the same level. So I want the colors from cmap not to be mapped on levels but on the values of mass which varies for various data sets. How can I do that?
My code structure is given below:
import numpy as np
import pylab as plt
from pandas import DataFrame as df
import matplotlib.colors as colors
import scipy.interpolate as interpolate
#data is imported from files, made into columns using DataFrame and put in the array name 'data'
xi = np.linspace(1,10,1000)
yi = np.linspace(-1,1,1000)
X, Y = np.meshgrid(xi,yi)
for i in range(9):
Z = interpolate.griddata((data[i]['q'], np.cos(data[i]['iota'])), data[i]['snr1'], (X,Y))
cs = plt.contour(X,Y,Z,levels=[20])
cs.collections[0].set_label(str(int(data[i]['mass'][0])))
plt.legend(loc=5, title='mass')
The resulting plot is:
How can I use cmap to map the various contours according to the mass values?
You need to create a colormap object and a norm. The colormap object converts an input value between 0 and 1 to a color value. The norm is a function that converts values between a minimum and a maximum to the range 0,1.
Note that the colors= parameter of plt.contour needs an extra pair of square brackets because the code gets confused in distinguishing between a single rgba value and an array of colors.
This is how your code could look like:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import scipy.interpolate as interpolate
#data is imported from files, made into columns using DataFrame and put in the array name 'data'
xi = np.linspace(1,10,1000)
yi = np.linspace(-1,1,1000)
X, Y = np.meshgrid(xi,yi)
cmap = plt.get_cmap('magma')
norm = plt.Normalize(0, max([data[i]['mass'][0] for i in range(9)]))
for i in range(9):
Z = interpolate.griddata((data[i]['q'], np.cos(data[i]['iota'])), data[i]['snr1'], (X, Y))
mass = data[i]['mass'][0]
cs = plt.contour(X, Y, Z, levels=[20], colors=[cmap(norm(mass))])
cs.collections[0].set_label(f'{mass:.0f}')
plt.legend(loc=5, title='mass')
plt.tight_layout()
plt.show()

Get actual numbers instead of normalized value in seaborn KDE plots

I have three dataframes and I plot the KDE using seaborn module in python. The issue is that these plots try to make the area under the curve 1 (which is how they are intended to perform), so the height in the plots are normalized ones. But is there any way to show the actual values instead of the normalized ones. Also is there any way I can find out the point of intersection for the curves?
Note: I do not want to use the curve_fit method of scipy as I am not sure about the distribution I will get for each dataframe, it can be multimodal also.
import seaborn as sns
plt.figure()
sns.distplot(data_1['gap'],kde=True,hist=False,label='1')
sns.distplot(data_2['gap'],kde=True,hist=False,label='2')
sns.distplot(data_3['gap'],kde=True,hist=False,label='3')
plt.legend(loc='best')
plt.show()
Output for the code is attached in the link as I can't post images.plot_link
You can just grab the line and rescale its y-values with set_data:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# create some data
n = 1000
x = np.random.rand(n)
# plot stuff
fig, ax = plt.subplots(1,1)
ax = sns.distplot(x, kde=True, hist=False, ax=ax)
# find the line and rescale y-values
children = ax.get_children()
for child in children:
if isinstance(child, matplotlib.lines.Line2D):
x, y = child.get_data()
y *= n
child.set_data(x,y)
# update y-limits (not done automatically)
ax.set_ylim(y.min(), y.max())
fig.canvas.draw()

Categories

Resources