Turn off marginal distribution axes on jointplot using seaborn package - python

I like this particular plot and the ability to pass a function to the stat_func keyword to quickly plot up and visualize relationships between variables, but there's one thing. How do I 'turn off' or not plot the marginal distribution axes?
It looks nice but sometime I don't want this feature.
For example using this code:
import numpy as np
import seaborn as sns
x = np.arange(100) + np.random.randn(100)*20
y = np.arange(100) + np.random.randn(100)*20
sns.jointplot(x, y, kind='reg')
How can I remove the kde subplots on the top and right hand side of the main axes?

You could use JointGrid directly:
from scipy import stats
g = sns.JointGrid(x, y, ratio=100)
g.plot_joint(sns.regplot)
g.annotate(stats.pearsonr)
g.ax_marg_x.set_axis_off()
g.ax_marg_y.set_axis_off()

Related

How to plot histograms on a 3D plot?

I have collected data on an experiment, where I am looking at property A over time, and then making a histogram of property A at a given condition B. Now the deal is that A is collected over an array of B values.
So I have a histogram that corresponds to B=B1, B=B2, ..., B=Bn. What I want to do, is construct a 3D plot, with the z axis being for property B, and the x axis being property A, and y axis being counts.
As an example, I want the plot to look like this (B corresponds to Temperature, A corresponds to Rg):
How do I pull this off on python?
The python library joypy can plot graphs like this. But I'm not sure if you also want these molecules within your graph.
Here an example:
import joypy
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import cm
%matplotlib inline
temp = pd.read_csv("data/daily_temp.csv",comment="%")
labels=[y if y%10==0 else None for y in list(temp.Year.unique())]
fig, axes = joypy.joyplot(temp, by="Year", column="Anomaly", labels=labels, range_style='own',
grid="y", linewidth=1, legend=False, figsize=(6,5),
title="Global daily temperature 1880-2014 \n(°C above 1950-80 average)",
colormap=cm.autumn_r)
Output:
See this thread as reference.

Is there a way to adjust the axes limits of pairplot(), but not as individual plots?

Is there a way to adjust the axes limits of pairplot(), but not as individual plots? Maybe a setting to produce better axes limits?
I would like to have the plots with a bigger range for the axes. My plots axes allows all the data to be visualized, but it is too 'zoomed in'.
My code is:
import pandas as pd
mport matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
g = sns.pairplot(iris, hue = 'species', diag_kind = 'hist', palette = 'Dark2', plot_kws={"s": 20})
The link for my plot and what I would like to plot to look like is here:
pairplot
To change the subplots, g.map(func, <parameters>) can be used. A small problem is that func needs to accept color as parameter, and plt.margins() gives an error when color is used. Moreover, map uses x and y to indicate the row and column variables. You could write a dummy function that simply calls plt.margin(), for example g.map(lambda *args, **kwargs: plt.margins(x=0.2, y=0.3)).
An alternative is to loop through g.axes.flat and call ax.margins() on each of them. Note that many axes are shared in x and/or y direction. The diagonal is treated differently; for some reason ax.margins needs to be called a second time on the diagonal.
To have the histogram for the different colors stacked instead of overlapping, diag_kws={"multiple": "stack"} can be set.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, hue='species', diag_kind='hist', palette='Dark2',
plot_kws={"s": 20}, diag_kws={"multiple": "stack"})
# g.map(plt.margins, x=0.2, y=0.2) # gives an error
for ax in g.axes.flat:
ax.margins(x=0.2, y=0.2)
for ax in g.diag_axes:
ax.margins(y=0.2)
plt.show()
PS: still another option, is to change the rcParams which will have effect on all the plots created later in the code:
import matplotlib as mpl
mpl.rcParams['axes.xmargin'] = 0.2
mpl.rcParams['axes.ymargin'] = 0.2

How to stop numpy trendline from going below 0 on matplotlib graph

I am creating several scatter plot graphs in matplotlib. For these I want to plot trend lines for the scatter plots. I am using the numpy polyfit and poly1d methods to create the trendline.
My problem is as follows: There are only positive y values in my dataset (I have also removed all 0 values), but my trendlines are going below 0. The reason why I think it's going below 0 is that I have some very large outlier values that skew the trendline.
Is there a way I can prevent my graph trendlines from going below 0 without removing data points? Perhaps using a method or parameter for a method in the numpy or matplotlib libraries?
Removing outliers helps some trendlines, but not at all for the multiple graphs I'm making.
Graph example with scatter points: https://imgur.com/a/bwIFJw7
Graph example without scatter points (same data as above graph): https://imgur.com/a/k5TyNjt
Changing the degree of the trend line doesn't solve the issue
code for reproduce-ability:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
plt.figure(figsize=(20,150))
loc = mdates.AutoDateLocator()
dataset = {'time':['4/5/2014','4/10/2014','4/21/2014','5/3/2014','5/8/2014','5/19/2014','6/7/2014','6/12/2014','6/16/2014','12/6/2014','12/11/2014','12/15/2014','2/7/2015','2/12/2015','2/16/2015','7/20/2015','8/1/2015','8/13/2015','8/17/2015,'9/5/2015','9/10/2015','9/21/2015','10/3/2015','12/10/2015','1/18/2016','8/6/2016','8/11/2016','8/15/2016','9/3/2016','9/8/2016','9/19/2016','10/1/2016','10/13/2016','10/17/2016','11/10/2016','11/5/2016','8/10/2017','9/14/2017','9/18/2017','10/7/2017','2/8/2018','2/19/2018','3/3/2018','3/8/2018','3/19/2018','4/12/2018','4/7/2018','4/16/2018','5/5/2018','5/10/2018','5/21/2018','11/3/2018','11/8/2018','11/19/2018','12/1/2018','12/13/2018','12/17/2018','1/5/2019','1/10/2019','1/21/2019','2/2/2019','2/14/2019','2/18/2019','3/2/2019','3/14/2019','3/18/2019','4/6/2019','4/11/2019','4/15/2019'],'yval':[1714.6,996.32,1638.4,1293.47,744.73,1843.2,1009.97,2168.47,819.2,2949.12,2730.67,2106.51,14745.6,3880.42,73728,792.77,538.16,585.14,571.53,580.54,933.27,460.8,646.74,4336.94,36864,190.51,206.89,199.02,197.54,219.84,210.27,223.75,201.96,212.23,223.6,211.48,1568.68,418.91,837.82,5671.38,217.18,189.74,192.59,192.04,196.74,197.8,196.47,200.69,193.69,210.79,349.42,222.5,209.17,191.37,192.91,197.57,207.23,192.48,189.7,199.44,187.57,186.85,187.99,189.19,196.34,196.11,192.61,196.39,190.05,]}
dataset['time'] = pd.to_datetime(dataset['time'])
dataset['yval'] = pd.to_numeric(dataset['yval'])
x = mdates.date2num(dataset['time'])
y = dataset['yval']
z = np.polyfit(x,y,3)
p = np.poly1d(z)
plt.plot(x,p(x),'#00FFFF', label = type)
plt.title(type)
plt.xlabel('Time')
plt.ylabel('Weight')
#comment out the next line to see plot without scatter points
plt.scatter(x,y)
plt.gca().xaxis.set_major_locator(loc)
plt.gca().xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
plt.grid(which='major',axis='both')
plt.show()
Graph with trendline not going below the horizontal 0 axis is the desired output

Formatting style for matplotlib: scatterplot histogram hybrid

In an old standalone plotting package (sm) there was a style available for scatter plots which I found more appealing to the general style. It appears as each point looking almost like a histogram which stretches to the next point.
An example of a scatter plot using this style:
Matplotlib does have this style for histograms, but I'm wondering if there's a way to cheat the system to allow the style to work for scatter plots.
I think some of the confusion comes from the fact that the desired plot is not a scatter plot. It's a line plot with lines in form of a step-like function.
You may plot step functions with pyplot.step(x,y) or plot(x,y, drawstyle="steps").
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
x = np.linspace(0,1)
y = np.random.rand(len(x))
fig, ax = plt.subplots()
ax.step(x,y)
# or
# ax.plot(x,y, drawstyle="steps")
plt.show()

ploting a bar plot for large amount of data

I have a 752 data points which i need to plot,
I have plotted the data on bar plot using seaborn library in python , but graph i get is very unclear and I am not able to analyze anything through graph , is there any way i can view this graph more clearly and all data points fit with labels seen clearly in python

code written is following
import seaborn as sns
sns.set_style("whitegrid")
ax = sns.barplot(x="Events", y = "Count" , data = Unique_Complaints)
It is always difficult to visualise so many points. Nihal, has rightly pointed that it is best to use Pandas and statistical analysis to extract information from your data. Having said this, IDEs like Spyder and Pycharm and packages like Bokeh allow interactive plots where you can zoom to different parts of the plot. Here is an example with Pycharm:
Code:
# Import libraries
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Exponential decay function
x = np.arange(1,10, 0.1)
A = 7000
y = A*np.exp(-x)
# Plot the exponential function
sns.barplot(x = x, y = y)
plt.show()
Figure without magnification
Magnified figure
To see a large amount of data you can use the figure from matplotlib.pyplot like this
from matplotlib.pyplot import figure
figure(num=None, figsize=(20,18), dpi=80, facecolor='w', edgecolor='r')
sns.barplot(x="Events", y = "Count" , data = Unique_Complaints)
plt.show()
I am using this to see a graph with 49 variables and the result is:
My code is
from matplotlib.pyplot import figure
figure(num=None, figsize=(20,18), dpi=256, facecolor='w', edgecolor='r')
plt.title("Missing Value Prercentage")
sns.barplot(miss_val_per, df.columns)
plt.show()
Data I am using is:
https://www.kaggle.com/sobhanmoosavi/us-accidents
just swap x and y and try to increase the fig size

Categories

Resources