Seaborn Adjusting Markers - python

As you can see here, the X axis labels here are quite unreadable. This will happen regardless of how I adjust the figure size. I'm trying to figure out how to adjust the labeling so that it only shows certain points. The X axis are all numerical between -1 to 1, and I think it would be nice and more viewer friendly to have labels at -1, -.5, 0, .5 and 1.
Is there a way to do this? Thank you!
Here's my code
sns.set(rc={'figure.figsize':(20,8)})
ax = sns.countplot(musi['Positivity'])
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha='right')
plt.tight_layout()
plt.show()

Basically seaborn is wrapper on matplotlib. You can use matplotlib ticker function to do a Job. Refer the below example.
Let's Plots tick every 1 spacing.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
sns.set_theme(style="whitegrid")
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
sns.lineplot(x, y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
Now Let's plot ticks every 5 ticks.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
sns.set_theme(style="whitegrid")
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 5
fig, ax = plt.subplots(1,1)
sns.lineplot(x, y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
P.S.: This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.

Related

seaborn: How to add a second level of labels on the X axis

I need to plot a time series. Dates on the X axis and values on the Y axsis, but I also need to specify the day of week on the X axsis.
ax = sns.lineplot(x='date', y='value', data=df)
I expect to be able to add day of week (another column from df) on the X axis.
example with Excel
You can try to do this by adding a second x-axis. Please find below a code you'll need to adapt to your problem. I guess there are better ways to do that but it should works.
from matplotlib.ticker import MultipleLocator
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
x = np.arange(1000)
x2 = np.arange(1000)*2
y = np.sin(x/100.)
fig = plt.figure()
ax = plt.subplot(111)
sns.lineplot(x, y)
plt.xlim(0, 1000)
ax.xaxis.set_major_locator(MultipleLocator(200))
ax2 = ax.twiny()
sns.lineplot(x2, y, visible=False)
plt.xlim(0, 2000)
ax2.xaxis.set_major_locator(MultipleLocator(400))
ax2.spines['top'].set_position(('axes', -0.15))
ax2.spines['top'].set_visible(False)
plt.tick_params(which='both', top=False)

Seaborn gives wrong values on x-axis ticks?

In the code below Matplotlib gives the correct range of 5.0 to 10.0, why is Seaborn different?
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from matplotlib import ticker
sns.set()
fig, (ax1, ax2) = plt.subplots(2)
x = np.linspace(5, 10)
y = x ** 2
sns.barplot(x, y, ax=ax1)
ax1.xaxis.set_major_locator(ticker.MultipleLocator(5))
ax1.xaxis.set_major_formatter(ticker.FormatStrFormatter('%.2f'))
ax2.bar(x, y, width = 0.1)
ax2.xaxis.set_major_locator(ticker.MultipleLocator(5))
ax2.xaxis.set_major_formatter(ticker.FormatStrFormatter('%.2f'))
plt.show()
Seaborn's barplot is a categorical plot. This means it places the bars at successive integer positions (0,1,...N-1). Hence, if you have N bars, the axis will range from -0.5 to N-0.5.
There is no way to tell seaborn to place the bars at different positions; but you can of course fake the labels to let it appear as such. E.g. to label every 5th bar with the value from x:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from matplotlib import ticker
sns.set()
fig, ax = plt.subplots()
x = np.linspace(5, 10)
y = x ** 2
sns.barplot(x, y, ax=ax)
ax.xaxis.set_major_locator(ticker.FixedLocator(np.arange(0, len(x), 5)))
ax.xaxis.set_major_formatter(ticker.FixedFormatter(x[::5]))
ax.tick_params(axis="x", rotation=90)
plt.tight_layout()
plt.show()
Inversely, it is possible to plot categorical plots with matplotlib. To this end, one needs to plot strings.
ax.bar(x.astype(str), y)
ax.xaxis.set_major_locator(ticker.FixedLocator(np.arange(0, len(x), 5)))
ax.xaxis.set_major_formatter(ticker.FixedFormatter(x[::5]))
ax.tick_params(axis="x", rotation=90)
If you want a numerical bar plot, i.e. a plot where each bar is at the axis position of x, you would need to use matplotlib. This is the default case also shown in the question, where the bars range between 5 and 10. One should make sure to have the width of the bars smaller than the difference between successive x positions in this case.
ax.bar(x, y, width=np.diff(x).mean()*0.8)
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%.2f'))
ax.tick_params(axis="x", rotation=90)

Use Seaborn to plot 1D time series as a line with marginal histogram along y-axis

I'm trying to recreate the broad features of the following figure:
(from E.M. Ozbudak, M. Thattai, I. Kurtser, A.D. Grossman, and A. van Oudenaarden, Nat Genet 31, 69 (2002))
seaborn.jointplot does most of what I need, but it seemingly can't use a line plot, and there's no obvious way to hide the histogram along the x-axis. Is there a way to get jointplot to do what I need? Barring that, is there some other reasonably simple way to create this kind of plot using Seaborn?
Here is a way to create roughly the same plot as shown in the question. You can share the axes between the two subplots and make the width-ratio asymmetric.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
x = np.linspace(0,8, 300)
y = np.tanh(x)+np.random.randn(len(x))*0.08
fig, (ax, axhist) = plt.subplots(ncols=2, sharey=True,
gridspec_kw={"width_ratios" : [3,1], "wspace" : 0})
ax.plot(x,y, color="k")
ax.plot(x,np.tanh(x), color="k")
axhist.hist(y, bins=32, ec="k", fc="none", orientation="horizontal")
axhist.tick_params(axis="y", left=False)
plt.show()
It turns out that you can produce a modified jointplot with the needed characteristics by working directly with the underlying JointGrid object:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
x = np.linspace(0,8, 300)
y = (1 - np.exp(-x*5))*.5
ynoise= y + np.random.randn(len(x))*0.08
grid = sns.JointGrid(x, ynoise, ratio=3)
grid.plot_joint(plt.plot)
grid.ax_joint.plot(x, y, c='C0')
plt.sca(grid.ax_marg_y)
sns.distplot(grid.y, kde=False, vertical=True)
# override a bunch of the default JointGrid style options
grid.fig.set_size_inches(10,6)
grid.ax_marg_x.remove()
grid.ax_joint.spines['top'].set_visible(True)
Output:
You can use ax_marg_x.patches to affect the outcome.
Here, I use it to turn the x-axis plot white so that it cannot be seen (although the margin for it remains):
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="white", color_codes=True)
x, y = np.random.multivariate_normal([2, 3], [[0.3, 0], [0, 0.5]], 1000).T
g = sns.jointplot(x=x, y=y, kind="hex", stat_func=None, marginal_kws={'color': 'green'})
plt.setp(g.ax_marg_x.patches, color="w", )
plt.show()
Output:

Statsmodel Probplot Tick customization

I've created a cumulative probability plot with StatsModels in Python, but there are way too many ticks on the axis.
I want there to be only be tick marks at 0.1, 10, 50, 90, 99, and 99.9. Anyone know how to make this work? I tried using the code below but it only gives me the first n number of ticks, making it pretty useless (See figure below.)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker
import statsmodels.api as sm
csv = pd.read_csv('cumProbMaxData.csv')
data = csv.values.tolist()
flat_list = [item for sublist in data for item in sublist]
fig,ax = plt.subplots()
x = np.array(flat_list)
pp_x = sm.ProbPlot(x, fit=True)
figure = pp_x.probplot(exceed=False, ax=ax, marker='.', color='k', markersize=12)
plt.xlabel('Cumulative Probability (%)')
plt.ylabel('Maximum CO$_2$ Flux (g m$^-$$^2$ d$^-$$^1$)')
tick_spacing=5
ax.xaxis.set_major_locator(ticker.MaxNLocator(tick_spacing))
plt.tight_layout()
plt.show()
Statsmodels ProbPlot plots the data in their real units. It is only the axes ticks which are then changed as to show some percentage value. This is in general bad style but of course you have to live with it if you want to use ProbPlot.
A solution for the problem of showing less ticks on such a plot which uses a FixedLocator and FixedFormatter would be to subsample the shown ticks. The ticklabels you want to show are at indices locs = [0,3,6,10,14,17,20] (you want to show the ticklabel 0, 3, 6, etc.).
You can use this list to select from the shown ticks only those in the list as shown below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm
x = np.random.randn(200)
fig,ax = plt.subplots()
pp_x = sm.ProbPlot(x, fit=True)
pp_x.probplot(exceed=False, ax=ax, marker='.', color='k', markersize=12)
locs = [0,3,6,10,14,17,20]
ax.set_xticklabels(np.array(ax.xaxis.get_major_formatter().seq)[locs])
ax.set_xticks(ax.get_xticks()[locs])
plt.tight_layout()
plt.show()

Plotting x-axis in log scale spacing but not labeling it in exponential form

I would like to plot say two values x = [0, 10,20,50,100] and y=[1,2,3,10,100] using pylab. I want to keep the spacing of x-axis in log form. But I want to tick at the values of x i'e at 10, 20, 50, 100 and print them as it not in the form of 10e1 or 10e2. I am doing it as follows:
import matplotlib.pylab as plt
plt.xscale('log')
plt.plot(x, y)
plt.xticks(x)
plt.grid()
But it keeps the values in the form of 10e1, 10e2.
Could you please help me out?
I think what you want is to change the major_formatter of the x axis?
import matplotlib.pylab as plt
import numpy as np
from matplotlib.ticker import ScalarFormatter
x = [0, 10,20,50,100]
y=[1,2,3,10,100]
plt.plot(x, y)
plt.xscale('log')
plt.grid()
ax = plt.gca()
ax.set_xticks(x[1:]) # note that with a log axis, you can't have x = 0 so that value isn't plotted.
ax.xaxis.set_major_formatter(ScalarFormatter())
plt.show()
The following
import matplotlib.pyplot as plt
x = [0,10,20,50,100]
y = [1,2,3,10,100]
f,ax = plt.subplots()
ax.plot(x,y)
ax.set_xscale('log')
ax.set_xticks(x)
ax.set_xticklabels(x)
ax.set_xlim([0,100])
will produce

Categories

Resources