How to decrease the density of x-ticks in seaborn - python

I have some data, based on which I am trying to build a countplot in seaborn. So I do something like this:
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
and get my countplot:
The problem is that ticks on the x-axis are too dense (which makes them useless). I tried to decrease the density with plot_.xticks=np.arange(0, 40, 10) but it didn't help.
Also is there a way to make the plot in one color?

Tick frequency
There seem to be multiple issues here:
You are using the = operator while using plt.xticks. You should use a function call instead (but not here; read point 2 first)!
seaborn's countplot returns an axes-object, not a figure
you need to use the axes-level approach of changing x-ticks (which is not plt.xticks())
Try this:
for ind, label in enumerate(plot_.get_xticklabels()):
if ind % 10 == 0: # every 10th label is kept
label.set_visible(True)
else:
label.set_visible(False)
Colors
I think the data-setup is not optimal here for this type of plot. Seaborn will interpret each unique value as new category and introduce a new color. If i'm right, the number of colors / and x-ticks equals the number of np.unique(data).
Compare your data to seaborn's examples (which are all based on data which can be imported to check).
I also think working with seaborn is much easier using pandas dataframes (and not numpy arrays; i often prepare my data in a wrong way and subset-selection needs preprocessing; dataframes offer more). I think most of seaborn's examples use this data-input.

even though this has been answered a while ago, adding another perhaps simpler alternative that is more flexible.
you can use an matplotlib axis tick locator to control which ticks will be shown.
in this example you can use LinearLocator to achieve the same thing:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.ticker as ticker
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
plot_.xaxis.set_major_locator(ticker.LinearLocator(10))

Since you have tagged matplotlib, one solution different from setting the ticks visible True/False is to plot every nth label as following
fig = plt.figure(); np.random.seed(123)
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
fig.canvas.draw()
new_ticks = [i.get_text() for i in plot_.get_xticklabels()]
plt.xticks(range(0, len(new_ticks), 10), new_ticks[::10])

As a slight modification of the accepted answer, we typically select labels based on their value (and not index), e.g. to display only values which are divisible by 10, this would work:
for label in plot_.get_xticklabels():
if np.int(label.get_text()) % 10 == 0:
label.set_visible(True)
else:
label.set_visible(False)

Related

Format tick labels in scatter plot to % in matplotlib - python [duplicate]

I have a line chart based on a simple list of numbers. By default the x-axis is just the an increment of 1 for each value plotted. I would like to be a percentage instead but can't figure out how. So instead of having an x-axis from 0 to 5, it would go from 0% to 100% (but keeping reasonably spaced tick marks. Code below. Thanks!
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid.axislines import Subplot
data=[8,12,15,17,18,18.5]
fig=plt.figure(1,(7,4))
ax=Subplot(fig,111)
fig.add_subplot(ax)
plt.plot(data)
The code below will give you a simplified x-axis which is percentage based, it assumes that each of your values are spaces equally between 0% and 100%.
It creates a perc array which holds evenly-spaced percentages that can be used to plot with. It then adjusts the formatting for the x-axis so it includes a percentage sign using matplotlib.ticker.FormatStrFormatter. Unfortunately this uses the old-style string formatting, as opposed to the new style, the old style docs can be found here.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you can do as follows to set the axis:
import matplotlib.ticker as mtick
# Actual plotting code omitted
ax.xaxis.set_major_formatter(mtick.PercentFormatter(5.0))
This will display values from 0 to 5 on a scale of 0% to 100%. The formatter is similar in concept to what #Ffisegydd suggests doing except that it can take any arbitrary existing ticks into account.
PercentFormatter() accepts three arguments, max, decimals, and symbol. max allows you to set the value that corresponds to 100% on the axis (in your example, 5).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Note that this formatter will use whatever ticks would normally be generated if you just plotted your data. It does not modify anything besides the strings that are output to the tick marks.
Update
PercentFormatter was accepted into Matplotlib in version 2.1.0.
Totally late in the day, but I wrote this and thought it could be of use:
def transformColToPercents(x, rnd, navalue):
# Returns a pandas series that can be put in a new dataframe column, where all values are scaled from 0-100%
# rnd = round(x)
# navalue = Nan== this
hv = x.max(axis=0)
lv = x.min(axis=0)
pp = pd.Series(((x-lv)*100)/(hv-lv)).round(rnd)
return pp.fillna(navalue)
df['new column'] = transformColToPercents(df['a'], 2, 0)

Understanding maplotlib and how to format matplotlib axis with a single digit?

I am having an very hard time getting the ticklabels of a seaborn heatmap to show only single integers (i.e. no floating numbers). I have two lists that form the axes of a data frame that i plot using seaborn.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
x = np.linspace(0, 15, 151)
y = np.linspace(0, 15, 151)
#substitute random data for my_data
df_map = pd.DataFrame(my_data, index = y, columns = x)
plt.figure()
ax = sns.heatmap(df_map, square = True, xticklabels = 20, yticklabels = 20)
ax.invert_yaxis()
I've reviewed many answers and the documents. My biggest problem is I have little experience and a very poor understanding of matplotlib and the docs feel like a separate language... Here are the things I've tried.
ATTEMPT 1: A slightly modified version of the solution to this question:
fmtr = tkr.StrMethodFormatter('{x:.0f}')
plt.gca().xaxis.set_major_formatter(fmtr)
I'm pretty sure tkr.StrMethodFormatter() is displaying every 20th index of the value it encounters in my axis string, which is probably due to my settings in sns.heatmap(). I tried different string inputs to tkr.StrMethodFormatter() without success. I looked at two other questions and tried different combinations of tkr classes that were used in answers for here and here.
ATTEMPT 2:
fmtr = tkr.StrMethodFormatter("{x:.0f}")
locator = tkr.MultipleLocator(50)
fstrform = tkr.FormatStrFormatter('%.0f')
plt.gca().xaxis.set_major_formatter(fmtr)
plt.gca().xaxis.set_major_locator(locator)
#plt.gca().xaxis.set_major_formatter(fstrform)
And now i'm at a complete loss. I've found out locator changes which nth indices to plot, and both fmtr and fstrform change the number of decimals being displayed, but i cannot for the life of me get the axes to display the integer values that exist in the axes lists!
Please help! I've been struggling for hours. It's probably something simple, and thank you!
As an aside:
Could someone please elaborate on the documentation excerpt in that question, specifically:
...and the field used for the position must be labeled pos.
Also, could someone please explain the differences between tkr.StrMethodFormatter("{x:.0f}") and tkr.FormatStrFormatter('%.0f')? I find it annoying there are two ways, each with their own syntax, to produce the same result.
UPDATE:
It took me a while to get around to implementing the solution provided by #ImportanceOfBeingErnest. I took an extra precaution and rounded the numbers in the x,y arrays. I'm not sure if this is necessary, but I've produced the result I wanted:
x = np.linspace(0, 15, 151)
y = np.linspace(0, 15, 151)
# round float numbers in axes arrays
x_rounded = [round(i,3) for i in x]
y_rounded = [round(i,3) for i in y]
#substitute random data for my_data
df_map = pd.DataFrame(my_data, index = y_rounded , columns = x_rounded)
plt.figure()
ax0 = sns.heatmap(df_map, square = True, xticklabels = 20)
ax0.invert_yaxis()
labels = [label.get_text() for label in ax0.get_xticklabels()]
ax0.set_xticklabels(map(lambda x: "{:g}".format(float(x)), labels))
Although I'm still not entirely sure why this worked; check the comments between me and them for clarification.
The sad thing is, you didn't do anything wrong. The problem is just that seaborn has a very perculiar way of setting up its heatmap.
The ticks on the heatmap are at fixed positions and they have fixed labels. So to change them, those fixed labels need to be changed. An option to do so is to collect the labels, convert them back to numbers, and then set them back.
labels = [label.get_text() for label in ax.get_xticklabels()]
ax.set_xticklabels(map(lambda x: "{:g}".format(float(x)), labels))
labels = [label.get_text() for label in ax.get_yticklabels()]
ax.set_yticklabels(map(lambda x: "{:g}".format(float(x)), labels))
A word of caution: One should in principle never set the ticklabels without setting the locations as well, but here seaborn is responsible for setting the positions. We just trust it do do so correctly.
If you want numeric axes with numeric labels that can be formatted as attempted in the question, one may directly use a matplotlib plot.
import numpy as np
import seaborn as sns # seaborn only imported to get its rocket cmap
import matplotlib.pyplot as plt
my_data = np.random.rand(150,150)
x = (np.linspace(0, my_data.shape[0], my_data.shape[0]+1)-0.5)/10
y = (np.linspace(0, my_data.shape[1], my_data.shape[1]+1)-0.5)/10
fig, ax = plt.subplots()
pc = ax.pcolormesh(x, y, my_data, cmap="rocket")
fig.colorbar(pc)
ax.set_aspect("equal")
plt.show()
While this already works out of the box, you may still use locators and formatters as attempted in the question.

Python, Seaborn: Logarithmic Swarmplot has unexpected gaps in the swarm

Let's look at a swarmplot, made with Python 3.5 and Seaborn on some data (which is stored in a pandas dataframe df with column lables stored in another class. This does not matter for now, just look at the plot):
ax = sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df)
Now the data is more readable if plotted in log scale on the y-axis because it goes over some decades.
So let's change the scaling to logarithmic:
ax.set_yscale("log")
ax.set_ylim(bottom = 5*10**-10)
Well I have a problem with the gaps in the swarms. I guess they are there because they have been there when the plot is created with a linear axis in mind and the dots should not overlap there. But now they look kind of strange and there is enough space to from 4 equal looking swarms.
My question is: How can I force seaborn to recalculate the position of the dots to create better looking swarms?
mwaskom hinted to me in the comments how to solve this.
It is even stated in the swamplot doku:
Note that arranging the points properly requires an accurate transformation between data and point coordinates. This means that non-default axis limits should be set before drawing the swarm plot.
Setting an existing axis to log-scale and use this for the plot:
fig = plt.figure() # create figure
rect = 0,0,1,1 # create an rectangle for the new axis
log_ax = fig.add_axes(rect) # create a new axis (or use an existing one)
log_ax.set_yscale("log") # log first
sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df, ax = log_ax)
This yields in the correct and desired plotting behaviour:

Pyplot ticks at values divisible by automatic interval?

Is there a way how to force pyplot (matplotlib) to have ticks at values divisible by automatic interval of ticks?
I really like that pyplot can adjust interval of ticks automatically based on data so I don't have to care about it. But I would really like it does use values divisible by that interval.
For example if it decides that interval is 5, it should use values 5,10,15,20... and not 4,9,14,19 like in the example below. How can I easily fix it?
You can locate your ticks anywhere you want using matplotlib.ticker.Locator classes. Specifically in your case I guess you'd like to use MultipleLocator. Just add in your program
from matplotlib.ticker import MultipleLocator
ax = plt.gca()
ax.get_xaxis().set_major_locator(MultipleLocator(base=5))
and you'll be all set.
UPDATE:
To get the base, you can check the default AutoLocator tick positions (after the call to plt.plot) and get the difference between any of them lying next to each other:
ticks = ax.get_xticks()
base = ticks[1] - ticks[0]

How do I change the density of x-ticks of a pandas time series plot?

I am trying to generate a smaller figure visualising a pandas time series. The automatically-generated x-ticks, however, do not adapt to the new size and result in overlapping ticks. I am wondering how can I adapt the frequency of the x-ticks? E.g. for this example:
figsize(4, 2)
num = 3000
X = linspace(0, 100, num=num)
dense_ts = pd.DataFrame(sin(X) + 0.1 * np.random.randn(num),
pd.date_range('2014-01-1', periods=num, freq='min'))
dense_ts.plot()
The figure that I get is:
I am able to work around this problem using the Matplotlib date plotting, but it is not a very elegant solution - the code requires me to specify all the output formatting on a per-case basis.
figsize(4, 2)
from matplotlib import dates
fig, ax = plt.subplots()
ax.plot_date(dense_ts.index.to_pydatetime(), dense_ts, 'b-')
ax.xaxis.set_minor_locator(dates.HourLocator(byhour=range(24),
interval=12))
ax.xaxis.set_minor_formatter(dates.DateFormatter('%H:%m'))
ax.xaxis.set_major_locator(dates.WeekdayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('\n\n%a\n%Y'))
plt.show()
I'm wondering if there is a way to solve this issue using the pandas plotting module or maybe by setting some axes object properties? I tried playing with the ax.freq object, but couldn't really achieve anything.
You can pass a list of x axis values you want displayed in your dense_ts.plot()
dense_ts.plot(xticks=['10:01','22:01'...])
Another example for clarity
df = pd.DataFrame(np.random.randn(10,3))
Plot without specifying xticks
df.plot(legend=False)
Plot with xticks argument
df.plot(xticks=[2,4,6,8],legend=False)

Categories

Resources