tick frequency when using seaborn/matplotlib boxplot - python

I am plotting with seaborn a series of boxplots with
sns.boxplot(full_array)
where full_array contains 200 arrays.
Therefore, I have 200 boxplots and ticks on the x-axis from 0 to 200.
The xticks are too close to each other and I would like to show only some of them, for instance, a labeled xtick every 20, or so.
I tried several solutions as those mentioned here but they did not work.
Every time I sample the xticks, I get wrong labels for the ticks, as they get numbered from 0 to N, with unit spacing.
For instance, with the line ax.xaxis.set_major_locator(ticker.MultipleLocator(20))
I get a labelled xtick every 20 but the labels are 1, 2, 3, 4 instead of 20, 40, 60, 80...
Thanks to anyone who's so kind to help.

The seaborn boxplot uses a FixedLocator and a FixedFormatter, i.e.
print ax.xaxis.get_major_locator()
print ax.xaxis.get_major_formatter()
prints
<matplotlib.ticker.FixedLocator object at 0x000000001FE0D668>
<matplotlib.ticker.FixedFormatter object at 0x000000001FD67B00>
It's therefore not sufficient to set the locator to a MultipleLocator since the ticks' values would still be set by the fixed formatter.
Instead you would want to set a ScalarFormatter, which sets the ticklabels to correspond to the numbers at their position.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn.apionly as sns
import numpy as np
ax = sns.boxplot(data = np.random.rand(20,30))
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
ax.xaxis.set_major_formatter(ticker.ScalarFormatter())
plt.show()

Related

Format tick labels in scatter plot to % in matplotlib - python [duplicate]

I have a line chart based on a simple list of numbers. By default the x-axis is just the an increment of 1 for each value plotted. I would like to be a percentage instead but can't figure out how. So instead of having an x-axis from 0 to 5, it would go from 0% to 100% (but keeping reasonably spaced tick marks. Code below. Thanks!
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid.axislines import Subplot
data=[8,12,15,17,18,18.5]
fig=plt.figure(1,(7,4))
ax=Subplot(fig,111)
fig.add_subplot(ax)
plt.plot(data)
The code below will give you a simplified x-axis which is percentage based, it assumes that each of your values are spaces equally between 0% and 100%.
It creates a perc array which holds evenly-spaced percentages that can be used to plot with. It then adjusts the formatting for the x-axis so it includes a percentage sign using matplotlib.ticker.FormatStrFormatter. Unfortunately this uses the old-style string formatting, as opposed to the new style, the old style docs can be found here.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you can do as follows to set the axis:
import matplotlib.ticker as mtick
# Actual plotting code omitted
ax.xaxis.set_major_formatter(mtick.PercentFormatter(5.0))
This will display values from 0 to 5 on a scale of 0% to 100%. The formatter is similar in concept to what #Ffisegydd suggests doing except that it can take any arbitrary existing ticks into account.
PercentFormatter() accepts three arguments, max, decimals, and symbol. max allows you to set the value that corresponds to 100% on the axis (in your example, 5).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Note that this formatter will use whatever ticks would normally be generated if you just plotted your data. It does not modify anything besides the strings that are output to the tick marks.
Update
PercentFormatter was accepted into Matplotlib in version 2.1.0.
Totally late in the day, but I wrote this and thought it could be of use:
def transformColToPercents(x, rnd, navalue):
# Returns a pandas series that can be put in a new dataframe column, where all values are scaled from 0-100%
# rnd = round(x)
# navalue = Nan== this
hv = x.max(axis=0)
lv = x.min(axis=0)
pp = pd.Series(((x-lv)*100)/(hv-lv)).round(rnd)
return pp.fillna(navalue)
df['new column'] = transformColToPercents(df['a'], 2, 0)

How to make y-axis in pyplot chart display two measurements of same value (counts AND percents)?

I want to build a bar chart that shows the utilization of some resources. Let's say characters in a text:
from collections import Counter
import matplotlib.pyplot as plt
raw_data = 'data to make example bar chart'
counts = Counter(raw_data)
keys, values = zip(*counts.most_common())
plt.bar(keys, values);
This produces the following chart, with absolute counts of characters:
If I transform values before plotting using for example
values = [v/len(raw_data) * 100.0 for v in values]
I would get exactly same graph, but value for a would be 20.0 (%).
Question is, could I somehow show two values on y axis?
I saw recipes on how to show two different functions of the same value and have scale to the left and right, but here I have one function, just different units of measurement. Could I somehow show two scales without plotting two bar charts?
https://matplotlib.org/gallery/api/two_scales.html
You could create a right y-axis via ax.twinx(), give it exactly the same limits as the left y-axis and format the ticks as percentages. The PercentFormatter() gets a parameter telling which value corresponds to 100%. In this case, 100% would be all the data in raw_data.
from collections import Counter
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np
raw_data = np.random.choice([*'abcdefghijklmnopqrst'], 200)
counts = Counter(raw_data)
keys, values = zip(*counts.most_common())
fig, ax = plt.subplots()
ax.bar(keys, values, color='turquoise')
ax.margins(x=0.02) # less wasted space left and right
ax.grid(axis='y')
ax2 = ax.twinx()
ax2.set_ylim(*ax.get_ylim())
ax2.yaxis.set_major_formatter(PercentFormatter(len(raw_data)))
plt.show()

Multiple x labels on Pyplot

Below is my code for a line graph. I would like another x label under the current one (so I can show the days of the week).
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns;sns.set()
sns.set()
data = pd.read_csv("123.csv")
data['DAY']=["01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31"]
plt.figure(figsize=(15,8))
plt.plot('DAY','SWST',data=data,linewidth=2,color="k")
plt.plot('DAY','WMID',data=data,linewidth=2,color="m")
plt.xlabel('DAY', fontsize=20)
plt.ylabel('VOLUME', fontsize=20)
plt.legend()
EDIT: After following the documentation, I have 2 issues. The scale has changed from 31 to 16, and the days of the week do not line up with the day number.
data['DAY']=["01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31"]
tick_labels=['1','\n\nThu','2','\n\nFri','3','\n\nSat','4','\n\nSun','5','\n\nMon','6','\n\nTue','7','\n\nWed','8','\n\nThu','9','\n\nFri','10','\n\nSat','11','\n\nSun','12','\n\nMon','13','\n\nTue','14','\n\nWed','15','\n\nThu','16','\n\nFri','17','\n\nSat','18','\n\nSun','19','\n\nMon','20','\n\nTue','21','\n\nWed','22','\n\nThu','23','\n\nFri','24','\n\nSat','25','\n\nSun','26','\n\nMon','27','\n\nTue','28','\n\nWed','29','\n\nThu','30','\n\nFri','31','\n\nSat']
tick_locations = np.arange(31)
plt.figure(figsize=(15,8))
plt.xticks(tick_locations, tick_labels)
plt.plot('DAY','SWST',data=data,linewidth=2,color="k")
plt.plot('DAY','WMID',data=data,linewidth=2,color="m")
plt.xlabel('DAY', fontsize=20)
plt.ylabel('VOLUME', fontsize=20)
plt.legend()
plt.show()
The pyplot function you are looking for is plt.xticks(). This is essentially a combination of ax.set_xticks() and ax.set_xticklabels()
From the documentation:
Parameters:
ticks : array_like
A list of positions at which ticks should be placed. You can pass an
empty list to disable xticks.
labels:
array_like, optional A list of explicit labels to place at the given
locs.
You would want something like the below code. Note you should probably explicitly set the tick locations as well as the labels to avoid setting labels in the wrong positions:
tick_labels = ['1','\n\nThu','2',..., '31','\n\nSat')
plt.xticks(tick_locations, tick_labels)
Note that the object-orientated API (i.e. using ax.) allows for more customisable plots.
Update
After the edit, I see that the labels you want to go below are part of the same list. Therefore your label list actually has a length of 62. So you need to join every 2 elements of your list together:
tick_labels=['1','\n\nThu','2','\n\nFri','3','\n\nSat','4','\n\nSun','5','\n\nMon','6','\n\nTue','7','\n\nWed','8',
'\n\nThu','9','\n\nFri','10','\n\nSat','11','\n\nSun','12','\n\nMon','13','\n\nTue','14','\n\nWed','15',
'\n\nThu','16','\n\nFri','17','\n\nSat','18','\n\nSun','19','\n\nMon','20','\n\nTue','21','\n\nWed','22',
'\n\nThu','23','\n\nFri','24','\n\nSat','25','\n\nSun','26','\n\nMon','27','\n\nTue','28','\n\nWed','29',
'\n\nThu','30','\n\nFri','31','\n\nSat']
tick_locations = np.arange(31)
new_labels = [ ''.join(x) for x in zip(tick_labels[0::2], tick_labels[1::2]) ]
plt.figure(figsize=(15, 8))
plt.xticks(tick_locations, new_labels)
plt.show()
Never use ax.set_xticklabels without setting the locations of the ticks as well. This can be done via ax.set_xticks.
ax.set_xticks(...)
ax.set_xticklabels(...)
Of course you may do the same with pyplot
ax = plt.gca()
ax.set_xticks(...)
ax.set_xticklabels(...)

Only color some tick labels [duplicate]

Using matplotlib, is there an option to change the color of specific tick labels on the axis?
I have a simple plot that show some values by days, and I need to mark some days as 'special' day so I want to mark these with a different color but not all ticks just some specific.
You can get a list of tick labels using ax.get_xticklabels(). This is actually a list of text objects. As a result, you can use set_color() on an element of that list to change the color:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(5,4))
ax.plot([1,2,3])
ax.get_xticklabels()[3].set_color("red")
plt.show()
Alternatively, you can get the current axes using plt.gca(). The below code will give the same result
import matplotlib.pyplot as plt
plt.figure(figsize=(5,4))
plt.plot([1, 2, 3])
plt.gca().get_xticklabels()[3].set_color("red")
plt.show()

Overlapping y-axis tick label and x-axis tick label in matplotlib

If I create a plot with matplotlib using the following code:
import numpy as np
from matplotlib import pyplot as plt
xx = np.arange(0,5, .5)
yy = np.random.random( len(xx) )
plt.plot(xx,yy)
plt.imshow()
I get a result that looks like the attached image. The problem is the
bottom-most y-tick label overlaps the left-most x-tick label. This
looks unprofessional. I was wondering if there was an automatic
way to delete the bottom-most y-tick label, so I don't have
the overlap problem. The fewer lines of code, the better.
In the ticker module there is a class called MaxNLocator that can take a prune kwarg.
Using that you can remove the first tick:
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import numpy as np
xx = np.arange(0,5, .5)
yy = np.random.random( len(xx) )
plt.plot(xx,yy)
plt.gca().xaxis.set_major_locator(MaxNLocator(prune='lower'))
plt.show()
Result:
You can pad the ticks on the x-axis:
ax.tick_params(axis='x', pad=15)
Replace ax with plt.gca() if you haven't stored the variable ax for the current figure.
You can also pad both the axes removing the axis parameter.
A very elegant way to fix the overlapping problem is increasing the padding of the x- and y-tick labels (i.e. the distance to the axis). Leaving out the corner most label might not always be wanted. In my opinion, in general it looks nice if the labels are a little bit farther from the axis than given by the default configuration.
The padding can be changed via the matplotlibrc file or in your plot script by using the commands
import matplotlib as mpl
mpl.rcParams['xtick.major.pad'] = 8
mpl.rcParams['ytick.major.pad'] = 8
Most times, a padding of 6 is also sufficient.
This is answered in detail here. Basically, you use something like this:
plt.xticks([list of tick locations], [list of tick lables])

Categories

Resources