Format tick labels in scatter plot to % in matplotlib - python [duplicate] - python

I have a line chart based on a simple list of numbers. By default the x-axis is just the an increment of 1 for each value plotted. I would like to be a percentage instead but can't figure out how. So instead of having an x-axis from 0 to 5, it would go from 0% to 100% (but keeping reasonably spaced tick marks. Code below. Thanks!
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid.axislines import Subplot
data=[8,12,15,17,18,18.5]
fig=plt.figure(1,(7,4))
ax=Subplot(fig,111)
fig.add_subplot(ax)
plt.plot(data)

The code below will give you a simplified x-axis which is percentage based, it assumes that each of your values are spaces equally between 0% and 100%.
It creates a perc array which holds evenly-spaced percentages that can be used to plot with. It then adjusts the formatting for the x-axis so it includes a percentage sign using matplotlib.ticker.FormatStrFormatter. Unfortunately this uses the old-style string formatting, as opposed to the new style, the old style docs can be found here.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()

This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you can do as follows to set the axis:
import matplotlib.ticker as mtick
# Actual plotting code omitted
ax.xaxis.set_major_formatter(mtick.PercentFormatter(5.0))
This will display values from 0 to 5 on a scale of 0% to 100%. The formatter is similar in concept to what #Ffisegydd suggests doing except that it can take any arbitrary existing ticks into account.
PercentFormatter() accepts three arguments, max, decimals, and symbol. max allows you to set the value that corresponds to 100% on the axis (in your example, 5).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Note that this formatter will use whatever ticks would normally be generated if you just plotted your data. It does not modify anything besides the strings that are output to the tick marks.
Update
PercentFormatter was accepted into Matplotlib in version 2.1.0.

Totally late in the day, but I wrote this and thought it could be of use:
def transformColToPercents(x, rnd, navalue):
# Returns a pandas series that can be put in a new dataframe column, where all values are scaled from 0-100%
# rnd = round(x)
# navalue = Nan== this
hv = x.max(axis=0)
lv = x.min(axis=0)
pp = pd.Series(((x-lv)*100)/(hv-lv)).round(rnd)
return pp.fillna(navalue)
df['new column'] = transformColToPercents(df['a'], 2, 0)

Related

How to plot x int date values from array matplotlib correctly?

I am having an issue when trying to plot some of the date values into a matplotlib side by side bar graph.
I first define my Series x = new_df['month'] which contains the following values:
0,2021-01-01
1,2021-02-01
2,2021-03-01
3,2021-04-01
4,2021-05-01
5,2021-06-01
6,2021-07-01
7,2021-08-01
8,2021-09-01
9,2021-10-01
10,2021-11-01
11,2021-12-01
12,2022-01-01
13,2022-02-01
14,2022-03-01
15,2022-04-01
16,2022-05-01
17,2022-06-01
18,2022-07-01
19,2022-08-01
20,2022-09-01
21,2022-10-01
22,2022-11-01
After this I define the function to plot my graph:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
import numpy as np
def side_by_side_bar_chart(x, y, labels, file_name):
width = 0.25 # set bar width
ind = np.arange(len(x)) # Get the number of x labels
fig, ax = plt.subplots(figsize=(10, 8))
# Get average number in order to set labels formatting
ymax = int(max([mean(x) for x in y]))
plt.xticks(ind, x) # sets x labels with values in x list (months)
# These two lines format ax labels
dtFmt = mdates.DateFormatter('%b-%y') # define the formatting
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.savefig("charts/"+ file_name + ".png", dpi = 300)
However, my x values are plotted as Jan 70 for all xticks:
Wrong labeled x ticks
I suspect that this has something to do with formatting. The same is causing similar issues in a different part of the script where I use twin(x) for a side by side chart with a trendline on top and my values are plotted wrong in the graph:
Wrong plotted graph
Does anybody have an idea how to fix these bugs? Thank you for your help in advance!
Pass the dates in the x array and plot all values correspondingly in the graphs.
The thing is that your "x" is not a date. It is obviously a string. So formatter can't interpret it correctly.
Let's try to reproduce your problem (this is the kind of minimal reproducible example I was mentioning earlier) :
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np # just to generate something to plot
# Generate a dummy set of 20 dates, starting from Mar 15 2020
dt=datetime.timedelta(days=31)
x0=[datetime.date(2020,3,1) + k*dt for k in range(20)]
x=[d.strftime("%Y-%m-%d") for d in x0] # This looks like your x: 20 strings
# And some y to have something to plot
y=np.cumsum(np.random.normal(0,1,20)) # Don't overthink it, it is just 20 numbers :)
# Plot y vs x (x being the strings)
plt.plot(x,y)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b-%y'))
plt.show()
Result
Now, solution for that is very simple: x must contains date, not strings.
From my example, I could just plt.plot(x0,y) instead of x, since x0 is the list of dates from which I computed x. But if, as it appears, you only have the string available, you can parse them. For example, using [d datetime.date.fromisoformat(d) for d in x].
Or, since you have already pandas at hand: pd.to_datetime(x) (it is not exactly the same date time, but both are understood by matplotlib)
xx=pd.to_datetime(x)
plt.plot(xx,y)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b-%y'))
plt.show()
Note that, without any action from me, it also stop printing all labels. That is because in the first case, matplotlib wasn't aware of any logical progression of x values. From its point of view, those where all just labels. And you can't, a priori, skip a label, since the reader could not guess what is between two labels separated by a gap (it seems obvious for us, since we know they are dates. But matplotlib doesn't know that. It is just as if x contained ['red', 'green', 'yellow', 'purple', 'black', 'blue', ...]. You would not expect every other label to be just arbitrarily skipped).
Whereas, now that we passed real dates to matplotlib, it is as if x was numerical: there is a logical progression of its values. Matplotlib knows it, and, more importantly, knows that we know it. So it is acceptable to just skip some to make the figure more readable: everybody knows what is between "Mar 20" and "May 20".
So, short answer: convert your string to dates.

How to remove scientific notations from this bar plot? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to remove relative shift in matplotlib axis
I'm plotting numbers with five digits (210.10, 210.25, 211.35, etc) against dates and I'd like to have the y-axis ticks show all digits ('214.20' rather than '0.20 + 2.14e2') and have not been able to figure this out. I've attempted to set the ticklabel format to plain, but it appears to have no effect.
plt.ticklabel_format(style='plain', axis='y')
Any hints on the obvious I'm missing?
The axis numbers are defined according to a given Formatter. Unfortunately (AFAIK), matplotlib does not expose a way to control the threshold to go from the numbers to a smaller number + offset. A brute force approach would be setting all the xtick strings:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(100, 100.1, 100)
y = np.arange(100)
fig = plt.figure()
plt.plot(x, y)
plt.show() # original problem
# setting the xticks to have 3 decimal places
xx, locs = plt.xticks()
ll = ['%.3f' % a for a in xx]
plt.xticks(xx, ll)
plt.show()
This is actually the same as setting a FixedFormatter with the strings:
from matplotlib.ticker import FixedFormatter
plt.gca().xaxis.set_major_formatter(FixedFormatter(ll))
However, the problem of this approach is that the labels are fixed. If you want to resize/pan the plot, you have to start over again. A more flexible approach is using the FuncFormatter:
def form3(x, pos):
""" This function returns a string with 3 decimal places, given the input x"""
return '%.3f' % x
from matplotlib.ticker import FuncFormatter
formatter = FuncFormatter(form3)
gca().xaxis.set_major_formatter(FuncFormatter(formatter))
And now you can move the plot and still maintain the same precision. But sometimes this is not ideal. One doesn't always want a fixed precision. One would like to preserve the default Formatter behaviour, just increase the threshold to when it starts adding an offset. There is no exposed mechanism for this, so what I end up doing is to change the source code. It's pretty easy, just change one character in one line in ticker.py. If you look at that github version, it's on line 497:
if np.absolute(ave_oom - range_oom) >= 3: # four sig-figs
I usually change it to:
if np.absolute(ave_oom - range_oom) >= 5: # four sig-figs
and find that it works fine for my uses. Change that file in your matplotlib installation, and then remember to restart python before it takes effect.
You can also just turn the offset off: (almost exact copy of How to remove relative shift in matplotlib axis)
import matlplotlib is plt
plt.plot([1000, 1001, 1002], [1, 2, 3])
plt.gca().get_xaxis().get_major_formatter().set_useOffset(False)
plt.draw()
This grabs the current axes, gets the x-axis axis object and then the major formatter object and sets useOffset to false (doc).

matplotlib: manually change yaxis values to differ from the actual value (NOT: change ticks!) [duplicate]

I am trying to plot a data and function with matplotlib 2.0 under python 2.7.
The x values of the function are evolving with time and the x is first decreasing to a certain value, than increasing again.
If the function is plotted against time, it shows function like this plot of data against time
I need the same x axis evolution for plotting against real x values. Unfortunately as the x values are the same for both parts before and after, both values are mixed together. This gives me the wrong data plot:
In this example it means I need the x-axis to start on value 2.4 and decrease to 1.0 than again increase to 2.4. I swear I found before that this is possible, but unfortunately I can't find a trace about that again.
A matplotlib axis is by default linearly increasing. More importantly, there must be an injective mapping of the number line to the axis units. So changing the data range is not really an option (at least when the aim is to keep things simple).
It would hence be good to keep the original numbers and only change the ticks and ticklabels on the axis. E.g. you could use a FuncFormatter to map the original numbers to
np.abs(x-tp)+tp
where tp would be the turning point.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
x = np.linspace(-10,20,151)
y = np.exp(-(x-5)**2/19.)
plt.plot(x,y)
tp = 5
fmt = lambda x,pos:"{:g}".format(np.abs(x-tp)+tp)
plt.gca().xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(fmt))
plt.show()
One option would be to use two axes, and plot your two timespans separately on each axes.
for instance, if you have the following data:
myX = np.linspace(1,2.4,100)
myY1 = -1*myX
myY2 = -0.5*myX-0.5
plt.plot(myX,myY, c='b')
plt.plot(myX,myY2, c='g')
you can instead create two subplots with a shared y-axis and no space between the two axes, plot each time span independently, and finally, adjust the limits of one of your x-axis to reverse the order of the points
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'wspace':0}, sharey=True)
ax1.plot(myX,myY1, c='b')
ax2.plot(myX,myY2, c='g')
ax1.set_xlim((2.4,1))
ax2.set_xlim((1,2.4))

Matplotlib: How to get same "base" and "offset" parameters for axis ticks and axis tick labels

I want to plot a series of values against a date range in matplotlib. I changed the tick base parameter to 7, to get one tick at the beginning of every week (plticker.IndexLocator, base = 7). The problem is that the set_xticklabels function does not accept a base parameter. As a result, the second tick (representing day 8 on the beginning of week 2) is labelled with day 2 from my date range list, and not with day 8 as it should be (see picture).
How to give set_xticklabelsa base parameter?
Here is the code:
my_data = pd.read_csv("%r_filename_%s_%s_%d_%d.csv" % (num1, num2, num3, num4, num5), dayfirst=True)
my_data.plot(ax=ax1, color='r', lw=2.)
loc = plticker.IndexLocator(base=7, offset = 0) # this locator puts ticks at regular intervals
ax1.set_xticklabels(my_data.Date, rotation=45, rotation_mode='anchor', ha='right') # this defines the tick labels
ax1.xaxis.set_major_locator(loc)
Here is the plot:
Plot
Many thanks - your solution perfectly works. For the case that other people run into the same issue in the future: i have implemented the above-mentioned solution but also added some code so that the tick labels keep the desired rotation and also align (with their left end) to the respective tick. May not be pythonic, may not be best-practice, but it works
x_fmt = mpl.ticker.IndexFormatter(x)
ax.set_xticklabels(my_data.Date, rotation=-45)
ax.tick_params(axis='x', pad=10)
ax.xaxis.set_major_formatter(x_fmt)
labels = my_data.Date
for tick in ax.xaxis.get_majorticklabels():
tick.set_horizontalalignment("left")
The reason your ticklabels went bad is that setting manual ticklabels decouples the labels from your data. The proper approach is to use a Formatter according to your needs. Since you have a list of ticklabels for each data point, you can use an IndexFormatter. It seems to be undocumented online, but it has a help:
class IndexFormatter(Formatter)
| format the position x to the nearest i-th label where i=int(x+0.5)
| ...
| __init__(self, labels)
| ...
So you just have to pass your list of dates to IndexFormatter. With a minimal, pandas-independent example (with numpy only for generating dummy data):
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
# create dummy data
x = ['str{}'.format(k) for k in range(20)]
y = np.random.rand(len(x))
# create an IndexFormatter with labels x
x_fmt = mpl.ticker.IndexFormatter(x)
fig,ax = plt.subplots()
ax.plot(y)
# set our IndexFormatter to be responsible for major ticks
ax.xaxis.set_major_formatter(x_fmt)
This should keep your data and labels paired even when tick positions change:
I noticed you also set the rotation of the ticklabels in the call to set_xticklabels, you would lose this now. I suggest using fig.autofmt_xdate to do this instead, it seems to be designed exactly for this purpose, without messing with your ticklabel data.

Dates in the xaxis for a matplotlib plot with imshow

So I am new to programming with matplotlib. I have created a color plot using imshow() and an array. At first the axis were just the row and column number of my array. I used extent = (xmin,xmax,ymin,ymax) to get the x-axis in unix time and altitude, respectively.
I want to change the x-axis from unix time (982376726,982377321) to UT(02:25:26, 02:35:21). I have created a list of the time range in HH:MM:SS. I am not sure how to replace my current x-axis with these new numbers, without changing the color plot (or making it disappear).
I was looking at datetime.time but I got confused with it.
Any help would be greatly appreciated!
I have put together some example code which should help you with your problem.
The code first generates some randomised data using numpy.random. It then calculates your x-limits and y-limits where the x-limits will be based off of two unix timestamps given in your question and the y-limits are just generic numbers.
The code then plots the randomised data and uses pyplot methods to convert the x-axis formatting to nicely represented strings (rather than unix timestamps or array numbers).
The code is well commented and should explain everything you need, if not please comment and ask for clarification.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
# Generate some random data for imshow
N = 10
arr = np.random.random((N, N))
# Create your x-limits. Using two of your unix timestamps you first
# create a list of datetime.datetime objects using map.
x_lims = list(map(dt.datetime.fromtimestamp, [982376726, 982377321]))
# You can then convert these datetime.datetime objects to the correct
# format for matplotlib to work with.
x_lims = mdates.date2num(x_lims)
# Set some generic y-limits.
y_lims = [0, 100]
fig, ax = plt.subplots()
# Using ax.imshow we set two keyword arguments. The first is extent.
# We give extent the values from x_lims and y_lims above.
# We also set the aspect to "auto" which should set the plot up nicely.
ax.imshow(arr, extent = [x_lims[0], x_lims[1], y_lims[0], y_lims[1]],
aspect='auto')
# We tell Matplotlib that the x-axis is filled with datetime data,
# this converts it from a float (which is the output of date2num)
# into a nice datetime string.
ax.xaxis_date()
# We can use a DateFormatter to choose how this datetime string will look.
# I have chosen HH:MM:SS though you could add DD/MM/YY if you had data
# over different days.
date_format = mdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(date_format)
# This simply sets the x-axis data to diagonal so it fits better.
fig.autofmt_xdate()
plt.show()

Categories

Resources