I have a dataframe like this:
data_ = list(range(106))
index_ = pd.period_range('3/1/2004', '12/1/2012', freq='M')
df2_ = pd.DataFrame(data = data_, index = index_, columns = ['data'])
I want to plot this dataframe. Currently, I am using:
df2_.plot()
Now I like to control the labels (and possibly ticks) at the x axis. In particular, I like to have monthly ticks at the axis and possibly a label at every other month or quarterly labels. I also like to have vertical grid lines.
I started looking at this example but I am already failing at constructing the timedelta.
With regards to constructing the timedelta, datetime.timdelta() doesn’t have a parameter to specify months, so it’s probably convenient to stick to pd.date_range(). However, I found that objects of type pandas.tslib.Timestamp don’t play nice with matplotlib ticks so you could convert them to datetime.date objects like so
index_ = [pd.to_datetime(date, format='%Y-%m-%d').date()
for date in pd.date_range('2004-03-01', '2012-12-01', freq="M")]
It’s possible to add gridlines and customise axes labels by first defining a matplotlib axes object, and then passing this to DataFrame.plot()
ax = plt.axes()
df2_.plot(ax=ax)
Now you can add vertical gridlines to your plot
ax.xaxis.grid(True)
And specify quarterly xticks labels by using matplotlib.dates.MonthLocator and setting the interval to 3
ax.xaxis.set_major_locator(dates.MonthLocator(interval=3))
And finally, I found the ticks to be to be very crowded so I formatted them to get a nicer fit
ax.xaxis.set_major_formatter(dates.DateFormatter('%b %y'))
labels = ax.get_xticklabels()
plt.setp(labels, rotation=85, fontsize=8)
To produce the following:
Related
I'm doing a time series analysis using the following code using seaborn in python,
df['Crop'] = pd.Categorical(df['Crop'], categories=df['Crop'].dropna().unique())
pd.to_datetime(df['DisplayDate'], format='%y-%m-%d')
cols = df['Gr']
fig, axes = plt.subplots(len(cols), figsize=(6,3))
sns.lineplot(data=df, x= df['DisplayDate'], y= df['Crop'], hue= df['Gr'])
plt.show()
the graph comes out with scribbles in the x-axis like this:
My thought is that i might be due to the size of the dates. Is there a way to change the formatting of the x-axis so that instead of showing the entire date, it only shows the month?
title probably does not make sense, but I will try to explain.
I am plotting chemical concentrations overtime. The x axis should be hours since midnight local time (i.e., 0,4,8,12,16,20). However, when I do this all of the xticks get smushed together to to left.
xticks = range(0,24,4)
ozoneest["mean"].plot(ax=ax, xticks=xticks,)
Results in:
xticks is only accepting arrays of datetime variables, which have values: 00:00, 04:00, 08:00, 12:00, 16:00, 20:00.
xticks = pd.date_range("2000/01/01", end="2000/01/02", freq="4H").time
ozoneest["mean"].plot(ax=ax, xticks=xticks,)
results in:
This is close to what I want, but I want just the number of the hour
Thanks!
I assume that your data is stored in a pandas dataframe with a DatetimeIndex that has an "Hour" frequency. I cannot exactly reproduce your problem seeing as you have not shared the code generating the ax object. Whether it is created with matplotlib or pandas, the problem is that the x-axis unit is based on the number of time periods (based on the DatetimeIndex frequency in pandas, days in matplotlib) that have passed since 1970-01-01. So the xticks = range(0,24,4) land far to the left relative to your datetimes. You can check the x-axis values of the default xticks with ax.get_xticks().
Here are two ways of formatting the xticks and labels as you want. I suggest that you do not create a new DatetimeIndex for the hours as this makes the code less easy to reuse, use instead the DatetimeIndex of the dataframe as shown in the second solution.
Create sample dataframe
import numpy as np # v 1.20.2
import pandas as pd # v 1.2.5
rng = np.random.default_rng(seed=123) # random number generator
time = pd.date_range(start="2000/01/01", end="2000/01/02", freq="H")[:-1]
mean = rng.normal(size=len(time))
ozoneest = pd.DataFrame(dict(mean=mean), index=time)
ozoneest.head()
Pandas plot with default xticks
ozoneest["mean"].plot()
Simple solution: do not use the DatetimeIndex as the x-axis
xticks = range(0,24,4)
ax = ozoneest["mean"].plot(use_index=False, xticks=xticks)
General solution: select xticks from DatetimeIndex and create labels with strftime
xticks = ozoneest.index[::4]
xticklabels = xticks.strftime("%H")
ax = ozoneest["mean"].plot()
ax.set_xticks(xticks)
ax.set_xticks([], minor=True)
ax.set_xticklabels(xticklabels)
This solution is more general because you do not need to manually adjust the xticks if the range of time of your dataset changes and the tick labels can be easily customized in many ways.
If you want to remove the leading zeros, you can use the following list comprehension:
xticklabels = [tick[1:] if tick[0] == "0" else tick for tick in xticks.strftime("%H")]
i am making a plot on which the x axis represents dates and the y axis represents total covid cases. the problem is that due to a large dataset, there are many dates on the x axis and when i am ploting that i am getting a plot on which the xtick values are overlapped and i can not clearly see the covid cases at a particular date. so i want to make a clear graph. how can i do that? or you can also suggest me any better idea to make the graph more readable.
i am giving my code and plot below. Thanks.
ensure your dates are dates not strings
Use matplotlib date formatters
I've used data from UK as you did not provide sample
x = countries["date"]
y = countries["total_cases"]
fig, ax = plt.subplots(figsize=(10, 6))
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
ax.plot(x, y)
I would like to remove the flat lines on my graph by keeping the labels x.
I have this code which gives me a picture
dates = df_stock.loc[start_date:end_date].index.values
x_values = np.array([datetime.datetime.strptime(d, "%Y-%m-%d %H:%M:%S") for d in dates])
fig, ax = plt.subplots(figsize=(15,9))
# y values
y_values = np.array(df_stock.loc[start_date:end_date, 'Bid'])
# plotting
_ = ax.plot(x_values, y_values, label='Bid')
# formatting
formatter = mdates.DateFormatter('%m-%d %H:%M')
ax.xaxis.set_major_formatter(formatter)
The flat lines correspond to data which does not exist I would like to know if it is possible not to display them while keeping the gap of the x labels.
thank you so much
You want to have time on the x-axis and time is equidistant -- independent whether you have data or not.
You now have several options:
don't use time on the x-axis but samples/index
do as in 1. but change the ticks & labels to draw time again (but this time not equidistantly)
make the value-vector equidistant and use NaNs to fill the gaps
Why is this so?
Per default, matplotlib produces a line plot, which connects the points with lines using the order in which they are presented. In contrast to this a scatter plot just plots the individual points, not suggesting any underlying order. You achieve the same result as if you would use a line plot without markers.
In general, you have 3-4 options
use the plot command but only plot markers (add linestyle='')
use the scatter command.
if you use NaNs, plotdoes not know what to plot and plots nothing (but also won't connect non-existing points with lines)
use a loop and plot connected sections as separate lines in the same axes
options 1/2 are the easiest if you want to do almost no changes on your code. Option 3 is the most proper and 4 mimics this result.
I want to plot a series of values against a date range in matplotlib. I changed the tick base parameter to 7, to get one tick at the beginning of every week (plticker.IndexLocator, base = 7). The problem is that the set_xticklabels function does not accept a base parameter. As a result, the second tick (representing day 8 on the beginning of week 2) is labelled with day 2 from my date range list, and not with day 8 as it should be (see picture).
How to give set_xticklabelsa base parameter?
Here is the code:
my_data = pd.read_csv("%r_filename_%s_%s_%d_%d.csv" % (num1, num2, num3, num4, num5), dayfirst=True)
my_data.plot(ax=ax1, color='r', lw=2.)
loc = plticker.IndexLocator(base=7, offset = 0) # this locator puts ticks at regular intervals
ax1.set_xticklabels(my_data.Date, rotation=45, rotation_mode='anchor', ha='right') # this defines the tick labels
ax1.xaxis.set_major_locator(loc)
Here is the plot:
Plot
Many thanks - your solution perfectly works. For the case that other people run into the same issue in the future: i have implemented the above-mentioned solution but also added some code so that the tick labels keep the desired rotation and also align (with their left end) to the respective tick. May not be pythonic, may not be best-practice, but it works
x_fmt = mpl.ticker.IndexFormatter(x)
ax.set_xticklabels(my_data.Date, rotation=-45)
ax.tick_params(axis='x', pad=10)
ax.xaxis.set_major_formatter(x_fmt)
labels = my_data.Date
for tick in ax.xaxis.get_majorticklabels():
tick.set_horizontalalignment("left")
The reason your ticklabels went bad is that setting manual ticklabels decouples the labels from your data. The proper approach is to use a Formatter according to your needs. Since you have a list of ticklabels for each data point, you can use an IndexFormatter. It seems to be undocumented online, but it has a help:
class IndexFormatter(Formatter)
| format the position x to the nearest i-th label where i=int(x+0.5)
| ...
| __init__(self, labels)
| ...
So you just have to pass your list of dates to IndexFormatter. With a minimal, pandas-independent example (with numpy only for generating dummy data):
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
# create dummy data
x = ['str{}'.format(k) for k in range(20)]
y = np.random.rand(len(x))
# create an IndexFormatter with labels x
x_fmt = mpl.ticker.IndexFormatter(x)
fig,ax = plt.subplots()
ax.plot(y)
# set our IndexFormatter to be responsible for major ticks
ax.xaxis.set_major_formatter(x_fmt)
This should keep your data and labels paired even when tick positions change:
I noticed you also set the rotation of the ticklabels in the call to set_xticklabels, you would lose this now. I suggest using fig.autofmt_xdate to do this instead, it seems to be designed exactly for this purpose, without messing with your ticklabel data.