How to change formatting of Line chart axis - python

I'm doing a time series analysis using the following code using seaborn in python,
df['Crop'] = pd.Categorical(df['Crop'], categories=df['Crop'].dropna().unique())
pd.to_datetime(df['DisplayDate'], format='%y-%m-%d')
cols = df['Gr']
fig, axes = plt.subplots(len(cols), figsize=(6,3))
sns.lineplot(data=df, x= df['DisplayDate'], y= df['Crop'], hue= df['Gr'])
plt.show()
the graph comes out with scribbles in the x-axis like this:
My thought is that i might be due to the size of the dates. Is there a way to change the formatting of the x-axis so that instead of showing the entire date, it only shows the month?

Related

from a series of datetimes, how to downsample and select less number of dates and put them in the x labels and xticks?

i am making a plot on which the x axis represents dates and the y axis represents total covid cases. the problem is that due to a large dataset, there are many dates on the x axis and when i am ploting that i am getting a plot on which the xtick values are overlapped and i can not clearly see the covid cases at a particular date. so i want to make a clear graph. how can i do that? or you can also suggest me any better idea to make the graph more readable.
i am giving my code and plot below. Thanks.
ensure your dates are dates not strings
Use matplotlib date formatters
I've used data from UK as you did not provide sample
x = countries["date"]
y = countries["total_cases"]
fig, ax = plt.subplots(figsize=(10, 6))
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
ax.plot(x, y)

plot x-axis not displaying correctly for rolling mean

I'm obviously making a very basic mistake in adding a rolling mean plot to my figure.
The basic plot of close prices works fine, but as soon as I add the rolling mean to the plot, the x-axis dates get screwed up and I can't see what it's trying to do.
Here's the code:
import pandas as pd
import matplotlib.pyplot as plot
df = pd.read_csv('historical_price_data.csv')
df['Date'] = pd.to_datetime(df.Date, infer_datetime_format=True)
df.sort_index(inplace=True)
ax = df[['Date', 'Close']].plot(figsize=(14, 7), x='Date', color='black')
rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')
plot.show()
With this sample data set I am getting this figure:
Given this simplicity of this code, I'm obviously making a very basic mistake, I just can't see what it is.
EDIT: Interesting, although #AndreyPortnoy's suggestion to set the index to Date results in the odd error that Date is not in the index, when I use the built-in's per his suggestion, the figure is no longer a complete mess, but for some reason the x-axis is reversed, and the ticks are no longer dates, but apparently ints (?) even though df.types shows Date is datetime64[ns]
#Sandipan\ Dey: Here's what the dataset looks like. Per code above I'm using pd.to_datetime() to convert to datetime64, and have tried df[::-1] to fix the problem where it is reversed when the 2nd plot (mov_avg) is added to the figure (but not reversed when figure only has the 1 plot.)
The fact that your dates for the moving averages start at 1970 suggests that an integer range index is used. It was generated by default when you read in the csv file. Try inserting
df.set_index('Date', inplace=True)
before
df.sort_index(inplace=True)
Then you can do
ax = df['Close'].plot(figsize=(14, 7), color='black')
rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')
Note that I'm not passing x explicitly, letting pandas and matplotlib infer it.
You can simplify your code by using the builtin plotting facilities like so:
df['mov_avg'] = df['Close'].rolling(window=7).mean()
df[['Close', 'mov_avg']].plot(figsize=(14, 7))

Add evenly spaced ticks using matplotlib plot_date [duplicate]

I have a dataframe like this:
data_ = list(range(106))
index_ = pd.period_range('3/1/2004', '12/1/2012', freq='M')
df2_ = pd.DataFrame(data = data_, index = index_, columns = ['data'])
I want to plot this dataframe. Currently, I am using:
df2_.plot()
Now I like to control the labels (and possibly ticks) at the x axis. In particular, I like to have monthly ticks at the axis and possibly a label at every other month or quarterly labels. I also like to have vertical grid lines.
I started looking at this example but I am already failing at constructing the timedelta.
With regards to constructing the timedelta, datetime.timdelta() doesn’t have a parameter to specify months, so it’s probably convenient to stick to pd.date_range(). However, I found that objects of type pandas.tslib.Timestamp don’t play nice with matplotlib ticks so you could convert them to datetime.date objects like so
index_ = [pd.to_datetime(date, format='%Y-%m-%d').date()
for date in pd.date_range('2004-03-01', '2012-12-01', freq="M")]
It’s possible to add gridlines and customise axes labels by first defining a matplotlib axes object, and then passing this to DataFrame.plot()
ax = plt.axes()
df2_.plot(ax=ax)
Now you can add vertical gridlines to your plot
ax.xaxis.grid(True)
And specify quarterly xticks labels by using matplotlib.dates.MonthLocator and setting the interval to 3
ax.xaxis.set_major_locator(dates.MonthLocator(interval=3))
And finally, I found the ticks to be to be very crowded so I formatted them to get a nicer fit
ax.xaxis.set_major_formatter(dates.DateFormatter('%b %y'))
labels = ax.get_xticklabels()
plt.setp(labels, rotation=85, fontsize=8)
To produce the following:

Axis interval spacing when plotting with pandas timedelta

I'm trying to plot some columns in a dataframe that has pandas timedelta values as its index. When I plot it, all the points are evenly spaced along the x axis even if there's a variable time between.
time = [pd.Timestamp('9/3/2016')-pd.Timestamp('9/1/2016'),pd.Timestamp('9/8/2016')-pd.Timestamp('9/1/2016'),pd.Timestam\p('9/29/2016')-pd.Timestamp('9/1/2016')]
df = pd.DataFrame(index=time, columns=['y'],data=[5,0,10])
df.plot()
plt.show()
Wrong spacing
If instead I used dates instead of timedelta, I get the proper spacing on the x axis:
time = [pd.Timestamp('9/3/2016'),pd.Timestamp('9/5/2016'),pd.Timestamp('9/20/2016')]
df = pd.DataFrame(index=time, columns=['y'],data=[5,0,10])
df.plot()
plt.show()
Right spacing
Is there a way to get this to display correctly?
At the moment, it's not fully supported yet in pandas. Please see this issue on Github for more info.
For a quick workaround, you can use:
import matplotlib.pyplot as plt
plt.plot(df.index, df.values)
Here's an example of how you could play with the ticks to make them readable (rather than just a very large number)
import matplotlib as mpl
import datetime
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
plt.xticks([t.value for t in df.index], df.index, rotation=45)
plt.show()

python pandas timeseries plots, how to set xlim and xticks outside ts.plot()?

fig = plt.figure()
ax = fig.gca()
ts.plot(ax=ax)
I know I can set xlim inside pandas plotting routine: ts.plot(xlim = ...), but how to change it after pandas plotting is done?
ax.set_xlim(( t0.toordinal(), t1.toordinal() )
works sometimes, but if pandas is formatting the xaxis as months from epoch, not days, this will fail hard.
Is there anyway to know how pandas has converted the dates to xaxis and then convert my xlim in the same way?
Thanks.
It works for me (with pandas 0.16.2) if I set the x-axis limits using pd.Timestamp values.
Example:
import pandas as pd
# Create a random time series with values over 100 days
# starting from 1st March.
N = 100
dates = pd.date_range(start='2015-03-01', periods=N, freq='D')
ts = pd.DataFrame({'date': dates,
'values': np.random.randn(N)}).set_index('date')
# Create the plot and adjust x/y limits. The new x-axis
# ranges from mid-February till 1st July.
ax = ts.plot()
ax.set_xlim(pd.Timestamp('2015-02-15'), pd.Timestamp('2015-07-01'))
ax.set_ylim(-5, 5)
Result:
Note that if you plot multiple time series in the same figure then make sure to set xlim/ylim after the last ts.plot() command, otherwise pandas will automatically reset the limits to match the contents.

Categories

Resources