I would like to plot a graph with the most recent dates on the left instead of the right of the x-axis.
Is there a way to do this in pandas and matplotlib and still get the date axis?
Invert an axis in a matplotlib grafic
shows how to do this for the y-axis using invert_yaxis(). However, this is not available for xaxis.
Set xlim() from pyplot. Let's take this example:
period = pd.period_range("1.1.2013","12.1.2013",freq="M")
data = np.arange(12)
s = pd.Series(data=data,index=period)
#Output
2013-01 0
2013-02 1
2013-03 2
2013-04 3
2013-05 4
2013-06 5
2013-07 6
2013-08 7
2013-09 8
2013-10 9
2013-11 10
2013-12 11
Set first value of xlim to be last index of series and second value to be the first index, like this:
s.plot()
plt.xlim(s.index[-1],s.index[0])
plt.show()
Related
I have a table that I'm currently trying to display on a bar chart. It is annual data, with various data from the 1st/jan of one year until the 31st/dec of the same year
DATE COUNT
0 2019-01-01 42
1 2019-02-01 3
2 2019-03-01 31
3 2019-04-01 13
4 2019-05-01 1
...
When I plot this with 'date' as the x-axis, plotly is automatically converting the x axis to weeks, so that i have 52 bars instead of 365.
fig = px.histogram(df, x="DATE", y="COUNT", title="title")
fig.update_layout(bargap=0.30)
fig
I've tried updating the ticks with various formats, but this just changes the x axis labels, not the number of bars
I'm not sure how to change it from weekly to daily on the x-axis
I'm new to Python. I hope you can help me.
I have a dataframe with two columns. The first column is called dates and the second column is filled with numbers. The dataframe has 351 row.
dates numbers
01.03.2019 5
02.03.2019 8
...
20.02.2020 3
21.02.2020 2
I want the whole first column to be on the x axis from. I tried to plot it like this:
graph = FinalDataframe.plot(figsize=(12, 8))
graph.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
graph.set_xticklabels(FinalDataframe['dates'])
plt.show()
But on the x axis are only the first few values from the column instead of the whole column. Furthermore, they are not correlated to the data from the second column.
Any suggestions?
Thank you in advance!
Your issue is that x ticks are generated automatically, and spaced out to be readable. However you the tell matplotlib to use all the labels. The simple fix is to tell him to use one tick label per entry, but that’s going to make your x-axis unreadable:
graph.set_xticks(range(len(FinalDataframe['dates'])))
Now you could space them out manually:
graph.set_xticks(range(0, len(FinalDataframe['dates']), 61))
graph.set_xticklabels(FinalDataframe['dates'][::61])
However the best result to plot dates on the x-axis is still to use pandas’ built-in date objects. We can do this with pd.to_datetime
This will also allow pandas to know where to place points on the x-axis, by specifying that you want the x-axis to be the dates. In that way, if dates are not sorted or missing, the gaps will be skipped properly, and points will be above the ordinate of the right date.
I’m first recreating a dataframe that looks like what you posted:
>>> df = pd.DataFrame({'dates': pd.date_range('20190301', '20200221', freq='D').strftime('%d.%m.%Y'), 'numbers': np.random.randint(0, 10, 358)})
>>> df
dates numbers
0 01.03.2019 2
1 02.03.2019 2
2 03.03.2019 5
3 04.03.2019 4
4 05.03.2019 3
.. ... ...
353 17.02.2020 2
354 18.02.2020 1
355 19.02.2020 2
356 20.02.2020 3
357 21.02.2020 1
(This should be the same as FinalDataFrame, or if your dates are the index, then it’s the same as FinalDataFrame.reset_index())
Now I’m converting the dates:
>>> df['dates'] = pd.to_datetime(df['dates'], format='%d.%m.%Y')
>>> df
dates numbers
0 2019-03-01 2
1 2019-03-02 2
2 2019-03-03 5
3 2019-03-04 4
4 2019-03-05 3
.. ... ...
353 2020-02-17 2
354 2020-02-18 1
355 2020-02-19 2
356 2020-02-20 3
357 2020-02-21 1
You can check your columns contain dates and not string representations of dates by checking their dtypes:
>>> df.dtypes
dates datetime64[ns]
numbers int64
Finally plotting:
>>> ax = df.plot(x='dates', y='numbers', figsize=(12, 8))
>>> ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
<matplotlib.legend.Legend object at 0x7fc8c24fd4f0>
>>> plt.show()
Legends are taken care of automatically. This is what you get:
I need to display months on x axis of a plot instead of the indexes number of a data frame that goes from 1 to 365, which they represent the number of day of the year. So instead of an x-axis which goes from 1 to 365, I want to display it as "Jan", "Feb" and so on, without losing the structure of the plot.
Here is the main structure of my data frame:
Month Day Max_Data Min_Data MonthDay
1 1 1 1.1 -13.3 1-1
2 1 2 3.9 -12.2 1-2
3 1 3 3.9 -6.7 1-3
4 1 4 4.4 -8.8 1-4
5 1 5 2.8 -15.5 1-5
I am currently plotting using:
plt.scatter(data_2015.index, data_2015['Max_Data'], marker='^', color='green',s=40, alpha=1.0)
And if I changed data_2015.index to Month the graph will plot a different a completely wrong values, as they 28, 30 or 31 rows for each month.
So what is the way to convert indexes into month and display them into the x axis of a plot?
I found a solution by doing simply the following:
month_starts = [0,31,60,91,121,152,182,213,244,274,305,335]
month_names = ['Jan','Feb','Mar','Apr','May','Jun',
'Jul','Aug','Sep','Oct','Nov','Dec']
plt.gca().set_xticks(month_starts)
plt.gca().set_xticklabels(month_names)
from this post on stack overflow.
I have a Series with more than 100 000 rows that I want to plot. I have problem with the x-axis of my figure. Since my x-axis is made of several dates, you can't see anything if you plot all of them.
How can I choose to show only 1 out of every x on the x-axis ?
Here is an example of a code which produces a graphic with an ugly x-axis :
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
Out :
2018-06-01 0
2018-06-02 1
2018-06-03 2
2018-06-04 3
2018-06-05 4
2018-06-06 5
2018-06-07 6
2018-06-08 7
2018-06-09 8
2018-06-10 9
2018-06-11 10
2018-06-12 11
2018-06-13 12
2018-06-14 13
2018-06-15 14
fig = plt.plot(sr)
plt.xlabel('Date')
plt.ylabel('Sales')
Using xticks you can achieve the desired effect:
In your example:
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
fig = plt.plot(sr)
plt.xlabel('Date')
plt.xticks(sr.index[::4]) #Show one in every four dates
plt.ylabel('Sales')
Output:
Also, if you want to set the number of ticks, instead, you can use locator_params:
sr.plot(xticks=sr.reset_index().index)
plt.locator_params(axis='x', nbins=5) #Show five dates
plt.ylabel('Sales')
plt.xlabel('Date')
Output:
I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output: