I have a dataframe that shows monthly revenue. There is an additional column that shows the number of locations opened in that month.
> Date Order Amount Locations Opened
16 2016-05-31 126443.17 2.0
> 17 2016-06-30 178144.27 0.0
18 2016-07-31 230331.96 1.0
> 19 2016-08-31 231960.04 0.0
20 2016-09-30 208445.26 0.0
I'm using seaborn to plot the revenue by month
sns.lineplot(x="Date", y="Order Amount",
data=total_monthly_rev).set_title("Total Monthly Revenue")
I've been trying, unsuccessfully, to use the third column, Locations Opened, to add supporting text to the lineplot so I can show the number of locations opened in a month, where Locations Opened > 0.
IIUC, use text:
plt.figure(figsize=(12, 5))
sns.lineplot(x="Date", y="Order Amount", data=total_monthly_rev).set_title("Total Monthly Revenue")
# Using a variable to manage how above/below text should appear
slider = 1000
for i in range(total_monthly_rev.shape[0]):
if total_monthly_rev['LocationsOpened'].iloc[i] > 0:
plt.text(total_monthly_rev.Date.iloc[i],
total_monthly_rev['Order Amount'].iloc[i] + slider,
total_monthly_rev['LocationsOpened'].iloc[i])
plt.show()
Related
I have two dataframes, df_iter_actuals and df_iter_preds that look like this:
cohort FGCI 0.0 2017 FGCI 1.0 2020 FGCI 1.0 2021
Month_Next
2022-04-01 1.207528 10.381440 4.332759
2022-05-01 1.529364 1.649636 5.007991
2022-06-01 21.715032 7.491215 6.096792
2022-07-01 0.958635 12.460808 5.759696
2022-08-01 25.637608 0.961132 4.635855
2022-09-01 0.997071 0.721632 3.799172
2022-10-01 27.006847 0.811228 3.586541
cohort FGCI 0.0 2017 FGCI 1.0 2020 FGCI 1.0 2021
Month_Next
2022-04-01 16.804628 5.143954 3.296097
2022-05-01 16.804628 5.143954 3.193598
2022-06-01 16.804628 5.143954 3.066248
2022-07-01 16.804628 5.143954 2.907984
2022-08-01 16.804628 5.143954 2.711235
2022-09-01 16.804628 5.143954 2.466544
2022-10-01 16.804628 5.143954 2.162079
One is the actual values and the other is the predicted values for a certain time series data set. I'd like to plot the shared columns between the two dataframes on a single plot, but creating a new plot for each column. For example, I'd like the data for FGCI 0.0 2017 to be shown on the same plot as line graphs for both dfs, and then for the next plot showing the data for FGCI 1.0 2020. I was able to accomplish this with just one dataframe with
for i in df_iter_actuals.columns:
plt.figure(figsize = (10,8))
plt.plot(df_iter[i])
plt.title(f'Pct Error of CPR Predictions for {i}')
But I don't know how to do it with two dataframes.
Since the format/column name is the same for both df, you can just call them :
for i in df_iter_actuals.columns:
plt.figure(figsize = (10,8))
plt.plot(df_iter_actuals[i])
plt.plot(df_iter_preds[i])
plt.title(f'Pct Error of CPR Predictions for {i}')
# Create figure and axes
fig, ax = plt.subplots(figsize=(10, 8))
# Iterate over columns in dataframes
for col in df_iter_actuals.columns:
# Use the `df.plot()` function to plot the column from each dataframe
df_iter_actuals[col].plot(ax=ax, label="Actual")
df_iter_preds[col].plot(ax=ax, label="Predicted")
# Set the title of the plot to the column name to whatever you need
ax.set_title(f"Pct Error of CPR Predictions for {col}")
# Show the plot and legend
ax.legend()
plt.show()
Just iterate over the columns of your two dataframes and then use the df.plot() function to plot each column from both dataframes on the same graph figure!
You can concat and use groupby.plot:
(pd.concat([df_iter_actuals, df_iter_preds], keys=['actual', 'pred'], axis=1)
.groupby(level=1, axis=1).plot()
)
I need to display months on x axis of a plot instead of the indexes number of a data frame that goes from 1 to 365, which they represent the number of day of the year. So instead of an x-axis which goes from 1 to 365, I want to display it as "Jan", "Feb" and so on, without losing the structure of the plot.
Here is the main structure of my data frame:
Month Day Max_Data Min_Data MonthDay
1 1 1 1.1 -13.3 1-1
2 1 2 3.9 -12.2 1-2
3 1 3 3.9 -6.7 1-3
4 1 4 4.4 -8.8 1-4
5 1 5 2.8 -15.5 1-5
I am currently plotting using:
plt.scatter(data_2015.index, data_2015['Max_Data'], marker='^', color='green',s=40, alpha=1.0)
And if I changed data_2015.index to Month the graph will plot a different a completely wrong values, as they 28, 30 or 31 rows for each month.
So what is the way to convert indexes into month and display them into the x axis of a plot?
I found a solution by doing simply the following:
month_starts = [0,31,60,91,121,152,182,213,244,274,305,335]
month_names = ['Jan','Feb','Mar','Apr','May','Jun',
'Jul','Aug','Sep','Oct','Nov','Dec']
plt.gca().set_xticks(month_starts)
plt.gca().set_xticklabels(month_names)
from this post on stack overflow.
I have a dataset with a multi-index and I would like to graph based on one index and one of the columns.
I tried referencing the data, '% Smokers', based on the index. The two indexes are Age Group and Year.
I want the graph to have 4 lines, for each age group, with Year as the x-axis.
The tail of my dataset looks like:
% Smokers Cigs per Day Smoker Count Total Count
Age Group Year
4.0 2003 9.221673 14.947439 86486.103843 9.378570e+05
1.0 2002 23.668647 7.832528 185319.850343 7.829761e+05
2.0 2002 24.130250 10.379573 616136.073633 2.553376e+06
3.0 2002 23.300126 13.569244 389576.705723 1.671994e+06
4.0 2002 9.892616 12.739635 89247.050214 9.021583e+05
I tried the following code:
fig, ax = plt.subplots(1,2, figsize = (20,10))
ax[0].plot(part1_df["% Smokers"].loc[1.0])
ax[0].plot(part1_df["% Smokers"].loc[2.0])
ax[0].plot(part1_df["% Smokers"].loc[3.0])
ax[0].plot(part1_df["% Smokers"].loc[4.0])
I'm getting a KeyError: '% Smokers'
I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:
I've successfully created the code to generate a bunch of charts. However, the x axis labels are slightly offset (to the left) from the x axis tick marks.
Dataframe
stationId date variable value prefix uom
0 site 1 2016-04-07 pH 6.90 NaN pH
1 site 1 2016-07-11 pH 6.80 NaN pH
2 site 1 2017-10-09 pH 6.80 NaN pH
3 site 1 2017-10-09 pH 6.80 NaN pH
4 site 1 2016-06-29 pH 6.79 NaN pH
Full dataframe here
There is nothing in the code which i can see why this should happen.
#plot
for line,group in linedf.groupby(['variable']):
x = group['date']
ax1 = group.plot(x='date', figsize=(8.2,4.5),linestyle='--',
linewidth=0.75,rot=0,marker='o',markersize=3)
#set axis labels and chart title
plt.title("chartTitle", fontsize=12)
ax1.set_xlabel('Date', fontsize=10)
ax1.set_ylabel('GWL (mAHD)',fontsize=10)
#set text font
rcParams['font.family'] = 'serif'
rcParams['font.serif'] = ['Cambria']
#set dates for x tick labels
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
lgd = plt.legend(bbox_to_anchor=(0.0 ,-0.13, 1.0, -0.03),
loc=2,ncol = 6, mode="expand", borderaxespad=0.0,shadow=True)
plt.show()
Without seeing the dataframe you are using (or at least a chunk of it) I have to speculate a bit, but it should suffice to simply adjust the alignment of the tick labels manually using
for tick in ax1.xaxis.get_major_ticks():
tick.label1.set_horizontalalignment('center')
Without the dataframe I can't test to ensure this works in your case, but from the plot in the question it appears the alignment of the x-tick labels has been set to 'right' and setting them to 'center' will align them how you desire.
Drawn from the centered ticklabels example in the matplotlib docs.