I would like to use seaborn barplot() to create a bar chart from a multi-indexed Series. I have grouped my dataset by two variables:
module_7_a_df = module_7_df.groupby(by=['Reported Race "MONRACE"', 'Hispanic Origin "HISPORIG"'])['SENTENCE CAP "SENSPCAP"'].count()
Grouping the dataframe creates a Series. This is what the resulting Series looks like:
When I try to create a barplot, I keep getting an error stating 'isna is not defined for MultiIndex.' The code for the barplot is:
sns.barplot(x=module_7_a_df.values, y=module_7_a_df.index)
This code works for Series created where the data has only been grouped by one column.
Can someone understand how to deal with this error?
Remove all nan values from the columns you groupby before you group them.
module_7_a_df.dropna(subset=['Reported Race "MONRACE"', 'Hispanic Origin "HISPORIG"'])
When you have a multi-index, you need to reset_index and when use hue = to enable the grouping, using an example dataset:
import pandas as pd
import seaborn as sns
df = sns.load_dataset("tips")
counts = df.groupby(['time','day']).size()
counts
time day
Lunch Thur 61
Fri 7
Sat 0
Sun 0
Dinner Thur 1
Fri 12
Sat 87
Sun 76
dtype: int64
Then with the following:
counts = counts.to_frame('counts').reset_index()
sns.barplot(data = counts, x = "time",y="counts",hue="day")
I have this dataset:
df = pd.DataFrame()
df['year'] = [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011]
df['month'] = [1,2,3,4,5,6,1,2,3,4,5,6]
df['after'] = [0,0,0,1,1,1,0,0,0,1,1,1]
df['campaign'] = [0,0,0,0,0,0,1,1,1,1,1,1]
df['sales'] = [10000,11000,12000,10500,10000,9500,7000,8000,5000,6000,6000,7000]
df['date_m'] = pd.to_datetime(df.year.astype(str) + '-' + df.month.astype(str))
And I want to make a line plot grouped by month and campaign, so I have tried this code:
df['sales'].groupby(df['date_m','campaign']).mean().plot.line()
But I get this error message KeyError: ('date_m', 'campaign'). Please, any help will be greatly appreciated.
Plotting is typically dependant upon the shape of the DataFrame.
.groupby creates a long format DataFrame, which is great for seaborn
.pivot_table creates a wide format DataFrame, which easily works with pandas.DataFrame.plot
.groupby the DataFrame
df['sales'].groupby(...) is incorrect, because df['sales'] selects one column of the dataframe; none of the other columns are available
.groupby converts the DataFrame into a long format, which is great for plotting with seaborn.lineplot.
Specify the hue parameter to separate by 'campaign'.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# perform groupby and reset the index
dfg = df.groupby(['date_m','campaign'])['sales'].mean().reset_index()
# display(dfg.head())
date_m campaign sales
0 2011-01-01 0 10000
1 2011-01-01 1 7000
2 2011-02-01 0 11000
3 2011-02-01 1 8000
4 2011-03-01 0 12000
# plot with seaborn
sns.lineplot(data=dfg, x='date_m', y='sales', hue='campaign')
.pivot_table the DataFrame
.pivot_table shapes the DataFrame correctly for plotting with pandas.DataFrame.plot, and it has an aggregation parameter.
The DataFrame is shaped into a wide format.
# pivot the dataframe into the correct shape for plotting
dfp = df.pivot_table(index='date_m', columns='campaign', values='sales', aggfunc='mean')
# display(dfp.head())
campaign 0 1
date_m
2011-01-01 10000 7000
2011-02-01 11000 8000
2011-03-01 12000 5000
2011-04-01 10500 6000
2011-05-01 10000 6000
# plot the dataframe
dfp.plot()
Plotting with matplotlib directly
fig, ax = plt.subplots(figsize=(8, 6))
for v in df.campaign.unique():
# select the data based on the campaign
data = df[df.campaign.eq(v)]
# this is only necessary if there is more than one value per date
data = data.groupby(['date_m','campaign'])['sales'].mean().reset_index()
ax.plot('date_m', 'sales', data=data, label=f'{v}')
plt.legend(title='campaign')
plt.show()
Notes
Package versions:
pandas v1.2.4
seaborn v0.11.1
matplotlib v3.3.4
I want to create a line chart using Plotly. I have 3 variables(date,shift,runt).I want to include date with runt(also i want to display shift as well).
Dataframe:
What I want is to plot a line chart using both date and shift to x-axis.
This is what i got from excel. i want to plot a same graph in python
But I can't take two values.I tried to concatenate the date and shift to one column. But it shows first day values and then night values.
import plotly.express as px
fig = px.line(df, x="Day-Shift", y="RUNT", title='Yo',template="plotly_dark")
fig.show()
Is there any way to turn off order. what i want is shown in the above excel graph
I've created a column that combines the date and the shift and specified it on the x-axis. Does this meet the intent of your question?
import pandas as pd
import numpy as np
import io
data = '''
Date Shift RUNT
0 June-16 Day 350
1 June-16 Night 20
2 June-17 Day 350
3 June-17 Night 20
4 June-18 Day 350
5 June-18 Night 20
6 June-19 Day 350
7 June-19 Night 20
8 June-20 Day 350
9 June-20 Night 20
10 June-21 Day 350
11 June-21 Night 20
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df['Day-Shift'] = df['Date'].str.cat(df['Shift'], sep='-')
import plotly.express as px
fig = px.line(df, x="Day-Shift", y="RUNT", title='Yo',template="plotly_dark")
fig.show()
Search column for each month of the year. Column is organized like this "01-Jan-2018". I want to find how many times "Jan-2018" appears in the column. Basically count it and plot it on a bar graph. I want it to show all the quantities for "Jan-2018" , "Feb-2018", etc. Should be 12 bars on the graph. Maybe using count or sum. I am pulling the data from a CSV using pandas and python.
I have tried to printing it out onto the console with some success. But I am getting confused as correct way to search a portion of the date.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import csv
import seaborn as sns
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile1.csv', error_bad_lines=False, encoding="ISO-8859-1", skiprows=6)
cols = data.columns
cols = cols.map(lambda x: x.replace(' ', '_') if isinstance(x, (str)) else x)
data.columns = cols
print(data.groupby('Case_Date').mean().plot(kind='bar'))
I am expecting the a bar graph that will show the total quantity for each month. So there should be 12 bar graphs. But I am not sure how to search the column 12 times and each time only looking for the data of each month. While excluding the date, only searching for the month and year.
IIUC, this is what you need.
Let's work with the below dataframe as input dataframe.
date
0 1/31/2018
1 2/28/2018
2 2/28/2018
3 3/31/2018
4 4/30/2018
5 5/31/2018
6 6/30/2018
7 6/30/2018
8 7/31/2018
9 8/31/2018
10 9/30/2018
11 9/30/2018
12 9/30/2018
13 9/30/2018
14 10/31/2018
15 11/30/2018
16 12/31/2018
The below mentioned lines of code will get the number of count for each month as a bar graph. When you have a column as as datetime object, a lot of function are much easy & the contents of the column are much more flexible. With that, you don't need search string of the name of the month.
df['date'] = pd.to_datetime(df['date'])
df['my']=df.date.dt.strftime('%b-%Y')
ax = df.groupby('my', sort=False)['my'].value_counts().plot(kind='bar')
ax.set_xticklabels(df.my, rotation=90);
Output
I create a pandas dataframe with a DatetimeIndex like so:
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create datetime index and random data column
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=14, freq='D')
data = np.random.randint(1, 10, size=14)
columns = ['A']
df = pd.DataFrame(data, index=index, columns=columns)
# initialize new weekend column, then set all values to 'yes' where the index corresponds to a weekend day
df['weekend'] = 'no'
df.loc[(df.index.weekday == 5) | (df.index.weekday == 6), 'weekend'] = 'yes'
print(df)
Which gives
A weekend
2014-10-13 7 no
2014-10-14 6 no
2014-10-15 7 no
2014-10-16 9 no
2014-10-17 4 no
2014-10-18 6 yes
2014-10-19 4 yes
2014-10-20 7 no
2014-10-21 8 no
2014-10-22 8 no
2014-10-23 1 no
2014-10-24 4 no
2014-10-25 3 yes
2014-10-26 8 yes
I can easily plot the A colum with pandas by doing:
df.plot()
plt.show()
which plots a line of the A column but leaves out the weekend column as it does not hold numerical data.
How can I put a "marker" on each spot of the A column where the weekend column has the value yes?
Meanwhile I found out, it is as simple as using boolean indexing in pandas. Doing the plot directly with pyplot instead of pandas' own plot wrapper (which is more convenient to me):
plt.plot(df.index, df.A)
plt.plot(df[df.weekend=='yes'].index, df[df.weekend=='yes'].A, 'ro')
Now, the red dots mark all weekend days which are given by df.weekend='yes' values.