How can i create multiple pie chart using matplotlib - python

I have a Pandas DataFrame seems like this
Year EventCode CityName EventCount
2015 10 Jakarta 12
2015 10 Yogjakarta 15
2015 10 Padang 27
...
2015 13 Jayapura 34
2015 14 Jakarta 24
2015 14 Yogjaarta 15
...
2019 14 Jayapura 12
i want to visualize top 5 city that have the biggest EventCount (with pie chart), group by eventcode in every year
How can i do that?

This could be achieved by restructuring your data with pivot_table, filtering on top cities using sort_values and the DataFrame.plot.pie method with subplots parameter:
# Pivot your data
df_piv = df.pivot_table(index='EventCode', columns='CityName',
values='EventCount', aggfunc='sum', fill_value=0)
# Get top 5 cities by total EventCount
plot_cities = df_piv.sum().sort_values(ascending=False).head(5).index
# Plot
df_piv.reindex(columns=plot_cities).plot.pie(subplots=True,
figsize=(10, 7),
layout=(-1, 3))
[out]

Pandas supports plotting each column into a subplot automatically. So you want to select the CityName as index, make EventCode as column and plot.
(df.sort_values('EventCount', ascending=False) # sort descending by `EventCount`
.groupby('EventCode', as_index=False)
.head(5) # get 5 most count within `EventCode`
.pivot(index='CityName', # pivot for plot.pie
columns='EventCode',
values='EventCount'
)
.plot.pie(subplots=True, # plot with some options
figsize=(10,6),
layout=(2,3))
)
Output:

Related

How do you graph multiple items in a dataframe on one graph using pandas and matplotlib.pyplot?

The dataframe I am trying to graph is below. I want to plot each fieldname as the legend item with x=year and y=value
The name of the dataframe is my_gross
fieldName thisType value year
0 diluted_shares_outstanding unit 9.637900e+07 2015
1 diluted_shares_outstanding unit 8.777500e+07 2016
2 diluted_shares_outstanding unit 8.556200e+07 2017
3 diluted_shares_outstanding unit 8.353000e+07 2018
4 diluted_shares_outstanding unit 7.771000e+07 2019
5 diluted_shares_outstanding unit 7.292900e+07 2020
6 eps gross 7.360470e+08 2015
7 eps gross 7.285207e+08 2016
8 eps gross 8.944702e+08 2017
9 eps gross 1.298734e+09 2018
10 eps gross 1.451550e+09 2019
11 eps gross 1.259110e+09 2020
18 sales_revenue gross 5.817000e+09 2015
19 sales_revenue gross 5.762000e+09 2016
20 sales_revenue gross 6.641000e+09 2017
21 sales_revenue gross 8.047000e+09 2018
22 sales_revenue gross 9.351000e+09 2019
23 sales_revenue gross 8.530000e+09 2020
The following code is what I ran to create a graph, but I get undesired results.
for item in my_gross['fieldName']:
plt.plot(my_gross['year'], my_gross['value'],label=item)
plt.legend()
plt.xticks(rotation=45)
plt.show()
Results
undesired graph
The result I am trying to get is similar to this graph
desired graph
Do I need to create a dictionary for unique values and do some sort of count and then loop through that dictionary instead of the df itself?
The standard pandas and matplotlib approach is to pivot to wide-form and plot:
import pandas as pd
from matplotlib import pyplot as plt
plot_df = df.pivot(index='year',
columns='fieldName',
values='value')
plot_df.plot()
plt.tight_layout()
plt.show()
plot_df:
fieldName diluted_shares_outstanding eps sales_revenue
year
2015 96379000.0 7.360470e+08 5.817000e+09
2016 87775000.0 7.285207e+08 5.762000e+09
2017 85562000.0 8.944702e+08 6.641000e+09
2018 83530000.0 1.298734e+09 8.047000e+09
2019 77710000.0 1.451550e+09 9.351000e+09
2020 72929000.0 1.259110e+09 8.530000e+09
seaborn.lineplot has built-in functionality with hue without needing to reshape:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
sns.lineplot(data=df, x='year', y='value', hue='fieldName')
plt.tight_layout()
plt.show()
There're several ways to do it, depending on libraries available.
Using just pandas (with matplotlib being used by pandas in backend):
Loop over unique values in your 'fieldName' column, filter the DataFrame to only include that value, set index to year (this will be your x-axis), and choose the value you intent to plot ('value' Series), then plot it.
for fieldname in df['fieldName'].unique():
df[df['fieldName'] == fieldname].set_index('year')['value'].plot(label = fieldname)
plt.legend()
Outputs:
EDIT:
Seems like a relatively simple groupby works (no loops needed):
df.set_index('year').groupby('fieldName')['value'].plot()
plt.legend()
Outputs:

Attempting create bar charts form a Pandas Data Frame. Charts to be specific to the month

Dataframe contains essentially three things.
Date, Count, and Company.
I want to create a program that makes bar charts with count on the y axis and company on the x axis; but there should be multiple charts for different months. for.eg there should be a may chart containing all the companies counts from that month only
Ive tried using groupby to organise them by company and using .sum() to count up for the whole database per company but am not able to do it also specific to a month
I can group them by company but I want to create individual graphs per company and also by month
Metric Count Date
Apple 97 16/01/2019
Samsung 84 06/01/2019
Linux 100 03/02/2019
Microsoft 61 29/01/2019
Blackberry 17 24/02/2019
LG 98 23/02/2019
Panasonic 20 22/02/2019
Apple 100 19/03/2019
Samsung 43 02/01/2019
Linux 21 06/01/2019
Microsoft 72 05/03/2019
Blackberry 75 24/03/2019
LG 82 19/03/2019
Panasonic 42 25/02/2019
Apple 50 12/01/2019
Samsung 74 15/02/2019
Linux 41 09/03/2019
Microsoft 97 12/03/2019
Blackberry 15 28/03/2019
I can group them by company but I want to create individual graphs per company and also by month
df = pd.read_csv('values.csv', delimiter = ',')
df.head(1)
df = df.query('Metric == "Company"')
df = df.groupby('Company').sum().Count
print(df)
df = df.plot(kind='bar', align='center', title ="entity",figsize=(15,10),legend=True, fontsize=5)
df.set_ylabel("Count",fontsize=12)
df.set_xlabel("Company",fontsize=12)
I can group them by company but I want to create individual graphs per company and also by month
Try this
data['Date']= pd.to_datetime(data['Date'],format='%d/%m/%Y')
data['Month']=data['Date'].dt.strftime('%b')
df = data.groupby(['Month', 'Metric']).sum()
df.plot(kind='bar')
It gives the output as below.
One plot for each month could be plotted with the code below
data['Date']= pd.to_datetime(data['Date'],format='%d/%m/%Y')
data['Month']=data['Date'].dt.strftime('%b')
df = data.groupby(['Month', 'Metric']).sum()
months = df.index.levels[0]
for month in months:
data = df.loc[month]
data.plot(kind='bar', align='center', title =str(month), legend=True)
IIUC:
new_df = (df.groupby([pd.Grouper(key='Date', freq='M'])
.Count.sum()
)
(new_df.reset_index()
.groupby('Date')
.plot.bar(x='Metric', y='Count',subplots=True)
)
You could add a 'Month' column and group by month and metric:
import datetime
# New month column
month_key = lambda x: datetime.date(x.year, x.month, 1)
df['Month'] = df['Date'].apply(month_key)
# Group by month and metric
df = df.groupby(['Month', 'Metric']).sum()
# One plot for each month
months = df.index.levels[0]
for month in months:
data = df.loc[month]
data.plot(kind='bar', align='center', title =str(month), legend=True)

Creating labels based on year in column Python

I've made a dataframe that has dates and 2 values that looks like:
Date Year Level Price
2008-01-01 2008 56 11
2008-01-03 2008 10 12
2008-01-05 2008 52 13
2008-02-01 2008 66 14
2008-05-01 2008 20 10
..
2009-01-01 2009 12 11
2009-02-01 2009 70 11
2009-02-05 2009 56 12
..
2018-01-01 2018 56 10
2018-01-11 2018 10 17
..
I'm able to plot these by colors on their year by creating a column on their years with df['Year'] = df['Date'].dt.year but I want to also have labels on each Year in the legend.
My code right now for plotting by year looks like:
colors = ['turquoise','orange','red','mediumblue', 'orchid', 'limegreen']
fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors))
plt.title('Title', fontsize=16)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
How can I adjust the labels in the legend to show the year? The way I've done it is just using the Year column but that obviously just gives me results like this:
When you are scattering your points, you will want to make sure that you are accessing a col in your dataframe that exists. In your code, you are trying to access a column called 'Year' which doesn't exist. See below for the problem:
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors)
In this line of code, where you specify the color (c) you are looking for a column that doesn't exist. As well, you have the same problem with your label that you are passing in. To solve this you need to create a column that contains the year:
Extract all the dates
Grab just the year from each date
Add this to your dataframe
Below is some code to implement these steps:
# Create a list of all the dates
dates = df.Date.values
#Create a list of all of the years using list comprehension
years = [x[0] for x in dates.split('-')]
# Add this column to your dataframe
df['Year'] = years
As well I would direct you to this course to learn more about plotting in python!
https://exlskills.com/learn-en/courses/python-data-modeling-intro-for-machine-learning-python_modeling_for_machine_learning/content

Groupby and plot bar graph

I want to plot a bar graph for sales over period of year. x-axis as 'year' and y-axis as sum of weekly sales per year. While plotting I am getting 'KeyError: 'year'. I guess it's because 'year' became index during group by.
Below is the sample content from csv file:
Store year Weekly_Sales
1 2014 24924.5
1 2010 46039.49
1 2015 41595.55
1 2010 19403.54
1 2015 21827.9
1 2010 21043.39
1 2014 22136.64
1 2010 26229.21
1 2014 57258.43
1 2010 42960.91
Below is the code I used to group by
storeDetail_df = pd.read_csv('Details.csv')
result_group_year= storeDetail_df.groupby(['year'])
total_by_year = result_group_year['Weekly_Sales'].agg([np.sum])
total_by_year.plot(kind='bar' ,x='year',y='sum',rot=0)
Updated the Code and below is the output:
DataFrame output:
year sum
0 2010 42843534.38
1 2011 45349314.40
2 2012 35445927.76
3 2013 0.00
below is the Graph i am getting:
While reading your csv file, you needed to use white space as the delimiter as delim_whitespace=True and then reset the index after summing up the Weekly_Sales. Below is the working code:
storeDetail_df = pd.read_csv('Details.csv', delim_whitespace=True)
result_group_year= storeDetail_df.groupby(['year'])
total_by_year = result_group_year['Weekly_Sales'].agg([np.sum]).reset_index()
total_by_year.plot(kind='bar' ,x='year',y='sum',rot=0, legend=False)
Output
In case it is making year your index due to group by command. you need to remove it as a index before plotting.
Try
total_by_year = total_by_year.reset_index(drop=False, inplace=True)
You might want to try this
storeDetail_df = pd.read_csv('Details.csv')
result_group_year= storeDetail_df.groupby(['year'])['Weekly_Sales'].sum()
result_group_year = result_group_year.reset_index(drop=False)
result_group_year.plot.bar(x='year', y='Weekly_Sales')

How to plot by category over time

I have two columns, categorical and year, that I am trying to plot. I am trying to take the sum total of each categorical per year to create a multi-class time series plot.
ax = data[data.categorical=="cat1"]["categorical"].plot(label='cat1')
data[data.categorical=="cat2"]["categorical"].plot(ax=ax, label='cat3')
data[data.categorical=="cat3"]["categorical"].plot(ax=ax, label='cat3')
plt.xlabel("Year")
plt.ylabel("Number per category")
sns.despine()
But am getting an error stating no numeric data to plot. I am looking for something similar to the above, perhaps with data[data.categorical=="cat3"]["categorical"].lambda x : (1 for x in data.categorical)
I will use the following lists as examples.
categorical = ["cat1","cat1","cat2","cat3","cat2","cat1","cat3","cat2","cat1","cat3","cat3","cat3","cat2","cat1","cat2","cat3","cat2","cat2","cat3","cat1","cat1","cat1","cat3"]
year = [2013,2014,2013,2015,2014,2014,2013,2014,2014,2015,2015,2013,2014,2014,2013,2014,2015,2015,2015,2013,2014,2015,2013]
My goal is to obtain something similar to the following picture
I'm hesitant to call this a "solution", as it's basically just a summary of basic Pandas functionality, which is explained in the same documentation where you found the time series plot you've placed in your post. But seeing as there's some confusion around groupby and plotting, a demo may help clear things up.
We can use two calls to groupby().
The first groupby() gets a count of category appearances per year, using the count aggregation.
The second groupby() is used to plot the time series for each category.
To start, generate a sample data frame:
import pandas as pd
categorical = ["cat1","cat1","cat2","cat3","cat2","cat1","cat3","cat2",
"cat1","cat3","cat3","cat3","cat2","cat1","cat2","cat3",
"cat2","cat2","cat3","cat1","cat1","cat1","cat3"]
year = [2013,2014,2013,2015,2014,2014,2013,2014,2014,2015,2015,2013,
2014,2014,2013,2014,2015,2015,2015,2013,2014,2015,2013]
df = pd.DataFrame({'categorical':categorical,
'year':year})
categorical year
0 cat1 2013
1 cat1 2014
...
21 cat1 2015
22 cat3 2013
Now get counts per category, per year:
# reset_index() gives a column for counting, after groupby uses year and category
ctdf = (df.reset_index()
.groupby(['year','categorical'], as_index=False)
.count()
# rename isn't strictly necessary here, it's just for readability
.rename(columns={'index':'ct'})
)
year categorical ct
0 2013 cat1 2
1 2013 cat2 2
2 2013 cat3 3
3 2014 cat1 5
4 2014 cat2 3
5 2014 cat3 1
6 2015 cat1 1
7 2015 cat2 2
8 2015 cat3 4
Finally, plot time series for each category, keyed by color:
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
# key gives the group name (i.e. category), data gives the actual values
for key, data in ctdf.groupby('categorical'):
data.plot(x='year', y='ct', ax=ax, label=key)
Have you tried groupby?
df.groupby(["year","categorical"]).count()

Categories

Resources