Multiple plots in 1 figure using for loop - python

I've got data in the below format, and what I'm trying to do is to:
1) loop over each value in Region
2) For each region, plot a time series of the aggregated (across Category) sales number.
Date |Region |Category | Sales
01/01/2016| USA| Furniture|1
01/01/2016| USA| Clothes |0
01/01/2016| Europe| Furniture|2
01/01/2016| Europe| Clothes |0
01/02/2016| USA| Furniture|3
01/02/2016| USA|Clothes|0
01/02/2016| Europe| Furniture|4
01/02/2016| Europe| Clothes|0 ...
The plot should look like the attached (done in excel).
However, if I try to do it in Python using the below, I get multiple charts when I really want all the lines to show up in one figure.
Python code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv(r'C:\Users\wusm\Desktop\Book7.csv')
plt.legend()
for index, group in df.groupby(["Region"]):
group.plot(x='Date',y='Sales',title=str(index))
plt.show()
Short of reformatting the data, could anyone advise on how to get the graphs in one figure please?

You can use pivot_table:
df = df.pivot_table(index='Date', columns='Region', values='Sales', aggfunc='sum')
print (df)
Region Europe USA
Date
01/01/2016 2 1
01/02/2016 4 3
or groupby + sum + unstack:
df = df.groupby(['Date', 'Region'])['Sales'].sum().unstack(fill_value=0)
print (df)
Region Europe USA
Date
01/01/2016 2 1
01/02/2016 4 3
and then DataFrame.plot
df.plot()

Related

visualization with pandas in python

I have the following problem
I want to plot following table:
I want to compare the new_cases from germany and france per week how can i visualise this?
I already tried multiple plots but I'm not happy with the results:
for example:
pivot_df['France'].plot(kind='bar')
plt.figure(figsize=(15,5))
pivot_df['France'].plot(kind='bar')
plt.figure(figsize=(15,5))`
But it only shows me france
I think you're trying to get a timeseries plot. For that you'll need to convert year_week to a datetime object. Subsequently you can groupby the country, and unstack and plot the timeseries:
import datetime
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('https://opendata.ecdc.europa.eu/covid19/testing/csv/data.csv')
df = df[df['country'].isin(['France', 'Germany'])]
df = df[df['level'] == 'national'].reset_index()
df['datetime'] = df['year_week'].apply(lambda x: datetime.datetime.strptime(x + '-1', '%G-W%V-%u')) #https://stackoverflow.com/a/54033252/11380795
df.set_index('datetime', inplace=True)
grouped_df = df.groupby('country').resample('1W').sum()['new_cases']
ax = grouped_df.unstack().T.plot(figsize=(10,5))
ax.ticklabel_format(style='plain', axis='y')
result:
Here you go:
sample df:
Country week new_cases
0 FRANCE 9 210
1 GERMANY 9 300
2 FRANCE 10 410
3 GERMANY 10 200
4 FRANCE 11 910
5 GERMANY 9 500
Code:
df.groupby(['week','Country'])['new_cases'].sum().unstack().plot.bar()
plt.ylabel('New cases')
Output:

Show time series and mean for grouped data

I have some data of different products and its corresponding sales with Datetime index. I managed to group them by product using:
grouped_df = data.loc[:, ['ProductID', 'Sales']].groupby('ProductID')
for key, item in grouped_df:
print(grouped_df.get_group(key), "\n\n")
And the output I got is:
ProductID Sales
Datetime
2014-03-31 1 2475.03
2014-09-27 1 10033.06
2015-02-03 1 5329.33
ProductID Sales
Datetime
2014-12-17 2 1960.0
2015-06-17 2 1400.0
2016-08-29 2 230.0
.
.
.
I would like to be able to plot each grouped data on the same graph to show a time-series of sales.
I also want to get the monthly mean sales of each productID and plot them on a bar chart. I have tried re-sampling but it did not work out for me.
How do I go about doing the above?
Help will be greatly appreciated!
you can use seaborn.lineplot here
you can use
import seaborn as sns
ax = sns.lineplot(x=data.index, y="Sales", hue = 'ProductID ', data=data)
you don't even need to group them

Search column for specific phrases and count the amount of times they appear in the column and plot to bar graph

Search column for each month of the year. Column is organized like this "01-Jan-2018". I want to find how many times "Jan-2018" appears in the column. Basically count it and plot it on a bar graph. I want it to show all the quantities for "Jan-2018" , "Feb-2018", etc. Should be 12 bars on the graph. Maybe using count or sum. I am pulling the data from a CSV using pandas and python.
I have tried to printing it out onto the console with some success. But I am getting confused as correct way to search a portion of the date.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import csv
import seaborn as sns
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile1.csv', error_bad_lines=False, encoding="ISO-8859-1", skiprows=6)
cols = data.columns
cols = cols.map(lambda x: x.replace(' ', '_') if isinstance(x, (str)) else x)
data.columns = cols
print(data.groupby('Case_Date').mean().plot(kind='bar'))
I am expecting the a bar graph that will show the total quantity for each month. So there should be 12 bar graphs. But I am not sure how to search the column 12 times and each time only looking for the data of each month. While excluding the date, only searching for the month and year.
IIUC, this is what you need.
Let's work with the below dataframe as input dataframe.
date
0 1/31/2018
1 2/28/2018
2 2/28/2018
3 3/31/2018
4 4/30/2018
5 5/31/2018
6 6/30/2018
7 6/30/2018
8 7/31/2018
9 8/31/2018
10 9/30/2018
11 9/30/2018
12 9/30/2018
13 9/30/2018
14 10/31/2018
15 11/30/2018
16 12/31/2018
The below mentioned lines of code will get the number of count for each month as a bar graph. When you have a column as as datetime object, a lot of function are much easy & the contents of the column are much more flexible. With that, you don't need search string of the name of the month.
df['date'] = pd.to_datetime(df['date'])
df['my']=df.date.dt.strftime('%b-%Y')
ax = df.groupby('my', sort=False)['my'].value_counts().plot(kind='bar')
ax.set_xticklabels(df.my, rotation=90);
Output

How to plot by category over time

I have two columns, categorical and year, that I am trying to plot. I am trying to take the sum total of each categorical per year to create a multi-class time series plot.
ax = data[data.categorical=="cat1"]["categorical"].plot(label='cat1')
data[data.categorical=="cat2"]["categorical"].plot(ax=ax, label='cat3')
data[data.categorical=="cat3"]["categorical"].plot(ax=ax, label='cat3')
plt.xlabel("Year")
plt.ylabel("Number per category")
sns.despine()
But am getting an error stating no numeric data to plot. I am looking for something similar to the above, perhaps with data[data.categorical=="cat3"]["categorical"].lambda x : (1 for x in data.categorical)
I will use the following lists as examples.
categorical = ["cat1","cat1","cat2","cat3","cat2","cat1","cat3","cat2","cat1","cat3","cat3","cat3","cat2","cat1","cat2","cat3","cat2","cat2","cat3","cat1","cat1","cat1","cat3"]
year = [2013,2014,2013,2015,2014,2014,2013,2014,2014,2015,2015,2013,2014,2014,2013,2014,2015,2015,2015,2013,2014,2015,2013]
My goal is to obtain something similar to the following picture
I'm hesitant to call this a "solution", as it's basically just a summary of basic Pandas functionality, which is explained in the same documentation where you found the time series plot you've placed in your post. But seeing as there's some confusion around groupby and plotting, a demo may help clear things up.
We can use two calls to groupby().
The first groupby() gets a count of category appearances per year, using the count aggregation.
The second groupby() is used to plot the time series for each category.
To start, generate a sample data frame:
import pandas as pd
categorical = ["cat1","cat1","cat2","cat3","cat2","cat1","cat3","cat2",
"cat1","cat3","cat3","cat3","cat2","cat1","cat2","cat3",
"cat2","cat2","cat3","cat1","cat1","cat1","cat3"]
year = [2013,2014,2013,2015,2014,2014,2013,2014,2014,2015,2015,2013,
2014,2014,2013,2014,2015,2015,2015,2013,2014,2015,2013]
df = pd.DataFrame({'categorical':categorical,
'year':year})
categorical year
0 cat1 2013
1 cat1 2014
...
21 cat1 2015
22 cat3 2013
Now get counts per category, per year:
# reset_index() gives a column for counting, after groupby uses year and category
ctdf = (df.reset_index()
.groupby(['year','categorical'], as_index=False)
.count()
# rename isn't strictly necessary here, it's just for readability
.rename(columns={'index':'ct'})
)
year categorical ct
0 2013 cat1 2
1 2013 cat2 2
2 2013 cat3 3
3 2014 cat1 5
4 2014 cat2 3
5 2014 cat3 1
6 2015 cat1 1
7 2015 cat2 2
8 2015 cat3 4
Finally, plot time series for each category, keyed by color:
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
# key gives the group name (i.e. category), data gives the actual values
for key, data in ctdf.groupby('categorical'):
data.plot(x='year', y='ct', ax=ax, label=key)
Have you tried groupby?
df.groupby(["year","categorical"]).count()

How to plot a stacked bar chart using pandas python

I have 3 dataframes for yearly data (one for 2014, 2015 and 2016), each having 3 columns named, 'PRACTICE', 'BNF NAME', 'ITEMS'.
BNF NAME refers to drug names and I am picking out 3 Ampicillin, Amoxicillin and Co-Amoxiclav. This column has different strengths/dosages (e.g Co-Amoxiclav 200mg or Co-Amoxiclav 300mg etc etc) that I want to ignore, so I have used str.contains() to select these 3 drugs.
ITEMS is the total number of prescriptions written for each drug.
I want to create a stacked bar chart with the x axis being year (2014, 2014, 2015) and the y axis being total number of prescriptions, and each of the 3 bars to be split up into 3 for each drug name.
I am assuming I need to use df.groupby() and select a partial string maybe, however I am unsure how to combine the yearly data and then how to group the data to create the stacked bar chart.
Any guidance would be much appreciated.
This is the line of code I am using to select the rows for the 3 drug names only.
frame=frame[frame['BNF NAME'].str.contains('Ampicillin' and 'Amoxicillin' and 'Co-Amoxiclav')]
This is what each of the dataframes resembles:
PRACTICE | BNF NAME | ITEMS
Y00327 | Co-Amoxiclav_Tab 250mg/125mg | 23
Y00327 | Co-Amoxiclav_Susp 125mg/31mg/5ml S/F | 10
Y00327 | Co-Amoxiclav_Susp 250mg/62mg/5ml S/F | 6
Y00327 | Co-Amoxiclav_Susp 250mg/62mg/5ml | 1
Y00327 | Co-Amoxiclav_Tab 500mg/125mg | 50
There are likely going to be a few different ways in which you could accomplish this. Here's how I would do it. I'm using a jupyter notebook, so your matplotlib imports may be different.
import pandas as pd
%matplotlib
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
df = pd.DataFrame({'PRACTICE': ['Y00327', 'Y00327', 'Y00327', 'Y00327', 'Y00327'],
'BNF NAME': ['Co-Amoxiclav_Tab 250mg/125mg', 'Co-Amoxiclav_Susp 125mg/31mg/5ml S/F',
'Co-Amoxiclav_Susp 250mg/62mg/5ml S/F', 'Ampicillin 250mg/62mg/5ml',
'Amoxicillin_Tab 500mg/125mg'],
'ITEMS': [23, 10, 6, 1, 50]})
Out[52]:
BNF NAME ITEMS PRACTICE
0 Co-Amoxiclav_Tab 250mg/125mg 23 Y00327
1 Co-Amoxiclav_Susp 125mg/31mg/5ml S/F 10 Y00327
2 Co-Amoxiclav_Susp 250mg/62mg/5ml S/F 6 Y00327
3 Ampicillin 250mg/62mg/5ml 1 Y00327
4 Amoxicillin_Tab 500mg/125mg 50 Y00327
To simulate your three dataframes:
df1 = df.copy()
df2 = df.copy()
df3 = df.copy()
Set a column indicating what year the dataframe represents.
df1['YEAR'] = 2014
df2['YEAR'] = 2015
df3['YEAR'] = 2016
Combining the three dataframes:
combined_df = pd.concat([df1, df2, df3], ignore_index=True)
To set what drug each row represents:
combined_df['parsed_drug_name'] = "" # creates a blank column
amp_bool = combined_df['BNF NAME'].str.contains('Ampicillin', case=False)
combined_df.loc[amp_bool, 'parsed_drug_name'] = 'Ampicillin' # sets the row to amplicillin, if BNF NAME contains 'ampicillin.'
amox_bool = combined_df['BNF NAME'].str.contains('Amoxicillin', case=False)
combined_df.loc[amox_bool, 'parsed_drug_name'] = 'Amoxicillin'
co_amox_bool = combined_df['BNF NAME'].str.contains('Co-Amoxiclav', case=False)
combined_df.loc[co_amox_bool, 'parsed_drug_name'] = 'Co-Amoxiclav'
Finally, perform a pivot on the data, and plot the results:
combined_df.pivot_table(index='YEAR', columns='parsed_drug_name', values='ITEMS', aggfunc='sum').plot.bar(rot=0, stacked=True)

Categories

Resources