Plotting a bar chart comparing years in pandas - python

I have the following dataframe
Date_x BNF Chapter_x VTM_NM Occurance_x Date_y BNF Chapter_y Occurance_y
0 2016-12-01 1 Not Specified 2994 2015-12-01 1 3212
1 2016-12-01 1 Mesalazine 2543 2015-12-01 1 2397
2 2016-12-01 1 Omeprazole 2307 2015-12-01 1 2370
3 2016-12-01 1 Esomeprazole 1535 2015-12-01 1 1516
4 2016-12-01 1 Lansoprazole 1511 2015-12-01 1 1547
I have plotted a bar chart with 2 bars one representing 2015 and the other 2016 using this code
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
width = 0.4
df.Occurance_x.plot(kind='bar', color='red', ax=ax, width=width, position=1)
df.Occurance_y.plot(kind='bar', color='blue', ax=ax, width=width, position=0)
ax.set_ylabel('Occurance')
plt.legend(['Date_x', 'Date_y'], loc='upper right')
ax.set_title('BNF Chapter 1 Top 5 drugs prescribed')
plt.show()
However the x axi shows the index 0 1 2 3 4
- I want it to show the drug names
How would I go about doing this?

I guess that you can start to play from this.
import pandas as pd
df = pd.DataFrame({"date_x":[2015]*5,
"Occurance_x":[2994, 2543, 2307, 1535, 1511],
"VTM_NM":["Not Specified", "Mesalazine", "Omeprazole",
"Esomeprazole", "Lansoprazole"],
"date_y":[2016]*5,
"Occurance_y":[3212, 2397, 2370, 1516, 1547]})
ax = df[["VTM_NM","Occurance_x", "Occurance_y"]].plot(x='VTM_NM',
kind='bar',
color=["g","b"],
rot=45)
ax.legend(["2015", "2016"]);

Related

Able to plot 2 graphs in a row but not 3. get ValueError: values must be a 1D array

This is the dataframe:
Data for last 8 months
date close volume change% obv compare close_trend
6 2022-06-30 00:00:00+05:30 18760.40 358433 5.52 1358338 True 18482.242046
7 2022-07-31 00:00:00+05:30 20015.10 252637 6.27 1610975 True 18905.447351
8 2022-08-31 00:00:00+05:30 18739.75 317107 -6.81 1293868 False 19328.826505
9 2022-09-30 00:00:00+05:30 19139.15 561137 2.09 1855005 True 19753.246889
10 2022-10-31 00:00:00+05:30 19246.95 243999 0.56 2099004 True 20179.207712
11 2022-11-30 00:00:00+05:30 20237.80 311138 4.90 2410142 True 20606.824373
12 2022-12-31 00:00:00+05:30 21367.20 386070 5.29 2796212 True 21035.629608
13 2023-01-31 00:00:00+05:30 22250.00 101527 3.97 2897739 True 21464.925515
I am able to plot 2 graphs in a row using matplotlib in jupyter notebook.
fig = plt.figure(figsize=(7,2))
plt.subplot(1,2,1)
plt.plot(df['date'], df['close'], color='red', figure=fig)
plt.subplot(1,2,2)
plt.plot(df[['close','close_trend']],figure=fig)
plt.tight_layout()
plt.show()
I get:
But when I try to plot 3 graphs like this, I get ValueError: values must be a 1D array
fig = plt.figure(figsize=(7,2))
plt.subplot(1,3,1)
plt.plot(df['date'], df['close'], color='red', figure=fig)
plt.subplot(1,3,2)
plt.plot(df[['close','close_trend']],figure=fig)
plt.subplot(1,3,3)
plt.plot(df.index, df['obv'],color='blue', figure=fig)
plt.tight_layout()
plt.show()
How do I get 3 plots in a row?

Show Count and percentage labels for grouped bar chart python

I would like to add count and percentage labels to a grouped bar chart, but I haven't been able to figure it out.
I've seen examples for count or percentage for single bars, but not for grouped bars.
the data looks something like this (not the real numbers):
age_group Mis surv unk death total surv_pct death_pct
0 0-9 1 2 0 3 6 100.0 0.0
1 10-19 2 1 0 1 4 99.9 0.0
2 20-29 0 3 0 1 4 99.9 0.0
3 30-39 0 7 1 2 10 100.0 0.0
`4 40-49 0 5 0 1 6 99.7 0.3
5 50-59 0 6 0 4 10 99.3 0.3
6 60-69 0 7 1 4 12 98.0 2.0
7 70-79 1 8 2 5 16 92.0 8.0
8 80+ 0 10 0 7 17 81.0 19.0
And The chart looks something like this
I created the chart with this code:
ax = df.plot(y=['deaths', 'surv'],
kind='barh',
figsize=(20,9),
rot=0,
title= '\n\n surv and deaths by age group')
ax.legend(['Deaths', 'Survivals']);
ax.set_xlabel('\nCount');
ax.set_ylabel('Age Group\n');
How could I add count and percentage labels to the grouped bars? I would like it to look something like this chart
Since nobody else has suggested anything, here is one way to approach it with your dataframe structure.
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv("test.txt", delim_whitespace=True)
cat = ['death', 'surv']
ax = df.plot(y=cat,
kind='barh',
figsize=(20, 9),
rot=0,
title= '\n\n surv and deaths by age group')
#making space for the annotation
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmin, 1.05 * xmax)
#connecting bar series with df columns
for cont, col in zip(ax.containers, cat):
#connecting each bar of the series with its absolute and relative values
for rect, vals, perc in zip(cont.patches, df[col], df[col+"_pct"]):
#annotating each bar
ax.annotate(f"{vals} ({perc:.1f}%)", (rect.get_width(), rect.get_y() + rect.get_height() / 2.),
ha='left', va='center', fontsize=10, color='black', xytext=(3, 0),
textcoords='offset points')
ax.set_yticklabels(df.age_group)
ax.set_xlabel('\nCount')
ax.set_ylabel('Age Group\n')
ax.legend(['Deaths', 'Survivals'], loc="lower right")
plt.show()
Sample output:
If the percentages per category add up, one could also calculate the percentages on the fly. This would then not necessitate that the percentage columns have exactly the same name structure. Another problem is that the font size of the annotation, the scaling to make space for labeling the largest bar, and the distance between bar and annotation are not interactive and may need fine-tuning.
However, I am not fond of this mixing of pandas and matplotlib plotting functions. I had cases where the axis definition by pandas interfered with matplotlib, and datetime objects ... well, let's not talk about that.

Grid line with date and time data in x axis in matplotlib

I created a line graph of the location of Tropical cyclone with time on a 6 hourly basis. I successfully plot the graph with all the needed labels except the gridline both the major and the minor gridlines. The major gridline appears in a 12 hours interval instead of 6 hours. My goal is to put the start of the major grid line at 0 and not a few mm east of 0 in the x-axis. Another thing, I cannot put a minor grid line just at the center between the two major gridlines to represent 6 hourly data or create a 6 hours interval major gridline.
The image below shows the result of my code.
And this is my code.
import matplotlib.pyplot as plt
from matplotlib.ticker import (AutoMinorLocator, MultipleLocator)
from matplotlib.dates import HourLocator, MonthLocator, YearLocator
fig, ax = plt.subplots()
ax.plot(df.time,df.Distance, color='r',marker = 'o', linestyle ='--')
ax.set_xlabel('Date and Time')
ax.set_ylabel('Distance (km)')
ax.set_title('The expected distance of Tropical cyclone')
plt.grid(True)
ax.minorticks_on()
plt.grid(which='major',axis ='y', linewidth='1', color='black')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
ax.tick_params(which='both', # Options for both major and minor ticks
top='off', # turn off top ticks
left='off', # turn off left ticks
right='off', # turn off right ticks
bottom='off') # turn off bottom ticks
hloc = HourLocator(1)
ax.xaxis.set_minor_locator(hloc)
ax.yaxis.set_minor_locator(MultipleLocator(50))
m = np.arange(0,round(max(df.Distance+200),100),100)
ax.set_yticks(m)
plt.xticks(rotation=45)
plt.ylim(0,1500)
plt.show()
and my data for the x-axis is this-
0 2019-09-24 04:00:00
1 2019-09-24 10:00:00
2 2019-09-24 16:00:00
3 2019-09-24 22:00:00
4 2019-09-25 04:00:00
5 2019-09-25 10:00:00
6 2019-09-25 16:00:00
7 2019-09-25 22:00:00
8 2019-09-26 04:00:00
9 2019-09-26 10:00:00
10 2019-09-26 16:00:00
11 2019-09-26 22:00:00
12 2019-09-27 04:00:00
13 2019-09-27 10:00:00
14 2019-09-27 16:00:00
15 2019-09-27 22:00:00
16 2019-09-28 04:00:00
and the y-axis
0 1385
1 1315
2 1245
3 1175
4 1105
5 1050
6 995
7 935
8 880
9 835
10 790
11 745
12 485
13 435
14 390
15 350
16 315
Revised to 6 hour intervals. I wasn't sure of the intent of the grid, so I posted the details and no grid.
df['time'] = pd.to_datetime(df['time'])
import matplotlib.pyplot as plt
from matplotlib.ticker import (AutoMinorLocator, MultipleLocator)
from matplotlib.dates import HourLocator, MonthLocator, YearLocator
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(20,12))
ax.plot(df.time,df.Distance, color='r',marker = 'o', linestyle ='--')
ax.set_xlabel('Date and Time')
ax.set_ylabel('Distance (km)')
ax.set_title('The expected distance of Tropical cyclone')
plt.grid(True)
ax.minorticks_on()
plt.grid(which='major',axis ='y', linewidth='1', color='black')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
ax.tick_params(which='both', # Options for both major and minor ticks
top='off', # turn off top ticks
left='off', # turn off left ticks
right='off', # turn off right ticks
bottom='off') # turn off bottom ticks
# hloc = HourLocator(1)
# ax.xaxis.set_minor_locator(hloc)
# ax.yaxis.set_minor_locator(MultipleLocator(50))
ax.xaxis.set_minor_locator(HourLocator(byhour=None, interval=3, tz=None))
ax.xaxis.set_major_locator(HourLocator(byhour=None, interval=6, tz=None))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m-%d %H"))
m = np.arange(0,round(max(df.Distance+200),100),100)
ax.set_yticks(m)
plt.xticks(rotation=45)
plt.ylim(0,1500)
plt.xlim(df['time'].min(), df['time'].max())
plt.show()
To view the minor gridlines, you should run
plt.minorticks_on()
To limit the x-axis of the chart, do:
plt.xlim(df.time.min(), df.time.max())
The result is below. As you can see, there's a major x-gridline every 6 hours and a minor one every hour.

Matplotlib x-axis limited range

So I've been trying to plot some data. The x-axis is limited to two years. My question is pretty simple can someones explain why X-axis is limited to date range from 2015Q1 - 2017Q1, when the available data is between 2015Q1 - 2020Q1. Is there something missing or incorrect with my code?
dd2
qtr median count
0 2015Q1 1290000.0 27
1 2015Q2 1330000.0 43
2 2015Q3 1570000.0 21
3 2015Q4 1371000.0 20
4 2016Q1 1386500.0 20
5 2016Q2 1767500.0 22
6 2016Q3 1427500.0 32
7 2016Q4 1501000.0 31
8 2017Q1 1700000.0 29
9 2017Q2 1630000.0 15
10 2017Q3 1687500.0 24
11 2017Q4 1450000.0 15
12 2018Q1 1505000.0 13
13 2018Q2 1494000.0 14
14 2018Q3 1415000.0 21
15 2018Q4 1150000.0 15
16 2019Q1 1228000.0 15
17 2019Q2 1352500.0 12
18 2019Q3 1237500.0 12
19 2019Q4 1455000.0 26
20 2020Q1 1468000.0 9
code
x = dd2['qtr']
y1 = dd2['count']
y2 = dd2['median']
fig, ax = plt.subplots(figsize=(40,10))
ax = plt.subplot(111)
ax2 = ax.twinx()
y1_plot = y1.plot(ax=ax2, color='green', legend=True, marker='*', label="median")
y2_plot = y2.plot(ax=ax, color='red', legend=True, linestyle='--', marker='x', label="count")
plt.title('Price trend analysis')
ax.set_xticklabels(x, rotation='vertical',color='k', size=20)
ax.set_xlabel('year')
ax.set_ylabel('sold price')
ax2.set_ylabel('number of sales')
y1_patch = mpatches.Patch(color='red', label='median sold price')
y2_patch = mpatches.Patch(color='green', label='count')
plt.legend(handles=[y2_patch,y1_patch],loc='upper right')
plt.savefig('chart.png', dpi=300,bbox_inches ='tight')
plt.show()
using mtick to plot all x-axis data.
import matplotlib.ticker as mtick
ax.xaxis.set_major_locator(mtick.IndexLocator(base=1, offset=0))
Instead of going through Pandas' Series plotting methods, I'd use pyplot to plot your x and y data together, like this:
# everything is the same up to 'ax2 = ax.twinx()'
# plot on your axes, save a reference to the line
line1 = ax.plot(x, y1, color="green", label="median sold price", marker='*')
line2 = ax2.plot(x, y2, color="red", label="count", marker='x')
# no need for messing with patches
lines = line1 + line2
labels = [l.get_label() for l in lines]
ax.legend(lines, labels, loc='upper right')
# this is the same as before again
plt.title('Price trend analysis')
ax.xaxis.set_tick_params(rotation=90, color='k', size
ax.set_xlabel('year')
ax.set_ylabel('sold price')
ax2.set_ylabel('number of sales')
plt.savefig('chart.png', dpi=300,bbox_inches ='tight')
plt.show()

Matplotlib bar chart from dataframe ValueError: incompatible sizes: argument 'height' must be length

I'm trying to make a simple bar chart from a data frame. My data frame looks like this:
start_date AvgPrice
0 2018-03-17 3146.278673
1 2018-12-08 3146.625048
2 2018-11-10 3148.762809
3 2018-11-17 3151.926036
4 2018-11-03 3153.965413
5 2018-02-03 3155.831255
6 2018-11-24 3161.057180
7 2018-01-27 3162.143680
8 2018-03-10 3162.239096
9 2018-01-20 3166.450869
.. ... ...
337 2018-07-13 8786.797679
338 2018-07-20 8969.859386
My code basically looks like this:
x = df.start_date
x = x.to_string() #convert datetime objects to strings
y = df.AvgPrice
# Plot data
fig, ax = plt.subplots()
ax.bar(x, y, width=30)
fig.savefig('week price bar chart')
plt.close(fig)
However I get the following error:
File "PriceHistogram.py", line 68, in plot
ax.bar(x, y, width=30)
File "/home/rune/env3/lib/python3.6/site-packages/matplotlib/__init__.py", line 1898, in inner
return func(ax, *args, **kwargs)
File "/home/rune/env3/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 2079, in bar
"must be length %d or scalar" % nbars)
ValueError: incompatible sizes: argument 'height' must be length 6101 or scalar
Not really sure if this is what you want. It was modified from the example in matplotlib's documentation.
import io
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
s = """start_date AvgPrice
2018-03-17 3146.278673
2018-12-08 3146.625048
2018-11-10 3148.762809
2018-11-17 3151.926036
2018-11-03 3153.965413
2018-02-03 3155.831255
2018-11-24 3161.057180
2018-01-27 3162.143680
2018-03-10 3162.239096
2018-01-20 3166.450869"""
df = pd.read_table(io.StringIO(s), sep=' ', header=0)
fig, ax = plt.subplots()
width = 0.8
b = ax.bar(np.arange(len(df.AvgPrice)), df.AvgPrice, width=width)
ax.set_xticks(np.arange(len(df.AvgPrice)) + width / 2)
ax.set_xticklabels(df.start_date, rotation=90)
plt.show()

Categories

Resources