Bar chart using Dictionaries in python using matplotlib - python

I have two Dictionaries:
A = {2018: 23, 2019: 30}
B = {2018: 26, 2019:35}
Now I want to plot trend for 2018/2019 for A and B. however when plotting the bar graph, I am getting the following result. The years are expanding to fill space and b is hiding out A completely. Please suggest how to plot the graph.
The original data have Average marks for maths, science, and total which I want to plot on the same graph (bar graph) for two years to show the trend.

You can align the bars of a bar graph by their left of right edge (pass a negative width to align using the right edge) - in this way you can get side-by-side bars. Alternatively you can stack the bars.
Here is the code with the output:
import matplotlib.pyplot as plt
A = {2018: 23, 2019:30}
B = {2018: 26, 2019:35}
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
ax1.bar(A.keys(), A.values(), width=0.2, align='edge', label='A')
ax1.bar(B.keys(), B.values(), width=-0.2, align='edge', label='B')
ax1.set_xticks([2018, 2019])
ax1.set_xlabel('YEAR')
ax1.legend()
ax2.bar(A.keys(), A.values(), width=0.4, align='center', label='A')
ax2.bar(B.keys(), B.values(), bottom=[A[i] for i in B.keys()], width=0.4, align='center', label='B')
ax2.set_xticks([2018, 2019])
ax2.set_xlabel('YEAR')
ax2.legend()
fig.show()
EDIT: If you start to deal with more data it makes sense to use a package that can handle data more easily. Pandas is a great package that will do this for you.
Here is an example with 4 sets of time-series data:
import matplotlib.pyplot as plt
import pandas as pd
A = {2018: 23, 2019:30}
B = {2018: 26, 2019:35}
C = {2018: 30, 2019:40}
D = {2018: 20, 2019:50}
df = pd.DataFrame([A,B,C,D], index=['A','B','C','D']).transpose()
fig, ax= plt.subplots(1,1, figsize=(6,5))
df.plot.bar(ax=ax)
ax.set_xlabel('YEAR')
fig.tight_layout()
fig.show()
The output is this figure:

Related

Same scale for twinx() combo plot with seaborn

Let's use the classic example of weekly precipitation:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from random import randint
data = {
'Week': [i for i in range(1,9)],
'Weekly Precipitation': [randint(1,10) for i in range(1,9)]
}
df = pd.DataFrame(data)
Let's also add a column with the cumulative precipitation:
df['Cumulative'] = df['Weekly Precipitation'].expanding(min_periods=2).sum()
Now, let's say I want a single chart with a barplot for the weekly precipitation, and a lineplot with the cumulative precipitation. So I do this:
fig, ax1 = plt.subplots(figsize=(10,5))
sns.barplot(x='Week', y='Weekly Precipitation', data=df, ax=ax1)
ax2 = ax1.twinx()
sns.lineplot(x='Week', y='Cumulative', data=df, ax=ax2)
Which yields this plot:
And you can see the problem: while both series are commensurate, both x axes use a different scale, which distorts the visualization, as the line should always be higher than the bars.
So, instead of twin axes, I'm trying to put both plot on the same axis:
fig, ax1 = plt.subplots(figsize=(10,5))
ax1.set_facecolor('white')
sns.barplot(x='Week', y='Weekly Precipitation', data=df, ax=ax1)
sns.lineplot(x='Week', y='Cumulative', data=df, ax=ax1)
ax1.set_ylabel('Precipitation')
Now, of course, the scale is right (although I have to do with a single y label), but... the second plot is shifted to the right by one tick!
How does that even make sense?!

How to properly plot a line over bars?

This one used to work fine, but somehow it stopped working (I must have changed something mistakenly but I can't find the issue).
I'm plotting a set of 3 bars per date, plus a line that shows the accumulated value of one of them. But only one or another (either the bars or the line) is properly being plotted. If I left the code for the bars last, only the bars are plotted. If I left the code for the line last, only the line is plotted.
fig, ax = plt.subplots(figsize = (15,8))
df.groupby("date")["result"].sum().cumsum().plot(
ax=ax,
marker='D',
lw=2,
color="purple")
df.groupby("date")[selected_columns].sum().plot(
ax=ax,
kind="bar",
color=["blue", "red", "gold"])
ax.legend(["LINE", "X", "Y", "Z"])
Appreciate the help!
Pandas draws bar plots with the x-axis as categorical, so internally numbered 0, 1, 2, ... and then setting the label. The line plot uses dates as x-axis. To combine them, both need to be categorical. The easiest way is to drop the index from the line plot. Make sure that the line plot is draw first, enabling the labels to be set correctly by the bar plot.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'date': pd.date_range('20210101', periods=10),
'earnings': np.random.randint(100, 600, 10),
'costs': np.random.randint(0, 200, 10)})
df['result'] = df['earnings'] - df['costs']
fig, ax = plt.subplots(figsize=(15, 8))
df.groupby("date")["result"].sum().cumsum().reset_index(drop=True).plot(
ax=ax,
marker='D',
lw=2,
color="purple")
df.groupby("date")[['earnings', 'costs', 'result']].sum().plot(
ax=ax,
kind="bar",
rot=0,
width=0.8,
color=["blue", "red", "gold"])
ax.legend(['Cumul.result', 'earnings', 'costs', 'result'])
# shorten the tick labels to only the date
ax.set_xticklabels([tick.get_text()[:10] for tick in ax.get_xticklabels()])
ax.set_ylim(ymin=0) # bar plots are nicer when bars start at zero
plt.tight_layout()
plt.show()
Here I post the solution:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
a=[11.3,222,22, 63.8,9]
b=[0.12,-1.0,1.82,16.67,6.67]
l=[i for i in range(5)]
plt.rcParams['font.sans-serif']=['SimHei']
fmt='%.1f%%'
yticks = mtick.FormatStrFormatter(fmt)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot(l, b,'og-',label=u'A')
ax1.yaxis.set_major_formatter(yticks)
for i,(_x,_y) in enumerate(zip(l,b)):
plt.text(_x,_y,b[i],color='black',fontsize=8,)
ax1.legend(loc=1)
ax1.set_ylim([-20, 30])
ax1.set_ylabel('ylabel')
plt.legend(prop={'family':'SimHei','size':8})
ax2 = ax1.twinx()
plt.bar(l,a,alpha=0.1,color='blue',label=u'label')
ax2.legend(loc=2)
plt.legend(prop={'family':'SimHei','size':8},loc="upper left")
plt.show()
The key to this is the command
ax2 = ax1.twinx()

Overlapping legend for pandas plot with a pie chart

I am plotting a pie chart with pandas plot function, with the following code and matplotlib:
plt.figure(figsize=(16,8))
# plot chart
ax1 = plt.subplot(121, aspect='equal')
dfhelp.plot(kind='pie', y = 'Prozentuale Gesamt', ax=ax1, autopct='%1.1f%%',
startangle=90, shadow=False, labels=dfhelp['Anzahl Geschäfte in der Gruppe'], legend = False, fontsize=14)
plt.show
the output looks like:
the problem is, the percentages and legend are overlapping, do you have any idea to fix that? For the plotting I used this question.
This is an easier and more readable version of this answer in my opinion (but credits to that answer for making it possible).
import matplotlib.pyplot as plt
import pandas as pd
d = {'col1': ['Tesla', 'GM', 'Ford', 'Nissan', 'Other'],
'col2': [117, 95, 54, 10, 7]}
df = pd.DataFrame(data=d)
print(df)
# Calculate percentages points
percent = 100.*df.col2/df.col2.sum()
# Write label in the format "Manufacturer - Percentage %"
labels = ['{0} - {1:1.2f} %'.format(i,j) for i,j in zip(df.col1, percent)]
ax = df.col2.plot(kind='pie', labels=None) # the pie plot
ax.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle
ax.yaxis.label.set_visible(False) # disable y-axis label
# add the legend
ax.legend(labels, loc='best', bbox_to_anchor=(-0.1, 1.), fontsize=8)
plt.show()

Adjust y-axis in Seaborn multiplot

I'm plotting a CSV file from my simulation results. The plot has three graphs in the same figure fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(24, 6)).
However, for comparison purposes I want the y-axis in all graphs starting at zero and the ending at a specific value. I tried the solution mentioned here from the Seaborn author. I don't get any errors, but the solution also does not work for me.
Here's my script:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
fname = 'results/filename.csv'
def plot_file():
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(24, 6))
df = pd.read_csv(fname, sep='\t')
profits = \
df.groupby(['providerId', 'periods'], as_index=False)['profits'].sum()
# y-axis needs to start at zero and end at 10
g = sns.lineplot(x='periods',
y='profits',
data=profits,
hue='providerId',
legend='full',
ax=axes[0])
# y-axis need to start at zero and end at one
g = sns.scatterplot(x='periods',
y='price',
hue='providerId',
style='providerId',
data=df,
legend=False,
ax=axes[1])
# y-axis need to start at zero and end at one
g = sns.scatterplot(x='periods',
y='quality',
hue='providerId',
style='providerId',
data=df,
legend=False,
ax=axes[2])
g.set(ylim=(0, None))
plt.show()
print(g) # -> AxesSubplot(0.672059,0.11;0.227941x0.77)
The resulting figure is as follows:
How can I adjust each individual plot?
Based on the way you've written your code, you can refer to each subplot axis with g.axis and use g.axis.set_ylim(low,high). (A difference compared to the linked answer is that your graphs are not being plotted on a seaborn FacetGrid.)
An example using dummy data and different axis ranges to illustrate:
df = pd.DataFrame(np.random.uniform(0,10,(100,2)), columns=['a','b'])
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(8,4))
g = sns.lineplot(x='a',
y='b',
data=df.sample(10),
ax=axes[0])
g.axes.set_ylim(0,25)
g = sns.scatterplot(x='a',
y='b',
data=df.sample(10),
ax=axes[1])
g.axes.set_ylim(0,3.5)
g = sns.scatterplot(x='a',
y='b',
data=df.sample(10),
ax=axes[2])
g.axes.set_ylim(0,0.3)
plt.tight_layout()
plt.show()

Seaborn: Overlaying a box plot or mean with error bars on a histogram

I am creating a histogram in Seaborn of my data in a pretty standard way, ie:
rc = {'font.size': 32, 'axes.labelsize': 28.5, 'legend.fontsize': 32.0,
'axes.titlesize': 32, 'xtick.labelsize': 31, 'ytick.labelsize': 12}
sns.set(style="ticks", color_codes=True, rc = rc)
plt.figure(figsize=(25,20),dpi=300)
ax = sns.distplot(synData['SYNERGY_SCORE'])
print (np.mean(synData['SYNERGY_SCORE']), np.std(synData['SYNERGY_SCORE']))
# ax = sns.boxplot(synData['SYNERGY_SCORE'], orient = 'h')
ax.set(xlabel = 'Synergy Score', ylabel = 'Frequency', title = 'Aggregate Synergy Score Distribution')
This produces the following output:
I also want to visualize the mean + standard deviation of this dataset on the same plot, ideally by having a point for the mean on the x-axis (or right above the x-axis) and notched error bars showing the standard deviation. Another option is a boxplot hugging the x-axis. I tried just adding the line which is commented out (sns.boxplot()), but it looks super ugly and not at all what I'm looking for. Any suggestions?
The boxplot is drawn on a categorical axis and won't coexist nicely with the density axis of the histogram, but it's possible to do it with a twin x axis plot:
import numpy as np
import seaborn as sns
x = np.random.randn(300)
ax = sns.distplot(x)
ax2 = ax.twinx()
sns.boxplot(x=x, ax=ax2)
ax2.set(ylim=(-.5, 10))

Categories

Resources