I am plotting a pie chart with pandas plot function, with the following code and matplotlib:
plt.figure(figsize=(16,8))
# plot chart
ax1 = plt.subplot(121, aspect='equal')
dfhelp.plot(kind='pie', y = 'Prozentuale Gesamt', ax=ax1, autopct='%1.1f%%',
startangle=90, shadow=False, labels=dfhelp['Anzahl Geschäfte in der Gruppe'], legend = False, fontsize=14)
plt.show
the output looks like:
the problem is, the percentages and legend are overlapping, do you have any idea to fix that? For the plotting I used this question.
This is an easier and more readable version of this answer in my opinion (but credits to that answer for making it possible).
import matplotlib.pyplot as plt
import pandas as pd
d = {'col1': ['Tesla', 'GM', 'Ford', 'Nissan', 'Other'],
'col2': [117, 95, 54, 10, 7]}
df = pd.DataFrame(data=d)
print(df)
# Calculate percentages points
percent = 100.*df.col2/df.col2.sum()
# Write label in the format "Manufacturer - Percentage %"
labels = ['{0} - {1:1.2f} %'.format(i,j) for i,j in zip(df.col1, percent)]
ax = df.col2.plot(kind='pie', labels=None) # the pie plot
ax.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle
ax.yaxis.label.set_visible(False) # disable y-axis label
# add the legend
ax.legend(labels, loc='best', bbox_to_anchor=(-0.1, 1.), fontsize=8)
plt.show()
Related
I have a pie chart showing the percentage of product sales from total revenue. The product names are too big, so they are abbreviated around the diagram, but in the legend the names should be full. Basically I just need to replace the labels in the legend. However, when creating a pie chart, I specify a list of labels from where the program should take them and the subsequent change of labels in the legend does not change anything.
data_names - Full names
data_names 2 - Abbreviated names
The labels=data_names command in plt.legend does not work.
ax.pie(data_values, labels=data_names2, colors=colors, radius=5, center=(0, 0),
wedgeprops={"linewidth": 0.4, "edgecolor": "white"}, frame=True, rotatelabels=True)
plt.legend(
loc='best', bbox_to_anchor=(1, 0.92),labels=data_names)
I found this link that might be helpful.
Lengend Labels
Using their example, I created a simple pie chart.
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
labels = ['Walking', 'Talking', 'Sleeping', 'Working']
abbrevs = ['Walk', 'Talk', 'Sleep', 'Work'] # Abbreviations for the legend
sizes = [23, 45, 12, 20]
colors = ['red', 'blue', 'green', 'yellow']
patches, texts = plt.pie(sizes, colors=colors, shadow=True, labels = labels, startangle=90)
plt.legend(patches, abbrevs, loc="best")
plt.axis('equal')
plt.show()
And that produced the pie chart with a legend with custom labels.
Hope that helps.
Regards.
I have made a Seaborn stripplot on top of barplot that has experience group on the axis, grouped by two different conditions (target present or target not present) from a dataframe using the following code:
IZ_colors = ['#E1F3DC','#56B567']
ax1 = sns.barplot(data=IZ_df, x='Group', y='Time in IZ (%)', hue='Condition',
order=['Std_Ctrl','ELS_Ctrl','Std_CSDS','ELS_CSDS'], hue_order=['Empty','Aggressor'],
palette=IZ_colors)
hatches = ['','//']
# Loop over the bars
for bars, hatch in zip(ax1.containers, hatches):
# Set a different hatch for each group of bars
for bar in bars:
bar.set_hatch(hatch)
sns.stripplot(data=IZ_df ,x='Group', y='Time in IZ (%)', hue='Condition', dodge=True,
order=['Std_Ctrl','ELS_Ctrl','Std_CSDS','ELS_CSDS'], hue_order=['Empty','Aggressor'],
palette=IZ_colors, marker='o', size=7, edgecolor='#373737', linewidth=1, color='black',)
plt.legend(bbox_to_anchor=(1.35, 0.7))
However, I would like the markers of the stripplot to be colored by sex (not by condition like how they are now), which is another column in the dataframe. I would still like them to be grouped by hue='Condition'. Is this possible?
plot here
You could create two stripplots, one for each sex and draw them as the same spot. The double entries of the legend can be removed via get_legend_handles_labels() and taking a subset of the handles and the labels.
Here is an example using the titanic dataset:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset('titanic')
IZ_colors = ['#E1F3DC', '#56B567']
ax1 = sns.barplot(data=titanic, x='class', y='age', hue='alive',
order=['First', 'Second', 'Third'], hue_order=['no', 'yes'],
palette=IZ_colors)
hatches = ['', '//']
for bars, hatch in zip(ax1.containers, hatches):
for bar in bars:
bar.set_hatch(hatch)
for sex, color in zip(['male', 'female'], ['orange', 'turquoise']):
df_per_sex = titanic[titanic['sex'] == sex]
sns.stripplot(data=df_per_sex, x='class', y='age', hue='alive',
order=['First', 'Second', 'Third'], hue_order=['no', 'yes'],
dodge=True, palette=[color] * 2,
marker='o', size=4, edgecolor='#373737', linewidth=1)
handles, labels = ax1.get_legend_handles_labels()
handles = [handles[0], handles[2]] + handles[4:]
labels = ['Male', 'Female'] + labels[4:]
ax1.legend(handles, labels, bbox_to_anchor=(1.01, 0.7), loc='upper left')
plt.tight_layout()
plt.show()
I'm looking for tips on getting my data from a pandas dataframe into a matplotlib chart that looks this:
Is it even possible without too much effort?
Thanks in advance for any advice!
The folllowing functionality can be used:
Generate a standard bar plot using pandas' df.plot.bar()
Loop through the generated bars to change their color and alpha. Also use the bar's dimensions to place a text with the height.
Remove all spines except the bottom spine.
Change the linewidth of the bottom spine.
Use grid() to place horizontal grid lines.
Use tick_params() to remove the tick marks and change tick label color and size. The y-ticks can not be removed as they are needed to position the grid lines.
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'Values': [40, 55, 60, 94, 79, 49]},
index=['January', 'February', 'March', 'April', 'May', 'June'])
colors = plt.cm.tab10.colors[:len(df)]
ax = df.plot.bar(width=0.9, legend=False)
for p, color in zip(ax.patches, colors):
p.set_color(color)
p.set_alpha(0.6)
ax.text(p.get_x() + p.get_width() / 2, p.get_y() + p.get_height() / 2, f'{p.get_height():.0f}', ha='center',
va='center', fontsize=20)
ax.grid(axis='y')
for where in ['left', 'right', 'top']:
ax.spines[where].set_visible(False)
ax.spines['bottom'].set_linewidth(3)
ax.tick_params(axis='y', length=0, labelcolor='none')
ax.tick_params(axis='x', length=0, rotation=0, labelsize=14)
plt.tight_layout()
plt.show()
PS: If you need the more grey-like colors, you could first make them darker:
colors = [(r*0.6, g*0.6, b*0.6) for r, g, b in colors]
And still use the alpha (0.5?) to make them whiter again.
After initializing your data frame, you could get bar chart df.plot.bar or df.plot(kind='bar').
For each column, you can set the color, either with hexadecimal value or color name.
Finally, to label on the bar chart, you need to get the dimension of the graph.
Below is an example:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
data = {'T1-Hotshots': [40],
'Type_2_IA': [55],
'Engines': [60],
'Helicopters': [94],
'Dozers': [79],
'Patrols': [49]}
df = pd.DataFrame(data, columns=['T1-Hotshots',
'Type_2_IA',
'Engines',
'Helicopters',
'Dozers',
'Patrols'])
x_pos = [i for i, _ in enumerate(data)]
plt.figure(figsize=(12, 8))
ax = df.plot(kind='bar', color=['blue', 'green', 'yellow', 'orange',
'purple', 'red'])
rects = ax.patches
labels = [df[key].values[0] for key in df]
for rect, label in zip(ax.patches, labels):
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width() / 2, height - 25, label,
ha='center', va='bottom')
plt.show()
Out:
I have two Dictionaries:
A = {2018: 23, 2019: 30}
B = {2018: 26, 2019:35}
Now I want to plot trend for 2018/2019 for A and B. however when plotting the bar graph, I am getting the following result. The years are expanding to fill space and b is hiding out A completely. Please suggest how to plot the graph.
The original data have Average marks for maths, science, and total which I want to plot on the same graph (bar graph) for two years to show the trend.
You can align the bars of a bar graph by their left of right edge (pass a negative width to align using the right edge) - in this way you can get side-by-side bars. Alternatively you can stack the bars.
Here is the code with the output:
import matplotlib.pyplot as plt
A = {2018: 23, 2019:30}
B = {2018: 26, 2019:35}
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
ax1.bar(A.keys(), A.values(), width=0.2, align='edge', label='A')
ax1.bar(B.keys(), B.values(), width=-0.2, align='edge', label='B')
ax1.set_xticks([2018, 2019])
ax1.set_xlabel('YEAR')
ax1.legend()
ax2.bar(A.keys(), A.values(), width=0.4, align='center', label='A')
ax2.bar(B.keys(), B.values(), bottom=[A[i] for i in B.keys()], width=0.4, align='center', label='B')
ax2.set_xticks([2018, 2019])
ax2.set_xlabel('YEAR')
ax2.legend()
fig.show()
EDIT: If you start to deal with more data it makes sense to use a package that can handle data more easily. Pandas is a great package that will do this for you.
Here is an example with 4 sets of time-series data:
import matplotlib.pyplot as plt
import pandas as pd
A = {2018: 23, 2019:30}
B = {2018: 26, 2019:35}
C = {2018: 30, 2019:40}
D = {2018: 20, 2019:50}
df = pd.DataFrame([A,B,C,D], index=['A','B','C','D']).transpose()
fig, ax= plt.subplots(1,1, figsize=(6,5))
df.plot.bar(ax=ax)
ax.set_xlabel('YEAR')
fig.tight_layout()
fig.show()
The output is this figure:
I am creating a histogram in Seaborn of my data in a pretty standard way, ie:
rc = {'font.size': 32, 'axes.labelsize': 28.5, 'legend.fontsize': 32.0,
'axes.titlesize': 32, 'xtick.labelsize': 31, 'ytick.labelsize': 12}
sns.set(style="ticks", color_codes=True, rc = rc)
plt.figure(figsize=(25,20),dpi=300)
ax = sns.distplot(synData['SYNERGY_SCORE'])
print (np.mean(synData['SYNERGY_SCORE']), np.std(synData['SYNERGY_SCORE']))
# ax = sns.boxplot(synData['SYNERGY_SCORE'], orient = 'h')
ax.set(xlabel = 'Synergy Score', ylabel = 'Frequency', title = 'Aggregate Synergy Score Distribution')
This produces the following output:
I also want to visualize the mean + standard deviation of this dataset on the same plot, ideally by having a point for the mean on the x-axis (or right above the x-axis) and notched error bars showing the standard deviation. Another option is a boxplot hugging the x-axis. I tried just adding the line which is commented out (sns.boxplot()), but it looks super ugly and not at all what I'm looking for. Any suggestions?
The boxplot is drawn on a categorical axis and won't coexist nicely with the density axis of the histogram, but it's possible to do it with a twin x axis plot:
import numpy as np
import seaborn as sns
x = np.random.randn(300)
ax = sns.distplot(x)
ax2 = ax.twinx()
sns.boxplot(x=x, ax=ax2)
ax2.set(ylim=(-.5, 10))