I'm looking for tips on getting my data from a pandas dataframe into a matplotlib chart that looks this:
Is it even possible without too much effort?
Thanks in advance for any advice!
The folllowing functionality can be used:
Generate a standard bar plot using pandas' df.plot.bar()
Loop through the generated bars to change their color and alpha. Also use the bar's dimensions to place a text with the height.
Remove all spines except the bottom spine.
Change the linewidth of the bottom spine.
Use grid() to place horizontal grid lines.
Use tick_params() to remove the tick marks and change tick label color and size. The y-ticks can not be removed as they are needed to position the grid lines.
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'Values': [40, 55, 60, 94, 79, 49]},
index=['January', 'February', 'March', 'April', 'May', 'June'])
colors = plt.cm.tab10.colors[:len(df)]
ax = df.plot.bar(width=0.9, legend=False)
for p, color in zip(ax.patches, colors):
p.set_color(color)
p.set_alpha(0.6)
ax.text(p.get_x() + p.get_width() / 2, p.get_y() + p.get_height() / 2, f'{p.get_height():.0f}', ha='center',
va='center', fontsize=20)
ax.grid(axis='y')
for where in ['left', 'right', 'top']:
ax.spines[where].set_visible(False)
ax.spines['bottom'].set_linewidth(3)
ax.tick_params(axis='y', length=0, labelcolor='none')
ax.tick_params(axis='x', length=0, rotation=0, labelsize=14)
plt.tight_layout()
plt.show()
PS: If you need the more grey-like colors, you could first make them darker:
colors = [(r*0.6, g*0.6, b*0.6) for r, g, b in colors]
And still use the alpha (0.5?) to make them whiter again.
After initializing your data frame, you could get bar chart df.plot.bar or df.plot(kind='bar').
For each column, you can set the color, either with hexadecimal value or color name.
Finally, to label on the bar chart, you need to get the dimension of the graph.
Below is an example:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
data = {'T1-Hotshots': [40],
'Type_2_IA': [55],
'Engines': [60],
'Helicopters': [94],
'Dozers': [79],
'Patrols': [49]}
df = pd.DataFrame(data, columns=['T1-Hotshots',
'Type_2_IA',
'Engines',
'Helicopters',
'Dozers',
'Patrols'])
x_pos = [i for i, _ in enumerate(data)]
plt.figure(figsize=(12, 8))
ax = df.plot(kind='bar', color=['blue', 'green', 'yellow', 'orange',
'purple', 'red'])
rects = ax.patches
labels = [df[key].values[0] for key in df]
for rect, label in zip(ax.patches, labels):
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width() / 2, height - 25, label,
ha='center', va='bottom')
plt.show()
Out:
Related
I created a scatter plot using matplotlib but I am somehow unable to get the labels to center into the boxes within the colorbar..
This is the code I have so far:
cMap = ListedColormap(['Orange', 'Purple', 'Blue','Red','Green'])
fig, ax = plt.subplots()
plt.figure(figsize=(12,12),dpi = 80)
#data
dist = np.random.rand(1900,1900)
#legend
cbar = plt.colorbar(scatter)
cbar.ax.get_yaxis().set_ticks([])
for j, lab in enumerate(['$Training$','$None$','$GS$','$ML$','$Both$']):
cbar.ax.text( .5, j - .985, lab, ha='left', va='center', rotation = 270)
cbar.ax.get_yaxis().labelpad = 15
cbar.ax.set_ylabel('Outliers', rotation=270)
indices = np.where(outlier_label != -2)[0]
plt.scatter(dist[indices, 0], dist[indices, 1], c=outlier_label[indices], cmap=cMap, s=20)
plt.gca().set_aspect('equal', 'datalim')
plt.title('Projection of the data', fontsize=24)
Thanks!
In line cbar.ax.text( .5, j - .985, lab, ha='left', va='center', rotation = 270) you have to work and change with '.985' with try and error to get better results.
You can extract the y limits of the colorbar to know its top and bottom. Dividing that area into 11 equally spaced positions, will have the 5 centers at the odd positions of that list. Similarly, you can extract the x limits to find the horizontal center.
Some remarks:
If you already called plt.subplots(), then plt.figure() will create a new figure, leaving the first plot empty. You can set the figsize directly via plt.subplots(figsize=...)
You are mixing matplotlib's "object-oriented interface" with the pyplot interface. This can lead to a lot of confusion. It is best to stick to one or the other. (The object-oriented interface is preferred, especially when you are creating non-trivial plots.)
You set dist = np.random.rand(1900,1900) of dimensions 1900x1900 while you are only using dimensions 1900x2.
The code nor the text give an indication of the values inside outlier_label. The code below assumes they are 5 equally-spaced numbers, and that both the lowest and the highest value are present in the data.
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import numpy as np
colors = ['Orange', 'Purple', 'Blue', 'Red', 'Green']
cmap = ListedColormap(colors)
fig, ax = plt.subplots(figsize=(12, 12), dpi=80)
# data
dist = np.random.randn(1900, 2).cumsum(axis=0)
outlier_label = np.repeat(np.arange(5), 1900 // 5)
indices = outlier_label != -2
scatter = ax.scatter(dist[indices, 0], dist[indices, 1], c=outlier_label[indices], cmap=cmap, s=20)
# legend
cbar = plt.colorbar(scatter, ax=ax)
cbar.ax.get_yaxis().set_ticks([])
cb_xmin, cb_xmax = cbar.ax.get_xlim()
cb_ymin, cb_ymax = cbar.ax.get_ylim()
num_colors = len(colors)
for j, lab in zip(np.linspace(cb_ymin, cb_ymax, 2 * num_colors + 1)[1::2],
['$Training$', '$None$', '$GS$', '$ML$', '$Both$']):
cbar.ax.text((cb_xmin + cb_xmax) / 2, j, lab, ha='center', va='center', rotation=270, color='white', fontsize=16)
cbar.ax.get_yaxis().labelpad = 25
cbar.ax.set_ylabel('Outliers', rotation=270, fontsize=18)
ax.set_aspect('equal', 'datalim')
ax.set_title('Projection of the data', fontsize=24)
plt.show()
I am trying to plot a histogram of exponential distribution ranging from 0 to 20 with mean value 2.2 and bin width 0.05. However, the bar color became white as I am plotting it. The following is my code:
bins = np.linspace(0, 20, 401)
x = np.random.exponential(2.2, 3000)
counts, _ = np.histogram(x, bins)
df = pd.DataFrame({'bin': bins[:-1], 'count': counts})
p = sns.catplot(data = df, x = 'bin', y = 'count', yerr = [i**(1/2) for i in counts], kind = 'bar', height = 4, aspect = 2, palette = 'Dark2_r')
p.set(xlabel = 'Muon decay times ($\mu s$)', ylabel = 'Count', title = 'Distribution for muon decay times')
for ax in p.axes.flat:
labels = ax.get_xticklabels()
for i,l in enumerate(labels):
if (i%40 != 0):
labels[i] = ""
ax.set_xticklabels(labels, rotation=30)
I believe that this is caused by the number of bins. If the first line of the codes are set to bins = np.linspace(0, 20, 11), the plot would be:
But I have no idea how to resolve this.
As #JohanC points out, if you're trying to draw elements that are close to or smaller than the resolution of your raster graphic, you have to expect some artifacts. But it also seems like you'd have an easier time making this plot directly in matplotlib, since catplot is not designed to make histograms:
f, ax = plt.subplots(figsize=(8, 4), dpi=96)
ax.bar(
bins[:-1], counts,
yerr=[i**(1/2) for i in counts],
width=(bins[1] - bins[0]), align="edge",
linewidth=0, error_kw=dict(linewidth=1),
)
ax.set(
xmargin=.01,
xlabel='Muon decay times ($\mu s$)',
ylabel='Count',
title='Distribution for muon decay times'
)
Matplotlib doesn't have a good way to deal with bars that are thinner than one pixel. If you save to an image file, you can increase the dpi and/or the figsize.
Some white space is due to the bars being 0.8 wide, leaving a gap of 0.2. Seaborn's barplot doesn't let you set the bar widths, but you could iterate through the generated bars and change their width (also updating their x-value to keep them centered around the tick position).
The edges of the bars get a fixed color (default 'none', or fully transparent). While iterating through the generated bars, you could set the edge color equal to the face color.
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator
import seaborn as sns
import pandas as pd
import numpy as np
bins = np.linspace(0, 20, 401)
x = np.random.exponential(2.2, 3000)
counts, _ = np.histogram(x, bins)
df = pd.DataFrame({'bin': bins[:-1], 'count': counts})
g = sns.catplot(data=df, x='bin', y='count', yerr=[i ** (1 / 2) for i in counts], kind='bar',
height=4, aspect=2, palette='Dark2_r', lw=0.5)
g.set(xlabel='Muon decay times ($\mu s$)', ylabel='Count', title='Distribution for muon decay times')
for ax in g.axes.flat:
ax.xaxis.set_major_locator(MultipleLocator(40))
ax.tick_params(axis='x', labelrotation=30)
for bar in ax.patches:
bar.set_edgecolor(bar.get_facecolor())
bar.set_x(bar.get_x() - (1 - bar.get_width()) / 2)
bar.set_width(1)
plt.tight_layout()
plt.show()
I am trying to make a bar plot that looks like this: vertical stacked barplot with a horizontal bar underneath
I made the actual vertical bar plot in python using this code:
fig, ax = plt.subplots(figsize=[15, 5])
width = 0.75
ax.bar(labels, my_order["relapse"], width, label=SAMPLE[0], color = "r")
ax.bar(labels, my_order['remission'], width, bottom=my_order['relapse'], label=SAMPLE[1], color = "orange")
ax.set_title('Ratio of cells by patient in each cluster')
ax.legend(bbox_to_anchor=(1.01,0.5), loc='center left')
my_order is a dataframe that contains a column with the numbers for relapse and a column for the numbers for remission. The part I cannot create is the horizontal bar underneath. This would be colored based on other properties of the different bars (each bar represents a cluster and in this case I would want to color the horizontal plot blue if the cluster has one property and yellow if it has another). Does anyone know if this is possible to do in python? Or if I have to do this manually? In the full dataset there are ~50 bars in the entire plot so it would be awesome to find a way to not do this manually.
You could loop through each bar position, and create a colored rectangle with the same width as the bar, but placed slightly below the plot.
The rectangle has following parameters:
(x, y), width, height
transform=ax.get_xaxis_transform(): in the x-direction the values are measured in "data coordinates", here being 0, 1, 2, ... for the bar positions; in the y-directions "axes coordinates" are used, going from 1 at the top to 0 at the bottom of the plot area and using negative for positions below
clip_on=False: normally, when something is placed outside the plot area it is clipped away; clip_on=False overrides that behavior
facecolor=...: the interior color
edgecolor='black': a black border
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
labels = [*'abcdefghijklmnopqrst']
my_order = pd.DataFrame({'relapse': np.random.randint(11, 51, 20),
'remission': np.random.randint(11, 51, 20),
'cluster': np.random.randint(1, 3, 20)})
fig, ax = plt.subplots(figsize=[15, 5])
width = 0.75
ax.bar(labels, my_order['relapse'], width, label='SAMPLE[0]', color='crimson')
ax.bar(labels, my_order['remission'], width, bottom=my_order['relapse'], label='SAMPLE[1]', color='orange')
ax.set_title('Ratio of cells by patient in each cluster')
legend1 = ax.legend(title='Samples', bbox_to_anchor=(1.01, 0.5), loc='center left')
ax.margins(x=0.01)
color_for_cluster = {1: 'skyblue', 2: 'yellow'}
for i, cluster in enumerate(my_order['cluster']):
ax.add_patch(plt.Rectangle((i - width / 2, -0.1), width, 0.02,
facecolor=color_for_cluster[cluster], edgecolor='black',
transform=ax.get_xaxis_transform(), clip_on=False))
handles = [plt.Rectangle((0, 0), 0, 0, facecolor=color_for_cluster[cluster], edgecolor='black', label=cluster)
for cluster in color_for_cluster]
legend2 = ax.legend(handles=handles, title='Clusters', bbox_to_anchor=(1.01, -0.01), loc='lower left')
ax.add_artist(legend1) # add the legend again, because the second call to ax.legend removes the first legend
fig.tight_layout()
plt.show()
You also could use plt.Rectangle((i - 1/2, -0.1), 1, 0.02, ...) to make the rectangles occupy the full width.
I am plotting a pie chart with pandas plot function, with the following code and matplotlib:
plt.figure(figsize=(16,8))
# plot chart
ax1 = plt.subplot(121, aspect='equal')
dfhelp.plot(kind='pie', y = 'Prozentuale Gesamt', ax=ax1, autopct='%1.1f%%',
startangle=90, shadow=False, labels=dfhelp['Anzahl Geschäfte in der Gruppe'], legend = False, fontsize=14)
plt.show
the output looks like:
the problem is, the percentages and legend are overlapping, do you have any idea to fix that? For the plotting I used this question.
This is an easier and more readable version of this answer in my opinion (but credits to that answer for making it possible).
import matplotlib.pyplot as plt
import pandas as pd
d = {'col1': ['Tesla', 'GM', 'Ford', 'Nissan', 'Other'],
'col2': [117, 95, 54, 10, 7]}
df = pd.DataFrame(data=d)
print(df)
# Calculate percentages points
percent = 100.*df.col2/df.col2.sum()
# Write label in the format "Manufacturer - Percentage %"
labels = ['{0} - {1:1.2f} %'.format(i,j) for i,j in zip(df.col1, percent)]
ax = df.col2.plot(kind='pie', labels=None) # the pie plot
ax.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle
ax.yaxis.label.set_visible(False) # disable y-axis label
# add the legend
ax.legend(labels, loc='best', bbox_to_anchor=(-0.1, 1.), fontsize=8)
plt.show()
how to add labels to a horizontal bar chart in matplotlib?
Hi everyone, I'm a matplotlib and python newbie and I wanted to ask this question again to get a bit of help as to if there are easier ways to add labels for the count represented by each bar than the current solutions I've found.
Here is the code I have written:
from matplotlib.pyplot import figure
figure(num=None, figsize=(8, 24), dpi=80, facecolor='w', edgecolor='k')
df['Name'].value_counts()[:80].plot(kind='barh')
It works just fine, except for the showing labels next to the bars bit...
I looked on here how to add the label and so I change my code to this:
x = df['Name']
y = df['Name'].value_counts(ascending=True)
fig, ax = plt.subplots(figsize=(18,20))
width = 0.75 # the width of the bars
ind = np.arange(len(y)) # the x locations for the groups
ax.barh(ind, y, width, color="blue")
ax.set_yticks(ind+width/2)
ax.set_yticklabels(y, minor=False)
plt.title('Count of supplies')
plt.xlabel('Count')
plt.ylabel('ylabel')
for i, v in enumerate(y):
ax.text(v + 100, i + 0, str(v), color='black', fontweight='bold')
However, now my names aren't associated with the bars and are just like in order they appear within the dataframe. is there a way to just simply change the first code or to make it so the names associated with bars are correct in 2nd attempt (grouped with the bar they are labeling..)?
Image sorta explaining my issue:
Using the index of y as the index of the barh plot should put the y-labels on the correct spot, next to the corresponding bar. There's no need to manipulate the y-ticklabels. The bar labels can be left aligned and vertically centered. The right x-limit may be moved a bit to have room for the label of the longest bar.
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'Name': np.random.choice(list('AABBBBBCCCCCDEEF'), 20000)})
y = df['Name'].value_counts(ascending=False)
fig, ax = plt.subplots(figsize=(12,5))
ax.barh(y.index, y, height=0.75, color="slateblue")
plt.title('Count of supplies')
plt.xlabel('Count')
plt.ylabel('ylabel')
_, xmax = plt.xlim()
plt.xlim(0, xmax+300)
for i, v in enumerate(y):
ax.text(v + 100, i, str(v), color='black', fontweight='bold', fontsize=14, ha='left', va='center')
plt.show()