I have a plot where the 5th bar is incorrectly placed right next to the 4th bar. What should I change?
My small_ax_0 pandas dataframe looks like this:
INDEX 0
0 1 5.0
1 10001 4.0
2 20001 5.0
3 30001 5.0
4 40001 5.0
5 50001 4.0
6 60001 1.0
7 70001 4.0
8 80001 0.0
9 90001 4.0
Here is my code:
plt.hist(small_ax_0[0])
plt.tick_params(axis='both', which='major', labelsize=100)
plt.tick_params(axis='both', which='minor', labelsize=100)
plt.xlabel('Position', fontsize=100)
plt.ylabel('Frequency', fontsize=100)
plt.title('My Plot', fontsize = 150) ##
plt.grid(b=True, which='major', color='grey', linestyle='dotted')
plt.xticks( rotation = 45)
plt.show()
pandas visualization
df['0'].value_counts().sort_index().plot(kind='bar')
By default, hist returns 10 bins, equally spaced along the range of your data. So in this case, the data ranges from 0 to 5, and spacing between bins is 0.5. If you just want to plot the number of occurrences of each number, I suggest using np.unique() and use a bar plot:
import numpy as np
nums, freq = np.unique(small_ax_0[0], return_counts=True)
plt.bar(nums, freq)
and you get a figure, where the bars are centered around each number.
Related
I have a dataframe with positive and negative values from three kind of variables.
labels variable value
0 -10e5 nat -38
1 2e5 nat 50
2 10e5 nat 16
3 -10e5 agr -24
4 2e5 agr 35
5 10e5 agr 26
6 -10e5 art -11
7 2e5 art 43
8 10e5 art 20
when values are negative I want the barplot to follow the color sequence:
n_palette = ["#ff0000","#ff0000","#00ff00"]
Instead when positive I want it to reverse the palette:
p_palette = ["#00ff00","#00ff00","#ff0000"]
I've tried this:
palette = ["#ff0000","#ff0000","#00ff00",
"#00ff00","#00ff00","#ff00",
"#00ff00","#00ff00","#ff00"]
ax = sns.barplot(x=melted['labels'], y=melted['value'], hue = melted['variable'],
linewidth=1,
palette=palette)
But I get the following output:
what I'd like is the first two bars of the group to become green and the last one red when values are positive.
You seem to want to do the coloring depending on a criterion on two columns. It seems suitable to add a new column which uniquely labels that criterion.
Further, seaborn allows the palette to be a dictionary telling exactly which hue label gets which color. Adding barplot(..., order=[...]) would define a fixed order.
Here is some example code:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from io import StringIO
data_str = ''' labels variable value
0 -10e5 nat -38
1 2e5 nat 50
2 10e5 nat 16
3 -10e5 agr -24
4 2e5 agr 35
5 10e5 agr 26
6 -10e5 art -11
7 2e5 art 43
8 10e5 art 20
'''
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
melted['legend'] = np.where(melted['value'] < 0, '-', '+')
melted['legend'] = melted['variable'] + melted['legend']
palette = {'nat-': "#ff0000", 'agr-': "#ff0000", 'art-': "#00ff00",
'nat+': "#00ff00", 'agr+': "#00ff00", 'art+': "#ff0000"}
ax = sns.barplot(x=melted['labels'], y=melted['value'], hue=melted['legend'],
linewidth=1, palette=palette)
ax.axhline(0, color='black')
plt.show()
PS: To remove the legend: ax.legend_.remove(). Or to have a legend with multiple columns: ax.legend(ncol=3).
A different approach, directly with the original dataframe, is to create two bar plots: one for the negative values and one for the positive. For this to work well, it is necessary that the 'labels' column (the x=) is explicitly made categorical. Also adding pd.Categorical(..., categories=['nat', 'agr', 'art']) for the 'variable' column could fix an order.
This will generate a legend with the labels twice with different colors. Depending on what you want, you can remove it or create a more custom legend.
An idea is to add the labels under the positive and on top of the negative bars:
sns.set()
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
palette_pos = {'nat': "#00ff00", 'agr': "#00ff00", 'art': "#ff0000"}
palette_neg = {'nat': "#ff0000", 'agr': "#ff0000", 'art': "#00ff00"}
melted['labels'] = pd.Categorical(melted['labels'])
ax = sns.barplot(data=melted[melted['value'] < 0], x='labels', y='value', hue='variable',
linewidth=1, palette=palette_neg)
sns.barplot(data=melted[melted['value'] >= 0], x='labels', y='value', hue='variable',
linewidth=1, palette=palette_pos, ax=ax)
ax.legend_.remove()
ax.axhline(0, color='black')
ax.set_xlabel('')
ax.set_ylabel('')
for bar_container in ax.containers:
label = bar_container.get_label()
for p in bar_container:
x = p.get_x() + p.get_width() / 2
h = p.get_height()
if not np.isnan(h):
ax.text(x, 0, label + '\n\n' if h < 0 else '\n\n' + label, ha='center', va='center')
plt.show()
Still another option involves sns.catplot() which could be clearer when a lot of data is involved:
sns.set()
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
melted['legend'] = np.where(melted['value'] < 0, '-', '+')
melted['legend'] = melted['variable'] + melted['legend']
palette = {'nat-': "#ff0000", 'agr-': "#ff0000", 'art-': "#00ff00",
'nat+': "#00ff00", 'agr+': "#00ff00", 'art+': "#ff0000"}
g = sns.catplot(kind='bar', data=melted, col='labels', y='value', x='legend',
linewidth=1, palette=palette, sharex=False, sharey=True)
for ax in g.axes.flat:
ax.axhline(0, color='black')
ax.set_xlabel('')
ax.set_ylabel('')
plt.show()
I would like to add count and percentage labels to a grouped bar chart, but I haven't been able to figure it out.
I've seen examples for count or percentage for single bars, but not for grouped bars.
the data looks something like this (not the real numbers):
age_group Mis surv unk death total surv_pct death_pct
0 0-9 1 2 0 3 6 100.0 0.0
1 10-19 2 1 0 1 4 99.9 0.0
2 20-29 0 3 0 1 4 99.9 0.0
3 30-39 0 7 1 2 10 100.0 0.0
`4 40-49 0 5 0 1 6 99.7 0.3
5 50-59 0 6 0 4 10 99.3 0.3
6 60-69 0 7 1 4 12 98.0 2.0
7 70-79 1 8 2 5 16 92.0 8.0
8 80+ 0 10 0 7 17 81.0 19.0
And The chart looks something like this
I created the chart with this code:
ax = df.plot(y=['deaths', 'surv'],
kind='barh',
figsize=(20,9),
rot=0,
title= '\n\n surv and deaths by age group')
ax.legend(['Deaths', 'Survivals']);
ax.set_xlabel('\nCount');
ax.set_ylabel('Age Group\n');
How could I add count and percentage labels to the grouped bars? I would like it to look something like this chart
Since nobody else has suggested anything, here is one way to approach it with your dataframe structure.
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv("test.txt", delim_whitespace=True)
cat = ['death', 'surv']
ax = df.plot(y=cat,
kind='barh',
figsize=(20, 9),
rot=0,
title= '\n\n surv and deaths by age group')
#making space for the annotation
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmin, 1.05 * xmax)
#connecting bar series with df columns
for cont, col in zip(ax.containers, cat):
#connecting each bar of the series with its absolute and relative values
for rect, vals, perc in zip(cont.patches, df[col], df[col+"_pct"]):
#annotating each bar
ax.annotate(f"{vals} ({perc:.1f}%)", (rect.get_width(), rect.get_y() + rect.get_height() / 2.),
ha='left', va='center', fontsize=10, color='black', xytext=(3, 0),
textcoords='offset points')
ax.set_yticklabels(df.age_group)
ax.set_xlabel('\nCount')
ax.set_ylabel('Age Group\n')
ax.legend(['Deaths', 'Survivals'], loc="lower right")
plt.show()
Sample output:
If the percentages per category add up, one could also calculate the percentages on the fly. This would then not necessitate that the percentage columns have exactly the same name structure. Another problem is that the font size of the annotation, the scaling to make space for labeling the largest bar, and the distance between bar and annotation are not interactive and may need fine-tuning.
However, I am not fond of this mixing of pandas and matplotlib plotting functions. I had cases where the axis definition by pandas interfered with matplotlib, and datetime objects ... well, let's not talk about that.
When comparing two different Y variables, there is no real way of knowing which chart type belongs to which Y-Axis. I need a legend that says which chart type belongs to which data set.
With help from this site itself I've been able to plot different categorized factors using different chart types, but as you can see there is no way to tell which chart type belongs to which factor/variable
This is the data table(tm_daily_df), and the current code
report_date shift UTL_R Head_Count
0 2019-03-17 A 0.669107 39
1 2019-03-18 A 0.602197 69
2 2019-03-19 A 0.568741 72
3 2019-03-20 A 0.552013 78
4 2019-03-21 A 0.585469 57
5 2019-03-22 A 0.635652 61
6 2019-03-23 A 0.602197 51
7 2019-03-17 1 0.828020 16
8 2019-03-17 2 0.585469 8
9 2019-03-17 3 0.526922 15
10 2019-03-18 1 0.618924 30
11 2019-03-18 2 0.610560 20
12 2019-03-18 3 0.577105 19
13 2019-03-19 1 0.610560 28
14 2019-03-19 2 0.602197 26
15 2019-03-19 3 0.468375 18
16 2019-03-20 1 0.543650 33
17 2019-03-20 2 0.552013 26
18 2019-03-20 3 0.552013 19
19 2019-03-21 1 0.577105 22
20 2019-03-21 2 0.585469 19
21 2019-03-21 3 0.602197 16
22 2019-03-22 1 0.593833 26
23 2019-03-22 2 0.685835 20
24 2019-03-22 3 0.635652 15
25 2019-03-23 1 0.577105 23
26 2019-03-23 2 0.627288 16
27 2019-03-23 3 0.602197 12
fig, ax = plt.subplots(figsize=(15,6))
g = sns.lineplot(x='report_date', y='UTL_R', data=tm_daily_df, ax=ax, hue = 'shift', legend = None,
marker='o', markersize=10)
ax2 = ax.twinx()
g = sns.barplot(x='report_date', y='Head_Count', data=tm_daily_df, ax=ax2, hue='shift',alpha=.5)
ax.set_title('Utilization Ratio vs HeadCount')
plt.show()
I want to have a legend that says which chart type belongs to which data set. In this case, there would be a secondary legend that shows a line and the word "UTL_R" and a square(or whatever would represent a bar graph) next to the word "Head_Count" .
I'm also open to any other ideas that can define the applied chart types. Keep in mind this graph is one of many from a huge set of variables, it's not a single instance.
Is there maybe a way i can just put an image/small table into the figure if this is not possible?
tl;dr at the bottom
I recently needed to implement two legends on a project as well. My code is something like:
def plot_my_data(ax, local_zerog, local_oneg, local_maxg):
# local_zerog list looks like: [local_zerog_dcmdcl_names, local_zerog_dcmdcl_values, local_zerog_time2double_names, local_zerog_time2double_values]
# the others are structured the same way as well
mpl.rcParams["lines.markersize"] = 7
dcmdcl = ax.scatter(local_zerog[0], local_zerog[1], label='Zero G', facecolors='none', edgecolors='b') #dcmdcl
ax.scatter(local_oneg[0], local_oneg[1], label="One G", facecolors='none', edgecolors='g')
ax.scatter(local_maxg[0], local_maxg[1], label="Max G", facecolors='none', edgecolors='r')
ax.tick_params(axis="x", direction="in", top=False, labeltop=False, labelbottom=True)
ax.tick_params(axis="y", direction="in", right=True)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=90, horizontalalignment='center')
legend1 = ax.legend(loc=1)
time2double = ax.scatter(local_zerog[2], local_zerog[3], label='Zero G', marker='s', color='b') #time2double
ax.scatter(local_oneg[2], local_oneg[3], label="One G", marker='s', color='g')
ax.scatter(local_maxg[2], local_maxg[3], label="Max G", marker='s', color='r')
ax.plot(local_oneg[0], [0 for _ in local_oneg[0]], color='k') # line at 0
ax.plot(local_oneg[2], [0 for _ in local_oneg[2]], color='k')
ax.legend([dcmdcl, time2double], ["dcmdcl [%]", "time2double [s]"], loc=2)
plt.gca().add_artist(legend1)
Where I had basically 6 sets of data: 3 for dcmdcl and 3 for time2double. Each has a different color/shape so basically I plotted all of one shape in the lines
dcmdcl = ax.scatter(local_zerog[0], local_zerog[1], label='Zero G', facecolors='none', edgecolors='b') #dcmdcl
ax.scatter(local_oneg[0], local_oneg[1], label="One G", facecolors='none', edgecolors='g')
ax.scatter(local_maxg[0], local_maxg[1], label="Max G", facecolors='none', edgecolors='r')
ax.tick_params(axis="x", direction="in", top=False, labeltop=False, labelbottom=True)
ax.tick_params(axis="y", direction="in", right=True)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=90, horizontalalignment='center')
legend1 = ax.legend(loc=1)
where the last line generates a legend based off the various labels I've assigned. Now to differentiate between the shapes I took one dcmdcl and one time2double and made another legend. The relevant code is:
dcmdcl = ax.scatter(local_zerog[0], local_zerog[1], label='Zero G', facecolors='none', edgecolors='b') #dcmdcl
time2double = ax.scatter(local_zerog[2], local_zerog[3], label='Zero G', marker='s', color='b') #time2double
ax.legend([dcmdcl, time2double], ["dcmdcl [%]", "time2double [s]"], loc=2)
where I basically feed it two specific instances and tell it to create another legend from this information and place it at another location.
tl;dr
It looks like you already have the legend you want for one of the data sets, so now you basically need to run:
legend1 = ax.legend(['put a series of items you want to describe here'], ['put how you would like to title them (needs to be in same order as previous list)'], loc=2)
plt.gca().add_artist(legend1)
I think the order might be important here (I don't remember from when I made it), but if you'll notice my order is:
plot some stuff
legend1 = ax.legend(loc=1) to make a legend (not plotted yet, just a variable)
plot more stuff
ax.legend([dcmdcl, time2double], ["dcmdcl [%]", "time2double [s]"], loc=2) (note this is not assigned to a variable this time)
plt.gca().add_artist(legend1) now I use the variable made earlier to plot it via add_artist()
My code to generate each ax that is passed into my function above:
fig = plt.figure(figsize=(15, 15))
ax = fig.add_subplot(1, 3, 1)
zerog, oneg, maxg = build_plot_data(lower_mach)
plot_my_data(ax, zerog, oneg, maxg)
ax.set_title("Mach < .7")
In a seaborn horizontal barplot, which has two set of barplots where one set is placed on top of another, how the axes of each can be controlled independanlty? Currently, I want to adjust the thickness of the bars based on the frequency of the occurrence of some entity in it.
Currently both the barplots are plotted with axes stored in ax1 and ax2. But I am able to adjust the thickness of the bar only for ax1 (lightblue in colour), but not for ax2 (dark blue. All bars have uniform thickness). I am not able to figure out how the assignment of ax2 needs to be done so as to adjust the bar thickness for the second set of bars as well.
How can varying length bars for both the barplots be obtained?
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")
f, ax = plt.subplots(figsize=(15, 45))
crashes = plotie.groupby('target_wcount').mean()
sns.set_color_codes("pastel")
ax1 = sns.barplot(x="uno", y="indie", orient='h', data=crashes,
label="uno", color="b")
sns.set_color_codes("muted")
ax2 = sns.barplot(x="miss", y="indie", orient='h', data=crashes,
label="miss", color="b")
for bar, newwidth in zip(ax1.patches, summa):
bar.set_height(3*newwidth)
for bar, newwidth in zip(ax2.patches, summa):
bar.set_height(3*newwidth)
sns.despine(left=True, bottom=True)
f.savefig('filea')
Sample For data
output_wcount missing_count match_count uni indie uno miss
target_wcount
49 49.0 39.440000 9.560000 1.0 49 1.0 0.804898
48 48.0 36.730000 11.270000 1.0 48 1.0 0.765208
46 46.0 34.400000 11.600000 1.0 46 1.0 0.747826
45 45.0 33.940000 11.060000 1.0 45 1.0 0.754222
44 44.0 34.630000 9.370000 1.0 44 1.0 0.787045
43 43.0 31.420000 11.580000 1.0 43 1.0 0.730698
42 42.0 31.455000 10.545000 1.0 42 1.0 0.748929
41 41.0 29.630000 11.370000 1.0 41 1.0 0.722683
40 40.0 28.430000 11.570000 1.0 40 1.0 0.710750
39 39.0 27.935556 11.064444 1.0 39 1.0 0.716296
By using the twinx function this could be easily solved
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")
f, ax = plt.subplots(figsize=(15, 45))
crashes = plotie.groupby('target_wcount').mean()
sns.set_color_codes("pastel")
ax1 = sns.barplot(x="uno", y="indie", orient='h', data=crashes,
label="uno", color="b")
sns.set_color_codes("muted")
ax2 = ax.twinx()
sns.barplot(x="miss", y="indie", orient='h', data=crashes,
label="miss", color="b",ax=ax2)
for bar, newwidth in zip(ax1.patches, summa):
bar.set_height(3*newwidth)
for bar, newwidth in zip(ax2.patches, summa):
bar.set_height(3*newwidth)
sns.despine(left=True, bottom=True)
f.savefig('filea')
Here is the sample data:
Datetime Price Data1 Data2 ShiftedPrice
0 2017-11-05 09:20:01.134 2123.0 12.23 34.12 300.0
1 2017-11-05 09:20:01.789 2133.0 32.43 45.62 330.0
2 2017-11-05 09:20:02.238 2423.0 35.43 55.62 NaN
3 2017-11-05 09:20:02.567 3423.0 65.43 56.62 NaN
4 2017-11-05 09:20:02.948 2463.0 45.43 58.62 NaN
I am trying to draw a plot between Datetime and Shiftedprice columns and horizontal lines for mean, confidence intervals of the ShiftedPrice column.
Have a look at the code below:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
df1 = df.dropna(subset=['ShiftedPrice'])
df1
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(121)
ax = df1.plot(x='Datetime',y='ShiftedPrice')
# Plotting the mean
ax.axhline(y=df1['ShiftedPrice'].mean(), color='r', linestyle='--', lw=2)
plt.show()
# Plotting Confidence Intervals
ax.axhline(y=df1['ShiftedPrice'].mean() + 1.96*np.std(df1['ShiftedPrice'],ddof=1), color='g', linestyle=':', lw=2)
ax.axhline(y=df1['ShiftedPrice'].mean() - 1.96*np.std(df1['ShiftedPrice'],ddof=1), color='g', linestyle=':', lw=2)
plt.show()
My problem is that horizontal lines are not appearing. Instead, I get the following message
ax.axhline(y=df1['ShiftedPrice'].mean(), color='r', linestyle='--', lw=2)
Out[22]: <matplotlib.lines.Line2D at 0xccc5c18>