Matplotlib grouped data clustered bar chart in Python

Matplotlib grouped data clustered bar chart in Python - python

I have a dictionary of values (drug) as follows:
{0: {0: 100.0, 1: 0.41249706379061035, 2: 5.144449764434768, 3: 31.078456871927678}, 1: {0: 100.0, 1: 0.6688801420346955, 2: 77.32360971119694, 3: 78.15132480853421}, 2: {0: 100.0, 1: 136.01949766418852, 2: 163.4967732211563, 3: 146.7726208999281}}
It contains 3 drug types, then the efficacy of that drug type in 4 different concentrations.
I'm trying to make a clustered bar chart which compares the 3 drugs against each other like this:
Currently, my code is the following:
fig, ax = plt.subplots()
width = 0.35
ind = np.arange(3)
for x in range(3):
ax.bar(ind + (width * x), drug[x].values(), width, bottom=0)
ax.set_title('Drug efficacy')
ax.set_xticks(ind + width / 2)
ax.set_xticklabels(list(string.ascii_uppercase[0:drugCount]))
ax.autoscale_view()
plt.show()
I have adapted the code from this guide, but am having multiple problems.
I think the main reason is that the data used in the example is such that values in one group correspond to the same colour rather than the same cluster.
How can I adapt this code such that it will plot the efficacy of each drug in the 4 different concentrations in isolation compared to the other drugs?

IIUC you want to normalize your values by column, which can be done using sklearn:
from sklearn import preprocessing
df = pd.DataFrame(drug)
scaler = preprocessing.MinMaxScaler()
df = pd.DataFrame(scaler.fit_transform(df))
df.T.plot(kind="bar")
plt.show()

Related

How to generate two legends for a scatterplot

I want to generate two different legends for these five points according to their size, the corresponding labels are written in my code, but I only generate one wrong legend so far, how can I correct my code?
By the way, if I want to generate wiring with the same logic as in my code, is there a better way? I have checked a lot of information and can only generate the picture like this, I hope to get your help to optimize the code. Thanks in advance!
Edit:
I made changes with reference to matplotlib: 2 different legends on same graph, but the legend I got is still incorrect. I want to add legends only with dots instead of lines. I also tried some methods but all failed, can you give me some suggestions？
import numpy as np
import matplotlib.pyplot as plt
X_new = np.random.randint(1,20,(3,2))
x1 = X_new[0,:]
x2 = X_new[1,:]
x3 = X_new[2,:]
p = np.random.randint(1,20,(1,2))
p1 = np.random.randint(1,20,(1,2))
# plt.style.use('ggplot')
color_map = {0: 'blue', 1:'green', 2: 'darkred', 3: 'black', 4:'red'}
legend1_label = {0: 'trn1', 1: 'trn2', 2: 'trn3'}
legend2_label = {0: 'p', 1: 'p1'}
plt.plot()
for idx, cl in enumerate(legend1_label):
scatter = plt.scatter(x=X_new[idx, 0], y=X_new[idx, 1], c=color_map[cl], marker='.',
s=100)
plt.plot([x1[0],p[0,0]], [x1[1],p[0,1]], color='k',linestyle='--',linewidth=1)
plt.plot([x2[0],p[0,0]], [x2[1],p[0,1]], color='k',linestyle='--',linewidth=1)
plt.plot([x3[0],p[0,0]], [x3[1],p[0,1]], color='k',linestyle='--',linewidth=1)
plt.plot([x1[0],p1[0,0]], [x1[1],p1[0,1]], color='r',linestyle='--',linewidth=1)
plt.plot([x2[0],p1[0,0]], [x2[1],p1[0,1]], color='r',linestyle='--',linewidth=1)
plt.plot([x3[0],p1[0,0]], [x3[1],p1[0,1]], color='r',linestyle='--',linewidth=1)
plt.scatter(x=p[0,0], y=p[0,1],c='red',marker='.',s=200)
plt.scatter(x=p1[0,0], y=p1[0,1],c='black',marker='.',s=200)
legend1 = plt.legend(labels=["trn1","trn2","trn3"],loc=4, title="legend1")
plt.legend(labels=["p","p1"], title="legend2")
plt.gca().add_artist(legend1)
plt.show()

Plotting Multiple Series of Lines on the Same Plot

I am attempting to graph battery cycling data similar to this . Each line is one cycle worth of datapoints and should be one line on the graph. At first the code I wrote simply saw the dataframe as a continuous variable, then I inserted a for loop that would graph 1 line for the 1 cycles worth of data, iterate to the next cycle 2 but currently it simply bugs and does not show any graph. Debug seems to show an issue once it loops past cycle 1. Each cycle does not have an equal amount of data points.
EDIT: I suspect now when looping the headers of the data is causing an issue. I think making a dictionary would solve this issue
df2 = pd.read_excel(r'C:\Users\####\- ##### - ####\2-7_7.xlsx',\
sheet_name='record', usecols="A:N")
df2['Capacity(mAh)'] = df2['Capacity(mAh)'].apply(lambda x: x*1000) #A fix for unit error in the data
df2.set_index('Cycle ID',inplace = True) #Set the index to the Cycle number
for cycle in df2.index:
chosen_cyclex = df2.loc[cycle, 'Capacity(mAh)']
chosen_cycley = df2.loc[cycle,'Voltage(V)']
plt.plot(chosen_cyclex.iloc[1],chosen_cycley.iloc[1])
#print(chosen_cyclex[1],chosen_cycley[1])
plt.show()

I ended up using this method, where the equivalents were selected.
for cycle in cyclearray:
plt.plot(df2[df2.index == cycle]['Capacity(mAh)'],df2[df2.index == cycle]['Voltage(V)'],cycle
For other battery testers who show up here, if you need to 'cut' the voltages curves up, use
plt.xlim([xmin,xmax])
plt.ylim([ymin+0.1,ymax-0.1])

You need to specify an ax when plotting. Here are some examples:
# reproducible (but unimaginative) setup
n = 100
cycles = 4
df2 = pd.DataFrame({
'ID': np.repeat(np.arange(cycles), n),
'Capacity(mAh)': np.tile(np.arange(n), cycles),
'Voltage(V)': (np.arange(n)**0.8 * np.linspace(5, 3, cycles)[:, None]).ravel(),
})
Example 1: using groupby.plot, then fiddle around to adjust labels
fig, ax = plt.subplots()
df2.groupby('ID').plot(x='Capacity(mAh)', y='Voltage(V)', ax=ax)
# now customize the labels
lines, labels = ax.get_legend_handles_labels()
for ith, line in zip('1st 2nd 3rd 4th'.split(), lines):
line.set_label(f'{ith} discharge')
ax.legend()
Example 2: groupby used as an iterator
fig, ax = plt.subplots()
ld = {1: 'st', 2: 'nd', 3: 'rd'}
for cycle, g in df2.groupby('ID'):
label = f'{cycle + 1}{ld.get(cycle + 1, "th")} discharge'
g.plot(x='Capacity(mAh)', y='Voltage(V)', label=label, ax=ax)
Same plot as above.
Example 3: using ax.plot instead of df.plot or similar
fig, ax = plt.subplots()
ld = {1: 'st', 2: 'nd', 3: 'rd'}
for cycle, g in df2.groupby('ID'):
label = f'{cycle + 1}{ld.get(cycle + 1, "th")} discharge'
ax.plot(g['Capacity(mAh)'], g['Voltage(V)'], label=label)
ax.legend()

Creating box plots by looping multiple columns

I am trying to create multiple box plot charts for about 5 columns in my dataframe (df_summ):
columns = ['dimension_a','dimension_b']
for i in columns:
sns.set(style = "ticks", palette = "pastel")
box_plot = sns.boxplot(y="measure", x=i,
palette=["m","g"],
data=df_summ_1500_delta)
sns.despine(offset=10, trim=True)
medians = df_summ_1500_delta.groupby([i])['measure'].median()
vertical_offset=df_summ_1500_delta['measure'].median()*-0.5
for xtick in box_plot.get_xticks():
box_plot.text(xtick,medians[xtick] + vertical_offset,medians[xtick],
horizontalalignment='center',size='small',color='blue',weight='semibold')
My only issue is that they aren't be separated on different facets, but rather on top of each other.
Any help on how I can make both on their own separate chart with the x axis being 'dimension a' and the x axis of the second chart being 'dimension b'.

To draw two boxplots next to each other at each x-position, you can use a hue for dimension_a and dimension_b separately. These two columns need to be transformed (with pd.melt()) to "long form".
Here is a some example code starting from generated test data. Note that the order both for the x-values as for the hue-values needs to be enforced to be sure of their exact position. The individual box plots are distributed over a width of 0.8.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
df = pd.DataFrame({'dimension_a': np.random.choice(['hot', 'cold'], 100),
'dimension_b': np.random.choice(['hot', 'cold'], 100),
'measure': np.random.uniform(100, 500, 100)})
df.loc[df['dimension_a'] == 'hot', 'measure'] += 100
df.loc[df['dimension_a'] == 'cold', 'measure'] -= 100
x_order = ['hot', 'cold']
columns = ['dimension_a', 'dimension_b']
df1 = df.melt(value_vars=columns, var_name='dimension', value_name='value', id_vars='measure')
sns.set(style="ticks", palette="pastel")
ax = sns.boxplot(data=df1, x='value', order=x_order, y='measure',
hue='dimension', hue_order=columns, palette=["m", "g"], dodge=True)
ax.set_xlabel('')
sns.despine(offset=10, trim=True)
for col, dodge_dist in zip(columns, np.linspace(-0.4, 0.4, 2 * len(x_order) + 1)[1::2]):
medians = df.groupby([col])['measure'].median()
vertical_offset = df['measure'].median() * -0.5
for x_ind, xtick in enumerate(x_order):
ax.text(x_ind + dodge_dist, medians[xtick] + vertical_offset, f'{medians[xtick]:.2f}',
horizontalalignment='center', size='small', color='blue', weight='semibold')
plt.show()

python seaborn: customize line plot and scatterplot together (also legend)

df = pd.DataFrame({
'id': {0: -3, 1: 2, 2: -3, 3: 1},
'val': {0: 0.4, 1: 0.03, 2: 0.88, 3: 1.3},
'indicator': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'count': {0: 40000, 1: 5779, 2: 3000, 3: 31090}
})
df
and I hope to get a plot like the following:
I know with code of following, I can get a close plot but I want to have the line size also varies with "count" variable, but when I tried to add size = 'count', I did not get a meaningful plot and also for the legend, I want to only have one legend for "indicator" rather than two:
plt.figure()
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df)
sns.scatterplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)

To answer the second part of your question - you can disable the lineplot legend like so:
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df, legend=False)
This will leave you with two legend groups - one for colours and one for sizes. This is the easiest way, but you can also tinker with plt.legend() and build your own from scratch.
As for making the lines vary their thickness dynamically from one point to another, I don't think you can do it using seaborn. For something like that you'd need a more low-level library, like bokeh or use matplotlib directly to draw connecting lines between line markers, adjusting for their varying size.

Plotting multiple bar charts

I want to create a bar chart with a focus on two cities. My data set is similar to this.
city rate Bedrooms
Houston 132.768382 0
Dallas 151.981043 1
Dallas 112.897727 3
Houston 132.332665 1
Houston 232.611185 2
Dallas 93.530662 4
I've broken them up into a dataframe of just Dallas and Houston. Like
dal.groupby('bedrooms')['rate'].mean().plot(kind='bar')
&
hou.groupby('bedrooms')['rate'].mean().plot(kind='bar')
How would I go about making a bar chart that lists average rate of listings based on bedroom type. Something similar to this image below that I found here Python matplotlib multiple bars. With the labels being the cities.
I'd appreciate any help!

Seaborn is your friend in this case, first create a grouped dataframe with the average rate per City and bedrooms and the plot it with seaborn
import seaborn as sns
dal_group = dal.groupby(['city' , 'Bedrooms']).agg({'rate': 'mean'}).reset_index()
sns.barplot(data=dal_group, x='Bedrooms', y='rate', hue='city')
with the data above, it will produce this plot:

Here's a basic way to do it in matplotlib:
import numpy as np
import matplotlib.pyplot as plt
data_dallas = dal.groupby('bedrooms')['rate'].mean()
data_houston = hou.groupby('bedrooms')['rate'].mean()
fig, ax = plt.subplots()
x = np.arange(5) # if the max. number of bedrooms is 4
width = 0.35 # width of one bar
dal_bars = ax.bar(x, data_dallas, width)
hou_bars = ax.bar(x + width, data_houston, width)
ax.set_xticks(x + width / 2)
ax.set_xticklabels(x)
ax.legend((dal_bars[0], hou_bars[0]), ('Dallas', 'Houston'))
plt.show()

There is an easy solution using one line of pandas (as long you rearrange the data first) only or using plotly
Data
import pandas as pd
df = pd.DataFrame({'city': {0: 'Houston',
1: 'Dallas',
2: 'Dallas',
3: 'Houston',
4: 'Houston',
5: 'Dallas'},
'rate': {0: 132.768382,
1: 151.981043,
2: 112.897727,
3: 132.332665,
4: 232.611185,
5: 93.530662},
'Bedrooms': {0: 0, 1: 1, 2: 3, 3: 1, 4: 2, 5: 4}})
# groupby
df = df.groupby(["city", "Bedrooms"])["rate"].mean().reset_index()
Pandas - Matplotlib
With pivot_table we can rearrange our data
pv = pd.pivot_table(df,
index="Bedrooms",
columns="city",
values="rate")
city Dallas Houston
Bedrooms
0 NaN 132.768382
1 151.981043 132.332665
2 NaN 232.611185
3 112.897727 NaN
4 93.530662 NaN
And then plot in one line only.
pv.plot(kind="bar");
Using Plotly
import plotly.express as px
px.bar(df, x="Bedrooms", y="rate", color="city",barmode='group')

You can read more here: https://pythonspot.com/matplotlib-bar-chart/
import numpy as np
import matplotlib.pyplot as plt
# data to plot
n_groups = # of data points for each
mean_rates_houston = [average rates of bedrooms for Houston]
mean_rates_dallas = [average rates of bedrooms for Dalls]
# create plot
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.35
opacity = 0.8
rects1 = plt.bar(index, mean_rates_dallas, bar_width,
alpha=opacity,
color='b',
label='Dallas')
rects2 = plt.bar(index + bar_width, mean_rates_houston, bar_width,
alpha=opacity,
color='g',
label='Houston')
plt.xlabel('City')
plt.ylabel('Rates')
plt.title('Bedroom Rates per City')
# whatever the number of bedrooms in your dataset might be: change plt.xticks
plt.xticks(index + bar_width, ('0', '1', '2', '3'))
plt.legend()
plt.tight_layout()
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib grouped data clustered bar chart in Python - python

IIUC you want to normalize your values by column, which can be done using sklearn: from sklearn import preprocessing df = pd.DataFrame(drug) scaler = preprocessing.MinMaxScaler() df = pd.DataFrame(scaler.fit_transform(df)) df.T.plot(kind="bar") plt.show()

Related

How to generate two legends for a scatterplot

Plotting Multiple Series of Lines on the Same Plot

Creating box plots by looping multiple columns

python seaborn: customize line plot and scatterplot together (also legend)

Plotting multiple bar charts

Categories

Resources