Seaborn: Hue dependent on two values

Seaborn: Hue dependent on two values - python

I have following two dataframes that I would like to plot together. The first one (data) contains the complete data of different groups for several repeated experiments (=replicates) with the values for the individual cells within that experiment. The second one (avgs) summarizes the mean of each replicate experiment for all groups. I basically want to plot my data in the way suggested here.
data.head()
cell replicate value group
0 1 1 0.029723 GROUP_A
1 1 2 0.019136 GROUP_A
2 2 2 0.020216 GROUP_A
3 3 1 0.032020 GROUP_B
4 3 2 0.044815 GROUP_B
avgs.head()
replicate value group
0 1 0.019709 GROUP_A
1 2 0.018937 GROUP_A
2 1 0.358437 GROUP_B
3 2 0.269602 GROUP_B
4 3 0.303252 GROUP_B
My aim is to achieve either the plots shown in B or C, where the hue depends on both the group and replicate.
import matplotlib.pyplot as plt
import seaborn as sns
sns.swarmplot(x="group", y="value", data=data, hue="replicate")
sns.swarmplot(x="group", y="value", data=avgs,size=8,hue="replicate", edgecolor="k", linewidth=2)
will give me basically the plot shown in A, with the hue corresponding to the replicate.
Is there a way to do this either with a different color palette for each group, so that the each group have different colors with each replicate having different shades of that color (example B, made in Affinity Designer)?
An alternative that would work for me is to plot the single cell values of data with a grey palette. However how can I achieve that when I add the replicate mean data of avgs, each group has a different color and each replicate mean has the corresponding shading in that color (example C)?
Is there the possibility to pass a palette dictionary to seaborn/matplotlib e.g. something like:
gray = sns.dark_palette("gray", n_colors=5)
red = sns.dark_palette("red", n_colors=5)
blue = sns.dark_palette("blue", n_colors=5)
my_palette={"GROUP_A": gray, "GROUP_B": red, "GROUP_C": blue}
Thanks!

The groups can be plotted separately, each with its own palette. To make sure the x-positions are respected, the order= keyword needs to be set with all the desired x-labels.
Seaborn automatically adds legend entries for each call, so the legend can get very large. You can either suppress the legend, or limit it to the first few entries.
from matplotlib import pyplot as plt
import matplotlib
import numpy as np
import pandas as pd
import seaborn as sns
N = 500
data = pd.DataFrame({'replicate': np.random.choice(range(1, 4), N),
'value': 2 + np.random.uniform(-0.5, 0.5, (N, 5)).sum(axis=1),
'group': np.random.choice([f'GROUP_{g}' for g in 'ABCD'], N)})
groups = np.unique(data.group)
for g in groups:
data.loc[data.group == g, 'value'] += np.random.uniform(0, 3)
avgs = data.groupby(['replicate', 'group']).mean()
avgs.reset_index(inplace=True)
my_palette = {"GROUP_A": 'Greys', "GROUP_B": 'Reds', "GROUP_C": 'Blues', "GROUP_D": 'Greens'}
for ind, g in enumerate(groups):
sns.swarmplot(x="group", y="value", data=data[data.group == g], order=groups,
palette=my_palette[g], hue="replicate")
sns.swarmplot(x="group", y="value", data=avgs[avgs.group == g], order=groups,
size=8, palette=my_palette[g], hue="replicate", edgecolor="k", linewidth=2)
# plt.gca().legend_.remove() # optionally suppress the legend
handles, labels = plt.gca().get_legend_handles_labels()
plt.legend(handles=handles[:3], title='replicate')
plt.tight_layout()
plt.show()

Related

plotting area plot as a subplot [duplicate]

I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I'm unfortunately failing to come up with a solution to how and would highly appreciate some help.

You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2)
df1.plot(ax=axes[0,0])
df2.plot(ax=axes[0,1])
...
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.

You can see e.gs. in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.

You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
nrow=3
ncol=2
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
count=0
for r in range(nrow):
for c in range(ncol):
df_list[count].plot(ax=axes[r,c])
count+=1
Using this code you can plot subplots in any configuration. You need to define the number of rows nrow and the number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.

You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
plt.figure(1)
plt.subplot(2,2,1)
df.A.plot() #no need to specify for first axis
plt.subplot(2,2,2)
df.B.plot(ax=plt.gca())
plt.subplot(2,2,3)
df.C.plot(ax=plt.gca())
etc...

You can use this:
fig = plt.figure()
ax = fig.add_subplot(221)
plt.plot(x,y)
ax = fig.add_subplot(222)
plt.plot(x,z)
...
plt.show()

You may not need to use Pandas at all. Here's a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
axes[c].set_title('cats')
plt.tight_layout()

Option 1: Create subplots from a dictionary of dataframes with long (tidy) data
Assumptions:
There is a dictionary of multiple dataframes of tidy data that are either:
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes don't necessarily contain all values of cat
hue='cat'
This example uses a dict of dataframes, but a list of dataframes would be similar.
If the dataframes are wide, use pandas.DataFrame.melt to convert them to long form.
Because dataframes are being iterated through, there's no guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1, seaborn 0.11.2
Imports and Test Data
import pandas as pd
import numpy as np # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend - square patches
from matplotlib.lines import Line2D # for custom legend - round markers
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i) # for repeatable sample data
data_length = 100
data = {'cat': np.random.choice(['A', 'B', 'C'], size=data_length),
'x': np.random.rand(data_length), 'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 B 0.944595 0.606329
1 A 0.586555 0.568851
2 A 0.903402 0.317362
3 B 0.137475 0.988616
4 B 0.139276 0.579745
# display(df_dict[6].tail())
cat x y
95 B 0.881222 0.263168
96 A 0.193668 0.636758
97 A 0.824001 0.638832
98 C 0.323998 0.505060
99 C 0.693124 0.737582
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in v.cat.unique()} # get unique cats
colors = sns.color_palette('tab10', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
# create the figue and axes
fig, axes = plt.subplots(row_nums, col_nums, figsize=(9, 6), sharex=True, sharey=True)
# convert to 1D array for easy iteration
axes = axes.flat
# iterate through dictionary and plot
for ax, (k, v) in zip(axes, df_dict.items()):
sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap, ax=ax)
sns.despine(top=True, right=True)
ax.legend_.remove() # remove the individual plot legends
ax.set_title(f'dataset = {k}', fontsize=11)
fig.tight_layout()
# create legend from cmap
# patches = [Patch(color=v, label=k) for k, v in cmap.items()] # square patches
patches = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in cmap.items()] # round markers
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(title='cat', handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0, frameon=False)
plt.show()
Option 2: Create subplots from a single dataframe with multiple separate datasets
The dataframes must be in a long form with the same column names.
This option uses pd.concat to combine multiple dataframes into a single dataframe, and .assign to add a new column.
See Import multiple csv files into pandas and concatenate into one DataFrame for creating a single dataframes from a list of files.
This option is easier because it doesn't require manually mapping colors to 'cat'
Combine DataFrames
# using df_dict, with dataframes as values, from the top
# combine all the dataframes in df_dict to a single dataframe with an identifier column
df = pd.concat((v.assign(dataset=k) for k, v in df_dict.items()), ignore_index=True)
# display(df.head())
cat x y dataset
0 B 0.944595 0.606329 1
1 A 0.586555 0.568851 1
2 A 0.903402 0.317362 1
3 B 0.137475 0.988616 1
4 B 0.139276 0.579745 1
# display(df.tail())
cat x y dataset
595 B 0.881222 0.263168 6
596 A 0.193668 0.636758 6
597 A 0.824001 0.638832 6
598 C 0.323998 0.505060 6
599 C 0.693124 0.737582 6
Plot a FacetGrid with seaborn.relplot
sns.relplot(kind='scatter', data=df, x='x', y='y', hue='cat', col='dataset', col_wrap=3, height=3)
Both options create the same result, however, it's less complicated to combine all the dataframes, and plot a figure-level plot with sns.relplot.

Building on #joris response above, if you have already established a reference to the subplot, you can use the reference as well. For example,
ax1 = plt.subplot2grid((50,100), (0, 0), colspan=20, rowspan=10)
...
df.plot.barh(ax=ax1, stacked=True)

Here is a working pandas subplot example, where modes is the column names of the dataframe.
dpi=200
figure_size=(20, 10)
fig, ax = plt.subplots(len(modes), 1, sharex="all", sharey="all", dpi=dpi)
for i in range(len(modes)):
ax[i] = pivot_df.loc[:, modes[i]].plot.bar(figsize=(figure_size[0], figure_size[1]*len(modes)),
ax=ax[i], title=modes[i], color=my_colors[i])
ax[i].legend()
fig.suptitle(name)

import numpy as np
import pandas as pd
imoprt matplotlib.pyplot as plt
fig, ax = plt.subplots(2,2)
df = pd.DataFrame({'A':np.random.randint(1,100,10),
'B': np.random.randint(100,1000,10),
'C':np.random.randint(100,200,10)})
for ax in ax.flatten():
df.plot(ax =ax)

Python: Plotting comma separated values within two different columns of a single row (Pandas)

Say I have a dataframe structured like so:
Name x y
Joe 0,1,5 0,3,8
Sue 0,2,8 1,9,5
...
Harold 0,5,6 0,7,2
I'd like to plot the values in the x and y axis on a line plot based on row. In reality, there are many x and y values, but there is always one x value for every y value in these columns. The name of the plot would be the value in "name".
I've tried to do this by first converting x and y to lists in their own separate columns like so:
df['xval'] = df.['x'].str.split(',')
df['yval'] = df.['y'].str.split(',')
And then passing them to seaborn:
ax = sns.lineplot(x=df['xval'], y=df['yval'], data=df)
However, this does not work because 1) I recieve an error, which I presume is due to attempting to pass a list from a dataframe, claiming:
TypeError: unhashable type: 'list'
And 2) I cannot specify the value for df['name'] for the specific line plot. What's the best way to go about solving this problem?

Data and imports:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.DataFrame({
'name': ['joe', 'sue', 'mike'],
'x': ['0,1,5', '0,2,8', '0,4'],
'y': ['0,3,8', '1,9,5', '1,6']
})
We should convert df into a useable format for plotting. This makes all plotting eaiser. We can take advantage of the fact that x and y have a 1-to-1 relationship. Notice I've added a third name with a 2 xy value as opposed to 3 to show this method will work for varied amounts of x and y per name as long as each row has equal numbers of x and y values.
Creating the plot_df:
# Grab Name Column to Start Plot DF with
plot_df = df.loc[:, ['name']]
# Split X column
plot_df['x'] = df['x'].str.split(',')
# Explode X into Rows
plot_df = plot_df.explode('x').reset_index(drop=True)
# Split and Series Explode y in one step
# This works IF AND ONLY IF a 1-to-1 relationship for x and y
plot_df['y'] = df['y'].str.split(',').explode().reset_index(drop=True)
# These need to be numeric to plot correctly
plot_df.loc[:, ['x', 'y']] = plot_df.loc[:, ['x', 'y']].astype(int)
plot_df:
name x y
0 joe 0 0
1 joe 1 3
2 joe 5 8
3 sue 0 1
4 sue 2 9
5 sue 8 5
6 mike 0 1
7 mike 4 6
References to the methods used in creating plot_df:
DataFrame.loc to subset the dataframe
Series.str.split to split the comma separated values into a list
DataFrame.explode to upscale the DataFrame based on the iterable in x
DataFrame.reset_index to make index unique again after exploding
Series.explode to upscale the lists in the Series y.
Series.reset_index to make index unique again after exploding
DataFrame.astype since the values are initially strings just splitting and exploding is not enough. Will need to convert to a numeric type for them to plot correctly
Plotting (Option 1)
# Plot with hue set to name.
sns.lineplot(data=plot_df, x='x', y='y', hue='name')
plt.show()
References for plotting separate lines:
sns.lineplot to plot. Note the hue argument to create separate lines based on name.
pyplot.show to display.
Plotting (Option 2.a) Subplots:
sns.relplot(data=plot_df, x='x', y='y', col='name', kind='line')
plt.tight_layout()
plt.show()
Plotting (Option 2.b) Subplots:
# Use Grouper From plot_df
grouper = plot_df.groupby('name')
# Create Subplots based on the number of groups (ngroups)
fig, axes = plt.subplots(nrows=grouper.ngroups)
# Iterate over axes and groups
for ax, (grp_name, grp) in zip(axes, grouper):
# Plot from each grp DataFrame on ax from axes
sns.lineplot(data=grp, x='x', y='y', ax=ax, label=grp_name)
plt.show()
References for plotting subplots:
2.a
relplot the row or col parameter can be used to create subplots in a similar way to how hue creates multiple lines. This will return a seaborn.FacetGrid so post processing will be different than lineplot which returns matplotlib.axes.Axes
2.b
groupby to create iterable that can be used to plot subplots.
pyplot.subplots to create subplots to plot on.
groupby.ngroup to count number of groups.
zip to iterate over axes and groups simultaneously.
sns.lineplot to plot. Note label is needed to have legends. grp_name contains the current key that is common in the current grp DataFrame.
pyplot.show to display.
Plotting option 3 (separate plots):
# Plot from each grp DataFrame in it's own plot
for grp_name, grp in plot_df.groupby('name'):
fig, ax = plt.subplots()
sns.lineplot(data=grp, x='x', y='y', ax=ax)
ax.set_title(grp_name)
fig.show()
joe plot
mike plot
sue plot
References for plotting separate plots:
groupby to create iterable that can be used to plot each name separately.
pyplot.subplots to create separate plot to plot on.
sns.lineplot to plot. Note label is needed to have legends. grp_name contains the current key that is common in the current grp DataFrame.
pyplot.show to display.

From what I understood this is what you want.
df = pd.DataFrame()
df['name'] = ['joe', 'sue']
df['x'] = ['0,1,5', '0,2,8']
df['y'] = ['0,3,8', '1,9,5']
df['newx'] = df['x'].str.split(',')
df['newy'] = df['y'].str.split(',')
for i in range(len(df)):
sns.lineplot(x=df.loc[i, 'newx'], y=df.loc[i, 'newy'])
plt.legend(df['name'])

How to make stackedbarplot with percent description and identical height of columns divided by target in Python Pandas?

I have Data Frame like below (for reference):
target |product
---------|--------
1 |EHZ
1 |GBK
0 |EHZ
0 |AKP
1 |AKP
So I have target variable "target" and nominal variable "product" and I woul like to plot graph like below based on my df, how can I do that? I know only that it is stackedbar, and
I need to have as below that each column have percentage description both for 0 and 1
and columns have identical heoght and they are divided into 1 and 0
Everything in Python Pandas / Matplotlib. Could you show me example code which makes me identical plot based on my data frame ?
I used code created by Rob Raymond like below:
fig, ax = plt.subplots(figsize=(10,3))
# prepare dataframe for plotting
dfp = pd.crosstab(index=df["product"], columns=df["target"]).apply(lambda r: r/r.sum(), axis=1)
# simple stacked plot
ax = dfp.plot(kind="barh", stacked=True, ax=ax)
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{w*100:.0f}%' if (w := v.get_width()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center')
ax.set_xlabel("procent")
ax.set_title("tytul")
and I have error like below:

From comments
first generate percent totals for each product
then it's a simple case of a horizontal stacked bar
labels in bars stack bar plot in matplotlib and add label to each section
use matplotlib API to set any additional titles and labels as desired
import io
import matplotlib.pyplot as plt
df = pd.read_csv(io.StringIO("""target |product
1 |EHZ
1 |GBK
0 |EHZ
0 |AKP
1 |AKP"""), sep="\s+\|", engine="python")
fig, ax = plt.subplots(figsize=(10,3))
# prepare dataframe for plotting
dfp = pd.crosstab(index=df["product"], columns=df["target"]).apply(lambda r: r/r.sum(), axis=1)
# simple stacked plot
ax = dfp.plot(kind="barh", stacked=True, ax=ax)
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{w*100:.0f}%' if (w := v.get_width()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center')
ax.set_xlabel("procent")
ax.set_title("tytul")

How to do kde plot in pyplot.subplots context? [duplicate]

I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I'm unfortunately failing to come up with a solution to how and would highly appreciate some help.

You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2)
df1.plot(ax=axes[0,0])
df2.plot(ax=axes[0,1])
...
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.

You can see e.gs. in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.

You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
nrow=3
ncol=2
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
count=0
for r in range(nrow):
for c in range(ncol):
df_list[count].plot(ax=axes[r,c])
count+=1
Using this code you can plot subplots in any configuration. You need to define the number of rows nrow and the number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.

You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
plt.figure(1)
plt.subplot(2,2,1)
df.A.plot() #no need to specify for first axis
plt.subplot(2,2,2)
df.B.plot(ax=plt.gca())
plt.subplot(2,2,3)
df.C.plot(ax=plt.gca())
etc...

You can use this:
fig = plt.figure()
ax = fig.add_subplot(221)
plt.plot(x,y)
ax = fig.add_subplot(222)
plt.plot(x,z)
...
plt.show()

You may not need to use Pandas at all. Here's a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
axes[c].set_title('cats')
plt.tight_layout()

Option 1: Create subplots from a dictionary of dataframes with long (tidy) data
Assumptions:
There is a dictionary of multiple dataframes of tidy data that are either:
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes don't necessarily contain all values of cat
hue='cat'
This example uses a dict of dataframes, but a list of dataframes would be similar.
If the dataframes are wide, use pandas.DataFrame.melt to convert them to long form.
Because dataframes are being iterated through, there's no guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1, seaborn 0.11.2
Imports and Test Data
import pandas as pd
import numpy as np # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend - square patches
from matplotlib.lines import Line2D # for custom legend - round markers
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i) # for repeatable sample data
data_length = 100
data = {'cat': np.random.choice(['A', 'B', 'C'], size=data_length),
'x': np.random.rand(data_length), 'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 B 0.944595 0.606329
1 A 0.586555 0.568851
2 A 0.903402 0.317362
3 B 0.137475 0.988616
4 B 0.139276 0.579745
# display(df_dict[6].tail())
cat x y
95 B 0.881222 0.263168
96 A 0.193668 0.636758
97 A 0.824001 0.638832
98 C 0.323998 0.505060
99 C 0.693124 0.737582
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in v.cat.unique()} # get unique cats
colors = sns.color_palette('tab10', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
# create the figue and axes
fig, axes = plt.subplots(row_nums, col_nums, figsize=(9, 6), sharex=True, sharey=True)
# convert to 1D array for easy iteration
axes = axes.flat
# iterate through dictionary and plot
for ax, (k, v) in zip(axes, df_dict.items()):
sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap, ax=ax)
sns.despine(top=True, right=True)
ax.legend_.remove() # remove the individual plot legends
ax.set_title(f'dataset = {k}', fontsize=11)
fig.tight_layout()
# create legend from cmap
# patches = [Patch(color=v, label=k) for k, v in cmap.items()] # square patches
patches = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in cmap.items()] # round markers
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(title='cat', handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0, frameon=False)
plt.show()
Option 2: Create subplots from a single dataframe with multiple separate datasets
The dataframes must be in a long form with the same column names.
This option uses pd.concat to combine multiple dataframes into a single dataframe, and .assign to add a new column.
See Import multiple csv files into pandas and concatenate into one DataFrame for creating a single dataframes from a list of files.
This option is easier because it doesn't require manually mapping colors to 'cat'
Combine DataFrames
# using df_dict, with dataframes as values, from the top
# combine all the dataframes in df_dict to a single dataframe with an identifier column
df = pd.concat((v.assign(dataset=k) for k, v in df_dict.items()), ignore_index=True)
# display(df.head())
cat x y dataset
0 B 0.944595 0.606329 1
1 A 0.586555 0.568851 1
2 A 0.903402 0.317362 1
3 B 0.137475 0.988616 1
4 B 0.139276 0.579745 1
# display(df.tail())
cat x y dataset
595 B 0.881222 0.263168 6
596 A 0.193668 0.636758 6
597 A 0.824001 0.638832 6
598 C 0.323998 0.505060 6
599 C 0.693124 0.737582 6
Plot a FacetGrid with seaborn.relplot
sns.relplot(kind='scatter', data=df, x='x', y='y', hue='cat', col='dataset', col_wrap=3, height=3)
Both options create the same result, however, it's less complicated to combine all the dataframes, and plot a figure-level plot with sns.relplot.

Building on #joris response above, if you have already established a reference to the subplot, you can use the reference as well. For example,
ax1 = plt.subplot2grid((50,100), (0, 0), colspan=20, rowspan=10)
...
df.plot.barh(ax=ax1, stacked=True)

Here is a working pandas subplot example, where modes is the column names of the dataframe.
dpi=200
figure_size=(20, 10)
fig, ax = plt.subplots(len(modes), 1, sharex="all", sharey="all", dpi=dpi)
for i in range(len(modes)):
ax[i] = pivot_df.loc[:, modes[i]].plot.bar(figsize=(figure_size[0], figure_size[1]*len(modes)),
ax=ax[i], title=modes[i], color=my_colors[i])
ax[i].legend()
fig.suptitle(name)

import numpy as np
import pandas as pd
imoprt matplotlib.pyplot as plt
fig, ax = plt.subplots(2,2)
df = pd.DataFrame({'A':np.random.randint(1,100,10),
'B': np.random.randint(100,1000,10),
'C':np.random.randint(100,200,10)})
for ax in ax.flatten():
df.plot(ax =ax)

How to plot a simple dataframe with different variables with different colors

Good morning everyone, my problem is the graphic representation of a dataframe. My data frame is similar to this one shown below
Country Year average_man average_woman
0 I1 2015 9.500000 3.663500
1 I1 2016 8.000000 4.810500
2 I2 2015 12.181818 3.514545
3 I2 2016 14.727273 2.815000
I would like to represent all the information reported in a single graph but I wouldn't know how to assign more variables to the same axis.
Now I have tried to plot average_men and country but I cannot assign a different color to each point for each year.
For example blue for 2015 and red for 2016.
My plot:
My code:
plt.scatter(df['average_man'], df['average_woman'], cmap= df['Year'])
plt.show()
Expected output

You can try this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Method 1, to plot individual columns
# a scatter plot
df.plot(kind='scatter',x='average_man',y='average_woman',color='red')
plt.show()
# Method 2, To plot all columns separately
df.plot(subplots=True)
plt.tight_layout()
plt.show()
#Method 3, preferred
data = np.random.rand(10,4)
data[:,0]= np.arange(10)
df = pd.DataFrame(data, columns=["X", "A", "B", "C"])
axis = df.plot(x="X", y="A", kind="bar")
df.plot(x="X", y="B", kind="bar", ax=axis, color="C2")
df.plot(x="X", y="C", kind="bar", ax=axis, color="C3")
plt.show()

I've made a dictionary that associates colors to countries:
import matplotlib.pyplot as plt
import pandas as pd
# read csv
df = pd.read_csv('test2.txt', delim_whitespace=True)
# find all unique countries, which shall correspond to a color
countries = df['Country'].unique()
custom_colors = ['r','b','g','orange']
# create a dictionary associating a color to a country
col_dict = {country:custom_colors[i] for i, country in enumerate(countries)}
# extend dataframe by new column with country colors
df['country_colors'] = [col_dict[country] for country in df['Country']]
# plot scatteplot while using c= as a container for colors.
# This is what makes scatter special: color argument can be a container
# of many different colors
fig, ax = plt.subplots(figsize=(7,4))
X,Y,col = df['average_man'], df['Country'], df['country_colors']
ax.scatter(X,Y,c=col)
The comments of my code should explain everything. But the general idea is to find all unique countries, associate a color to all unique countries and then append a new column to the DataFrame with the right colors at the right positions: e.g. all rows with 'I1' have the color 'r' in the dataframe.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn: Hue dependent on two values - python

Related

plotting area plot as a subplot [duplicate]

Python: Plotting comma separated values within two different columns of a single row (Pandas)

How to make stackedbarplot with percent description and identical height of columns divided by target in Python Pandas?

How to do kde plot in pyplot.subplots context? [duplicate]

How to plot a simple dataframe with different variables with different colors

Categories

Resources