I need to create a somewhat unusual bar plot in matplotlib and the standard functionality does not seem to offer what I need.
I have clustered some documents and want to show the 5 most important keywords per cluster. The first problem is that I have one group per cluster which consists of 5 individual bars. The second problem is that the labels of these individual bars are important, not the same across groups and not unique either.
I have a makeshift prototype that looks like this:
I just plotted all the individual bars in the right order and separated them by empty entries. The biggest problem (aside from being ugly) is that the only way to identify the cluster is by counting the groups. It would help a lot if the clusters could be identified either by color or something else, but I cannot figure out how to do this.
Edit: Here is some requested toy data as well as the code used to produce the plot I already have.
Toy data:
The following two pandas dataframes are included in an array. The two code blocks include the results from df_list[i].to_csv(). I hope this helps, but for the context of this problem the actual data does not really matter, so you can also just create your own dataframes.
,features,score
0,knowledg,0.09862235117497174
1,manag,0.07812351138840486
2,innov,0.06502084705448799
3,organ,0.0561819290497529
4,km,0.05580332888282127
and
,features,score
0,knowledg,0.04217018718591911
1,develop,0.03423580137595049
2,manag,0.032239226503136
3,system,0.031064303713788467
4,sustain,0.029628875636649198
Code:
The approach for the current solution is to combine all the individual dataframes into one dataframe, add empty entries where necessary, and plot the result.
def plot_all_clusters_words(dfs):
# target structure: word as non unique column, value as other non unique column
df_dict_list = []
for df in dfs:
for index, row in df.iterrows():
df_dict_list.append({"word": row.features, "value": row.score})
df_dict_list.append({"word": "", "value": 0})
df_dict_list = df_dict_list[:-1]
new_df = pd.DataFrame(df_dict_list)
new_df.plot.bar(x="word")
plt.show()
return new_df
Note:
I just need a way to easily identify the groups, if you know a different approach than the ones I suggested above, feel free to do so.
Calling plt.bar for each of the dataframes, each with an own label and color, would create the following plot:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from io import StringIO
df1_str = '''features,score
0,knowledg,0.09862235117497174
1,manag,0.07812351138840486
2,innov,0.06502084705448799
3,organ,0.0561819290497529
4,km,0.05580332888282127'''
df2_str = '''features,score
0,knowledg,0.04217018718591911
1,develop,0.03423580137595049
2,manag,0.032239226503136
3,system,0.031064303713788467
4,sustain,0.029628875636649198'''
df1 = pd.read_csv(StringIO(df1_str))
df2 = pd.read_csv(StringIO(df2_str))
dfs = [df1, df2]
cluster_names = [f'cluster {i}' for i in range(1, len(dfs) + 1)]
colors = plt.cm.rainbow(np.linspace(0, 1, len(dfs)))
bar_width = 0.8 # width of individual bars
cluster_gap = 0.2 # extra distance between clusters
starts = np.append(0, np.array([len(df) + cluster_gap for df in dfs]).cumsum())
all_tickpos = [s + np.arange(len(df)) for df, s in zip(dfs, starts)]
for df, name, color, tickpos in zip(dfs, cluster_names, colors, all_tickpos):
plt.bar(tickpos, df['score'], width=bar_width, color=color, label=name)
plt.xticks(np.concatenate(all_tickpos), [f for df in dfs for f in df['features']], rotation=90)
plt.legend()
plt.tight_layout()
plt.show()
Related
I want to make line chart for the different categories where one is a different country, and one is a different country for weekly based line charts. Initially, I was able to draft line plots using seaborn but it is not quite handy like setting its label, legend, color palette and so on. I am wondering is there any way to easily reshape this data with multiple categorical variables and render line charts. In initial attempt, I tried seaborn.relplot but it is not easy to tune its parameter and hard to customize the resulted plot. Can anyone point me to any efficient way to reshape dataframe with multiple categorical columns and render a clear line chart? Any thoughts?
reproducible data & my attempt:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
dff = pd.read_csv(url, parse_dates=['weekly'])
dff.drop('Unnamed: 0', axis=1, inplace=True)
df2_bf = dff.groupby(['destination', 'weekly'])['FCF_Beef'].sum().unstack()
df2_bf = df2_bf.fillna(0)
mm = df2_bf.T
mm.columns.name = None
mm = mm[~(mm.isna().sum(1)/mm.shape[1]).gt(0.9)].fillna(0)
#Total sum per column:
mm.loc['Total',:]= mm.sum(axis=0)
mm1 = mm.T
mm1 = mm1.nlargest(6, columns=['Total'])
mm1.drop('Total', axis=1, inplace=True)
mm2 = mm1.T
mm2.reset_index(inplace=True)
mm2['weekly'] = pd.to_datetime(mm2['weekly'])
mm2['year'] = mm2['weekly'].dt.year
mm2['week'] = mm2['weekly'].dt.isocalendar().week
df = mm2.melt(id_vars=['weekly','week','year'], var_name='country')
df_ = df.groupby(['country', 'year', 'week'], as_index=False)['value'].sum()
sns.relplot(data=df_, x='week', y='value', hue='year', row='country', kind='line', height=6, aspect=2, facet_kws={'sharey': False, 'sharex': False}, sizes=(20, 10))
current plot
this is one of current plot that I made with seaborn.relplot
structure of plot is okay for me, but in seaborn.replot, it is hard to tune parameter and it is as flexible as using matplotlib. Also, I realized that the way of aggregating my data is not very efficient. I think there might be a shortcut to make the above code snippet more efficient like:
plt_data = []
for i in dff.loc[:, ['FCF_Beef','FCF_Beef']]:
...
but doing this way I faced a couple of issues to make the right plot. Can anyone point me out how to make this simple and efficient in order to make the expected line chart with matplotlib? Does anyone know any better way of doing this? Any idea? Thanks
desired output
In my desired plot, first I need to iterate list of countries, where each country has one subplot, in each subplot, x-axis shows 52 weeks and y-axis shows weeklyExport amount of different years for each country. Here is draft plot that I made with seaborn.relplot.
note that, I don't like the output from seaborn.relplot, so I am wondering how can I make above attempt more efficient with matplotlib attempt. Any idea?
As requested by the OP, following is an iterative way to plot the data.
The following example plots each year, for a given 'destination' in a single figure
This is similar to the answer for this question.
import pandas as pd
import matplotlib.pyplot as plt
# load the data
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
df = pd.read_csv(url, parse_dates=['weekly'], usecols=range(1, 6))
# groupby destination and iterate through for plotting
for g, d in df.groupby(['destination']):
# create the figure
fig, ax = plt.subplots(figsize=(7, 4))
# add lines for specific years
for year in d.weekly.dt.year.unique():
data = d[d.weekly.dt.year == year].copy() # select the data from d, by year
data['week'] = data.weekly.dt.isocalendar().week # create a week column
data.sort_values('weekly', inplace=True)
display(data.head()) # display is for jupyter, if it causes an error, use pring
data.plot(x='week', y='FCF_Beef', ax=ax, label=year)
plt.show()
Single sample plot
If we look at the tail of one of the dataframes, data.weekly.dt.isocalendar().week as putting the last day of the year as week 1, so a line is drawn back to the last data point being placed at week 1.
This function rests on datetime.datetime(2018, 12, 31).isocalendar() and is the expected behavior from the datetime module, as per this closed pandas bug.
Removing the last row with .iloc[:-1, :], is a work around
Alternatively, replace data['week'] = data.weekly.dt.isocalendar().week with data['week'] = data.weekly.dt.strftime('%W').astype('int')
data.iloc[:-1, :].plot(x='week', y='FCF_Beef', ax=ax, label=year)
Updated with all code from OP
# load the data
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
dff = pd.read_csv(url, parse_dates=['weekly'], usecols=range(1, 6))
df2_bf = dff.groupby(['destination', 'weekly'])['FCF_Beef'].sum().unstack()
df2_bf = df2_bf.fillna(0)
mm = df2_bf.T
mm.columns.name = None
mm = mm[~(mm.isna().sum(1)/mm.shape[1]).gt(0.9)].fillna(0)
#Total sum per column:
mm.loc['Total',:]= mm.sum(axis=0)
mm1 = mm.T
mm1 = mm1.nlargest(6, columns=['Total'])
mm1.drop('Total', axis=1, inplace=True)
mm2 = mm1.T
mm2.reset_index(inplace=True)
mm2['weekly'] = pd.to_datetime(mm2['weekly'])
mm2['year'] = mm2['weekly'].dt.year
mm2['week'] = mm2['weekly'].dt.strftime('%W').astype('int')
df = mm2.melt(id_vars=['weekly','week','year'], var_name='country')
# groupby destination and iterate through for plotting
for g, d in df.groupby(['country']):
# create the figure
fig, ax = plt.subplots(figsize=(7, 4))
# add lines for specific years
for year in d.weekly.dt.year.unique():
data = d[d.weekly.dt.year == year].copy() # select the data from d, by year
data.sort_values('weekly', inplace=True)
display(data.head()) # display is for jupyter, if it causes an error, use pring
data.plot(x='week', y='value', ax=ax, label=year, title=g)
plt.show()
I have a pandas dataframe:
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df1 = pd.DataFrame(data1)
and I want to plot a bar plot like this:
with
df1.plot(kind='barh',x='Date',y='Total', ax=ax0, color='#C0C0C0',
width=0.5)
df1.plot(kind='barh',x='Date',y='Arrived', ax=ax0, color='#C0FFFF',
width=0.5)
df1.plot(kind='barh',x='Date',y='Solved', ax=ax0, color='#C0C0FF',
width=0.5)
However, to avoid overlapping, I have to draw each column taking into account which of them has the bigger value.(Total greater than Arrived greater than Solved)
How can I avoid to do this and automate this process easily?
There must be a straightforward and simpler approach in Pandas but I just came up with this quick workaround. The idea is following:
Leave out the first column Date and sort the remaining columns.
Use the sorted indices for plotting the columns in ascending order
To make the colors consistent, you can make use of dictionary so that the ascending/descending order doesn't affect your colors.
fig, ax0 = plt.subplots()
ids = np.argsort(df1.values[0][1:])[::-1]
colors = {'Total': '#C0C0C0', 'Arrived': '#C0FFFF', 'Solved':'#C0C0FF'}
for col in np.array(df1.columns[1:].tolist())[ids]:
df1.plot(kind='barh',x='Date',y=col, ax=ax0, color=colors[col], width=0.1)
A stacked bar graph can be produced in pandas via the stacked=True option. To use this you need to make the "Date" the index first.
import matplotlib.pyplot as plt
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df = pd.DataFrame(data1)
df.set_index("Date").plot(kind="barh", stacked=True)
plt.show()
I am VERY new to the world of python/pandas/matplotlib, but I have been using it recently to create box and whisker plots. I was curious how to create a box and whisker plot for each sheet using a specific column of data, i.e. I have 17 sheets, and I have column called HMB and DV on each sheet. I want to plot 17 data sets on a Box and Whisker for HMB and another 17 data sets on the DV plot. Below is what I have so far.
I can open the file, and get all the sheets into list_dfs, but then don't know where to go from there. I was going to try and manually slice each set (as I started below before coming here for help), but when I have more data in the future, I don't want to have to do that by hand. Any help would be greatly appreciated!
import pandas as pd
import numpy as np
import xlrd
import matplotlib.pyplot as plt
%matplotlib inline
from pandas import ExcelWriter
from pandas import ExcelFile
from pandas import DataFrame
excel_file = 'Project File Merger.xlsm'
list_dfs = []
xls = xlrd.open_workbook(excel_file,on_demand=True)
for sheet_name in xls.sheet_names():
df = pd.read_excel(excel_file,sheet_name)
list_dfs.append(df)
d_psppm = {}
for i, sheet_name in enumerate(xls.sheet_names()):
df = pd.read_excel(excel_file,sheet_name)
d_psppm["PSPPM" + str(i)] = df.loc[:,['PSPPM']]
values_list = list(d_psppm.values())
print(values_list[:])
A sample output looks like below, for 17 list entries, but with different number of rows for each.
PSPPM
0 0.246769
1 0.599589
2 0.082420
3 0.250000
4 0.205140
5 0.850000,
PSPPM
0 0.500887
1 0.475255
2 0.472711
3 0.412953
4 0.415883
5 0.703716,...
The next thing I want to do is create a box and whisker plot, 1 plot with 17 box and whiskers. I am not sure how to get the dictionary to plot with the values and indices as the name. I have tried to dig, and figure out how to convert the dictionary to a list and then plot each element in the list, but have had no luck.
Thanks for the help!
I agree with #Alex that forming your columns into a new DataFrame and then plotting from that would be a good approach, however, if you're going to use the dict, then it should look something like this. Depending on the version of Python you're using, the dictionary may be unordered, so if the ordering on the plot is important to you, then you might want to create a list of dictionary keys in the order you want and iterate over that instead
import matplotlib.pyplot as plt
import numpy as np
#colours = []#list of colours here, if you want
#markers = []#list of markers here, if you want
fig, ax = plt.subplots()
for idx, k in enumerate(d_psppm, 1):
data = d_psppm[k]
jitter = np.random.normal(0, 0.1, data.shape[0]) + idx
ax.scatter(jitter,
data,
s=25,#size of the marker
c="r",#colour, could be from colours
alpha=0.35,#opacity, 1 being solid
marker="^",#or ref. to markers, e.g. markers[idx]
edgecolors="none"#removes black border
)
As per Alex's suggestion, you could use the data to create a seaborn boxplot and overlay a swarmplot to show the data (depends on how many rows each has whether this is practical).
EDIT: this question arose back in 2013 with pandas ~0.13 and was obsoleted by direct support for boxplot somewhere between version 0.15-0.18 (as per #Cireo's late answer; also pandas greatly improved support for categorical since this was asked.)
I can get a boxplot of a salary column in a pandas DataFrame...
train.boxplot(column='Salary', by='Category', sym='')
...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion:
category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys()
How can I apply my custom column order to the boxplot columns? (other than ugly kludging the column names with a prefix to force ordering)
'Category' is a string (really, should be a categorical, but this was back in 0.13, where categorical was a third-class citizen) column taking 27 distinct values: ['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']. So it can be easily factorized with pd.Categorical.from_array()
On inspection, the limitation is inside pandas.tools.plotting.py:boxplot(), which converts the column object without allowing ordering:
pandas.core.frame.py.boxplot() is a passthrough to
pandas.tools.plotting.py:boxplot()
which instantiates ...
matplotlib.pyplot.py:boxplot() which instantiates ...
matplotlib.axes.py:boxplot()
I suppose I could either hack up a custom version of pandas boxplot(), or reach into the internals of the object. And also file an enhance request.
Hard to say how to do this without a working example. My first guess would be to just add an integer column with the orders that you want.
A simple, brute-force way would be to add each boxplot one at a time.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))
columns_my_order = ['C', 'A', 'D', 'B']
fig, ax = plt.subplots()
for position, column in enumerate(columns_my_order):
ax.boxplot(df[column], positions=[position])
ax.set_xticks(range(position+1))
ax.set_xticklabels(columns_my_order)
ax.set_xlim(xmin=-0.5)
plt.show()
EDIT: this is the right answer after direct support was added somewhere between version 0.15-0.18
tl;dr: for recent pandas - use positions argument to boxplot.
Adding a separate answer, which perhaps could be another question - feedback appreciated.
I wanted to add a custom column order within a groupby, which posed many problems for me. In the end, I had to avoid trying to use boxplot from a groupby object, and instead go through each subplot myself to provide explicit positions.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame()
df['GroupBy'] = ['g1', 'g2', 'g3', 'g4'] * 6
df['PlotBy'] = [chr(ord('A') + i) for i in xrange(24)]
df['SortBy'] = list(reversed(range(24)))
df['Data'] = [i * 10 for i in xrange(24)]
# Note that this has no effect on the boxplot
df = df.sort_values(['GroupBy', 'SortBy'])
for group, info in df.groupby('GroupBy'):
print 'Group: %r\n%s\n' % (group, info)
# With the below, cannot use
# - sort data beforehand (not preserved, can't access in groupby)
# - categorical (not all present in every chart)
# - positional (different lengths and sort orders per group)
# df.groupby('GroupBy').boxplot(layout=(1, 5), column=['Data'], by=['PlotBy'])
fig, axes = plt.subplots(1, df.GroupBy.nunique(), sharey=True)
for ax, (g, d) in zip(axes, df.groupby('GroupBy')):
d.boxplot(column=['Data'], by=['PlotBy'], ax=ax, positions=d.index.values)
plt.show()
Within my final code, it was even slightly more involved to determine positions because I had multiple data points for each sortby value, and I ended up having to do the below:
to_plot = data.sort_values([sort_col]).groupby(group_col)
for ax, (group, group_data) in zip(axes, to_plot):
# Use existing sorting
ordering = enumerate(group_data[sort_col].unique())
positions = [ind for val, ind in sorted((v, i) for (i, v) in ordering)]
ax = group_data.boxplot(column=[col], by=[plot_by], ax=ax, positions=positions)
Actually I got stuck with the same question. And I solved it by making a map and reset the xticklabels, with code as follows:
df = pd.DataFrame({"A":["d","c","d","c",'d','c','a','c','a','c','a','c']})
df['val']=(np.random.rand(12))
df['B']=df['A'].replace({'d':'0','c':'1','a':'2'})
ax=df.boxplot(column='val',by='B')
ax.set_xticklabels(list('dca'))
Note that pandas can now create categorical columns. If you don't mind having all the columns present in your graph, or trimming them appropriately, you can do something like the below:
http://pandas.pydata.org/pandas-docs/stable/categorical.html
df['Category'] = df['Category'].astype('category', ordered=True)
Recent pandas also appears to allow positions to pass all the way through from frame to axes.
https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py
https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_core.py
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/pyplot.py
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py
It might sound kind of silly, but many of the plot allow you to determine the order. For example:
Library & dataset
import seaborn as sns
df = sns.load_dataset('iris')
Specific order
p1=sns.boxplot(x='species', y='sepal_length', data=df, order=["virginica", "versicolor", "setosa"])
sns.plt.show()
If you're not happy with the default column order in your boxplot, you can change it to a specific order by setting the column parameter in the boxplot function.
check the two examples below:
np.random.seed(0)
df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))
##
plt.figure()
df.boxplot()
plt.title("default column order")
##
plt.figure()
df.boxplot(column=['C','A', 'D', 'B'])
plt.title("Specified column order")
Use the new positions= attribute:
df.boxplot(column=['Data'], by=['PlotBy'], positions=df.index.values)
This can be resolved by applying a categorical order. You can decide on the ranking yourself. I'll give an example with days of week.
Provide categorical order to weekday
#List categorical variables in correct order
weekday = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
#Assign the above list to category ranking
wDays = pd.api.types.CategoricalDtype(ordered= True, categories=Weekday)
#Apply this to the specific column in DataFrame
df['Weekday'] = df['Weekday'].astype(wDays)
# Then generate your plot
plt.figure(figsize = [15, 10])
sns.boxplot(data = flights_samp, x = 'Weekday', y = 'Y Axis Variable', color = colour)
Given the following:
import numpy as np
import pandas as pd
import seaborn as sns
np.random.seed(365)
x1 = np.random.randn(50)
y1 = np.random.randn(50) * 100
x2 = np.random.randn(50)
y2 = np.random.randn(50) * 100
df1 = pd.DataFrame({'x1':x1, 'y1': y1})
df2 = pd.DataFrame({'x2':x2, 'y2': y2})
sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)
This will create 2 separate plots. How can I add the data from df2 onto the SAME graph? All the seaborn examples I have found online seem to focus on how you can create adjacent graphs (say, via the 'hue' and 'col_wrap' options). Also, I prefer not to use the dataset examples where an additional column might be present as this does not have a natural meaning in the project I am working on.
If there is a mixture of matplotlib/seaborn functions that are required to achieve this, I would be grateful if someone could help illustrate.
You could use seaborn's FacetGrid class to get desired result.
You would need to replace your plotting calls with these lines:
# sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
# sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)
df = pd.concat([df1.rename(columns={'x1':'x','y1':'y'})
.join(pd.Series(['df1']*len(df1), name='df')),
df2.rename(columns={'x2':'x','y2':'y'})
.join(pd.Series(['df2']*len(df2), name='df'))],
ignore_index=True)
pal = dict(df1="red", df2="blue")
g = sns.FacetGrid(df, hue='df', palette=pal, size=5);
g.map(plt.scatter, "x", "y", s=50, alpha=.7, linewidth=.5, edgecolor="white")
g.map(sns.regplot, "x", "y", ci=None, robust=1)
g.add_legend();
This will yield this plot:
Which is if I understand correctly is what you need.
Note that you will need to pay attention to .regplot parameters and may want to change the values I have put as an example.
; at the end of the line is to suppress output of the command (I use ipython notebook where it's visible).
Docs give some explanation on the .map() method. In essence, it does just that, maps plotting command with data. However it will work with 'low-level' plotting commands like regplot, and not lmlplot, which is actually calling regplot behind the scene.
Normally plt.scatter would take parameters: c='none', edgecolor='r' to make non-filled markers. But seaborn is interfering the process and enforcing color to the markers, so I don't see an easy/straigtforward way to fix this, but to manipulate ax elements after seaborn has produced the plot, which is best to be addressed as part of a different question.
Option 1: sns.regplot
In this case, the easiest to implement solution is to use sns.regplot, which is an axes-level function, because this will not require combining df1 and df2.
import pandas as pd
import seaborn
import matplotlib.pyplot as plt
# create the figure and axes
fig, ax = plt.subplots(figsize=(6, 6))
# add the plots for each dataframe
sns.regplot(x='x1', y='y1', data=df1, fit_reg=True, ci=None, ax=ax, label='df1')
sns.regplot(x='x2', y='y2', data=df2, fit_reg=True, ci=None, ax=ax, label='df2')
ax.set(ylabel='y', xlabel='x')
ax.legend()
plt.show()
Option 2: sns.lmplot
As per sns.FacetGrid, it is better to use figure-level functions than to use FacetGrid directly.
Combine df1 and df2 into a long format, and then use sns.lmplot with the hue parameter.
When working with seaborn, it is almost always necessary for the data to be in a long format.
It's customary to use pandas.DataFrame.stack or pandas.melt to convert DataFrames from wide to long.
For this reason, df1 and df2 must have the columns renamed, and have an additional identifying column. This allows them to be concatenated on axis=0 (the default long format), instead of axis=1 (a wide format).
There are a number of ways to combine the DataFrames:
The combination method in the answer from Primer is fine if combining a few DataFrames.
However, a function, as shown below, is better for combining many DataFrames.
def fix_df(data: pd.DataFrame, name: str) -> pd.DataFrame:
"""rename columns and add a column"""
# rename columns to a common name
data.columns = ['x', 'y']
# add an identifying value to use with hue
data['df'] = name
return data
# create a list of the dataframes
df_list = [df1, df2]
# update the dataframes by calling the function in a list comprehension
df_update_list = [fix_df(v, f'df{i}') for i, v in enumerate(df_list, 1)]
# combine the dataframes
df = pd.concat(df_update_list).reset_index(drop=True)
# plot the dataframe
sns.lmplot(data=df, x='x', y='y', hue='df', ci=None)
Notes
Package versions used for this answer:
pandas v1.2.4
seaborn v0.11.1
matplotlib v3.3.4