I have two dataframes
df1 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],['1.2','3.5','44','77','3.4','24','11','12','13'], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df.columns = [['QID','score', 'DocID']]
df2 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],['21.2','13.5','12','77.6','3.9','29','17','41','32'], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df.columns = [['QID','score', 'DocID']]
Currently, I'm plotting scores using bokeh in df1 and df2 in two different graphs as
df1_BarDocID = Bar(df1, 'QID', values='score', stack = 'DocID', title="D1: QID Stacked by DocID on Score")
D2_BarDocID = Bar(df2, 'QID', values='score', stack = 'DocID', title="D1: QID Stacked by DocID on Score")
grid = gridplot([[D1_BarDocID, D2_BarDocID]])
show(grid)
But, I want to plot two Dataframes in a single figure in a way that the outputs of Df1 and Df2 are plotted side by side for a single QID. So I can visualise the difference in score between two DataFrames, using bokeh.
df1 & df2 plots, using bokeh
Here is a complete example using the newer vbar_stack and stable bokeh.plotting API. It could probably be made simpler but my Pandas knowledge is limited:
import pandas as pd
from bokeh.core.properties import value
from bokeh.io import output_file
from bokeh.models import FactorRange
from bokeh.palettes import Spectral8
from bokeh.plotting import figure, show
df1 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],[1.2, 3.5, 44, 77, 3.4, 24, 11, 12, 13], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df1.columns = ['QID','score', 'DocID']
df1 = df1.pivot(index='QID', columns='DocID', values='score').fillna(0)
df1.index = [(x, 'df1') for x in df1.index]
df2 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],[21.2, 13.5, 12, 77.6, 3.9, 29, 17, 41, 32], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df2.columns = ['QID','score', 'DocID']
df2 = df2.pivot(index='QID', columns='DocID', values='score').fillna(0)
df2.index = [(x,'df2') for x in df2.index]
df = pd.concat([df1, df2])
p = figure(plot_width=800, x_range=FactorRange(*df.index))
p.vbar_stack(df.columns, x='index', width=0.8, fill_color=Spectral8,
line_color=None, source=df, legend=[value(x) for x in df.columns])
p.legend.location = "top_left"
output_file('foo.html')
show(p)
Produces:
Related
Consider the following code :
from plotly import graph_objs as go
import pandas as pd
mtds = ['2022-03', '2022-04', '2022-05', '2022-06']
values = [28, 24, 20, 18]
data1 = []
for j in range(4):
data1.append([mtds[j], values[j]])
df1 = pd.DataFrame(data1, columns=['month', 'counts'])
fig = go.Figure()
fig.add_trace(go.Scatter(
x = df1['month'],
y = df1['counts'],
name = 'counts history'
))
fig.show()
The output is
However, this is not was I was expecting. I would like to amend the code such that
the mtds list string values '2022-03', '2022-04', '2022-05', '2022-06' are shown in the x-axis instead of the dates. Could you please assist with this ?
Thank you.
As per the plotly documentation on time series, you can use the update_xaxes method to change the ocurrence and format of the x-axis labels:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df1["month"], y=df1["counts"], name="counts history"))
fig.update_xaxes(dtick="M1", tickformat="%Y-%m")
fig.show()
I have a df which represents three states (S1, S2, S3) at 3 timepoints (1hr, 2hr and 3hr). I would like to show a stacked bar plot of the states but the stacks are discontinous or at least not cumulative. How can I fix this in Seaborn? It is important that time is on the y-axis and the state counts on the x-axis.
Below is some code.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
melt = pd.melt(df, id_vars = 'Time')
plt.figure()
sns.histplot(data = melt,x = 'value', y = 'Time', bins = 3, hue = 'variable', multiple="stack")
EDIT:
This is somewhat what I am looking for, I hope this gives you an idea. Please ignore the difference in the scales between boxes...
If I understand correctly, I think you want to use value as a weight:
sns.histplot(
data=melt, y='Time', hue='variable', weights='value',
multiple='stack', shrink=0.8, discrete=True,
)
This is pretty tough in seaborn as it doesn't natively support stacked bars. You can use either the builtin plot from pandas, or try plotly express.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
# so your y starts at 1
df.Time+=1
melt = pd.melt(df, id_vars = 'Time')
# so y isn't treated as continuous
melt.Time = melt.Time.astype('str')
Pandas can do it, but getting the labels in there is a bit of pain. Check around to figure out how to do it.
df.set_index('Time').plot(kind='barh', stacked=True)
Plotly makes it easier:
import plotly.express as px
px.bar(melt, x='value', y='Time', color='variable', orientation='h', text='value')
i am a beginner with coding with python and i have a question:
This code works fantastic to creat a chart for each Column:
The Main DF is:
enter image description here
1- Removing Outliers:
def remove_outliers(df_in, col):
q1 = df_in[col].quantile(0.25)
q3 = df_in[col].quantile(0.75)
iqr = q3-q1
lower_bound = q1-1.5*iqr
upper_bound = q3+1.5*iqr
df_out = df_in.loc[(df_in[col] > lower_bound) & (df_in[col] < upper_bound)]
return df_out
2- Define the Format of the Lineplot
rc={'axes.labelsize': 20, 'font.size': 20, 'legend.fontsize':20,'axes.titlesize':20,'xtick.labelsize': 14,'ytick.labelsize': 14, 'lines.linewidth':1, 'lines.markersize':7, 'xtick.major.pad':10}
sns.set(rc=rc)
3- Creat a Lineplot with seaborn:
df1_DH001= remove_outliers(main_df, 'DH001')[['DH 001','Datum']]
df1_DH001_chart= sns.scatterplot(x='Datum', y='DH 001', data=df1_DH001)
df1_DH001_chart= sns.lineplot(x='Datum', y='DH 001', data=df1_DH001, lw=3, color="b")
df1_DH001_chart.set(xlim=('1995','2019'), ylim=(0, 220) ,title='DH 001', ylabel='Nitrat mg/L', xlabel="Jahr")
df1_DH001_chart.xaxis.set_major_locator(mdates.YearLocator(1))
df1_DH001_chart.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
df1_DH001_chart
So I got this:
enter image description here
Now I would like to creat a for-Loop to creat the same plot and the same x-Axis (Datum) but with another column (There are 22 Columns)
Could some one help me?
Import the following:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Create asample DF:
data = {'day': ['Mon','Tue','Wed','Thu'],
'col1': [22000,25000,27000,35000],
'col2': [2200,2500,2700,3500],
}
df = pd.DataFrame(data)
Select only numeric columns from your DF or alternatively select the columns that you want to consider in the loop:
df1 = df.select_dtypes([np.int, np.float])
Iterate through the columns and print a line plot with seaborn:
for i, col in enumerate(df1.columns):
plt.figure(i)
sns.lineplot(x='day',y=col, data=df)
Then the following pictures will be shown:
I have two bar plots, one is positive, other is negative. I want to overlay them with same x-axis in plotly. How can I do this? Here is a simple example of two bar plots:
import plotly.express as px
import pandas as pd
df1 = pd.DataFrame({'x1':[1,2,3], 'y1':[1,1,1], 'col':['A','A','B']})
df2 = pd.DataFrame({'x2':[1,2,3], 'y2':[-1,-1,-1], 'col':['A','A','B']})
fig1 = px.bar(df1, x="x1", y="y1", color="col")
fig2 = px.bar(df2, x="x2", y="y2", color="col")
If you rename your columns so that they have the same name (like 'x1' and 'y1') you can concatenate the dataframes. Plotly stacks them automatically:
df1 = pd.DataFrame({'x1':[1,2,3], 'y1':[1,1,1], 'col':['A','A','B']})
df2 = pd.DataFrame({'x1':[1,2,3], 'y1':[-1,-1,-1], 'col':['A','A','B']})
df = pd.concat((df1, df2))
px.bar(df, x='x1', y='y1', color='col')
I'm trying to print actual values in pies instead of percentage, for one dimensonal series this helps:
Matplotlib pie-chart: How to replace auto-labelled relative values by absolute values
But when I try to create multiple pies it won't work.
d = {'Yes':pd.Series([825, 56], index=["Total", "Last 2 Month"]), 'No':pd.Series([725, 73], index=["Total", "Last 2 Month"])}
df = pd.DataFrame(d)
df = df.T
def absolute_value(val):
a = np.round(val/100.*df.values, 0)
return a
df.plot.pie(subplots=True, figsize=(12, 6),autopct=absolute_value)
plt.show()
How can I make this right?
Thanks.
A hacky solution would be to index the dataframe within the absolute_value function, considering that this function is called exactly once per value in that dataframe.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
d = {'Yes':pd.Series([825, 56], index=["Total", "Last 2 Month"]),
'No':pd.Series([725, 73], index=["Total", "Last 2 Month"])}
df = pd.DataFrame(d)
df = df.T
i = [0]
def absolute_value(val):
a = df.iloc[i[0]%len(df),i[0]//len(df)]
i[0] += 1
return a
df.plot.pie(subplots=True, figsize=(12, 6),autopct=absolute_value)
plt.show()
The other option is to plot the pie charts individually by looping over the columns.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
d = {'Yes':pd.Series([825, 56], index=["Total", "Last 2 Month"]),
'No':pd.Series([725, 73], index=["Total", "Last 2 Month"])}
df = pd.DataFrame(d)
df = df.T
print df.iloc[:,0].sum()
def absolute_value(val, summ):
a = np.round(val/100.*summ,0)
return a
fig, axes = plt.subplots(ncols=len(df.columns))
for i,ax in enumerate(axes):
df.iloc[:,i].plot.pie(ax=ax,autopct=lambda x: absolute_value(x,df.iloc[:,i].sum()))
plt.show()
In both cases the output would look similar to this