Pandas 2 dataframes into one graph

Pandas 2 dataframes into one graph - python

I have 2 separate dataframes that look exactly the same but with different numbers in it
df = pd.DataFrame({'clip emotes':[79,223,435,291,188,99,153,50,55,78,83,48,43,73]}, index=['roohappy','rooblank','lul','omegalul','pog','pogchamp','roovv','roowut','roopog','pepehands','biblethumb','roocry','rooree','rooblind'])
df
and
df = pd.DataFrame({'vod emotes':[3963,7286,5560,4390,3386,3111,2639,2612,2422,1999,1948,1691,1654,1573,1308,1090,1024,1019,1019,974,945,912,893,856,790,771,731,677,658,652]}, index=['rood','roovv','pepega','lul','clap','rookek','roocult','rooblank','pog','rooree','rooaww','roohappy','omegaroll','rooduck','rooh','rareroo','roocry','pepehand','lulw','rooderp','roopog','hyperclap','roospy','rooayaya','omegalul','roolove','roowut','roonya','monkas','roo4'])
df
and then I do df.plot(kind = 'bar') for both of the separately. I cant figure out how can I put these two datas into a one graph one over the other so that one bar with the same name would be over the other with a different colour.

You can do it by joining them:
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'clip emotes':[79,223,435,291,188,99,153,50,55,78,83,48,43,73]}, index=['roohappy','rooblank','lul','omegalul','pog','pogchamp','roovv','roowut','roopog','pepehands','biblethumb','roocry','rooree','rooblind'])
df2 = pd.DataFrame({'vod emotes':[3963,7286,5560,4390,3386,3111,2639,2612,2422,1999,1948,1691,1654,1573,1308,1090,1024,1019,1019,974,945,912,893,856,790,771,731,677,658,652]}, index=['rood','roovv','pepega','lul','clap','rookek','roocult','rooblank','pog','rooree','rooaww','roohappy','omegaroll','rooduck','rooh','rareroo','roocry','pepehand','lulw','rooderp','roopog','hyperclap','roospy','rooayaya','omegalul','roolove','roowut','roonya','monkas','roo4'])
df3 = df2.join(df1)
df3.plot(kind='bar', stacked=True)
plt.tight_layout()

Related

Plot grid of histograms based on group variable using plotly

I have a data frame that contains multiple variables where each variable is logically connected to a factor level of an additional group variable. I would like to plot a histogram of each variable in such a way that it is possible to show a grid of multiple histograms 'group-wise'.
Here's an example data frame df_melt (the variables var_1,var_2,var_3,var_4 are logically connected to the factor level 'foo', the variables var_5,var_6,var_7 belong to factor level 'bar'):
import numpy as np
import pandas as pd
# simulate data and create plot-ready dataframe
np.random.seed(42)
var_values = np.random.randint(low=1,high=100,size=(100,7))
var_names = ['var_1','var_2','var_3','var_4','var_5','var_6','var_7']
group_names = ['foo','foo','foo','foo','bar','bar','bar']
df = pd.DataFrame(var_values,columns=var_names)
multi_index = pd.MultiIndex.from_arrays([df.columns,group_names],names=['variable','group'])
df.columns = multi_index
df_melt = pd.melt(df)
The output should look like this:
These stackoverflow posts might help to provide an answer, but I was not able to come up with a solution on my own:
Plotting a grouped pandas data in plotly
Plotly equivalent for pd.DataFrame.hist

Best I came up with is the following. Sadly, this is not in the nicely plotted format that you wanted, but I think/hope you can start with this.
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# simulate data and create plot-ready dataframe
np.random.seed(42)
var_values = np.random.randint(low=1,high=100,size=(100,7))
var_names = ['var_1','var_2','var_3','var_4','var_5','var_6','var_7']
group_names = ['foo','foo','foo','foo','bar','bar','bar']
df = pd.DataFrame(var_values,columns=var_names)
multi_index = pd.MultiIndex.from_arrays([df.columns,group_names],names=['variable','group'])
df.columns = multi_index
df_melt = pd.melt(df)
uniq_cols = set(group_names)
for col in uniq_cols:
rows = df_melt[df_melt['group']==col]['variable'].unique()
# print(list(rows))
num_vars = len(rows)
fig = make_subplots(rows=1, cols=len(rows), column_titles=list(rows))
for i, row in enumerate(rows):
fig.add_trace(go.Histogram(x=df_melt[(df_melt['group']==col) & (df_melt['variable']==row)]['value']),
row=1, col=i+1)
fig.show()

Create a graph of a pivot_table in Python

I create a pivot table and I want create a bar graph. This is my pivot_table:
I don't know how to stract the values of the column 1970 and use this information to make a bar graph.
Thanks!!

Just convert dataframe column names to str then you can select the data of year 1970 with df['1970']. Then, you can use pandas built-in plot.bar method to make a bar plot. Try this:
import pandas as pd
import matplotlib.pyplot as plt
#converting column names to string
df.columns = df.columns.astype(str)
#plotting a bar plot
df['1970'].plot.bar()
plt.show()
Examples based on #AlanDyke DataFrame:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([[1970,'a',1],
[1970,'b',2],
[1971,'a',2],
[1971,'b',3]],
columns=['year','location', 'value'])
df = pd.pivot_table(df, values='value', index='location', columns='year')
df.columns = df.columns.astype(str)
df['1970'].plot.bar()
plt.show()

you can use plt.bar and slice the dataframe:
df = pd.DataFrame([[1970,'a',1],
[1970,'b',2],
[1971,'a',2],
[1971,'b',3]],
columns=['year','location', 'value'])
df = pd.pivot_table(df, values='value', index='location', columns='year')
plt.bar(list(df.transpose().columns), height=df[1970])

Bar plot and coloured categorical variable

I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output

Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)

If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)

Plotly two bar charts overlaid

I have two bar plots, one is positive, other is negative. I want to overlay them with same x-axis in plotly. How can I do this? Here is a simple example of two bar plots:
import plotly.express as px
import pandas as pd
df1 = pd.DataFrame({'x1':[1,2,3], 'y1':[1,1,1], 'col':['A','A','B']})
df2 = pd.DataFrame({'x2':[1,2,3], 'y2':[-1,-1,-1], 'col':['A','A','B']})
fig1 = px.bar(df1, x="x1", y="y1", color="col")
fig2 = px.bar(df2, x="x2", y="y2", color="col")

If you rename your columns so that they have the same name (like 'x1' and 'y1') you can concatenate the dataframes. Plotly stacks them automatically:
df1 = pd.DataFrame({'x1':[1,2,3], 'y1':[1,1,1], 'col':['A','A','B']})
df2 = pd.DataFrame({'x1':[1,2,3], 'y1':[-1,-1,-1], 'col':['A','A','B']})
df = pd.concat((df1, df2))
px.bar(df, x='x1', y='y1', color='col')

Box Plot of a many Pandas Dataframes

I have three dataframes containing 17 sets of data with groups A, B, and C. A shown in the following code snippet
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C'])
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C'])
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C'])
I would like to plot a box plot to compare the three groups as shown in the figure below
I am trying make the plot using seaborn's box plot as follows
import seaborn as sns
sns.boxplot(data1, groupby='A','B','C')
but obviously this does not work. Can someone please help?

Consider assigning an indicator like Location to distinguish your three sets of data. Then concatenate all three and melt the data to retrieve one value column, one Letter categorical column, and one Location column, all inputs into sns.boxplot:
import pandas as pd
import numpy as np
from matplotlib.pyplot as plt
import seaborn as sns
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
print(mdf.head())
# Location Letter value
# 0 1 A 0.223565
# 1 1 A 0.515797
# 2 1 A 0.377588
# 3 1 A 0.687614
# 4 1 A 0.094116
ax = sns.boxplot(x="Location", y="value", hue="Letter", data=mdf)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas 2 dataframes into one graph - python

Related

Plot grid of histograms based on group variable using plotly

Create a graph of a pivot_table in Python

Bar plot and coloured categorical variable

Plotly two bar charts overlaid

Box Plot of a many Pandas Dataframes

Categories

Resources