How can I plot this data frame using seaborn to show the KPI per model?
allFrame = pd.DataFrame({'modelName':['first','second', 'third'],
'kpi_1':[1,2,3],
'kpi_2':[2,4,3]})
Not like sns.barplot(x="kpi2", y="kpi1", hue="modelName", data=allFrame)
But rather like this per KPI
Try melting the dataframe first, and then you can plot using seaborn:
import pandas as pd
import seaborn as sns
allFrame = pd.DataFrame({'modelName':['first','second', 'third'],
'kpi_1':[1,2,3],
'kpi_2':[2,4,3]})
allFrame2 = pd.melt(frame=allFrame,
id_vars=['modelName'],
value_vars=["kpi_1","kpi_2"],
value_name="Values", var_name="kpis")
sns.barplot(x="kpis", y="Values", hue="modelName", data=allFrame2)
Thanks!
Related
Using Altair charting, how can I create a chart of value_counts() of multiple columns? This is easily done by matplotlib. How can the identical chart be created using Altair?
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Col1':[0,1,2,3],
'Col2':[0,1,2,2],
'Col3':[2,3,3,3]})
pd.DataFrame({col:df[col].value_counts(normalize=True) for col in df}).plot(kind='bar')
You could do this:
import pandas as pd
import altair as alt
df = pd.DataFrame({
'Col1':[0,1,2,3],
'Col2':[0,1,2,2],
'Col3':[2,3,3,3]
}).melt(var_name='column')
alt.Chart(df).mark_bar().encode(
x='column',
y='count()',
column='value:O',
color='column'
)
In the next major release of Altair you can use the offset channels instead of faceting as in this example.
I have the following pandas Dataframe. alfa_value and beta_value are random, ndcg shall be the parameter deciding the color.
The question is: how do I do a heatmap of the pandas Dataframe?
You can use the code below to generate a heatmap. You have to adjust the bins to group your data (analyze the mean, the std, ...)
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
rng = np.random.default_rng(2022)
df = pd.DataFrame({'alfa_value': rng.integers(1000, 10000, 1000),
'beta_value': rng.random(1000),
'ndcg': rng.random(1000)})
out = df.pivot_table('ndcg', pd.cut(df['alfa_value'], bins=10),
pd.cut(df['beta_value'], bins=10), aggfunc='mean')
sns.heatmap(out)
plt.tight_layout()
plt.show()
In general, Seaborn's heatmap function is a nice way to color pandas' DataFrames based on their values. Good examples and descriptions can be found here.
Since you seem to want to color the row based on a different column, you are probably looking for something more like these answers.
So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:
I have a dataframe with a list of items and associated values. Which metric and method is best for performing the clustering?
I want to create a seaborn clustermap (dendrogram Plus heatmap) from the list on the basis of rows only, map it (that is done as shown is code), but how can I get the list of items for each cluster or each protein with its cluster information. (similar to Extract rows of clusters in hierarchical clustering using seaborn clustermap, but only based on rows and not columns)
How do I determine which "method" and "metric" is best for my data?
data.csv example:
item,v1,v2,v3,v4,v5
A1,1,2,3,4,5
B1,2,4,6,8,10
C1,3,6,9,12,15
A1,2,3,4,5,6
B2,3,5,7,9,11
C2,4,7,10,13,16
My code for creating the clustermap:
import pandas as pd
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.cluster.hierarchy as sch
df = pd.read_csv('data.csv', index_col=0)
sns.clustermap(df, col_cluster=False, cmap="coolwarm", method='ward', metric='euclidean', figsize=(40,40))
plt.savefig('plot.pdf', dpi=300)
I just hacked this together. Is this what you want?
import pandas as pd
import numpy as np
import seaborn as sns
cars = {'item': ['A1','B1','C1','A1','B1','C1'],
'v1': [1.0,2.0,3.0,2.0,3.0,4.0],
'v2': [2.0,4.0,6.0,3.0,5.0,7.0],
'v3': [3.0,6.0,9.0,4.0,7.0,10.0],
'v4': [4.0,8.0,12.0,5.0,9.0,13.0],
'v5': [5.0,10.0,15.0,6.0,11.0,16.0]
}
df = pd.DataFrame(cars)
df
heatmap_data = pd.pivot_table(df, values=['v1','v2','v3','v4','v5'],
index=['item'])
heatmap_data.head()
sns.clustermap(heatmap_data)
df = df.drop(['item'], axis=1)
g = sns.clustermap(df)
Also, check out links below for more info on this topic.
https://seaborn.pydata.org/generated/seaborn.clustermap.html
https://kite.com/python/docs/seaborn.clustermap
I have the following dataframe:
I'm trying to plot a bar chart, with x as 'config names', y as 'value', and one bar per month (one bin per month). I'm not sure how to do this, any ideas?
If you have your data in a pandas DataFrame (let's say df), it's rather easy:
import seaborn as sns
sns.barplot(x='config names', y='value', data='df')
I'm not sure what you mean by one bin per month. The bins here are your x axis.
If you mean you want to split different months into different bins then you should just add them to the hue parameter.
import seaborn as sns
sns.barplot(x='config names', y='value', data='df', hue='month')
I may not understand what you ask but it looks like this
So I suggest you do a pivot table with your dataframe.
Let's say your dataframe variable name is df, can you try this :
import pandas as pd
import numpy as np
pt_df = pd.pivot_table(
df,
values=['value'],
columns=['month'],
aggfunc=np.sum
).plot(kind='bar')