Rearranging the columns of my heatmap using python's seaborn - python

I'm trying to visualize the following .csv data:
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20
4,4,2,2,4,2,3,5,3,4,2,5,2,1,4,4,2,1,5,2
2,2,4,4,4,2,2,2,4,4,2,4,2,2,3,2,2,4,5,2
4,5,4,1,4,2,2,4,4,3,2,2,2,1,2,4,4,2,5,4
3,4,2,4,4,2,2,2,4,3,2,4,4,3,3,4,2,4,5,1
4,4,3,2,4,3,4,5,4,3,1,5,3,2,4,2,2,3,4,2
4,5,2,3,5,1,3,4,3,3,1,2,4,4,5,4,1,4,5,4
5,5,5,2,4,3,2,4,4,2,2,4,4,2,4,2,2,4,4,5
4,4,3,1,5,3,2,4,2,2,1,4,4,2,4,1,2,5,5,3
1,3,5,2,4,4,3,1,4,4,2,3,1,4,3,4,3,3,4,1
3,3,5,2,4,2,4,4,3,4,1,5,4,2,1,2,2,4,5,2
Here's my code:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
map = sns.clustermap(df, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
plt.show()
Which returns:
I want to rearrange my heatmap and order it column-wise by the frequency of each response. For example, The column Q5 has the value 4 repeated 8 times (more than any other column), so it should be the first column. Columns 17 and 19 have a value that is repeated 7 times, so they should come in second and third (exact order doesn't matter). How can I do this?

You can compute the order and reindex before using the data in clustermap:
order = (df.apply(pd.Series.value_counts)
.max()
.sort_values(ascending=False)
.index
)
import seaborn as sns
cm = sns.clustermap(df[order], col_cluster=False, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
Output:

Related

Pandas, Seaborn, Plot boxplot with 2 columns and a 3º as hue

in a Pandas Df with 3 variables i want to plot 2 columns in 2 different boxes and the 3rd column as hue with seaborn
I can reach the first step with pd.melt but I cant insert the hue and make it work
This is what I have:
df=pd.DataFrame({'A':['a','a','b','a','b'],'B':[1,3,5,4,7],'C':[2,3,4,1,3]})
df2=df[['B','C']].copy()
sb.boxplot(data=pd.melt(df2), x="variable", y="value",palette= 'Blues')
I want to do this in the first DF, setting variable 'A' as hue
Can you help me?
Thank you
IIUC, you can achieve this as follows:
Apply df.melt, using column A for id_vars, and ['B','C'] for value_vars.
Next, inside sns.boxplot, feed the melted df to the data parameter, and add hue='A'.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['a','a','b','a','b'], 'B':[1,3,5,4,7], 'C':[2,3,4,1,3]})
sns.boxplot(data=df.melt(id_vars='A', value_vars=['B','C']),
x='variable', y='value', hue='A', palette='Blues')
plt.show()
Result

Multi Index Seaborn Line Plot

I have a multi index dataframe, with the two indices being Sample and Lithology
Sample 20EC-P 20EC-8 20EC-10-1 ... 20EC-43 20EC-45 20EC-54
Lithology Pd Di-Grd Gb ... Hbl Plag Pd Di-Grd Gb
Rb 7.401575 39.055118 6.456693 ... 0.629921 56.535433 11.653543
Ba 24.610102 43.067678 10.716841 ... 1.073115 58.520532 56.946630
Th 3.176471 19.647059 3.647059 ... 0.823529 29.647059 5.294118
I am trying to put it into a seaborn lineplot as such.
spider = sns.lineplot(data = data, hue = data.columns.get_level_values("Lithology"),
style = data.columns.get_level_values("Sample"),
dashes = False, palette = "deep")
The lineplot comes out as
1
I have two issues. First, I want to format hues by lithology and style by sample. Outside of the lineplot function, I can successfully access sample and lithology using data.columns.get_level_values, but in the lineplot they don't seem to do anything and I haven't figured out another way to access these values. Also, the lineplot reorganizes the x-axis by alphabetical order. I want to force it to keep the same order as the dataframe, but I don't see any way to do this in the documentation.
To use hue= and style=, seaborn prefers it's dataframes in long form. pd.melt() will combine all columns and create new columns with the old column names, and a column for the values. The index too needs to be converted to a regular column (with .reset_index()).
Most seaborn functions use order= to set an order on the x-values, but with lineplot the only way is to make the column categorical applying a fixed order.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
column_tuples = [('20EC-P', 'Pd '), ('20EC-8', 'Di-Grd'), ('20EC-10-1 ', 'Gb'),
('20EC-43', 'Hbl Plag Pd'), ('20EC-45', 'Di-Grd'), ('20EC-54', 'Gb')]
col_index = pd.MultiIndex.from_tuples(column_tuples, names=["Sample", "Lithology"])
data = pd.DataFrame(np.random.uniform(0, 50, size=(3, len(col_index))), columns=col_index, index=['Rb', 'Ba', 'Th'])
data_long = data.melt(ignore_index=False).reset_index()
data_long['index'] = pd.Categorical(data_long['index'], data.index) # make categorical, use order of the original dataframe
ax = sns.lineplot(data=data_long, x='index', y='value',
hue="Lithology", style="Sample", dashes=False, markers=True, palette="deep")
ax.set_xlabel('')
ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.02))
plt.tight_layout() # fit legend and labels into the figure
plt.show()
The long dataframe looks like:
index Sample Lithology value
0 Rb 20EC-P Pd 6.135005
1 Ba 20EC-P Pd 6.924961
2 Th 20EC-P Pd 44.270570
...

Python Pandas - Plotting multiple Bar plots by category from dataframe

I have dataframe which looks like
df = pd.DataFrame(data={'ID':[1,1,1,2,2,2], 'Value':[13, 12, 15, 4, 2, 3]})
Index ID Value
0 1 13
1 1 12
2 1 15
3 2 4
4 2 2
5 2 3
and I want to plot it by the IDs (categories) so that each category would have different bar plot,
so in this case I would have two figures,
one figure with bar plot of ID=1,
and second separate figure bar plot of ID=2.
Can I do it (preferably without loops) with something like df.plot(y='Value', kind='bar')?
2 options are possible, one using matplotlib and the other seaborn that you should absolutely now as it works well with Pandas.
Pandas with matplotlib
You have to create a subplot with a number of columns and rows you set. It gives an array axes in 1-D if either nrows or ncols is set to 1, or in 2-D otherwise. Then, you give this object to the Pandas plot method.
If the number of categories is not known or high, you need to use a loop.
import pandas as pd
import matplotlib.pyplot as plt
fig, axes = plt.subplots( nrows=1, ncols=2, sharey=True )
df.loc[ df["ID"] == 1, 'Value' ].plot.bar( ax=axes[0] )
df.loc[ df["ID"] == 2, 'Value' ].plot.bar( ax=axes[1] )
plt.show()
Pandas with seaborn
Seaborn is the most amazing graphical tool that I know. The function catplot enables to plot a series of graph according to the values of a column when you set the argument col. You can select the type of plot with kind.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
df['index'] = [1,2,3] * 2
sns.catplot(kind='bar', data=df, x='index', y='Value', col='ID')
plt.show()
I added a column index in order to compare with the df.plot.bar. If you don't want to, remove x='index' and it will display an unique bar with errors.

Bar plot and coloured categorical variable

I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values ​​contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output
Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)

Can we create scatter plot with a single data line

I have sample data in dataframe as below
Header=['Date','EmpCount','DeptCount']
2009-01-01,100,200
print(df)
Date EmpCount DeptCount
0 2009-01-01 100 200
Can we generate Scatter plot(or any Line chart etc..) only with this one record.
I tried multiple approaches but i am getting
TypeError: no numeric data to plot
In X Axis: Dates
In Y Axis: Two dots one for Emp Count , and other one is for dept count
Starting from #the-cauchy-criterion, try this:
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
b=df.set_index('Date')
ax = plt.plot(b, linewidth=3, markersize=10, marker='.')
What are you using to plot the scatter plot?
Here's how to do it with pyplot.
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
plt.scatter(*df.iloc[0][1:])
plt.show()
iloc[0] gets the first entry, [1:] takes all the columns except the first and the * operator unpacks the arguments.

Categories

Resources