Rearranging the columns of my heatmap using python's seaborn - python
I'm trying to visualize the following .csv data:
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20
4,4,2,2,4,2,3,5,3,4,2,5,2,1,4,4,2,1,5,2
2,2,4,4,4,2,2,2,4,4,2,4,2,2,3,2,2,4,5,2
4,5,4,1,4,2,2,4,4,3,2,2,2,1,2,4,4,2,5,4
3,4,2,4,4,2,2,2,4,3,2,4,4,3,3,4,2,4,5,1
4,4,3,2,4,3,4,5,4,3,1,5,3,2,4,2,2,3,4,2
4,5,2,3,5,1,3,4,3,3,1,2,4,4,5,4,1,4,5,4
5,5,5,2,4,3,2,4,4,2,2,4,4,2,4,2,2,4,4,5
4,4,3,1,5,3,2,4,2,2,1,4,4,2,4,1,2,5,5,3
1,3,5,2,4,4,3,1,4,4,2,3,1,4,3,4,3,3,4,1
3,3,5,2,4,2,4,4,3,4,1,5,4,2,1,2,2,4,5,2
Here's my code:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
map = sns.clustermap(df, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
plt.show()
Which returns:
I want to rearrange my heatmap and order it column-wise by the frequency of each response. For example, The column Q5 has the value 4 repeated 8 times (more than any other column), so it should be the first column. Columns 17 and 19 have a value that is repeated 7 times, so they should come in second and third (exact order doesn't matter). How can I do this?
You can compute the order and reindex before using the data in clustermap:
order = (df.apply(pd.Series.value_counts)
.max()
.sort_values(ascending=False)
.index
)
import seaborn as sns
cm = sns.clustermap(df[order], col_cluster=False, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
Output:
Related
Pandas, Seaborn, Plot boxplot with 2 columns and a 3º as hue
in a Pandas Df with 3 variables i want to plot 2 columns in 2 different boxes and the 3rd column as hue with seaborn I can reach the first step with pd.melt but I cant insert the hue and make it work This is what I have: df=pd.DataFrame({'A':['a','a','b','a','b'],'B':[1,3,5,4,7],'C':[2,3,4,1,3]}) df2=df[['B','C']].copy() sb.boxplot(data=pd.melt(df2), x="variable", y="value",palette= 'Blues') I want to do this in the first DF, setting variable 'A' as hue Can you help me? Thank you
IIUC, you can achieve this as follows: Apply df.melt, using column A for id_vars, and ['B','C'] for value_vars. Next, inside sns.boxplot, feed the melted df to the data parameter, and add hue='A'. import pandas as pd import seaborn as sns import matplotlib.pyplot as plt df = pd.DataFrame({'A':['a','a','b','a','b'], 'B':[1,3,5,4,7], 'C':[2,3,4,1,3]}) sns.boxplot(data=df.melt(id_vars='A', value_vars=['B','C']), x='variable', y='value', hue='A', palette='Blues') plt.show() Result
Multi Index Seaborn Line Plot
I have a multi index dataframe, with the two indices being Sample and Lithology Sample 20EC-P 20EC-8 20EC-10-1 ... 20EC-43 20EC-45 20EC-54 Lithology Pd Di-Grd Gb ... Hbl Plag Pd Di-Grd Gb Rb 7.401575 39.055118 6.456693 ... 0.629921 56.535433 11.653543 Ba 24.610102 43.067678 10.716841 ... 1.073115 58.520532 56.946630 Th 3.176471 19.647059 3.647059 ... 0.823529 29.647059 5.294118 I am trying to put it into a seaborn lineplot as such. spider = sns.lineplot(data = data, hue = data.columns.get_level_values("Lithology"), style = data.columns.get_level_values("Sample"), dashes = False, palette = "deep") The lineplot comes out as 1 I have two issues. First, I want to format hues by lithology and style by sample. Outside of the lineplot function, I can successfully access sample and lithology using data.columns.get_level_values, but in the lineplot they don't seem to do anything and I haven't figured out another way to access these values. Also, the lineplot reorganizes the x-axis by alphabetical order. I want to force it to keep the same order as the dataframe, but I don't see any way to do this in the documentation.
To use hue= and style=, seaborn prefers it's dataframes in long form. pd.melt() will combine all columns and create new columns with the old column names, and a column for the values. The index too needs to be converted to a regular column (with .reset_index()). Most seaborn functions use order= to set an order on the x-values, but with lineplot the only way is to make the column categorical applying a fixed order. from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import numpy as np column_tuples = [('20EC-P', 'Pd '), ('20EC-8', 'Di-Grd'), ('20EC-10-1 ', 'Gb'), ('20EC-43', 'Hbl Plag Pd'), ('20EC-45', 'Di-Grd'), ('20EC-54', 'Gb')] col_index = pd.MultiIndex.from_tuples(column_tuples, names=["Sample", "Lithology"]) data = pd.DataFrame(np.random.uniform(0, 50, size=(3, len(col_index))), columns=col_index, index=['Rb', 'Ba', 'Th']) data_long = data.melt(ignore_index=False).reset_index() data_long['index'] = pd.Categorical(data_long['index'], data.index) # make categorical, use order of the original dataframe ax = sns.lineplot(data=data_long, x='index', y='value', hue="Lithology", style="Sample", dashes=False, markers=True, palette="deep") ax.set_xlabel('') ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.02)) plt.tight_layout() # fit legend and labels into the figure plt.show() The long dataframe looks like: index Sample Lithology value 0 Rb 20EC-P Pd 6.135005 1 Ba 20EC-P Pd 6.924961 2 Th 20EC-P Pd 44.270570 ...
Python Pandas - Plotting multiple Bar plots by category from dataframe
I have dataframe which looks like df = pd.DataFrame(data={'ID':[1,1,1,2,2,2], 'Value':[13, 12, 15, 4, 2, 3]}) Index ID Value 0 1 13 1 1 12 2 1 15 3 2 4 4 2 2 5 2 3 and I want to plot it by the IDs (categories) so that each category would have different bar plot, so in this case I would have two figures, one figure with bar plot of ID=1, and second separate figure bar plot of ID=2. Can I do it (preferably without loops) with something like df.plot(y='Value', kind='bar')?
2 options are possible, one using matplotlib and the other seaborn that you should absolutely now as it works well with Pandas. Pandas with matplotlib You have to create a subplot with a number of columns and rows you set. It gives an array axes in 1-D if either nrows or ncols is set to 1, or in 2-D otherwise. Then, you give this object to the Pandas plot method. If the number of categories is not known or high, you need to use a loop. import pandas as pd import matplotlib.pyplot as plt fig, axes = plt.subplots( nrows=1, ncols=2, sharey=True ) df.loc[ df["ID"] == 1, 'Value' ].plot.bar( ax=axes[0] ) df.loc[ df["ID"] == 2, 'Value' ].plot.bar( ax=axes[1] ) plt.show() Pandas with seaborn Seaborn is the most amazing graphical tool that I know. The function catplot enables to plot a series of graph according to the values of a column when you set the argument col. You can select the type of plot with kind. import pandas as pd import matplotlib.pyplot as plt import seaborn as sns sns.set_style('white') df['index'] = [1,2,3] * 2 sns.catplot(kind='bar', data=df, x='index', y='Value', col='ID') plt.show() I added a column index in order to compare with the df.plot.bar. If you don't want to, remove x='index' and it will display an unique bar with errors.
Bar plot and coloured categorical variable
I have a dataframe with 3 variables: data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]] df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result']) I want a barplot grouped by the Period column, showing all the values contained in the Observations column and colored with the Result column. How can I do this? I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values). sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None) Plot output
Assuming that you want one bar for each row, you can do as follows: import matplotlib.pyplot as plt import matplotlib.patches as mpatches result_cat = df["Result"].astype("category") result_codes = result_cat.cat.codes.values cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0])) patches = [] for code in result_cat.cat.codes.unique(): cat = result_cat.cat.categories[code] patches.append(mpatches.Patch(color=cmap[code], label=cat)) df.plot.bar(x='Period', y='Observations', color=cmap[result_codes], legend=False) plt.ylabel("Observations") plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly: %matplotlib inline import pandas as pd import matplotlib.pyplot as plt data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]] df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result']) df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)
Can we create scatter plot with a single data line
I have sample data in dataframe as below Header=['Date','EmpCount','DeptCount'] 2009-01-01,100,200 print(df) Date EmpCount DeptCount 0 2009-01-01 100 200 Can we generate Scatter plot(or any Line chart etc..) only with this one record. I tried multiple approaches but i am getting TypeError: no numeric data to plot In X Axis: Dates In Y Axis: Two dots one for Emp Count , and other one is for dept count
Starting from #the-cauchy-criterion, try this: import pandas as pd import matplotlib.pyplot as plt header=['Date','EmpCount','DeptCount'] df = pd.DataFrame([['2009-01-01',100,200]],columns=header) b=df.set_index('Date') ax = plt.plot(b, linewidth=3, markersize=10, marker='.')
What are you using to plot the scatter plot? Here's how to do it with pyplot. import pandas as pd import matplotlib.pyplot as plt header=['Date','EmpCount','DeptCount'] df = pd.DataFrame([['2009-01-01',100,200]],columns=header) plt.scatter(*df.iloc[0][1:]) plt.show() iloc[0] gets the first entry, [1:] takes all the columns except the first and the * operator unpacks the arguments.