So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:
I am plotting a seaborn heatmap and would like to annotate only the specific cells with custom text.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
data = StringIO(u'''75,83,41,47,19
51,24,100,0,58
12,94,63,91,7
34,13,86,41,77''')
labels = StringIO(u'''7,8,4,,1
5,2,,2,8
1,,6,,7
3,1,,4,7''')
data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)
labels = pd.read_csv(labels, header=None)
#labels = np.ma.masked_invalid(labels)
fig, ax = plt.subplots()
sns.heatmap(data, annot=labels, ax=ax, vmin=0, vmax=100)
plt.show()
The above code generates the following heatmap:
and the commented line generates the following heatmap:
I would like to show only the non-nan (or non-zero) text on the cells. How can that be achieved?
Use a string array for annot instead of a masked array:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
data = StringIO(u'''75,83,41,47,19
51,24,100,0,58
12,94,63,91,7
34,13,86,41,77''')
labels = StringIO(u'''7,8,4,,1
5,2,,2,8
1,,6,,7
3,1,,4,7''')
data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)
labels = pd.read_csv(labels, header=None)
#labels = np.ma.masked_invalid(labels)
# Convert everything to strings:
annotations = labels.astype(str)
annotations[np.isnan(labels)] = ""
fig, ax = plt.subplots()
sns.heatmap(data, annot=annotations, fmt="s", ax=ax, vmin=0, vmax=100)
plt.show()
To complement the answer by #mrzo, you can use na_filter=False in read_csv() to store nans as empty strings and use pandas.DataFrame.astype() to convert to strings in place:
# ...
labels = pd.read_csv(labels, header=None, na_filter=False).astype(str)
sns.heatmap(data, annot=labels, fmt='s', ax=ax, vmin=0, vmax=100)
Just going to add this as it has taken me some time to work out how to do something similar programmatically for a slightly different application: I wanted to suppress 0-values from the annotation, but because the values were arising as the result of a crosstab operation I couldn't use William Miller's nice approach without writing the crosstab out and then reading it back in which seemed... inelegant.
There may be a yet more elegant way to do this, but for me running it through numpy was ridiculously fast and quite easy.
import numpy as np
import pandas as pd
import seaborn as sns
from io import StringIO
data = StringIO(u'''75,83,41,47,19
51,24,100,0,58
12,94,63,91,7
34,13,86,41,77''')
data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)
# For more complex functions you could write a def instead
# of using this simple lambda function
an = np.vectorize(lambda x: '' if x<50 else str(round(x,-1)))(data.to_numpy())
sns.heatmap(
data=data.to_numpy(), # Note this is now numpy too
cmap='BuPu',
annot=an, # The matching ndarray of annotations
fmt = '', # Formats annotations as strings (i.e. no formatting)
cbar=False, # Seems overkill if you've got annotations
vmin=0,
vmax=data.max().max()
)
This can make life a little more difficult in terms of labelling axes, though it's straightforward enough: ax.set_xticklabels(df.columns.values). And if you had axislabels in, say, the first column then you'd need to use iloc (data.iloc[:,1:]) in your to_numpy call, but combined with a custom colormap (e.g. 0==white) you can create heatmaps that are a lot easier to look at.
Obviously the crude rounding is confusing (why does 80 have different shades?) but you get the point:
How can I achieve that using matplotlib?
Here is my code with the data you provided. As there's no class [they are all different, despite your first example in your question does have classes], I gave colors based on the numbers. You can definitely start alone from here, whatever result you want to achieve. You just need pandas, seaborn and matplotlib:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import xls
df=pd.read_excel('data.xlsx')
# exclude Ranking values
df1 = df.ix[:,1:-1]
# for each element it takes the value of the xls cell
df2=df1.applymap(lambda x: float(x.split('\n')[1]))
# now plot it
df_heatmap = df2
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(df_heatmap, square=True, ax=ax, annot=True, fmt="1.3f")
plt.yticks(rotation=0,fontsize=16);
plt.xticks(fontsize=12);
plt.tight_layout()
plt.savefig('dfcolorgraph.png')
Which produces the following picture.
I try to plot multi-line with different attribute(color, line-type, etc) with pandas grouby data set. My code plots all blue line of multiple source.
How to apply line attribute at each group?
My code is bleow.
from pandas import Series, DataFrame
import pandas as pd
import matplotlib.pyplot as plt
xls_file = pd.ExcelFile(r'E:\SAT_DATA.xlsx')
glider_data = xls_file.parse('Yosup (4)', parse_dates=[0])
each_glider = glider_data.groupby('Vehicle')
fig, ax = plt.subplots(1,1);
glider_data.groupby("Vehicle").plot(x="TimeStamp", y="Temperature(degC)", ax=ax)
plt.legend(glider_data['Vehicle'], loc='best')
plt.xlabel("Time")
plt.ylabel("Temp")
plt.show()
I think you need to loop over the groups from groupby. Something like:
for i,group in glider_data.groupby('Vehicle'):
group.plot(x='TimeStamp', y='Temperature(degC)', ax=ax, label=i)
I have a csv file which contains two columns where first column is fruit name and second column is count and I need to plot histogram using this csv as input to the code below. How do I make it possible. I just have to show first 20 entries where fruit names will be x axis and count will be y axis from entire csv file of 100 lines.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', header = None ,quoting=2)
data.hist(bins=10)
plt.xlim([0,100])
plt.ylim([50,500])
plt.title("Data")
plt.xlabel("fruits")
plt.ylabel("Frequency")
plt.show()
I edited the above program to plot a bar chart -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None)
data.values
print data
plt.bar(data[:,0], data[:,1], color='g')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
but this gives me an error 'Unhashable Type '. Can anyone help on this.
You can use the inbuilt plot of pandas, although you need to specify the first column is index,
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None, index_col =0)
data.plot(kind='bar')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
If you need to use matplotlib, it may be easier to convert the array to a dictionary using data.to_dict() and extract the data to numpy array or something.