Heatmap Fill empty spaces with black - python

Suppose this is the data at hand:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
data = {'trajectory': [101,102,102,102,102,102,102,102,104,104,104,104,104,104,104,107,107,107,107,
107,107,107,107,107,108,108,108,108,108,108,108,109,109,109,109,109,109,112,
112,112,112,112,113,113,113,113,114,114,114,114],
'segment': [1,1,1,1,2,2,3,3,1,1,2,2,2,3,3,1,1,2,2,2,2,3,3,3,1,1,1,
2,2,2,2,1,1,1,2,2,2,1,1,2,2,2,1,2,2,3,1,2,2,2],
'prediction': [3,0,0,1,3,3,2,2,0,0,4,4,2,0,0,0,0,2,2,2,3,0,0,2,0,0,1,1,
1,1,0,1,2,1,3,3,3,1,1,4,4,2,1,4,4,3,0,3,3,2]}
df = pd.DataFrame(data)
df.head(2)
trajectory segment prediction
0 101 1 3
1 102 1 0
And this is plotted like so:
plot_data = (df.value_counts()
.sort_values(ascending=False)
.reset_index()
.drop_duplicates(['trajectory', 'segment'])
.pivot_table(index='trajectory', columns='segment', values='prediction',))
cmap = mcolors.ListedColormap(['c', 'b', 'g', 'y','m', ])
fig, ax = plt.subplots(figsize=(10,6))
sns.heatmap(plot_data,vmin=-0.5, vmax=4.5,cmap=cmap, annot=True)
Giving:
I want to fill all white cells to black. For that I have to replace all NaN values in my plot_data to some value, say 99, and add black color code k to cmap.
plot_data = (df.value_counts()
.sort_values(ascending=False)
.reset_index()
.drop_duplicates(['trajectory', 'segment'])
.pivot_table(index='trajectory', columns='segment', values='prediction',
fill_value=99))
cmap = mcolors.ListedColormap(['c', 'b', 'g', 'y','m', 'k'])
fig, ax = plt.subplots(figsize=(10,6))
sns.heatmap(plot_data,vmin=-0.5, vmax=4.5,cmap=cmap, annot=True)
And plot again, giviing:
Confusion: 4 is coloured k: black, same as 99, instead of m: magenta. Plus, I do not like to annotate the null value cells with 99. It is there as a placeholder, since I cannot plot when NaN values are replaced with character such as -.
Intended results:
something like the following

You can use set_bad to set the color for masked values of your colorbar to opaque black:
cmap = mcolors.ListedColormap(['c', 'b', 'g', 'y','m',])
cmap.set_bad('k')
(in your colormap definition it's transparent black, that's why you can see the Axes patch in the first place).

Ah, All I need was to set the background to black, before adding heatmap, like so:
ax.set_facecolor('black')
sns.heatmap(plot_data, vmin=-0.5, vmax=4.5, cmap=cmap, annot=True)
And that's it.

Related

Select the color of the bar in histogram plot based on its value

I have thousands of data that I want to plot the histogram of them. I want to put the different colors based on the values of the histogram. My values are between 0-10. So, I want to put the color of the bar from red to green. And if it is close to zero, the color should be red and if it is close to 10, the color should be green. Like the image I attached. In the following example, I want to set the color of row h as close to green, and the b is close to red. Here is a simple example, I have multiple bars and values.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
rating = [8, 4, 5,6]
objects = ('h', 'b', 'c','a')
y_pos = np.arange(len(objects))
plt.barh(y_pos, rating, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.show()
Could you please help me with this? Thank you.
To apply colors depending on values, matplotlib uses a colormap combined with a norm. The colormap maps values between 0 and 1 to a color, for example 0 to green, 0.5 to yellow and 1 to red. A norm maps values from a given range to the range 0 to 1, for example, the minimum value to 0 and the maximum value to 1. Applying the colormap to the norm of the given values then gives the desired colors.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
rating = [8, 4, 5, 6]
objects = ('h', 'b', 'c', 'a')
y_pos = np.arange(len(objects))
cmap = plt.get_cmap('RdYlGn_r')
norm = plt.Normalize(vmin=min(rating), vmax=max(rating))
plt.barh(y_pos, rating, align='center', color=cmap(norm(np.array(rating))))
plt.yticks(y_pos, objects)
plt.show()
Alternatively, the seaborn library could be used for a little bit simpler approach:
import seaborn as sns
rating = [8, 4, 5, 6]
objects = ['h', 'b', 'c', 'a']
ax = sns.barplot(x=rating, y=objects, hue=rating, palette='RdYlGn_r', dodge=False)

How to plot a graph over map of a country?

The code I tried is,
df = pd.DataFrame({'x':phi_pp, 'y':lambda_pp})
df.plot('x', 'y', kind='line',legend=None)
plt.xlabel('IPP Longitude')
ax=plt.ylabel('IPP Latitude')
im = plt.imread("Map_of_India.jpg")
fig, ax = plt.subplots()
plt.show()
The sample df is:
x y
0 15.121270 4246.948356
1 12.103705 4248.927074
2 8.583936 4247.596317
3 18.173364 4244.749973
4 14.175727 4290.142397
The two plots are not plotting on same axes.
Any image of Indian map can be used as a sample image.
You can first plot the image using imshow. Then make a second axes at the same position as the image axes and set its background transparent. Now you can plot your dataframe on this new axes and it will overlay the image.
import pandas as pd
import matplotlib.pyplot as plt
im = plt.imread('https://static01.nyt.com/newsgraphics/2021/coronavirus-tracking/images/maps/IND/hotspots.png')
df = pd.DataFrame({'x': [15.12127, 12.103705, 8.583936, 18.173364, 14.175727],
'y': [4246.948356, 4248.927074, 4247.596317, 4244.749973, 4290.142397]})
fig, ax_im = plt.subplots()
ax_im.imshow(im)
ax_im.set_axis_off()
ax_df = fig.add_axes(ax_im.get_position())
ax_df.patch.set_alpha(0)
df.plot('x', 'y', legend=None, xlabel='IPP Longitude', ylabel='IPP Latitude', ax=ax_df)

Skip bars in Seaborn bar plot for which no data exists [duplicate]

I have a grouped barplot. It's working very well, but I try to remove the empty barplots. They take too much space.
I have already tried :
%matplotlib inline
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
import matplotlib.pyplot as plt
import sys
import os
import glob
import seaborn as sns
import pandas as pd
import ggplot
from ggplot import aes
sns.set(style= "whitegrid", palette="pastel", color_codes=True )
tab_folder = 'myData'
out_folder ='myData/plots'
tab = glob.glob('%s/R*.tab'%(tab_folder))
#is reading all my data
for i, tab_file in enumerate(tab):
folder,file_name=os.path.split(tab_file)
s_id=file_name[:-4].replace('DD','')
df=pd.DataFrame.from_csv(tab_file, sep='\t')
df_2 = df.groupby(['name','ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0]
table = pd.pivot_table(df_2, index='name',columns='ab', values='count' )
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], ax = ax)
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(4)
ax.set_title(s_id).update({'color':'black', 'size':5, 'family':'monospace'})
ax.set_xlabel('')
ax.set_ylabel('')
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1], bbox_to_anchor=(1, 1.05),prop= {'size': 4} )
png_t = '%s/%s.b.png'%(out_folder,s_id)
plt.savefig(png_t, dpi = 500)
But it's not working. The bars are still the same.
Is there any other method to remove empty bars?
Your question is not complete. I don't know what you're trying to accomplish, but from what you've said I'd guess that you are trying not to display empty pivot pairs.
This is not possible by standard means of pandas. Plot of groups need to display all of them even NaNs which will be plot as "empty bars".
Furthermore after groupby every group is at least size of one, so df_2[df_2['count'] != 0] is allways true.
For example
df = pd.DataFrame([['nameA', 'abA'], ['nameB', 'abA'],['nameA','abB'],['nameD', 'abD']], columns=['names', 'ab'])
df_2 = df.groupby(['names', 'ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0] # this line has no effect
table = pd.pivot_table(df_2, index='names',columns='ab', values='count' )
table
gives
ab abA abB abD
names
nameA 1.00 1.00 NaN
nameB 1.00 NaN NaN
nameD NaN NaN 1.00
and
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'])
shows
And that's the way it is. Plot need to show all groups after pivot.
EDIT
You can also use stacked plot, to get rid of spaces
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], stacked=True)

Line color as a function of column values in pandas dataframe

I am trying to plot two columns of a pandas dataframe against each other, grouped by a values in a third column. The color of each line should be determined by that third column, i.e. one color per group.
For example:
import pandas as pd
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3]})
df.groupby('colors').plot('x','y',ax=ax)
If I do it this way, I end up with three different lines plotting x against y, with each line a different color. I now want to determine the color by the values in 'colors'. How do I do this using a gradient colormap?
Looks like seaborn is applying the color intensity automatically based on the value in hue..
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2,3,4,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3,1.5,1.5,1.5]})
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors')
Gives:
you can change the colors by adding palette argument as below:
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
#more combinations : viridis, mako, flare, etc.
gives:
Edit (for colormap):
based on answers at Make seaborn show a colorbar instead of a legend when using hue in a bar plot?
import seaborn as sns
fig = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
norm = plt.Normalize(vmin = df['colors'].min(), vmax = df['colors'].max())
sm = plt.cm.ScalarMappable(cmap="mako", norm = norm)
fig.figure.colorbar(sm)
fig.get_legend().remove()
plt.show()
gives..
Hope that helps..
Complementing to Prateek's very good answer, once you have assigned the colors based on the intensity of the palette you choose (for example Mako):
plots = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors',palette='mako')
You can add a colorbar with matplotlib's function plt.colorbar() and assign the palette you used:
sm = plt.cm.ScalarMappable(cmap='mako')
plt.colorbar(sm)
After plt.show(), we get the combined output:

Remove empty bars from grouped barplot

I have a grouped barplot. It's working very well, but I try to remove the empty barplots. They take too much space.
I have already tried :
%matplotlib inline
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
import matplotlib.pyplot as plt
import sys
import os
import glob
import seaborn as sns
import pandas as pd
import ggplot
from ggplot import aes
sns.set(style= "whitegrid", palette="pastel", color_codes=True )
tab_folder = 'myData'
out_folder ='myData/plots'
tab = glob.glob('%s/R*.tab'%(tab_folder))
#is reading all my data
for i, tab_file in enumerate(tab):
folder,file_name=os.path.split(tab_file)
s_id=file_name[:-4].replace('DD','')
df=pd.DataFrame.from_csv(tab_file, sep='\t')
df_2 = df.groupby(['name','ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0]
table = pd.pivot_table(df_2, index='name',columns='ab', values='count' )
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], ax = ax)
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(4)
ax.set_title(s_id).update({'color':'black', 'size':5, 'family':'monospace'})
ax.set_xlabel('')
ax.set_ylabel('')
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1], bbox_to_anchor=(1, 1.05),prop= {'size': 4} )
png_t = '%s/%s.b.png'%(out_folder,s_id)
plt.savefig(png_t, dpi = 500)
But it's not working. The bars are still the same.
Is there any other method to remove empty bars?
Your question is not complete. I don't know what you're trying to accomplish, but from what you've said I'd guess that you are trying not to display empty pivot pairs.
This is not possible by standard means of pandas. Plot of groups need to display all of them even NaNs which will be plot as "empty bars".
Furthermore after groupby every group is at least size of one, so df_2[df_2['count'] != 0] is allways true.
For example
df = pd.DataFrame([['nameA', 'abA'], ['nameB', 'abA'],['nameA','abB'],['nameD', 'abD']], columns=['names', 'ab'])
df_2 = df.groupby(['names', 'ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0] # this line has no effect
table = pd.pivot_table(df_2, index='names',columns='ab', values='count' )
table
gives
ab abA abB abD
names
nameA 1.00 1.00 NaN
nameB 1.00 NaN NaN
nameD NaN NaN 1.00
and
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'])
shows
And that's the way it is. Plot need to show all groups after pivot.
EDIT
You can also use stacked plot, to get rid of spaces
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], stacked=True)

Categories

Resources