The code below:
import pandas as pd
import matplotlib.pyplot as plt
data = [['Apple',10],['Banana',15],['Kiwi',11],['Orange',17]]
df = pd.DataFrame(data,columns=['Fruit','Quantity'])
df.set_index('Fruit', inplace=True)
df.plot.bar(color='gray',rot=0)
plt.show()
gives the following output:
I would like to plot bars in red color for the top two quantity fruits i.e., Orange and Banana. How can I do that? Instead of giving a fixed threshold value to change color, I would prefer if my plot is robust enough to identify top two bars.
There might be a straightforward and simpler way but I was able to come up with the following solution which would work in principle for any number of top n values. The idea is:
First get the top n elements (n=2 in the example below) from the DataFrame using nlargest
Then, loop over the x-tick labels and change the color of the patches (bars) for those values which are the largest using an if statement to get their index. Here we created an axis instance ax to be able to extract the patches for setting the colors.
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
data = [['Apple',10],['Banana',15],['Kiwi',11],['Orange',17]]
df = pd.DataFrame(data,columns=['Fruit','Quantity'])
df.set_index('Fruit', inplace=True)
df.plot.bar(color='gray',rot=0, ax=ax)
top = df['Quantity'].nlargest(2).keys() # Top 2 values here
for i, tick in enumerate(ax.get_xticklabels()):
if tick.get_text() in top:
ax.patches[i].set_color('r')
plt.show()
Plotting a colored bar plot
The problem is that pandas bar plots take the color argument to apply column-wise. Here you have a single column. Hence something like the canonical attempt to color a bar plot does not work
pd.DataFrame([12,14]).plot.bar(color=["red", "green"])
A workaround is to create a diagonal matrix instead of a single column and plot it with the stacked=True option.
df = pd.DataFrame([12,14])
df = pd.DataFrame(np.diag(df[0].values), index=df.index, columns=df.index)
df.plot.bar(color=["red", "green"], stacked=True)
Another option is to use matplotlib instead.
df = pd.DataFrame([12,14])
plt.bar(df.index, df[0].values, color=color)
Choosing the colors according to values
Now the question remains on how to create a list of the colors to use in either of the two solutions above. Given a dataframe df you can create an array of equal length to the frame and fill it with the default color, then you can set those entries of the two highest values to another color:
color = np.array(["gray"]*len(df))
color[np.argsort(df["Quantity"])[-2:]] = "red"
Solution:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = [['Apple',10],['Banana',15],['Kiwi',11],['Orange',17]]
df = pd.DataFrame(data,columns=['Fruit','Quantity'])
df.set_index('Fruit', inplace=True)
color = np.array(["gray"]*len(df))
color[np.argsort(df["Quantity"])[-2:]] = "red"
plt.bar(df.index, df.values, color=color)
plt.show()
Related
I want to specify the color of the area surrounding a plot created using the df.plot() in Pandas/Python.
Using .set_facecolor as in the code below only changes the area inside the axes (see image), I want to change the color outside too.
import pandas as pd
import numpy as np
df = pd.DataFrame(components, columns=['PC1','PC2']
df.plot('PC1','PC2','scatter').set_facecolor('green')
Replacing the last line with these two lines produces the same graph.
ax = df.plot('PC1','PC2','scatter')
ax.set_facecolor('green')
setfacecolor example
IIUC, you can use fig.set_facecolor:
fig, ax = plt.subplots()
df.plot('PC1','PC2','scatter', ax=ax).set_facecolor('green')
fig.set_facecolor('green')
plt.show()
Output:
I want to create a bar plot (vertical) using seaborn, each x axis label will have n (2 in the example) bars of different colors - but each bar will be floating - in other words it uses the matplotlib bar bottom parameter
this works without the bottom part as follows, but fails with it
import pandas as pd
import seaborn as sns
d = {'month':['202001','202002','202003','202001','202002','202003'],
'range' : [0.94,4.47,0.97,4.70,0.98,1.23],
'bottom' : [8.59,17.05,8.35,17.78,8.32,5.67],
'group' : ['a','a','a','b','b','b']
}
df = pd.DataFrame(data=d)
sns.barplot(data=df,x = "month", y = "range",hue='group')
(Sorry I can't upload the picture for some reason, I think the service is blocked from my work, but the code will display it if run)
but when I add the bottom parameters it fails
sns.barplot(data=df,x = "month", y = "range",hue='group',bottom='bottom')
I appreciate the help, and perhaps an explanation of why it is failing, as logically it should work
The bars indicate a range of forecasts for a measure, and I want to show them as a rectangle
sns itself doesn't handle bottom, so it's passed to plt.bar. But plt.bar requires bottom to have the same shape/size with x and y which is not the case when data is passed by sns.
Let's try a work around with pandas plot function:
to_plot = df.pivot(index='month',columns='group')
fig,ax = plt.subplots()
to_plot['range'].add(to_plot['bottom']).plot.bar(ax=ax)
# cover the bars up to `bottom`
# replace `w` with background color of your choice
to_plot['bottom'].plot.bar(ax=ax, color='w', legend=None)
Output:
sns.set()
to_plot = df.pivot(index='month',columns='group')
For another approach that allows a specific style:
# set sns plot style
sns.set()
fig,ax = plt.subplots()
for i,(label,r) in enumerate(to_plot.iterrows()):
plt.bar([i-0.1,i+0.1],r['range'],
bottom=r['bottom'],
color=['C0','C1'],
width=0.2)
plt.xticks(np.arange(len(to_plot)), to_plot.index);
Output:
I want to plot column number 0 vs all the other columns with a colour map.
I have written in following with For loop. However, all the graphs appear separately and not on the same graph. Below, is the code I wrote,
csv_file1 = pd.read_csv(r'file path')
j = [i for i in range(1,175)]
for i in j:
csv_file1.plot.scatter(0,i,c=i,colormap='viridis')
plt.hold()
How to get all the plot of the same graph?
With pandas, you'll need to draw all the plot using the same ax. Supposing you want to use the first column as x and all the other columns as y, everything colored via the y value. You need to suppress the colorbar, as pandas wants to add an individual colorbar for each column.
from matplotlib import pyplot as plt
import pandas as pd
csv_file1 = pd.read_csv(r'file path')
ax = plt.gca()
columns = csv_file1.columns
for col in columns[1:]:
csv_file1.plot.scatter(x=columns[0], y=col, c=col, colormap='viridis',
ax=ax, colorbar=False)
plt.show()
Alternatively, if you want to give each column its own color, you could calculate a list of colors (pandas seems to like them in a list of one color):
colors = [[plt.cm.viridis(i / len(columns))] for i in range(len(columns) - 1)]
for col, color in zip(columns[1:], colors):
csv_file1.plot.scatter(x=columns[0], y=col, c=color, ax=ax, colorbar=False)
plt.show()
Can't speak for pandas.plot, but you can control when canvases are cleared by using plt.scatter and placing plt.show() outside the loop
import pandas as pd
import matplotlib.pyplot as plt
csv_file1 = pd.read_csv(r'file path')
j = csv_file.columns.tolist()
for i in j:
plt.scatter(csv_file1.index,csv_file1[i],colormap='viridis')
plt.show()
I am passing a pandas dataframe to be plotted with pd.scatterplot and want to use the 'bright' color palette. The color is to be determined by values in an integer Series I pass as hue to the plotting function.
The problem is that this only works when the hue Series has only two distinct values. When it has only one ore more than 2 different values, the plotting defaults to a beige-to-purple color palette.
When setting the color palette using sns.set_palette('bright') everything happens as described above. But when I do palette='bright'inside the plotting function call (and n_classes is != 2) I get an explicit Value Error thrown:
ValueError: Palette {} not understood
Here is the code for reproducing:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_palette('bright') # first method
n_classes = 3
a = np.arange(10)
b = np.random.randn(10)
c = np.random.randint(n_classes, size=10)
s = pd.DataFrame({'A': a, 'B':b, 'C': c})
sns.scatterplot(data=s, x='A', y='B', hue='C')
plt.show()
For the second method simply change the scatterplot call to
sns.scatterplot(data=s, x='A', y='B', hue='C', palette='bright')
Is there a way to get multiple hue levels in the palette I want? Am I doing anything wrong or is this a bug?
You need to pass the number of colors
Something like that.
sns.scatterplot(data=s,
x='A',
y='B',
hue='C',
palette=sns.color_palette('bright', s.C.unique().shape[0])
)
I am trying to create a single image with heatmaps representing the correlation of features of data points for each label separately. With seaborn I can create a heatmap for a single class like so
grouped = df.groupby('target')
sns.heatmap(grouped.get_group('Class_1').corr())
An I get this which makes sense:
But then I try to make a list of all the labels like so:
g = sns.FacetGrid(df, col='target')
g.map(lambda grp: sns.heatmap(grp.corr()))
And sadly I get this which makes no sense to me:
Turns out you can do it pretty concisely with just seaborn if you use map_dataframe instead of map:
g = sns.FacetGrid(df, col='target')
g.map_dataframe(lambda data, color: sns.heatmap(data.corr(), linewidths=0))
#mwaskom points out in his comment that it might be a good idea to explicitly set the limits of the colormap so that the different facets can be more directly compared. The documentation describes relevant heatmap parameters:
vmin, vmax : floats, optional
Values to anchor the colormap, otherwise they are inferred from the data and other keyword arguments.
Without FacetGrid, but making a corr heatmap for each group in a column:
import pandas as pd
import seaborn as sns
from numpy.random import randint
import matplotlib.pyplot as plt
df = pd.DataFrame(randint(0,10,(200,12)),columns=list('abcdefghijkl'))
grouped = df.groupby('a')
rowlength = grouped.ngroups/2 # fix up if odd number of groups
fig, axs = plt.subplots(figsize=(9,4), nrows=2, ncols=rowlength)
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
sns.heatmap(grouped.get_group(key).corr(), ax=ax,
xticklabels=(i >= rowlength),
yticklabels=(i%rowlength==0),
cbar=False) # Use cbar_ax into single side axis
ax.set_title('a=%d'%key)
plt.show()
Maybe there's a way to set up a lambda to correctly pass the data from the g.facet_data() generator through corr before going to heatmap.