I am new to python and i am trying to make a correlation heatmap on seaborn. Could anyone tell me how to customise the default values on the right of the heatmap with my own correlation cutoffs? I get something like the one in the picture but i want to customise with my own cutoffs and three values instead of four.
import pandas as pd
import seaborn as sns
new_df = df[['A', 'B', 'C', 'D', 'G','E']]
sns.heatmap(new_df.corr(), annot = False,square=True)
If I understand your question correctly, you want to change the tick marks that display on the colorbar.
If so the heatmap function has an attribute cbar_kws which accepts a dictionary as input. A potential solution is:
sns.heatmap(
new_df.corr(),
annot = False,
square=True,
cbar_kws={'ticks': [0.25, .5, 1]}
)
Seaborn heatmap documentation
Matplotlib Colorbar documentation (for the key words)
Shortly ago, I posted this question:
seaborn barplot: vary color with x and hue
My sample code from that question produces a barplot, which looks like this:
As you can see, there is a very tiny space between the bars.
The code is this:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(columns=["model", "time", "value"])
df["model"] = ["on"]*2 + ["off"]*2
df["time"] = ["short", "long"] * 2
df["value"] = [1, 10, 2, 4]
sns.barplot(data=df, x="model", hue="time", y="value")
plt.show()
Now I executed the code on a different machine and it produced a different image:
The different colors are no concern, I'll specify my own palette in any case. But an important difference for me is: The bars touch each other now, there is no longer a white border between them.
How can I reproduce the original behaviour, how can I explicitly set the padding of barplots bars in seaborn ?
My current seaborn version is 0.9.0. Unfortunately, I don't know with which version the original image was created.
I have made a colourmap from my chosen colours, however I'd like to convert it to a palette which can be used to 'hue' a seaborn plot. Is this possible, and if so how so?
I have used...
cmap = pl.colors.LinearSegmentedColormap.from_list("", ["red","white"],gamma=0.5,N=len(hue))
...in order to make my own colourmap, which I know works because it can be applied to a standard matplotlib.pyplot.scatter plot successfully, as shown below.
plt.scatter(x=[1,2,3,4,5],y=[1,2,3,4,5],c=[5,4,3,2,1],cmap=cmap)
click here to see the output as it won't let me embed an image
However, I am trying to use seaborn's swarmplot function and pass in my colourmap as the hue parameter. Obviously this doesn't work as this parameter requires a palette - hence my question!
I'm not quite sure where to start! Any help would be appreciated!
A seaborn palette is a simple list of colors. You may obtain the colors via
cmap(np.linspace(0,1,cmap.N))
Complete example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
df =pd.DataFrame({"x" : np.random.randint(0,4, size=200),
"y" : np.random.randn(200),
"hue" : np.random.randint(0,4, size=200)})
u = np.unique(df["hue"].values)
cmap = mcolors.LinearSegmentedColormap.from_list("", ["indigo","gold"],gamma=0.5,N=len(u))
sns.swarmplot("x", "y", hue="hue", data=df, palette=cmap(np.linspace(0,1,cmap.N)))
plt.show()
I am plotting several graphs with matplotlib for a publication and I need that all have the same style. Some graphs have more than 6 categories and I have noticed that, by default, it does not plot more than 6 different colours. 7 or more and I start to have repeated colours.
e.g.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.style.use('seaborn-muted')
df2= pd.DataFrame(np.random.rand(10,8))
df2.plot(kind='bar',stacked=True)
plt.legend(fontsize=13,loc=1)
plt.show()
There is probably a cognitive reason not to include more than 6 different colours, but if I need to, How can I do it? I have tried different stylesheets (seaborn, ggplot, classic) and all have seem to have the same "limitation".
Do I need to change the colormap/stylesheet? Ideally, I would like to use a qualitative colormap (there is no order in the categories that I am plotting) and use a pre-existing one... I am not very good choosing colours.
thanks!
By default, matplotlib will cycle through a series of six colors. If you want to change the default colors (or number of colors), you can use cycler to loop through the colors you want instead of the defaults.
from cycler import cycler
% Change the default cycle colors to be red, green, blue, and yellow
plt.rc('axes', prop_cycle=(cycler('color', ['r', 'g', 'b', 'y']))
demo here
A better way is just to manually specify plot colors when you create your plot so that not every plot you make has to use the same colors.
plt.plot([1,2,3], 'r')
plt.plot([4,5,6], 'g')
plt.plot([7,8,9], 'b')
plt.plot([10,11,12], 'y')
Or you can change the color after creation
h = plt.plot([1,2,3])
h.set_color('r')
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
colors = plt.cm.jet(np.linspace(0, 1, 10))
df2= pd.DataFrame(np.random.rand(10,8))
df2.plot(kind='bar',color=colors, stacked=True)
plt.legend(fontsize=13,loc=1)
this is basically copied from Plotting with more colors in matplotlib. check the document of colormap and its example page.
I have this data frame diamonds which is composed of variables like (carat, price, color), and I want to draw a scatter plot of price to carat for each color, which means different color has different color in the plot.
This is easy in R with ggplot:
ggplot(aes(x=carat, y=price, color=color), #by setting color=color, ggplot automatically draw in different colors
data=diamonds) + geom_point(stat='summary', fun.y=median)
I wonder how could this be done in Python using matplotlib ?
PS:
I know about auxiliary plotting packages, such as seaborn and ggplot for python, and I don't prefer them, just want to find out if it is possible to do the job using matplotlib alone, ;P
Imports and Sample DataFrame
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns # for sample data
from matplotlib.lines import Line2D # for legend handle
# DataFrame used for all options
df = sns.load_dataset('diamonds')
carat cut color clarity depth table price x y z
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
With matplotlib
You can pass plt.scatter a c argument, which allows you to select the colors. The following code defines a colors dictionary to map the diamond colors to the plotting colors.
fig, ax = plt.subplots(figsize=(6, 6))
colors = {'D':'tab:blue', 'E':'tab:orange', 'F':'tab:green', 'G':'tab:red', 'H':'tab:purple', 'I':'tab:brown', 'J':'tab:pink'}
ax.scatter(df['carat'], df['price'], c=df['color'].map(colors))
# add a legend
handles = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in colors.items()]
ax.legend(title='color', handles=handles, bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
df['color'].map(colors) effectively maps the colors from "diamond" to "plotting".
(Forgive me for not putting another example image up, I think 2 is enough :P)
With seaborn
You can use seaborn which is a wrapper around matplotlib that makes it look prettier by default (rather opinion-based, I know :P) but also adds some plotting functions.
For this you could use seaborn.lmplot with fit_reg=False (which prevents it from automatically doing some regression).
sns.scatterplot(x='carat', y='price', data=df, hue='color', ec=None) also does the same thing.
Selecting hue='color' tells seaborn to split and plot the data based on the unique values in the 'color' column.
sns.lmplot(x='carat', y='price', data=df, hue='color', fit_reg=False)
With pandas.DataFrame.groupby & pandas.DataFrame.plot
If you don't want to use seaborn, use pandas.groupby to get the colors alone, and then plot them using just matplotlib, but you'll have to manually assign colors as you go, I've added an example below:
fig, ax = plt.subplots(figsize=(6, 6))
grouped = df.groupby('color')
for key, group in grouped:
group.plot(ax=ax, kind='scatter', x='carat', y='price', label=key, color=colors[key])
plt.show()
This code assumes the same DataFrame as above, and then groups it based on color. It then iterates over these groups, plotting for each one. To select a color, I've created a colors dictionary, which can map the diamond color (for instance D) to a real color (for instance tab:blue).
Here's a succinct and generic solution to use a seaborn color palette.
First find a color palette you like and optionally visualize it:
sns.palplot(sns.color_palette("Set2", 8))
Then you can use it with matplotlib doing this:
# Unique category labels: 'D', 'F', 'G', ...
color_labels = df['color'].unique()
# List of RGB triplets
rgb_values = sns.color_palette("Set2", 8)
# Map label to RGB
color_map = dict(zip(color_labels, rgb_values))
# Finally use the mapped values
plt.scatter(df['carat'], df['price'], c=df['color'].map(color_map))
I had the same question, and have spent all day trying out different packages.
I had originally used matlibplot: and was not happy with either mapping categories to predefined colors; or grouping/aggregating then iterating through the groups (and still having to map colors). I just felt it was poor package implementation.
Seaborn wouldn't work on my case, and Altair ONLY works inside of a Jupyter Notebook.
The best solution for me was PlotNine, which "is an implementation of a grammar of graphics in Python, and based on ggplot2".
Below is the plotnine code to replicate your R example in Python:
from plotnine import *
from plotnine.data import diamonds
g = ggplot(diamonds, aes(x='carat', y='price', color='color')) + geom_point(stat='summary')
print(g)
So clean and simple :)
Here a combination of markers and colors from a qualitative colormap in matplotlib:
import itertools
import numpy as np
from matplotlib import markers
import matplotlib.pyplot as plt
m_styles = markers.MarkerStyle.markers
N = 60
colormap = plt.cm.Dark2.colors # Qualitative colormap
for i, (marker, color) in zip(range(N), itertools.product(m_styles, colormap)):
plt.scatter(*np.random.random(2), color=color, marker=marker, label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=4);
Using Altair.
from altair import *
import pandas as pd
df = datasets.load_dataset('iris')
Chart(df).mark_point().encode(x='petalLength',y='sepalLength', color='species')
The easiest way is to simply pass an array of integer category levels to the plt.scatter() color parameter.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
plt.scatter(df['carat'], df['price'], c=pd.factorize(df['color'])[0],)
plt.gca().set(xlabel='Carat', ylabel='Price', title='Carat vs. Price')
This creates a plot without a legend, using the default "viridis" colormap. In this case "viridis" is not a good default choice because the colors appear to imply a sequential order rather than purely nominal categories.
To choose your own colormap and add a legend, the simplest approach is this:
import matplotlib.patches
levels, categories = pd.factorize(df['color'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
plt.scatter(df['carat'], df['price'], c=colors)
plt.gca().set(xlabel='Carat', ylabel='Price', title='Carat vs. Price')
plt.legend(handles=handles, title='Color')
I chose the "tab10" discrete (aka qualitative) colormap here, which does a better job at signaling the color factor is a nominal categorical variable.
Extra credit:
In the first plot, the default colors are chosen by passing min-max scaled values from the array of category level ints pd.factorize(iris['species'])[0] to the call method of the plt.cm.viridis colormap object.
With df.plot()
Normally when quickly plotting a DataFrame, I use pd.DataFrame.plot(). This takes the index as the x value, the value as the y value and plots each column separately with a different color.
A DataFrame in this form can be achieved by using set_index and unstack.
import matplotlib.pyplot as plt
import pandas as pd
carat = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
price = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
color =['D', 'D', 'D', 'E', 'E', 'E', 'F', 'F', 'F', 'G', 'G', 'G',]
df = pd.DataFrame(dict(carat=carat, price=price, color=color))
df.set_index(['color', 'carat']).unstack('color')['price'].plot(style='o')
plt.ylabel('price')
With this method you do not have to manually specify the colors.
This procedure may make more sense for other data series. In my case I have timeseries data, so the MultiIndex consists of datetime and categories. It is also possible to use this approach for more than one column to color by, but the legend is getting a mess.
You can convert the categorical column into a numerical one by using the commands:
#we converting it into categorical data
cat_col = df['column_name'].astype('category')
#we are getting codes for it
cat_col = cat_col.cat.codes
# we are using c parameter to change the color.
plt.scatter(df['column1'],df['column2'], c=cat_col)