I had a look at Kaggle's univariate-plotting-with-pandas. There's this line which generates bar graph.
reviews['province'].value_counts().head(10).plot.bar()
I don't see any color scheme defined specifically.
I tried plotting it using jupyter notebook but could see only one color instead of all multiple colors as at Kaggle.
I tried reading the document and online help but couldn't get any method to generate these colors just by the line above.
How do we do that? Is there a config to set this randomness by default?
It seems like the multicoloured bars were the default behaviour in one of the former pandas versions and Kaggle must have used that one for their tutorial (you can read more here).
You can easily recreate the plot by defining a list of standard colours and then using it as an argument in bar.
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
'#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
reviews['province'].value_counts().head(10).plot.bar(color=colors)
Tested on pandas 0.24.1 and matplotlib 2.2.2.
In seaborn is it not problem:
import seaborn as sns
sns.countplot(x='province', data=reviews)
In matplotlib are not spaces, but possible with convert values to one row DataFrame:
reviews['province'].value_counts().head(10).to_frame(0).T.plot.bar()
Or use some qualitative colormap:
import matplotlib.pyplot as plt
N = 10
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Paired(np.arange(N)))
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Pastel1(np.arange(N)))
The colorful plot has been produced with an earlier version of pandas (<= 0.23). Since then, pandas has decided to make bar plots monochrome, because the color of the bars is pretty meaningless. If you still want to produce a bar chart with the default colors from the "tab10" colormap in pandas >= 0.24, and hence recreate the previous behaviour, it would look like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
N = 13
df = pd.Series(np.random.randint(10,50,N), index=np.arange(1,N+1))
cmap = plt.cm.tab10
colors = cmap(np.arange(len(df)) % cmap.N)
df.plot.bar(color=colors)
plt.show()
Related
I am looking for a way to combine a bar and a line plot, without the bar plot shifting when the line plot is added.
The following code is used to generate the barplot
import matplotlib.pyplot as plt
import pandas as pd
data = pd.DataFrame([[4,30,0,3,2,2,], [5,24,0,3,1,1,], [6,34,0,4,2,1], [7,18,0,2,1,1], [8,34,0,3,3,2]], columns=['t', 'Cost', 0,1,2,3])
data[[1,2,3]].plot(kind='bar')
Thus, the data looks as follows
and the following plot is generated:
Next, I add the cost information using
data['Cost'].plot(style='o--', c='black', secondary_y=True)
Running it all together returns the following graph:
The issue is that the outer bars are no longer visible. I tried changing the range on the x-axis with xlim, but that did not help, it only made it worse. There is probably an easy fix for it, which I have not been able to find anywhere online.
I don't have the issue, running your code:
That said, an easy fix is to run ax.set_xlim(-0.5, 4.5)
Sample result I want to color code the circle plots in my scatter and associate them with a legend.
I have made various attempts at a solution referencing matplotlib.org and this website. All to no avail.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
hysector = pd.read_csv('LF98TRUU_Lvl3_sector.csv',index_col=0)
hysector.plot.scatter(x='OAD',y='OAS',s=hysector['Wgt']*35,label='Sector');
I am new to matplotlib and seaborn and is currently trying to practice the two libraries using the classic titanic dataset. This might be elementary, but I'm trying to plot two factorplots side by side by inputting the argument ax = matplotlib axis as shown in the code below:
import matploblib.pyplot as plt
import seaborn as sns
%matplotlib inline
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,4))
sns.factorplot(x='Pclass',data=titanic_df,kind='count',hue='Survived',ax=axis1)
sns.factorplot(x='SibSp',data=titanic_df,kind='count',hue='Survived',ax=axis2)
I was expecting the two factorplots side by side, but instead of just that, I ended up with two extra blank subplots as shown above
Edited: image was not there
Any call to sns.factorplot() actually creates a new figure, although the contents are drawn to the existing axes (axes1, axes2). Those figures are shown together with the original fig.
I guess the easiest way to prevent those unused figures from showing up is to close them, using plt.close(<figure number>).
Here is a solution for a notebook
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
%matplotlib inline
titanic_df = pd.read_csv(r"https://github.com/pcsanwald/kaggle-titanic/raw/master/train.csv")
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,4))
sns.factorplot(x='pclass',data=titanic_df,kind='count',hue='survived',ax=axis1)
sns.factorplot(x='sibsp',data=titanic_df,kind='count',hue='survived',ax=axis2)
plt.close(2)
plt.close(3)
(For normal console plotting, remove the %matplotlib inline command and add plt.show() at the end.)
Explanation:
I am making a plot with stripplot method using hue argument, but some markers of the result image didn't have face colors I intended, instead they have black, gray or white.
The simplified code below makes a image alike in my case. There are only 3 records to plot, and their markers lose face colors.
I am new in Python, so I might miss something... If this post needs more info, please tell me.
Question:
Is there any work around that the only 3 record have face color?
Environment:
MacOS 10.11.1
Python 2.7.10 (Homebrew install)
seaborn 0.6.0 (pip install)
matplotlib 1.5.0 (pip install)
Code:
#!/usr/bin/env python
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
data = sns.load_dataset("tips")
data = data[data.sex == "Female"][0:3]
plot = sns.stripplot(
x="total_bill",
y="sex",
data=data
)
plt.show()
This problem has been already referred on GitHub.
See github.com/mwaskom/seaborn/issues/753
How can I get seaborn colors when doing a scatter plot?
import matplotlib.pyplot as plt
import seaborn as sns
ax=fig.add_subplot(111)
for f in files:
ax.scatter(args) # all datasets end up same colour
#plt.plot(args) # cycles through palette correctly
You have to tell matplotlib which color to use. To Use, for example, seaborn's default color palette:
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
ax=fig.add_subplot(111)
palette = itertools.cycle(sns.color_palette())
for f in files:
ax.scatter(args, color=next(palette))
The itertools.cycle makes sure we don't run out of colors and start with the first one again after using the last one.
Update:
As per #Iceflower's comment, creating a custom color palette via
palette = sns.color_palette(None, len(files))
might be a better solution. The difference is that my original answer at the top iterates through the default colors as often as it has to, whereas this solution creates a palette with as much hues as there are files. That means that no color is repeated, but the difference between colors might be very subtle.
To build on Carsten's answer, if you have a large number of categories to assign colours to, you might wish to zip the colours to a very large seaborn palette, for example the xkcd_palette or crayon_palette.. Note that this practice is usually a chartjunk anti-pattern: using more than 5-6 colours is usually overkill, and you might need to consider changing your chart type.
import matplotlib.pyplot as plt
import seaborn as sns
palette = zip(df['category'].unique(), sns.crayons.values())