Avoiding overlapping plots in seaborn bar plot - python

I have the following code where I am trying to plot a bar plot in seaborn. (This is a sample data and both x and y variables are continuous variables).
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
xvar = [1,2,2,3,4,5,6,8]
yvar = [3,6,-4,4,2,0.5,-1,0.5]
year = [2010,2011,2012,2010,2011,2012,2010,2011]
df = pd.DataFrame()
df['xvar'] = xvar
df['yvar']=yvar
df['year']=year
df
sns.set_style('whitegrid')
fig,ax=plt.subplots()
fig.set_size_inches(10,5)
sns.barplot(data=df,x='xvar',y='yvar',hue='year',lw=0,dodge=False)
It results in the following plot:
Two questions here:
I want to be able to plot the two bars on 2 side by side and not overlapped the way they are now.
For the x-labels, in the original data, I have alot of them. Is there a way I can set xticks to a specific frequency? for instance, in the chart above only I only want to see 1,3 and 6 for x-labels.
Note: If I set dodge = True then the lines become very thin with the original data.

For the first question, get the patches in the bar chart and modify the width of the target patch. It also shifts the position of the x-axis to represent the alignment.
The second question can be done by using slices to set up a list or a manually created list in a specific order.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
xvar = [1,2,2,3,4,5,6,8]
yvar = [3,6,-4,4,2,0.5,-1,0.5]
year = [2010,2011,2012,2010,2011,2012,2010,2011]
df = pd.DataFrame({'xvar':xvar,'yvar':yvar,'year':year})
fig,ax = plt.subplots(figsize=(10,5))
sns.set_style('whitegrid')
g = sns.barplot(data=df, x='xvar', y='yvar', hue='year', lw=0, dodge=False)
for idx,patch in enumerate(ax.patches):
current_width = patch.get_width()
current_pos = patch.get_x()
if idx == 8 or idx == 15:
patch.set_width(current_width/2)
if idx == 15:
patch.set_x(current_pos+(current_width/2))
ax.set_xticklabels([1,'',3,'','',6,''])
plt.show()

Related

For loop charts - changing xtick frequency dynamically for each chart

I'm trying to increase the number of xticks for each chart in the dataframe.
for c in df:
fig = plt.figure(figsize=[10,5]);
ax = df[c].plot(kind='hist', color=(0.2,0.4,0.6,0.6), bins=30);
I've tried:
ax.xticks(np.arange(min(c),max(x)+1,1));
Results in an AttributeError.
Thus are there any methods to increase the number of xticks without specifying the ticks explicitly but rather dynamically so it works for all the charts?
the function doesn't understand the c in min (and I guess it is max(c) too.
it works this way:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(figsize=[10,5])
for c in df:
ax = df[c].plot(kind='hist', color=(0.2,0.4,0.6,0.6), bins=30)
plt.xticks(np.arange(min(df[c]),max(df[c]), step = 1))

How can I plot slice of certain DataFrame for each row with different color?

I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)

Matplotlib stacked histogram using `scatter_matrix` on pandas dataframe

Currently I have the following code
import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import scatter_matrix
df= pd.read_csv(file, sep=',')
colors = list('r' if i==1 else 'b' for i in df['class']) # class is either 1 or 0
plt.figure()
scatter_matrix(df, color=colors )
plt.show()
It shows the following output
But in this plot on diagonals, instead of simple histogram I want to show stacked histogram like the following such that for class '1' it is red and for '0' it is blue
Please guide me how can I do this ?
The use of seaborn is probably highly beneficial for plotting a scatter matrix kind of plot. However, I do not know how to plot a stacked histogram easily into the diagonal of a PairGrid in seaborn.
As the question anyways asks for matplotlib, the following is a solution using pandas and matplotlib. Unfortunately it will require to do a lot of stuff by hand. The following would be an example (note that seaborn is only imported to get some data since the question did not provide any).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# seaborn import just needed to get some data
import seaborn as sns
df = sns.load_dataset("iris")
n_hist = 10
category = "species"
columns = ["sepal_length","sepal_width","petal_length","petal_width"]
mi = df[columns].values.min()
ma = df[columns].values.max()
hist_bins = np.linspace(mi, ma, n_hist)
fig, axes = plt.subplots(nrows=len(columns), ncols=len(columns),
sharex="col")
for i,row in enumerate(columns):
for j,col in enumerate(columns):
ax= axes[i,j]
if i == j:
# diagonal
mi = df[col].values.min()
ma = df[col].values.max()
hist_bins = np.linspace(mi, ma, n_hist)
def hist(x):
h, e = np.histogram(x.dropna()[col], bins=hist_bins)
return pd.Series(h, e[:-1])
b = df[[col,category]].groupby(category).apply(hist).T
values = np.cumsum(b.values, axis=1)
for k in range(len(b.columns)):
if k == 0:
ax.bar(b.index, values[:,k], width=np.diff(hist_bins)[0])
else:
ax.bar(b.index, values[:,k], width=np.diff(hist_bins)[0],
bottom=values[:,k-1])
else:
# offdiagonal
for (n,cat) in df.groupby(category):
ax.scatter(cat[col],cat[row], s = 5,label=n, )
ax.set_xlabel(col)
ax.set_ylabel(row)
#ax.legend()
plt.tight_layout()
plt.show()
Sample code
import seaborn as sns
sns.set(style="ticks")
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")

Python plotting by different dataframe columns (using Seaborn?)

I'm trying to create a scatterplot of a dataset with point coloring based on different categorical columns. Seaborn works well here for one plot:
fg = sns.FacetGrid(data=plot_data, hue='col_1')
fg.map(plt.scatter, 'x_data', 'y_data', **kws).add_legend()
plt.show()
I then want to display the same data, but with hue='col_2' and hue='col_3'. It works fine if I just make 3 plots, but I'm really hoping to find a way to have them all appear as subplots in one figure. Unfortunately, I haven't found any way to change the hue from one plot to the next. I know there are plotting APIs that allow for an axis keyword, thereby letting you pop it into a matplotlib figure, but I haven't found one that simultaneously allows you to set 'ax=' and 'hue='. Any ideas?
Thanks in advance!
Edit:
Here's some sample code to illustrate the idea
xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)
sns.set(style='ticks')
fg = sns.FacetGrid(data=df, hue='pet', size=5)
fg.map(plt.scatter, 'x', 'y').add_legend()
fg = sns.FacetGrid(data=df, hue='hair', size=5)
fg.map(plt.scatter, 'x', 'y').add_legend()
plt.show()
This plots what I want, but in two windows. The color scheme is set in the first plot by grouping by 'pet', and in the second plot by 'hair'. Is there any way to do this on one plot?
In order to plot 3 scatterplots with different colors for each, you may create 3 axes in matplotlib and plot a scatter to each axes.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(10,5),
columns=["x", "y", "col1", "col2", "col3"])
fig, axes = plt.subplots(nrows=3)
for ax, col in zip(axes, df.columns[2:]):
ax.scatter(df.x, df.y, c=df[col])
plt.show()
For categorical data it is often easier to plot several scatter plots, one per category.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns
xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)
cols = ['pet',"hair"]
fig, axes = plt.subplots(nrows=len(cols ))
for ax,col in zip(axes,cols):
for n, group in df.groupby(col):
ax.scatter(group.x,group.y, label=n)
ax.legend()
plt.show()
You may surely use a FacetGrid, if you really want, but that requires a different data format of the DataFrame.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns
xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)
df2 = pd.melt(df, id_vars=['x','y'], value_name='category', var_name="kind")
fg = sns.FacetGrid(data=df2, row="kind",hue='category', size=3)
fg.map(plt.scatter, 'x', 'y').add_legend()

Set Seaborn PairGrid x-axis with 2 different value ranges

[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!
I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.

Categories

Resources