Set Seaborn PairGrid x-axis with 2 different value ranges - python

[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!

I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.

Related

Avoiding overlapping plots in seaborn bar plot

I have the following code where I am trying to plot a bar plot in seaborn. (This is a sample data and both x and y variables are continuous variables).
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
xvar = [1,2,2,3,4,5,6,8]
yvar = [3,6,-4,4,2,0.5,-1,0.5]
year = [2010,2011,2012,2010,2011,2012,2010,2011]
df = pd.DataFrame()
df['xvar'] = xvar
df['yvar']=yvar
df['year']=year
df
sns.set_style('whitegrid')
fig,ax=plt.subplots()
fig.set_size_inches(10,5)
sns.barplot(data=df,x='xvar',y='yvar',hue='year',lw=0,dodge=False)
It results in the following plot:
Two questions here:
I want to be able to plot the two bars on 2 side by side and not overlapped the way they are now.
For the x-labels, in the original data, I have alot of them. Is there a way I can set xticks to a specific frequency? for instance, in the chart above only I only want to see 1,3 and 6 for x-labels.
Note: If I set dodge = True then the lines become very thin with the original data.
For the first question, get the patches in the bar chart and modify the width of the target patch. It also shifts the position of the x-axis to represent the alignment.
The second question can be done by using slices to set up a list or a manually created list in a specific order.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
xvar = [1,2,2,3,4,5,6,8]
yvar = [3,6,-4,4,2,0.5,-1,0.5]
year = [2010,2011,2012,2010,2011,2012,2010,2011]
df = pd.DataFrame({'xvar':xvar,'yvar':yvar,'year':year})
fig,ax = plt.subplots(figsize=(10,5))
sns.set_style('whitegrid')
g = sns.barplot(data=df, x='xvar', y='yvar', hue='year', lw=0, dodge=False)
for idx,patch in enumerate(ax.patches):
current_width = patch.get_width()
current_pos = patch.get_x()
if idx == 8 or idx == 15:
patch.set_width(current_width/2)
if idx == 15:
patch.set_x(current_pos+(current_width/2))
ax.set_xticklabels([1,'',3,'','',6,''])
plt.show()

For loop charts - changing xtick frequency dynamically for each chart

I'm trying to increase the number of xticks for each chart in the dataframe.
for c in df:
fig = plt.figure(figsize=[10,5]);
ax = df[c].plot(kind='hist', color=(0.2,0.4,0.6,0.6), bins=30);
I've tried:
ax.xticks(np.arange(min(c),max(x)+1,1));
Results in an AttributeError.
Thus are there any methods to increase the number of xticks without specifying the ticks explicitly but rather dynamically so it works for all the charts?
the function doesn't understand the c in min (and I guess it is max(c) too.
it works this way:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(figsize=[10,5])
for c in df:
ax = df[c].plot(kind='hist', color=(0.2,0.4,0.6,0.6), bins=30)
plt.xticks(np.arange(min(df[c]),max(df[c]), step = 1))

Reproduce simple pandas plot

I have a situation with my data. I like the behaviour of .plot() over a data frame. But sometimes it doesn't work, because the frequency of the time index is not an integer.
But reproducing the plot in matplotlib is OK. Just ugly.
The part that bother me the most is the settings of the x axis. The tick frequency and the limits. Is there any easy way that I can reproduce this behaviour in matplotlib?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Create Data
f = lambda x: np.sin(0.1*x) + 0.1*np.random.randn(1,x.shape[0])
x = np.arange(0,217,0.001)
y = f(x)
# Create DataFrame
data = pd.DataFrame(y.transpose(), columns=['dp'], index=None)
data['t'] = pd.date_range('2021-01-01 14:32:09', periods=len(data['dp']),freq='ms')
data.set_index('t', inplace=True)
# Pandas plot()
data.plot()
# Matplotlib plot (ugly x-axis)
plt.plot(data.index,data['dp'])
EDIT: Basically, what I want to achieve is a similar spacing in the xtics labels, and the tight margin adjust of the values. Legends and axis title, I can do them
Pandas output
Matplotlib output
Thanks
You can use some matplotlib date utilities:
Figure.autofmt_xdate() to unrotate and center the date labels
Axis.set_major_locator() to change the interval to 1 min
Axis.set_major_formatter() to reformat as %H:%M
fig, ax = plt.subplots()
ax.plot(data.index, data['dp'])
import matplotlib.dates as mdates
fig.autofmt_xdate(rotation=0, ha='center')
ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=1))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
# uncomment to remove the first `xtick`
# ax.set_xticks(ax.get_xticks()[1:])

How can I plot slice of certain DataFrame for each row with different color?

I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)

Plotting data with categorical x and y axes in python

I have a list of case and control samples along with the information about what characteristics are present or absent in each of them. A dataframe including the information can be generated by Pandas:
import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
I need to visualize this data as a dotplot/scatterplot in the way that both of the x and y axis to be categorical and presence/absence to be coded by different shapes. Something like following:
Patient| x x -
Control| - x -
__________________
GeneA GeneB GeneC
I am new to Matplotlib/seaborn and I can plot simple line plots and scatter plots. But searching online I could not find any instructions or plot similar to what I need here.
A quick way would be:
import pandas as pd
import matplotlib.pyplot as plt
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present'])
Thanks to #DEEPAK SURANA for adding labels to the colorbar.
I searched the pyplot documentation and could not find a scatter or dot plot exactly like you described. Here is my take on creating a plot that illustrates what you want. The True records are blue and the False records are red.
# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
'Control':[False,True,False]}
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)
# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
df_gene = df[[gene, 'level']]
cList = ['blue' if x == True else 'red' for x in df[gene]]
for inr_idx, lv in enumerate(df['level']):
ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()
Something like this might work
import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)
look at https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html
and https://matplotlib.org/api/ticker_api.html
I think you have to convert the boolean values to zeros and ones to make it work. Someting like df.astype(int)

Categories

Resources