I want to automatize an imshow degrading figure with python3. I would like to give a data frame and this to be plot no matter how many columns are given.
I tried this:
vmin = 3.5
vmax = 6
fig, axes = plt.subplots(len(list(df.columns)),1)
for i,j in zip(list(df.columns),range(1,len(list(df.columns))+1)):
df = df.sort_values([i], ascending = False)
y = df[i].tolist()
gradient = [y,y]
plt.imshow(gradient, aspect='auto', cmap=plt.get_cmap('hot_r'), vmin=vmin, vmax=vmax)
axes = plt.subplot(len(list(df.columns)),1,j)
sm = plt.cm.ScalarMappable(cmap=plt.get_cmap('hot_r'),norm=plt.Normalize(vmin,vmax))
sm._A = []
My problem is that the first set of data (first column of the df) is never showed. Also the map is not where I want it to be. This is exactly what I get:
But this is what I want:
You shouldn't use plt.subplot if you already have created your subplots via plt.subplots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
f = lambda x, s: x*np.exp(-x**2/s)/2
df = pd.DataFrame({"A" : f(np.linspace(0,50,600),70)+3.5,
"B" : f(np.linspace(0,50,600),110)+3.5,
"C" : f(np.linspace(0,50,600),150)+3.5,})
vmin = 3.5
vmax = 6
fig, axes = plt.subplots(len(list(df.columns)),1)
for col, ax in zip(df.columns,axes.flat):
df = df.sort_values([col], ascending = False)
y = df[col].values
gradient = [y,y]
im = ax.imshow(gradient, aspect='auto',
cmap=plt.get_cmap('hot_r'), vmin=vmin, vmax=vmax)
# Since all images have the same vmin/vmax, we can take any of them for the colorbar
fig.colorbar(im, ax=axes)
I made some modification to plots for df2 columns code block because i think that is where i have to modify but i could not yield the output.
a sample of the plot i want is this
this was how i modified it:
f, axes = plt.subplots(nrows=len(signals.columns)+1, sharex=True, )
i = 0
for col in df2.columns:
fig, axs = plt.subplots()
sns.regplot(x='', y='', data=df2, ax=axs[0])
df2[col].plot(ax=axes[i], color='grey')
I have seen that its wrong.
I tried this out, it seems like a head way :)
How do I make modification on this to get what i want:
f, axes = plt.subplots(nrows=len(signals.columns)+1, sharex=True, )
# plots for df2 columns
i = 0
for col in df2.columns:
df2[col].plot(ax=axes[i], color='grey')
axes[i].set_ylim(0, 1)
You have several options to make this graph. df1 and df2 are as defined in your previous question
The version with matplotlib.pyplot.scatter is faster to draw, but less faithful to the example. The version with seaborn.rugplot looks identical to the example, but takes longer to draw. I highlighted the important part of the code between comment lines ########
using matplotlib.pyplot.scatter
import seaborn as sns
import numpy as np
f, axes = plt.subplots(nrows=len(df2.columns)+1, sharex=True,
gridspec_kw={'height_ratios':np.append(np.repeat(1, len(df2.columns)), 3)})
####### variable part below #######
# plots for df2 columns
i = 0
for col in df2.columns:
axes[i].scatter(x=df2.index, y=np.repeat(0, len(df2)), c=df2[col], marker='|', cmap='Greys')
axes[i].set_ylim(-0.5, 0.5)
## code to plot annotations
axes[-1].set_xlabel('Genomic position')
axes[-1].set_ylim(-0.5, 1.5)
axes[-1].set_yticks([0, 1])
axes[-1].set_yticklabels(['−', '+'])
for _, r in df1.iterrows():
marker = '|'
if r['type'] == 'exon':
y = 1 if r['strand'] == '+' else 0
axes[-1].plot((r['start'], r['stop']), (y, y),
marker=marker, lw=lw,
# remove space between plots
axes[-1].set_xlim(0, len(df2))
f.set_size_inches(6, 2)
using seaborn.rugplot
import seaborn as sns
import numpy as np
f, axes = plt.subplots(nrows=len(df2.columns)+1, sharex=True,
gridspec_kw={'height_ratios':np.append(np.repeat(1, len(df2.columns)), 3)})
####### variable part below #######
import matplotlib
import matplotlib.cm as cm
norm = matplotlib.colors.Normalize(vmin=0, vmax=1, clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.Greys)
# plots for df2 columns
i = 0
for col in df2.columns:
sns.rugplot(x=df2.index, color=list(map(mapper.to_rgba, df2[col])), height=1, ax=axes[i])
## code to plot annotations
axes[-1].set_xlabel('Genomic position')
axes[-1].set_ylim(-0.5, 1.5)
axes[-1].set_yticks([0, 1])
axes[-1].set_yticklabels(['−', '+'])
for _, r in df1.iterrows():
marker = '|'
if r['type'] == 'exon':
y = 1 if r['strand'] == '+' else 0
axes[-1].plot((r['start'], r['stop']), (y, y),
marker=marker, lw=lw,
# remove space between plots
axes[-1].set_xlim(0, len(df2))
f.set_size_inches(6, 2)
i created a dataframe with random columns and values. now i am trying to interate with an loop over "time" window" (maybe there is a more elegant solution than mine). i try to plot the calculated correlations in a heatmap and then interate furhter and show the next result in the same figure. Like this
The current code plot a new figure for each correlation...
Thanks for ideas and help!
Creates Dataframe
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import time
import seaborn as sns
index = pd.date_range('01/01/2010',periods=num_days, freq='D')
data_KW = pd.DataFrame(np.random.randint(0,250,size=(250, 10)), columns=list('ABCDEFGHIJ'), index=index)
interate and plot (wrong :))
# Calculate the lenght of the Dataframe
end = 10 #len(data_KW.index)
# is the variable for the rolling window
var_start = 0
var_end = 5
#Set up the matplotlib figure
f, ax = plt.subplots(figsize=(5, 5))
while var_end <= end:
window = data_KW.iloc[var_start : var_end]
# Compute the correlation matrix
corr = window.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=1, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
var_start = var_start + 1
var_end = var_end + 1
I would like to add cross (X) on heatmap cells (depending on significance level, but the question is on adding the X).
Like in R-language (sig.level = XXX).
See the Python and R code used and the corresponding output images.
Thank you for your help.
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, center=0, vmin=-1, vmax=1, square=True, linewidths=0.5, fmt=".2f",
cbar_kws={"shrink": .65, "orientation": "horizontal", "ticks":np.arange(-1, 1+1, 0.2)},
annot = True, annot_kws={"weight": 'bold', "size":15})
corrplot(cor(subset (wqw, select =
# compute the p matrix
p.mat = cor.mtest(subset
(wqw, select = c(fixed.acidity:quality,ratio.sulfur.dioxide))),
# significance level 0.01
sig.level = 0.01,
# Method to display : color (could be corcle, ...)
method = "color",
# color palette
col = colorRampPalette(c("#BB4444", "#EE9988",
"#FFFFFF", "#77AADD", "#4477AA"))(200),
The easy solution is to add a scatter plot with an X-shaped marker to cross out the unwanted cells.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
data = np.random.rand(10,10)
mask = np.zeros_like(data)
mask[np.triu_indices_from(mask)] = True
data_masked = np.ma.array(data, mask=mask)
fig, ax = plt.subplots()
im = ax.imshow(data_masked, cmap="YlGnBu", origin="upper")
ax.scatter(*np.argwhere(data_masked.T < 0.4).T, marker="x", color="black", s=100)
The drawback of this is that the markersize (s) is independent of the number of cells and needs to be adjusted for different figure sizes.
An alternative is hence to draw some lines (an X are two crossed lines) at the respective positions. Here we create a function crossout(points, ax=None, scale=1, **kwargs), where scale is the percentage the lines shall take from each cell.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def crossout(points, ax=None, scale=1, **kwargs):
ax = ax or plt.gca()
l = np.array([[[1,1],[-1,-1]]])*scale/2.
r = np.array([[[-1,1],[1,-1]]])*scale/2.
p = np.atleast_3d(points).transpose(0,2,1)
c = LineCollection(np.concatenate((l+p,r+p), axis=0), **kwargs)
return c
data = np.random.rand(10,10)
mask = np.zeros_like(data)
mask[np.triu_indices_from(mask)] = True
data_masked = np.ma.array(data, mask=mask)
fig, ax = plt.subplots()
im = ax.imshow(data_masked, cmap="YlGnBu", origin="upper")
crossout(np.argwhere(data_masked.T < 0.4), ax=ax, scale=0.8, color="black")
For scale=0.8 this looks like
Note that for a pcolormesh plot or a seaborn heatmap (which uses pcolormesh internally), one would need to add 0.5 to the data, i.e.
np.argwhere(data_masked.T < 0.4)+0.5
we are building our reports on matplotlib. Each page has multiple charts and some text.
In the report data there is over 100 locations, each location has a density. The idea is to plot the points on a map where the color (shade of red) represents the density of the location.
However, I do not understand the connection between the kwargs : c and cmap in the ax.scatter call, nor do I understand the role of color.Normalize in this application.
import pandas as pd
import matplotlib
import numpy as np
from pandas import Series, DataFrame
import csv
from scipy import stats
import matplotlib.pyplot as plt
import random
import matplotlib.colors as colors
# Get the data and transform
data = pd.read_csv('logHistThis.csv')
data.drop('Unnamed: 0', axis=1, inplace=True)
dataMean = data['Density'].mean()
data = list(data['Density'])
# I was under the impresion that the data for the colormap
# had to be between 1 and 0 so did this:
aColorScale = []
def myColorScale(theData):
aColorScale = []
for x in theData:
this = x/100
return aColorScale
aColorScale = myColorScale(data)
estimated_mu, estimated_sigma = stats.norm.fit(data)
xmin = min(data)
xmax = max(data)
x = np.linspace(xmin, xmax, 100)
pdf = stats.norm.pdf(x, loc=estimated_mu, scale=estimated_sigma)
thisRangeMin = np.log(27)
thisRangeMax = np.log(35)
q = [np.random.choice(data, 40)]
z = [ np.random.randint(1, 50, size=40)]
s = 100 *q
colormap = 'Reds'
normalize =matplotlib.colors.Normalize(vmin=xmin, vmax=xmax)
#plt.scatter(x,y,z,s=5, cmap=colormap, norm=normalize, marker='*')
fig = plt.figure(figsize=(10, 5), frameon=False, edgecolor='000000', linewidth = 1)
rect0 = .05, .05, .4, .9
rect1 = .5, .05, .4, .9
# This works great
ax1 = fig.add_axes(rect0)#<-----------x2TopTenSummary
ax1.hist(data, bins=13, normed=True, color='c', alpha=0.05)
#ax1.fill_between(x, pdf, where=(), alpha=.2)
ax1.fill_between(x, pdf, where=((x < thisRangeMax) & ( x > thisRangeMin)), alpha=.2, label='City Range')
ax1.vlines(dataMean, 0, stats.norm.pdf(dataMean, loc=estimated_mu, scale=estimated_sigma), color='r')
ax1.plot(x, pdf, 'k')
# This does not work :
# It just gives blue dots
ax2= fig.add_axes(rect1)
ax2= fig.add_axes(rect1)
ax2.scatter(q,z, s=200, cmap= 'Reds',norm=matplotlib.colors.Normalize(vmin=min(aColorScale) , vmax=max(aColorScale)))
# Tried to set the color map in a variety of ways:
# When kwarg 'c' is set to the variable 'aColorScale' i get the error
So my question is how do we incorporate the colormap in an application of this sort?
Multiple axes on a figure with a predetermined size (A4 or letter).
The color determination is a third variable z, (not x or y)
The color determinant is a float where 0 < z < 8
the call is ax not plt
The description of the application in the docs is unclear to me:
the doc for axes.scatter
the doc for color.normalize
I have seen plenty of examples where there is only one ax in the figure and the call is to plt.scatter... for example here
In our case x, y will be longitude, lattitude and the variable is 'data' a list or array of floats between 0 and 8.
Okay the answer came from the PyCon Israel 2017 in this document by Tamir Lousky.
The normalization of the data and the correlation with color map happens with this block of code right here:
aColorScale = data
aColorScale = np.array(aColorScale)
norm = (aColorScale - aColorScale.min())/(aColorScale.max() - aColorScale.min())
cmap= plt.get_cmap('Reds')
colors = [cmap(tl) for tl in norm]#<---- thisRightHere
Then colors gets fed into ax2:
ax2= fig.add_axes(rect1)
ax2.scatter(q,z, s=200, color = colors)
I wish those who downvoted my question would say why, there was hours of searching and trying to find this.
Anyway here is the final image:
While I do have problems understanding the issue itself, I can tell you that the solution you have in your answer can be simplified to the usual way to plot scatters:
ax2= fig.add_axes(rect1)
ax2.scatter(q,z, c=aColorScale, s=200, cmap='Reds')
I would like to show a pseudocolor image (such as produced by pcolor, pcolormesh or imshow) overlayed with contourlines. It appears that those three plot functions can be one data point off. Here's an example:
import numpy
from matplotlib import pyplot,cm
f = pyplot.figure(figsize=(3,2))
ax = f.add_subplot(111)
data = numpy.ones((10,10))
data[5,5] = 2.0
data[0,:] = data[-1,:] = 0
data[:,0] = data[:,-1] = 0
This produces (with the TkAgg backend GUI):
Substituting the imshow() method with
In both cases, the contour lines don't match the underlying image.
import numpy
from matplotlib import pyplot,cm
f = pyplot.figure()
ax = f.add_subplot(111)
data = numpy.ones((10,10))
data[5,5] = 2.0
data[0,:] = data[-1,:] = 0
data[:,0] = data[:,-1] = 0
ax.imshow(data, interpolation='nearest')