Matplotlib - Plot content vanishes using plt.yscale('log') [duplicate] - python

I am currently using logscale in order to have greater possibilities of plotting my data. Nevertheless, my data consists also of zero values. I know that these zero values will not work on logscale as log(0) is not defined.
So e.g.,
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot([0,1,2],[10,10,100],marker='o',linestyle='-')
ax.set_yscale('log')
ax.set_xscale('log')
completely omits the zero value. Is this behavior acceptable? At least there should be some kind of warning. I only recognized it by accident. Is there maybe also a way of plotting zero value data in logscale?
Thanks!
P.S.: I hope this fits to stackoverflow. I did not find a mailing list of matplotlib.

It's easiest to use a "symlog" plot for this purpose. The interval near 0 will be on a linear scale, so 0 can be displayed.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([0,1,2],[10,10,100],marker='o',linestyle='-')
ax.set_yscale('symlog')
ax.set_xscale('symlog')
plt.show()
Symlog sets a small interval near zero (both above and below) to use a linear scale. This allows things to cross 0 without causing log(x) to explode (or go to -inf, rather).
There's a nice visual comparison as an SO answer here: https://stackoverflow.com/a/3513150/325565

Related

Seaborn: Violinplot experiences difficulty with too many variables?

I wanted to use seaborn to visualize my entire Pandas dataframe with violinplots, and I thought I had made the necessary corrections to generate a large graph for the sizable number of 270 variables my dataframe possessed.
However, no matter what I do, the violinplots only display their inner mini-boxplots (as another question here describes) for each variable, and not their kde's:
fig, ax = plt.subplots(figsize=(50,5))
ax.set_ylim(-6, 6)
a = sns.violinplot(x='variable', y='value', data=pd.melt(train_norm), ax=ax)
a.set_xticklabels(a.get_xticklabels(), rotation=90);
plt.savefig('massive_violinplot.png', figsize=(50,5), dpi=220)
(apologies for the cropped graph, the whole thing is too big to post)
Whereas the following code, using the same pd.Dataframe, but only showing the first six variables, displays correctly:
fig, ax = plt.subplots(figsize=(10,5))
ax.set_ylim(-6, 6)
a = sns.violinplot(x='variable', y='value', data=pd.melt(train_norm.iloc[:,:6]), ax=ax)
a.set_xticklabels(a.get_xticklabels(), rotation=90);
plt.savefig('massive_violinplot.png', figsize=(10,5), dpi=220)
How could I get a graph like the above for all the variables, filled with proper violinplots showing their kde's?
This is not related to the number of variables or the plot size but to the huge differences in the distributions of the variables. I can't access your data right now so I will ilustrate it with a made up dataset. You can follow along with your dataset, selecting the three variables with more dispersion and the three with less dispersion. As a dispersion measurement you can use the variance or even the data range (if you don't have crazy long tails) or something different, I am not sure what would work better.
rs = np.random.RandomState(42)
data = rs.randn(100, 6)
data[:, :3] *= 20
df = pd.DataFrame(data)
See what happens if we plot the density with common axes so they are directly comparable.
df.plot(kind='kde', subplots=True, layout=(3, 2), sharex=True, sharey=True)
plt.tight_layout()
This is more or less the same you can see in the seaborn violin plot but of course transposed.
sns.violinplot(x='variable', y='value', data=pd.melt(df))
This is usually great for comparing the variables because you can look at the differences in width as differences in density. Unfortunately the violin for the variables with more dispersion are so narrow that you can't see the width at all and you lose any sense of the shape. On the other hand the variables with less dispersion appear too short (actually in your dataset some of them are just horizontal lines).
For the first problem you can make the violins use all the available horizontal space by using scale='width' but then you no longer can compare the density across variables. The width is the same at the peaks but the density is not.
sns.violinplot(x='variable', y='value', data=pd.melt(df), scale='width')
By the way, this is what matplotlib's violin plot does by default.
plt.violinplot(df.T)
For the second problem I think your only option is to normalize or standardize the variables in some way.
sns.violinplot(x='variable', y='value', data=pd.melt((df - df.mean()) / df.std()))
Now you have a clearer view of each variable separately (how many modes they have, how skewed they are, how long the tails are...) but you can compare neither the scale nor the dispersion across variables.
The moral of the story is that you can't see everything at once, you have to pick and choose depending on what you are looking for in the data.

matplotlib: manually change yaxis values to differ from the actual value (NOT: change ticks!) [duplicate]

I am trying to plot a data and function with matplotlib 2.0 under python 2.7.
The x values of the function are evolving with time and the x is first decreasing to a certain value, than increasing again.
If the function is plotted against time, it shows function like this plot of data against time
I need the same x axis evolution for plotting against real x values. Unfortunately as the x values are the same for both parts before and after, both values are mixed together. This gives me the wrong data plot:
In this example it means I need the x-axis to start on value 2.4 and decrease to 1.0 than again increase to 2.4. I swear I found before that this is possible, but unfortunately I can't find a trace about that again.
A matplotlib axis is by default linearly increasing. More importantly, there must be an injective mapping of the number line to the axis units. So changing the data range is not really an option (at least when the aim is to keep things simple).
It would hence be good to keep the original numbers and only change the ticks and ticklabels on the axis. E.g. you could use a FuncFormatter to map the original numbers to
np.abs(x-tp)+tp
where tp would be the turning point.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
x = np.linspace(-10,20,151)
y = np.exp(-(x-5)**2/19.)
plt.plot(x,y)
tp = 5
fmt = lambda x,pos:"{:g}".format(np.abs(x-tp)+tp)
plt.gca().xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(fmt))
plt.show()
One option would be to use two axes, and plot your two timespans separately on each axes.
for instance, if you have the following data:
myX = np.linspace(1,2.4,100)
myY1 = -1*myX
myY2 = -0.5*myX-0.5
plt.plot(myX,myY, c='b')
plt.plot(myX,myY2, c='g')
you can instead create two subplots with a shared y-axis and no space between the two axes, plot each time span independently, and finally, adjust the limits of one of your x-axis to reverse the order of the points
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'wspace':0}, sharey=True)
ax1.plot(myX,myY1, c='b')
ax2.plot(myX,myY2, c='g')
ax1.set_xlim((2.4,1))
ax2.set_xlim((1,2.4))

Scale colormap for contour and contourf

I'm trying to plot the contour map of a given function f(x,y), but since the functions output scales really fast, I'm losing a lot of information for lower values of x and y. I found on the forums to work that out using vmax=vmax, it actually worked, but only when plotted for a specific limit of x and y and levels of the colormap.
Say I have this plot:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
u = np.linspace(-2,2,1000)
x,y = np.meshgrid(u,u)
z = (1-x)**2+100*(y-x**2)**2
cont = plt.contour(x,y,z,500,colors='black',linewidths=.3)
cont = plt.contourf(x,y,z,500,cmap="jet",vmax=100)
plt.colorbar(cont)
plt.show
I want to uncover whats beyond the axis limits keeping the same scale, but if I change de x and y limits to -3 and 3 I get:
See how I lost most of my levels since my max value for the function at these limits are much higher. A work around to this problem is to increase the levels to 1000, but that takes a lot of computational time.
Is there a way to plot only the contour levels that I need? That is, between 0 and 100.
An example of a desired output would be:
With the white space being the continuation of the plot without resizing the levels.
The code I'm using is the one given after the first image.
There are a few possible ideas here. The one I very much prefer is a logarithmic representation of the data. An example would be
from matplotlib import ticker
fig = plt.figure(1)
cont1 = plt.contourf(x,y,z,cmap="jet",locator=ticker.LogLocator(numticks=10))
plt.colorbar(cont1)
plt.show()
fig = plt.figure(2)
cont2 = plt.contourf(x,y,np.log10(z),100,cmap="jet")
plt.colorbar(cont2)
plt.show()
The first example uses matplotlibs LogLocator functions. The second one just directly computes the logarithm of the data and plots that normally.
The third example just caps all data above 100.
fig = plt.figure(3)
zcapped = z.copy()
zcapped[zcapped>100]=100
cont3 = plt.contourf(x,y,zcapped,100,cmap="jet")
cbar = plt.colorbar(cont3)
plt.show()

Matplotlib - Boxplot calculated on log10 values but shown in logarithmic scale

I think this is a simple question, but I just still can't seem to think of a simple solution. I have a set of data of molecular abundances, with values ranging many orders of magnitude. I want to represent these abundances with boxplots (box-and-whiskers plots), and I want the boxes to be calculated on log scale because of the wide range of values.
I know I can just calculate the log10 of the data and send it to matplotlib's boxplot, but this does not retain the logarithmic scale in plots later.
So my question is basically this:
When I have calculated a boxplot based on the log10 of my values, how do I convert the plot afterward to be shown on a logarithmic scale instead of linear with the log10 values?
I can change tick labels to partly fix this, but I have no clue how I get logarithmic scales back to the plot.
Or is there another more direct way to plotting this. A different package maybe that has this options already included?
Many thanks for the help.
I'd advice against doing the boxplot on the raw values and setting the y-axis to logarithmic, because the boxplot function is not designed to work across orders of magnitudes and you may get too many outliers (depends on your data, of course).
Instead, you can plot the logarithm of the data and manually adjust the y-labels.
Here is a very crude example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
np.random.seed(42)
values = 10 ** np.random.uniform(-3, 3, size=100)
fig = plt.figure(figsize=(9, 3))
ax = plt.subplot(1, 3, 1)
ax.boxplot(np.log10(values))
ax.set_yticks(np.arange(-3, 4))
ax.set_yticklabels(10.0**np.arange(-3, 4))
ax.set_title('log')
ax = plt.subplot(1, 3, 2)
ax.boxplot(values)
ax.set_yscale('log')
ax.set_title('raw')
ax = plt.subplot(1, 3, 3)
ax.boxplot(values, whis=[5, 95])
ax.set_yscale('log')
ax.set_title('5%')
plt.show()
The right figure shows the box plot on the raw values. This leads to many outliers, because the maximum whisker length is computed as a multiple (default: 1.5) of the interquartile range (the box height), which does not scale across orders of magnitude.
Alternatively, you could specify to draw the whiskers for a given percentile range:
ax.boxplot(values, whis=[5, 95])
In this case you get a fixed amount of outlires (5%) above and below.
You can use plt.yscale:
plt.boxplot(data); plt.yscale('log')

Symmetrical Log color scale in matplotlib contourf plot

How do I create a contour plot with a symlog (symmetrical log) scale for the contours. i.e. a log scale that shows both negative and positive values.
One possibility would be to work off of this example:
http://matplotlib.org/examples/pylab_examples/contourf_log.html
Which gives this recipe for a log scale:
from matplotlib import pyplot, ticker
cs = pyplot.contourf(X, Y, z, locator=ticker.LogLocator())
However, this doesn't allow for negative values. There is a ticker.SymmetricalLogLocator(), which may be the solution, but it doesn't seem to have much documentation.
EDIT:
To clarify (since requesting negative values on a log scale may sound nonsensical), what I want is the same as the "symlog" scale provided on matplotlib axes. The plot below, (taken from another stack exchange post), shows symlog on the x-axis. It is a "log" scale, but handles negative values in a way that is clear to the viewer.
I want the same sort of scaling, but for the colorscale on contour or contourf.
I stumbled across this thread trying to do the same thing, i.e plotting a large range of values in both the positive and negative direction. In addition I wanted to have a granularity as fine as in imshow.
It turns out you can have that using "ticker.MaxNLocator(nbins)" where nbins can be set high to have a fine granularity, e.g. set nbins to 100.
I also wanted to have a nice Latex style ticker formatting, for which I found a solution on StackOverflow a while ago.
I will just post this code snippet here from one of the classes it is part of so that anyone who might want can get the basic idea about how it's working. I use this solution to generate multiple plots as shown in the image below.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
# function for nice Latex style tick formatting
# copied from
# http://stackoverflow.com/questions/25983218/
# scientific-notation-colorbar-in-matplotlib
# output formating for colorbar in 2D plots
def fmt(x, pos):
a, b = '{:.2e}'.format(x).split('e')
b = int(b)
return r'${} \times 10^{{{}}}$'.format(a, b)
# A confourf function I use inside one of my classes
# mainly interesting are the "plot" and "cbar" lines
def Make2DSubPlot(self, posIdent, timeIdx,typeIdx):
plt.subplot(posIdent)
y = self.radPos
x = self.axPos
z = self.fieldList[timeIdx][typeIdx]
plot = plt.contourf(x, y, z, locator=ticker.MaxNLocator(100), \
aspect='auto',origin='lower')
cbar = plt.colorbar(plot, orientation='vertical', \
format=ticker.FuncFormatter(fmt))
cbar.ax.set_ylabel(self.labelList[typeIdx])
plt.xlabel(self.labelList[self.iax])
plt.ylabel(self.labelList[self.iax])

Categories

Resources