Errorbars with logscale in matplotlib - python

Let us assume I have data x with an error Sx which I want to plot with the method errorbar. Now I am wondering what happens if I rescale to logarithmic scale, does it do the error correctly? The error propagation should go
f(x) = log(x)
=> Sf = |Sx / x|
I could imagine that matplotlib just does
Sf = log Sx
which would be totally wrong. So, what is matplotlib actually doing?

Indeed, it puzzles me as well. Imagine that I have a file contains a set of data:
xi, yi(xi), sigma(yi) ; i=1,2,....,N
where sigma(yi) is the one standard error of yi(xi). Now, suppose I plot this data using matplotlib, where both x-scale and y-scale are linear. Certainly, the marks on the y axis will be one at yi(xi) - sigma(yi) and another at yi(xi) + sigma(yi). The difference of them is sigma(yi).
The question is, if I set
ax.set_yscale("log")
then, will I see the marks on the log10(y) axis being one at log10( yi(xi)-sigma(yi) ) and another at log10( yi(xi)+sigma(yi) ) ?
However, the above error is not true, since the error of log10(yi(xi)) is certainly not simply as log10( sigma(yi)), instead, error propagation has to be made, via
sigma( log10(yi) )= log10(e) * | sigma(yi)/yi |
So, does anyone know, will error propagation be done while plotting the data with yerrorbars in log y scale?

The way errorbar works is (more-or-less) at each point where you want an errobar drawn it puts a mark at y + err_p and y - err_n in data coordinates. The log scale is applied as part of the transformation from data space -> screen space.
This is rather unambiguously the right thing for a plotting library to do. What you seem to want is propagate error through some computations which requires knowing what the computations are (so you can get all the partials) and is not the business mpl is in. Maybe take a look at sympy.

Related

Contour line error with plt.contour in python 3

I am plotting a contour plot in python 3 with matplotlib, and I am getting a strange result. At first, I was using plt.contourf, and notices there was a strange north-south linear artifact in the data that I knew shouldn't be there (I used simulated data). So I changed plt.contourf to plt.contour, and the problem seems to be that some of the edge contours are deformed for some reason (see picture).
Unfortunately, it is hard for me to past a simple version of my code because this is part of a large GUI based app. Here is what I am doing though.
#grid the x,y,z data so it can be used in the contouring
self.beta_zi =
#This is matplot griddata, not the scipy.interpolate.griddata
griddata(self.output_df['x'].values,self.output_df['y'].values,
self.output_df['Beta'].values,
self.cont_grid_x,
self.cont_grid_y,
interp='linear')
#call to the contour itself
self.beta_contour=self.beta_cont_ax.contour(self.cont_grid_x,self.cont_grid_y,
self.beta_zi,
levels=np.linspace(start=0,stop=1, num=11, endpoint=True),
cmap=cm.get_cmap(self.user_beta_cmap.get()))
This seems like a simple problem based on the edges. Has anyone seen this before that can help. I am use a TK backend, which works better with the tkinter based GUI I wrote.
UPDATE: I also tried changing to scipy.interpolate.griddata because matplot's griddata is deprecated, but the problem is the same and persists, so it must be with the actual contour plotting function.
I found that the problem had to do with how I was interpreting the inputs of contour and grid data.
plt.contour and matplot.griddata takes
x = x location of sample data
y = y location of sample data
z = height or z value of sample data
xi = locations of x tick marks on grid
zi = locations of y ticks marks on grid
Typically xi and yi are all the locatoins of each grid node, which is what I was supplying, but in this case you only need the unqiue tick marks on each axis.
Thanks to this post I figured it out.
Matplotlib contour from xyz data: griddata invalid index

Python fastKDE beyond limits of data points

I'm trying to use the fastKDE package (https://pypi.python.org/pypi/fastkde/1.0.8) to find the KDE of a point in a 2D plot. However, I want to know the KDE beyond the limits of the data points, and cannot figure out how to do this.
Using the code listed on the site linked above;
#!python
import numpy as np
from fastkde import fastKDE
import pylab as PP
#Generate two random variables dataset (representing 100000 pairs of datapoints)
N = 2e5
var1 = 50*np.random.normal(size=N) + 0.1
var2 = 0.01*np.random.normal(size=N) - 300
#Do the self-consistent density estimate
myPDF,axes = fastKDE.pdf(var1,var2)
#Extract the axes from the axis list
v1,v2 = axes
#Plot contours of the PDF should be a set of concentric ellipsoids centered on
#(0.1, -300) Comparitively, the y axis range should be tiny and the x axis range
#should be large
PP.contour(v1,v2,myPDF)
PP.show()
I'm able to find the KDE for any point within the limits of the data, but how do I find the KDE for say the point (0,300), without having to include it into var1 and var2. I don't want the KDE to be calculated with this data point, I want to know the KDE at that point.
I guess what I really want to be able to do is give the fastKDE a histogram of the data, so that I can set its axes myself. I just don't know if this is possible?
Cheers
I, too, have been experimenting with this code and have run into the same issues. What I've done (in lieu of a good N-D extrapolator) is to build a KDTree (with scipy.spatial) from the grid points that fastKDE returns and find the nearest grid point to the point I was to evaluate. I then lookup the corresponding pdf value at that point (it should be small near the edge of the pdf grid if not identically zero) and assign that value accordingly.
I came across this post while searching for a solution of this problem. Similiar to the building of a KDTree you could just calculate your stepsize in every griddimension, and then get the index of your query point by just subtracting the point value with the beginning of your axis and divide by the stepsize of that dimension, finally round it off, turn it to integer and voila. So for example in 1D:
def fastkde_test(test_x):
kde, axes = fastKDE.pdf(test_x, numPoints=num_p)
x_step = (max(axes)-min(axes)) / len(axes)
x_ind = np.int32(np.round((test_x-min(axes)) / x_step))
return kde[x_ind]
where test_x in this case is both the set for defining the KDE and the query set. Doing it this way is marginally faster by a factor of 10 in my case (at least in 1D, higher dimensions not yet tested) and does basically the same thing as the KDTree query.
I hope this helps anyone coming across this problem in the future, as I just did.
Edit: if your querying points outside of the range over which the KDE was calculated this method of course can only give you the same result as the KDTree query, namely the corresponding border of your KDE-grid. You would however have to hardcode this by cutting the resulting x_ind at the highest index, i.e. `len(axes)-1'.

How do I limit the interpolation region in the InterpolatedUnivariateSpline in Python when given non-uniform samples?

I'm trying to get a nice upsampler using Python when I have non-uniform spaced inputs. Any suggestions would be helpful. I've tried a number of interp functions. Here's an example:
from scipy.interpolate import InterpolatedUnivariateSpline
from numpy import linspace, arange, append
from matplotlib.pyplot import plot
F=[0, 1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,22050]
M=[0.,2.85,2.49,1.65,1.55,1.81,1.35,1.00,1.13,1.58,1.21,0.]
ff=linspace(F[0],F[1],10)
for i in arange(2, len(F)):
ff=append(ff,linspace(F[i-1],F[i], 10))
aa=InterpolatedUnivariateSpline(x=F,y=M,k=2);
mm=aa(ff)
plot(F,M,'r-o'); plot(ff,mm,'bo'); show()
This is the plot I get:
I need to get interpolated values that don't go below 0. Note that the blue dots go below zero. The red line represents the original F vs. M data. If I use k=1 (piece-wise linear interp) then I get good values as shown here:
aa=InterpolatedUnivariateSpline(x=F,y=M,k=1)
mm=aa(ff); plot(F,M,'r-o');plot(ff,mm,'bo'); show()
The problem is that I need to have a "smooth" interpolation and not the piece-wise value. Does anyone know if the bbox argument in InterpolatedUnivarientSpline helps to fix that? I cant find any documentation on what bbox does. Is there another easier way to accomplish this?
Thanks in advance for any help.
Positivity-preserving interpolation is hard (if it wasn't, there wouldn't be a bunch of papers written about it). The splines of low degree (2, 3) usually do pretty well in this regard, but your data has that large gap in it, and it happens to be at the end of data range, making things worse.
One solution is to do interpolation in two steps: first upsample the data by piecewise linear interpolation, then interpolate new data with a smooth spline (I'll use cubic spline below, though quadratic also works).
The gap_size array records how large each gap is, relative to the smallest one. In subsequent loop, uniformly spaced points are replaced in large gaps (those that are at least twice the size of smallest one). The result is F_new, a nearly-uniform better grid that still includes the original points. The corresponding M values for it are generated by a piecewise linear spline.
Subsequent cubic interpolation produces a smooth curve that stays positive.
F = [0, 1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,22050]
M = [0.,2.85,2.49,1.65,1.55,1.81,1.35,1.00,1.13,1.58,1.21,0.]
gap_size = np.diff(F) // np.diff(F).min()
F_new = []
for i in range(len(F)-1):
F_new.extend(np.linspace(F[i], F[i+1], gap_size[i], endpoint=False))
F_new.append(F[-1])
pl_spline = InterpolatedUnivariateSpline(F, M, k=1);
M_new = pl_spline(F_new)
smooth_spline = InterpolatedUnivariateSpline(F_new, M_new, k=3)
ff = np.linspace(F[0], F[-1], 100)
plt.plot(F, M, 'ro')
plt.plot(ff, smooth_spline(ff), 'b')
plt.show()
Of course, no tricks can hide the truth that we don't know what happens between 5500 and 22050 (Hz, I presume), the nearly-linear part is just a placeholder.

Heatmap with varying y axis

I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.
I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives:

Matlab, Python: Fixing colormap to specified values

It is a simple but common task required when trying to fix a colormap according to a 2D matrix of values.
To demonstrate consider the problem in Matlab, the solution does not need to be in Matlab (i.e., the code presented here is only for demonstration purpose).
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
So the output is as:
when some values change to over the maximum value it happens like:
x = [0,1,2; 3,4,5; 6,7,18];
which looks logical but makes problems when we wish to compare/trace elements in two maps. Since the colormap association is changed it is almost impossible to find an individual cell for comparison/trace etc.
The solution I implemented is to mask the matrix as:
x = [0,1,2; 3,4,5; 6,7,18];
m = 8;
x(x>=m) = m;
which works perfectly.
Since the provided code requires searching/filtering (extra time consuming!) I wonder if there is a general/more efficient way for this job to be implemented in Matlab, Python etc?
One of the cases that this issue occurs is when we have many simulations sequentially and wish to make a sense-making animation of the progress; in this case each color should keep its association fixed.
In Python using package MatPlotLib the solution is as follows:
import pylab as pl
x = [[0,1,2],[3,4,5],[6,7,18]]
pl.matshow(x, vmin=0, vmax=8)
pl.axis('image')
pl.axis('off')
show()
So vmin and vmax are boundary limits for the full range of colormap.
The indexing is pretty quick so I don't think you need worry.
However, in Matlab, you can pass in the clims argument to imagesc:
imagesc(x,[0 8]);
This maps all values above 8 to the top colour in the colour scale, and all values below 0 to the bottom colour in the colour scale, and then stretches the scale for colours in-between.
imagesc documentation.
f1 = figure;
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
limits = get(gca(f1),'CLim');
f2 = figure;
z = [0,1,2; 3,4,5; 6,7,18];
imagesc(z)
axis square
axis off
caxis(limits)

Categories

Resources