I have graphed (using matplotlib) a time series and its associated upper and lower confidence interval bounds (which I calculated in Stata). I used Pandas to read the stata.csv output file and so the series are of type pandas.core.series.Series.
Matplotlib allows me to graph these three series on the same plot, but I wish to shade between the upper and lower confidence bounds to generate a visual confidence interval. Unfortunately I get an error, and the shading doesn't work. I think this is to do with the fact that the functions between which I wish to fill are pandas.core.series.Series.
Another post on here suggests that passing my_series.value instead of my_series will fix this problem; however I cannot get this to work. I'd really appreciate an example.
As long as you don't have NaN values in your data, you should be okay:
In [78]: x = Series(linspace(0, 2 * pi, 10000))
In [79]: y = sin(x)
In [80]: fill_between(x.values, y.min(), y.values, alpha=0.5)
Which yields:
Related
I am trying to upsample my dataframe in pandas (from 50 Hz to 2500 Hz). I have to upsample to match a sensor that was sampled at this higher frequency. I have points in x, y, z coming from a milling machine.
When I am plotting the original data the lines look straight, as I would expect.
I am interpolating the dataframe like this:
df.drop_duplicates(subset='time', inplace=True)
df.set_index('time', inplace=True)
df.index = pd.DatetimeIndex(df.index)
upsampled = new_df.resample('0.4ms').interpolate(method='linear')
plt.scatter(upsampled['X[mm]'], upsampled['Y[mm]'], s=0.5)
plt.plot()
I also tried with
upsampled = df.resample('0.4L').interpolate(method='linear')
I expect the new points to always come between the original points. Since I am going from 50 Hz to 2500 Hz, I expect 50 points uniformly spaced between each pair of points in the original data. However, it seems that some of the original points are ignored, as can be seen in the picture below (the second picture is zoomed in on a particularly troublesome spot).
This figure shows the original points in orange and the upsampled, interpolated points in blue (both are scattered, although the upsampled points are so dense it appears as a plot). The code for this is shown below.
upsampled = df.resample('0.4ms').interpolate(method='linear')
plt.scatter(upsampled['X[mm]'], upsampled['Y[mm]'], s=0.5, c='blue')
plt.scatter(df['X[mm]'], df['Y[mm]'], s=0.5, c='orange')
plt.gca().set_aspect('equal', adjustable='box')
fig.show()
Any ideas how I could make the interpolation work?
Most likely the problem is that the timestamps in the original and resampled DataFrames are not aligned, so when resampling we need to specify how to deal with that.
Since the original is at 50 Hz and the resampled is at 2500 Hz, simply taking mean should fix it:
upsampled = new_df.resample('0.4ms').mean().interpolate(method='linear')
Unfortunately, without having any sample data, I cannot verify that it works. Please let me know if it does help
I'm trying to use Imshow to plot a 2-d Fourier transform of my data. However, Imshow plots the data against its index in the array. I would like to plot the data against a set of arrays I have containing the corresponding frequency values (one array for each dim), but can't figure out how.
I have a 2D array of data (gaussian pulse signal) that I Fourier transform with np.fft.fft2. This all works fine. I then get the corresponding frequency bins for each dimension with np.fft.fftfreq(len(data))*sampling_rate. I can't figure out how to use imshow to plot the data against these frequencies though. The 1D equivalent of what I'm trying to do us using plt.plot(x,y) rather than just using plt.plot(y).
My first attempt was to use imshows "extent" flag, but as fas as I can tell that just changes the axis limits, not the actual bins.
My next solution was to use np.fft.fftshift to arrange the data in numerical order and then simply re-scale the axis using this answer: Change the axis scale of imshow. However, the index to frequency bin is not a pure scaling factor, there's typically a constant offset as well.
My attempt was to use 2d hist instead of imshow, but that doesn't work since 2dhist plots the number of times an order pair occurs, while I want to plot a scalar value corresponding to specific order pairs (i.e the power of the signal at specific frequency combinations).
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
f = 200
st = 2500
x = np.linspace(-1,1,2*st)
y = signal.gausspulse(x, fc=f, bw=0.05)
data = np.outer(np.ones(len(y)),y) # A simple example with constant y
Fdata = np.abs(np.fft.fft2(data))**2
freqx = np.fft.fftfreq(len(x))*st # What I want to plot my data against
freqy = np.fft.fftfreq(len(y))*st
plt.imshow(Fdata)
I should see a peak at (200,0) corresponding to the frequency of my signal (with some fall off around it corresponding to bandwidth), but instead my maximum occurs at some random position corresponding to the frequencie's index in my data array. If anyone has any idea, fixes, or other functions to use I would greatly appreciate it!
I cannot run your code, but I think you are looking for the extent= argument to imshow(). See the the page on origin and extent for more information.
Something like this may work?
plt.imshow(Fdata, extent=(freqx[0],freqx[-1],freqy[0],freqy[-1]))
I am using Python to plot data (coming from many experiments) and I would like to use boxplot method of pandas library.
Executing df = pd.DataFrame(value,columns=['Col1']) the result is the following one:
The problem comes from the extreme values. In Matlab the solution is to use the 'DataLimit' option:
boxplot(bp1,'DataLim',[4.2,4.3])
From Matlab documentation:
Data Limits and Maximum Distances
'DataLim' — Extreme data limits
[-Inf,Inf] (default) | two-element numeric vector
Extreme data limits, specified as the comma-separated pair consisting of 'DataLim' and a two-element numeric vector containing the lower and upper limits, respectively. The values specified for 'DataLim' are used by 'ExtremeMode' to determine which data points are extreme.
Is there something similar for Python?
Walkaround:
However, I have a walk around (that I really don't like because it changes the statistical distribution of the measurements): I just exclude the "problematic values" manually:
df = pd.DataFrame(value[100:],columns=['Col1'])
df.boxplot(column=['Col1'])
and the result is:
This is because I know where the problem is.
You can use ylim to constrain the axis without omitting the outliers from the calculation:
data = np.concatenate((np.random.rand(50) * 100, # spread
np.ones(25) * 50, # center
np.random.rand(10) * 100 + 100, # flier high
np.random.rand(10) * -100, # flier low
np.random.rand(2) * 10_000)) # unwanted outlier
fig1, ax1 = plt.subplots()
ax1.boxplot(data)
plt.ylim([-100, 200])
plt.show()
When using Matplotlib's scatterplot, sometimes autoscaling works, sometimes not.
How do I fix it?
As in the example provided in the bug report, this code works:
plt.figure()
x = np.array([0,1,2,3])
x = np.array([2,4,5,9])
plt.scatter(x,y)
But when using smaller values, the scaling fails to work:
plt.figure()
x = np.array([0,1,2,3])
x = np.array([2,4,5,9])
plt.scatter(x/10000,y/10000)
Edit: An example can be found here. I have not specified the specific cause in the question, because when encountering the error it is not obvious what causes it. Also, I have specified the solution and cause in my own answer.
In at least Matplotlib 1.5.1, there is a bug where autoscale fails for small data values, as reported here.
The workaround is to use .set_ylim(bottom,top) (documentation) to manually set the data limits (in this example for the y axis, to set the x axis, use .set_xlim(left,right).
To automatically find the data limits that are pleasing to the eyes, the following pseudocode can be used:
def set_axlims(series, marginfactor):
"""
Fix for a scaling issue with matplotlibs scatterplot and small values.
Takes in a pandas series, and a marginfactor (float).
A marginfactor of 0.2 would for example set a 20% border distance on both sides.
Output:[bottom,top]
To be used with .set_ylim(bottom,top)
"""
minv = series.min()
maxv = series.max()
datarange = maxv-minv
border = abs(datarange*marginfactor)
maxlim = maxv+border
minlim = minv-border
return minlim,maxlim
I am currently plotting 3 kernel density estimations together on the same graph. I assume that kdeplots use relative frequency as the y value, however for some of my data the kdeplot has frequencies way above 1.
code I'm using:
sns.distplot(data1, kde_kws={"color": "b", "lw": 1.5, "shade": "False", "kernel": "gau", "label": "t"}, hist=False)
Does anyone know how I can make sure that the kdeplot either makes y value relative frequency, or allow me to adjust the ymax axis limit automatically to the maximum frequency calculated?
Okay so I figured out that I just needed to set the autocaling to Tight, that way it didn't give negative values on the scale.