I am running scipy.interpolate.griddata on a set of coordinates that could be of many dimensions (even 1). When the coordinates are 1D the nearest method produces nans instead of the closest values when outside boundaries. An example:
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
target_points = [1.,2.,3.,4.,5.,6.,7.]
points = np.random.rand(50)*2*np.pi
values = np.sin(points)
interp = griddata(points, values, target_points, method='nearest')
plt.plot(points,values,'o')
plt.plot(target_points,interp,'ro')
print interp
plt.show()
The last value printed is a NaN. Am I doing something wrong? If this is a limitation of scipy do you have a smart workaround?
Note that linear/cubic modes are expected to give NaNs, but this should not be the case for the 'nearest' mode.
When the data is 1-dimensional, griddata defers to interpolate.interp1d:
if ndim == 1 and method in ('nearest', 'linear', 'cubic'):
from .interpolate import interp1d
points = points.ravel()
...
ip = interp1d(points, values, kind=method, axis=0, bounds_error=False,
fill_value=fill_value)
return ip(xi)
So even though method='nearest' griddata will not extrapolate since interp1d behaves this way.
However, there are other tools, such as scipy.cluster.vq (vector quantization), which you could use to find the nearest value. For example,
import numpy as np
import scipy.cluster.vq as vq
import matplotlib.pyplot as plt
target_points = np.array([1.,2.,3.,4.,5.,6.,7.])
points = (np.random.rand(50)*2*np.pi)
values = np.sin(points)
code, dist = vq.vq(target_points, points)
interp = values[code]
plt.plot(points,values,'o')
plt.plot(target_points,interp,'ro')
print interp
plt.show()
This looks like a bug in scipy.interpolate.griddata because the behaviour is not according to the documentation which clearly states that the input argument "fill_value" has no effect when method is "nearest".
The output of the following line:
scipy.interpolate.griddata(points=np.array([1,2]), values=np.array([10,20]), xi=3, method='nearest', fill_value=-1)
is array(-1.0) which proves that the fill_value has an impact on the output contrary to what is stated in the documentation.
Related
How can I calculate the value of this integral:
f_tu(t) is given as numpy.array. The graph looks like this:
How can I implement this?
Everything I could find looks something like this
from scipy.integrate import quad
def f(x):
return 1/sin(x)
I = quad(f, 0, 1)
but I have an array there, not a specific function like sin.
How about auc from sklearn.metrics?
import numpy as np
import numpy as np
from scipy.integrate import quad
from sklearn.metrics import auc
x = np.arange(0, 100, 0.001)
y = np.sin(x)
print('auc:', auc(x,y))
print('quad:', quad(np.sin, 0, 100))
auc: 0.13818791291277366
quad: (0.1376811277123232, 9.459751315610276e-09)
Okay, so you have one of those pesky infinity integrals. Here is how I would deal with it:
import numpy as np
from scipy.integrate import quad
def f(x):
return(1/(x**2)) #put your function to integrate here
print(quad(f,0,np.Infinity)) #integrates from 0 to infinity
This returns two values. The first is the estimated value of the integral, and the second is the approximate absolute error of the integral which is useful to know.
If you want to integrate a numpy array here is a simple solution:
import numpy as np
print(np.trapz(your numpy array here))
Lets say I have the following function:
def f(x):
return log(3*exp(3*x) + 7*exp(7*x))
I want to do two things:
1) plot the function over a range of x-values
2) find the root of the function using the Newton method from scipy
My problem is that it seems that plotting is best done with a numpy array x=np.linspace(-2,2,1000), but then evaluating the function results in erros TypeError: only size-1 arrays can be converted to Python scalars. I can fix this by simply changing log and exp to np.log and np.exp, respectively.
But doing so then makes scipy.optimize.newton unhappy.
It seems like I need to define the function twice, once for use in plotting (with np. ...) and once for optimizing in the form given above.
I can't imagine that this is actually the case. Any hints would be greatly appreciated.
Seems legit, you just need to use numpy functions instead of base math functions:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
%matplotlib inline
def f(x):
return np.log(3*np.exp(3*x) + 7*np.exp(7*x))
x = np.linspace(-2,2,1000)
y = f(x)
plt.scatter(x, y)
optimize.root(f, 1)
I need the values of the autocorrelation coefficients coming from the autocorrelation_plot(). The problem is that the output coming from this function is not accessible, so I need another function to get such values. That's why I used acf() from statsmodels but it didn't get the same plot as autocorrelation_plot() does. Here is my code:
from statsmodels.tsa.stattools import acf
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt
import numpy as np
y = np.sin(np.arange(1,6*np.pi,0.1))
plt.plot(acf(y))
plt.show()
So the result is not the same as this:
autocorrelation_plot(y)
plt.show()
This seems to be related to the nlags parameter of acf:
nlags: int, optional
Number of lags to return autocorrelation for.
I don't know what exactly this does but in the source of acf there is a slicing
that shortens the array:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
acf = avf[:nlags + 1] / avf[0]
If you use statsmodels.tsa.stattools.acovf directly the result is the same as with autocorrelation_plot:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
So you can call it like
plt.plot(acf(y, nlags=len(y)))
to make it work.
An explanation of lag: https://math.stackexchange.com/questions/2548314/what-is-lag-in-a-time-series/2548350
I need to regrid data on a irregular grid (lambert conical) to a regular grid. I think pyresample is my best bet. Infact my original lat,lon are not 1D (which seems to be needed to use basemap.interp or scipy.interpolate.griddata).
I found this SO's answer helpful. However I get empty interpolated data. I think it has to do with the choice of my radius of influence and with the fact that my data are wrapped (??).
This is my code:
import numpy as np
from matplotlib import pyplot as plt
import netCDF4
%matplotlib inline
url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR/Dailies/monolevel/hlcy.2009.nc"
SRHtemp = netCDF4.Dataset(url).variables['hlcy'][0,::]
Y_n = netCDF4.Dataset(url).variables['y'][:]
X_n = netCDF4.Dataset(url).variables['x'][:]
T_n = netCDF4.Dataset(url).variables['time'][:]
lat_n = netCDF4.Dataset(url).variables['lat'][:]
lon_n = netCDF4.Dataset(url).variables['lon'][:]
lat_n and lon_n are irregular and the latitude and longitude corresponding to the projected coordinates x,y.
Because of the way lon_n is, I added:
lon_n[lon_n<0] = lon_n[lon_n<0]+360
so that now if I plot them they look nice and ok:
Then I create my new set of regular coordinates:
XI = np.arange(148,360)
YI = np.arange(0,87)
XI, YI = np.meshgrid(XI,YI)
Following the answer above I wrote the following code:
from pyresample.geometry import SwathDefinition
from pyresample.kd_tree import resample_nearest
def_a = SwathDefinition(lons=XI, lats=YI)
def_b = SwathDefinition(lons=lon_n, lats=lat_n)
interp_dat = resample_nearest(def_b,SRHtemp,def_a,radius_of_influence = 70000,fill_value = -9.96921e+36)
the resolution of the data is about 30km, so I put 70km, the fill_value I put is the one from the data, but of course I can just put zero or nan.
however I get an empty array.
What do I do wrong? also - if there is another way of doing it, I am interested in knowing it. Pyresample documentation is a bit thin, and I need a bit more help.
I did find this answer suggesting to use another griddata function:
import matplotlib.mlab as ml
resampled_data = ml.griddata(lon_n.ravel(), lat_n.ravel(),SRHtemp.ravel(),XI,YI,interp = "linear")
and it seems to be ok:
But I would like to understand more about pyresample, since it seems so powerful.
The problem is that XI and XI are integers, not floats. You can fix this by simply doing
XI = np.arange(148,360.)
YI = np.arange(0,87.)
XI, YI = np.meshgrid(XI,YI)
The inability to handle integer datatypes is an undocumented, unintuitive, and possibly buggy behavior from pyresample.
A few more notes on your coding style:
It's not necessary to overwrite the XI and YI variables, you don't gain much by this
You should just load the netCDF dataset once and the access the variables via that object
I'm new to python and scipy, and i am trying to filter acceleration data taken in 3 dimensions at 25Hz. I'm having a weird problem, after applying the filter the graph of my data is smoothed, however the values seem to be amplified quite a bit depending on the order and cutoff frequencies of the filter. Here is my code:
from scipy import loadtxt
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
my_data = loadtxt("DATA-001.CSV",delimiter=",",skiprows=8)
N, Wn = signal.buttord( [3,11], [.3,18], .1, 10, True)
print N
print Wn
b,a = signal.butter(N, Wn, 'bandpass', analog=True)
filtered_z = signal.filtfilt(a,b,[my_data[1:500,3]],)
filtered_z = np.reshape(filtered_z, (499,))
plt.figure(1)
plt.subplot(411)
plt.plot(my_data[1:500,0],my_data[1:500,3])
plt.subplot(412)
plt.plot(my_data[1:500,0], filtered_z, 'k')
plt.show()
Right now, this code returns this graph:
I'm unsure of how to get rid of this weird gain issue, if anyone has any suggestions?
Thank you!
You have your coefficients the wrong way around in signal.filtfilt. Should be:
filtered_z = signal.filtfilt(b,a,[my_data[1:500,3]],)
The size and ratio of the coefficients can result in amplification of the signal.