Regridding NetCDF4 in Python - python

I'm working with various climate models, but right now I'm working on regridding the latitudes and longitudes of these files from 2.5x2.5 to 0.5x0.5, and I am completely lost. I've been running on the Anaconda package for all of my netCDF4 needs, and I've made good progress, it's just regridding that baffles me completely. I have three main arrays that I'm using:
The first is the data_array, a numpy array that contains the information for precipitation.
The second is the lan_array, a numpy array containing the latitude information.
The third is the lot_array, a numpy array containing the longitude information.
All this data came from the netCDF4 file.
Again, my data is currently in 2.5x2.5. Meaning, the lonxlat is currently 144x72. I use np.meshgrid(lon_array,lat_array) to bring lonxlat to go to 72. My data_array also contains 72 elements, thus matching up perfectly.
This is where I get stuck and I have no idea how to proceed.
My thoughts: I want my 144x72 to convert to 720x360 in order for it to be 0.5x0.5.
I know one way of creating the lonxlat that I want is by np.arange(-89.75,90.25,0.5) and np.arange(-179.75,181.25,0.5). But I don't know how to match up the data_array to match with that.
Can anyone please offer any assistance? Any help is much appreciated!
Note: I also have ESMF modules available to me.

An easy option would be nctoolkit (https://nctoolkit.readthedocs.io/en/latest/installing.html). This has a built in method called to_latlon that easily achieves what you want. Just do the following for bilinear interpolation (and see the user guide for other methods):
import nctoolkit as nc
data = nc.open("infile.nc")
data.to_latlon(lon = [-179.75, 179.75], lat = [-89.75, 89.75], res = [0.5, 0.5])

Related

How do I get arrays of coordinate values for a variable from a netCDF gridded dataset using Siphon and MetPy?

I have requested a netCDF subset using Siphon and formed a query to retrieve a variable within a bounding box:
from siphon.catalog import TDSCatalog
cat = TDSCatalog("https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_onedeg/catalog.xml?dataset=grib/NCEP/GFS/Global_onedeg/Best")
ncss = cat.datasets[0].subset()
query = ncss.query()
query.variables("Absolute_vorticity_isobaric")
query.lonlat_box(north=34., south=33., west=-102., east=-101.)
query.accept("netcdf4")
I am seeking a reliable, concise approach to getting the values of that variable's coordinates, specifically time and vertical level. A working, but impractical approach to this would be to request and work with the whole dataset.
Functional-but-Impractical Approach
Get the data
import xarray as xr
query.all_times()
data = ncss.get_data(query)
datastore = xr.backends.NetCDF4DataStore(data)
Get data as xarray.Dataset using MetPy's xarray accessor
ds = xr.open_dataset(datastore).metpy.parse_cf()
Get the coordinate axes from a constituent xarray.DataArray
For each variable of the dataset as an xarray.DataArray, calling ds.VARIABLE.metpy.DIMENSION has MetPy automatically return the appropriate coordinate variable (no matter what it is named, e.g. lat, lon, time, time1, altitude_above_msl, isobaric3, height_above_ground1), where DIMENSION is one of time, vertical, x, and y.
Get the values
In this case, ds.Absolute_vorticity_isobaric.metpy.time returns ds.time, and ds.Absolute_vorticity_isobaric.metpy.vertical returns ds.isobaric2. Adding .values to the call returns just the numpy.ndarray with the values I have been trying to get. So, calling ds.Absolute_vorticity_isobaric.metpy.time.values produces the following (which is truncated below):
array(['2019-11-17T00:00:00.000000000', '2019-11-17T03:00:00.000000000',
'2019-11-17T06:00:00.000000000', ..., '2020-01-02T06:00:00.000000000',
'2020-01-02T09:00:00.000000000', '2020-01-02T12:00:00.000000000'],
dtype='datetime64[ns]')
Calling ds.Absolute_vorticity_isobaric.metpy.time.values and ds.Absolute_vorticity_isobaric.metpy.vertical.values will return just the NumPy arrays, which is what I seek.
The Problem
While the above does in fact do what I want, it took nearly a minute and a half to run for just one variable, and it (I assume) unnecessarily taxes UCAR servers. Is there any way to get the output above without the massive overhead of loading all of that data itself?
If you are concerned about the performance of your original method, and only wish to extract the time and vertical coordinates, I would recommend using OPENDAP to access your data rather than NCSS. This will simply fetch the metadata at first, and then will lazily load the data that you request (time and vertical coordinates, in your case). Using MetPy v0.11 or newer, an example script using your TDS Catalog of interest would look something like the following:
import metpy
import xarray as xr
from siphon.catalog import TDSCatalog
cat = TDSCatalog("https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_onedeg/catalog.xml?dataset=grib/NCEP/GFS/Global_onedeg/Best")
opendap_url = cat.datasets[0].access_urls['OPENDAP']
ds = xr.open_dataset(opendap_url)
time = ds['Absolute_vorticity_isobaric'].metpy.time.values
vertical = ds['Absolute_vorticity_isobaric'].metpy.vertical.values
print(time)
print(vertical)
This takes roughly a half-second to run on my system.
If you instead have MetPy older than v0.11, you will need to use .metpy.parse_cf() when opening the dataset, as follows:
ds = xr.open_dataset(opendap_url).metpy.parse_cf()

Python MNE - reading EEG data from array

I have EEG data that comes in the form of a 3D numpy array (epoch * channel * timepoint). timepoint is a 256 element array containing each sampled timepoint (1s total, at 256Hz). epoch is an experimental trial.
I'm trying to import the numpy array into a form Python-MNE (http://martinos.org/mne/stable/mne-python.html) understands, but I'm having some trouble
First, I'm not sure if I should be importing this raw data as a RawArray or an EpochsArray. I tried the latter with this:
ch_names = list containing my 64 eeg channel names
allData = 3d numpy array as described above
info = mne.create_info(ch_names, 256, ch_types='eeg')
event_id = 1
#I got this from a tutorial but really unsure what it does and I think this may be the problem
events = np.array([200, event_id]) #I got this from a tutorial but really unsure what it does and I think this may be the problem
raw = mne.EpochsArray(allData, info, events=events)
picks = mne.pick_types(info, meg=False, eeg=True, misc=False)
raw.plot(picks=picks, show=True, block=True)
When I run this I get an index error: "too many indices for array"
Ultimately I want to do some STFT and CSP analysis on the data, but right now I'm in need of some help with the initial restructuring and importing into MNE.
Whats the correct way to import this numpy data that would make it easiest to complete my intended analyses?
Is there any way you can convert the data you acquired from your EEG setup into the .fif format? The 'raw' data format the MNE page talks about in their tutorial is a .fif format file. If you can get your eeg data into .fif format, you can pretty much just follow the tutorial step by step...
Functions to convert from various other EEG file formats to .fif: http://martinos.org/mne/stable/manual/convert.html
If that's not an option, here are some thoughts:
EpochsArray() looks to be the correct function as it expects a data array with (n_epochs, n_channels, n_times) for the shape. Just to be sure, check that the shape of your allData array matches up with np.shape(allData).
On a related note the help page for EpochsArray() mentioned mne.read_events() the big question though is where your events data might be stored for you to be able to read it...
Based on the tutorial you linked it seems like the way to get 'events' if you're starting from a .fif file is:
events = mne.find_events(raw, stim_channel='STI 014'). This makes me wonder if you have more than 64 channels in your numpy array and one of your channels is in fact a stimulation channel... if that's the case you could try feeding that stim channel to the mne.read_events() function. Alternatively, perhaps your stim or events channel might be a separate array or perhaps unprocessed?
Hope this is at least somewhat helpful and good luck!
In case someone else is wondering, they added a tutorial to their doc: Creating MNE-Python data structures from scratch. You should be able to find the 2 needed steps:
info structure creation
epochs from array creation

Reading multidimensional array data into Python

I have data in the format of 10000x500 matrix contained in a .txt file. In each row, data points are separated from each other by one whitespace and at the end of each row there a new line starts.
Normally I was able to read this kind of multidimensional array data into Python by using the following snippet of code:
with open("position.txt") as f:
data = [line.split() for line in f]
# Get the data and convert to floats
ytemp = np.array(data)
y = ytemp.astype(np.float)
This code worked until now. When I try to use the exact some code with another set of data formatted in the same way, I get the following error:
setting an array element with a sequence.
When I try to get the 'shape' of ytemp, it gives me the following:
(10001,)
So it converts the rows to array, but not the columns.
I thought of any other information to include, but nothing came to my mind. Basically I'm trying to convert my data from a .txt file to a multidimensional array in Python. The code worked before, but now for some reason that is unclear to me it doesn't work. I tried to look compare the data, of course it's huge, but everything seems quite similar between the data that is working and the data that is not working.
I would be more than happy to provide any other information you may need. Thanks in advance.
Use numpy's builtin function:
data = numpy.loadtxt('position.txt')
Check out the documentation to explore other available options.

Why does netCDF4 give different results depending on how data is read?

I am coding in python, and trying to use netCDF4 to read in some floating point netCDF data. Mt original code looked like
from netCDF4 import Dataset
import numpy as np
infile='blahblahblah'
ds = Dataset(infile)
start_pt = 5 # or whatever
x = ds.variables['thedata'][start_pt:start_pt+2,:,:,:]
Now, because of various and sundry other things, I now have to read 'thedata' one slice at a time:
x = np.zeros([2,I,J,K]) # I,J,K match size of input array
for n in range(2):
x[n,:,:,:] = ds.variables['thedata'][start_pt+n,:,:,:]
The thing is that the two methods of reading give slightly different results. Nothing big, like one part in 10 to the fifth, but still ....
So can anyone tell me why this is happening and how I can guarantee the same results from the two methods? My thought was that the first method perhaps automatically establishes x as being the same type as the input data, while the second method establishes x as the default type for a numpy array. However, the input data is 64 bit and I thought the default for a numpy array was also 64 bit. So that doesn't explain it. Any ideas? Thanks.
The first example pulls the data into a NetCDF4 Variable object, while the second example pulls the data into a numpy array. Is it possible that the Variable object is just displaying the data with a different amount of precision?

Python - How to transform counts in to m/s using the obspy module

I have a miniseed file with a singlechannel trace and I assume the data is in counts (how can i check the units of the trace?). I need to transform this in to m/s.
I already checked the obspy tutorial and my main problem is that i dont know how to access the poles and zeros and amplification factor from the miniseed file.
Also, do I need the calibration file for this?
Here is my code:
from obspy.core import *
st=read('/Users/guilhermew/Documents/Projecto/Dados sismicos 1 dia/2012_130_DOC01.mseed')
st.plot()
Thanks in advance,
Guilherme
EDIT:
I finally understood how to convert the data. Obspy has different ways to achieve this, but it all comes down to removing the instrument response from the waveform data.
Like #Robert Barsch said, I needed another file to get the instrument response metadata.
So I came up with the following code:
parser=Parser("dir/parser/file")
for tr in stream_aux:
stream_id=tr.stats.network+'.'+tr.stats.station+ '..' + tr.stats.channel
paz=parser.getPAZ(stream_id, tr.stats.starttime)
df = tr.stats.sampling_rate
tr.data = seisSim(tr.data, df, paz_remove=paz)
Im using the seisSim function to convert the data.
My problem now is that the output dosen't look right (but i cant seem to post the image)
This is clearly a question which should be asked to the seismology community and not at StackOverflow! How about you write to the ObsPy user mailinglist?
Update: I still feel the answer is that he/she should ask directly at the ObsPy mailing list. However, in order to give a proper answer for the actual question: MiniSEED is a data only format which does not contain any meta information such as poles and zeros or the used unit. So yes you will need another file such as RESP, SAC PAZ, Dataless SEED, Full SEED etc in order to get the station specific meta data. To apply your seismometer correction read http://docs.obspy.org/tutorial/code_snippets/seismometer_correction_simulation.html
To get it in real-life units and not counts, you need to remove the instrument response. I remove instrument response using this code:
# Define math defaults
from __future__ import division #allows real devision without rounding
# Retrieve modules needed
from obspy.core import read
import numpy as np
import matplotlib.pyplot as plt
#%% Choose and import data
str1 = read(fileloc)
print(str1) #show imported data
print(str1[0].stats) #show stats for trace
#%% Remove instrument response
# create dictionary of poles and zeros
TrillC = {'gain': 800.0,
'poles': [complex(-3.691000e-02,3.712000e-02),
complex(-3.691000e-02,-3.712000e-02),
complex(-3.739000e+02,4.755000e+02),
complex(-3.739000e+02,-4.755000e+02),
complex(-5.884000e+02,1.508000e+03),
complex(-5.884000e+02,-1.508000e+03)],
'sensitivity': 8.184000E+11,
'zeros': [0 -4.341E+02]}
str1_remres = str1.copy() #make a copy of data, so original isn't changed
str1_remres.simulate(paz_remove=TrillC, paz_simulate=None, water_level=60.0)
print("Instrument Response Removed")
plt.figure()
str1_remres_m = str1_remres.merge()
plt.plot(str1_remres_m[0].data) #only will plot first trace of the stream
As you can see I have manually defined the poles and zeros. There is probably a way to automatically input it but this was the way that I found that worked.
Remember each instrument has different poles and zeros.
The number of zeros you use depends on what you want your output to be. Seismometers are normally velocity (2 zeros)
3 zeros = displacement
2 zeros = velocity
1 zero = acceleration

Categories

Resources