Python - re-write a netcdf file after calculation - python

I have a netcdf4 file called test.nc
I am calculating monthly median (from the daily values) with the following code:
import xarray as xr
os.chdir(inbasedir)
data = xr.open_dataset('test.nc')
monthly_data = data.resample(freq='m', dim ='time', how = 'median')
My question is how can I write this output to a new netcdf file, without having to re-write all the variables and the metadata already included in the input netcdf file.

Not sure if it is what you want. But this creates a new netcdf file from the created Dataset:
monthly_data.to_netcdf('newfile.nc')
You might use .drop() on the Dataset to remove data which you don't want in the output.

Related

Reading multiple hdf5 files from a folder

I currently have a code that imports an hdf5 file, and then computes a function for an area under the curve.
import h5py
file = h5py.File('/Users/hansari/Desktop/blabla', 'r')
xdata = file.get('data')
xdata= np.array(xdata)
xdata_df = pd.DataFrame(xdata)
table = pd.DataFrame(xdata_df).reset_index()
This is the code I use to fetch the file.
I currently have a folder than has 25 hdf5 files. Is there a way to have it so that I can have the code run all 25 files and spit out the result of the function for all?
I'm hoping to have it import the file, run through the whole script, and then repeat it with the next hdf5 file, instead of importing all the data first and then running through the code with a mass amt of data.
I'm currently using glob.glob, but it's importing all of the files at one go and giving me a huge dataset that is hard to work with.
Without more code, I can't tell you what you are doing wrong. To demonstrate the process, I created a simple example that reads multiple HDF5 files and loads into a Pandas dataframe using glob.iglob() and h5py. See the code below. The table dataframe is created inside the 2nd loop and only contains data from 1 HDF5 file. You should add your function to compute the area under the curve inside the for file in glob.iglob() loop.
# First, create 3 simple H5 files
for fcnt in range(1,4,1):
fname = f'file_{fcnt}.h5'
with h5py.File(fname,'w') as h5fw:
arr = np.random.random(10*10).reshape(10,10)
h5fw.create_dataset('data',data=arr)
# Loop over H5 files and load into a dataframe
for file in glob.iglob('file*.h5'):
with h5py.File(file, 'r') as h5fr:
xdata = h5fr['data'][()]
table = pd.DataFrame(xdata).reset_index()
print(table)
# add code to compute area under the curve here

Reading in a multiindex .csv file as returned from pandas using the ftable type in R

I have a multi-index (multi-column to be exact) pandas data frame in Python that I saved using the .to_csv() method. Now I would like to continue my analysis in R. For that I need to read in the .csv file. I know that R does not really support multi-index data frames like pandas does but it can handle ftables using the stats package. I tried to use read.ftable() but I can't figure out how to set the arguments right to correctly import the .csv file.
Here's some code to create a .csv file that has the same structure as my original data:
require(stats)
# create example .csv file with a multiindex as it would be saved when using pandas
fileConn<-file('test.csv')
long_string = paste("col_level_1,a,b,c,d\ncol_level_2,cat,dog,tiger,lion\ncol_level_3,foo,foo,foo,foo\nrow_level_1,,,,\n1,",
"\"0,525640810622065\",\"0,293400380474675\",\"0,591895790442417\",\"0,675403394728461\"\n2,\"0,253176104907883\",",
"\"0,107715459748816\",\"0,211636325794272\",\"0,618270276545688\"\n3,\"0,781049927692169\",\"0,72968971635063\",",
"\"0,913378426593516\",\"0,739497259262532\"\n4,\"0,498966730971063\",\"0,395825713762063\",\"0,252543611974303\",",
"\"0,240732390893718\"\n5,\"0,204075522469035\",\"0,227454178487449\",\"0,476571725142606\",\"0,804041968683541\"\n6,",
"\"0,281453400066927\",\"0,010059089264751\",\"0,873336799707968\",\"0,730105129502755\"\n7,\"0,834572206714808\",",
"\"0,668889079581709\",\"0,516135581764696\",\"0,999861473609101\"\n8,\"0,301692961056344\",\"0,702428450077691\",",
"\"0,211660363912457\",\"0,626178589354395\"\n9,\"0,22051883447221\",\"0,934567760412661\",\"0,757627523007149\",",
"\"0,721590060307143\"",sep="")
writeLines(long_string, fileConn)
close(fileConn)
When opening the .csv file in a reader of your choice, it should look like this:
How can I read this in using R?
I found one solution without using read.ftable() based on this post. Not that this won't give you the data in the ftable format:
headers <- read.csv(file='./test.csv',header=F,nrows=3,as.is=T,row.names=1)
dat <- read.table('./test.csv',skip=4,header=F,sep=',',row.names=1)
headers_collapsed <- apply(headers,2,paste,collapse='.')
colnames(dat) <- headers_collapsed

NetCDF file with all data variable values missing when I read it into Python with both Xarray and netCDF4

I have a netCDF file generated by a model output. The file contains spatially gridded variables over a 30 yr time span and I've confirmed that the file contains the data using Ferret from within linux. When I read the file into Python with both xarray and netCDF4 the file reads successfully with correct dimensions but the data variables are all missing.
I first obtained the error:
ValueError: unable to decode time units 'growing seasons since 2071-01-01 00:00:00' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.
So, I added the following lines in order to solve the time issue:
ds = xr.open_dataset('my_file.nc4', decode_times=False)
units, reference_date = ds.time.attrs['units'].split('since')`
ds['time'] = pd.date_range(start=reference_date, periods=ds.sizes['time'],` `freq='A')`
Now, there is no error when reading the file in but all of the data variables are showing NaN. I don't have a problem reading in any other netCDF files. I have a very similar file with extension '.nc' instead of '.nc4' which is one of the tiles that make up the final file and it's reading in with all data present. I'm thinking there is some sort of disagreement with the dimensions of my dataset and xarray. Here is the summary of the dataset:

Export .csv file from .mat file generated with OpenModelica

I am trying to export a .csv file from a .mat file, which was generated with OpenModelica. The following code seems to work quite well:
from scipy.io import loadmat
import numpy
x = loadmat('results.mat')
traj=x['data_2'][0]
numpy.savetxt("results.csv", traj, delimiter=",")
However, there is an issue that I cannot solve. The line traj=x['data_2'][0] is taking an array with the values (over time) of the first variable in the file (index is 0). The problem is that I cannot make a correspondence between the variable I am looking for and its index. Let's say that I want to print the values of a variable called "My_model.T". How do I know the index of this variable?
The file format is described here: https://www.openmodelica.org/doc/OpenModelicaUsersGuide/1.17/technical_details.html#the-matv4-result-file-format
So you need to lookup the variable's name in the name matrix, then look in the dataInfo matrix to see if the variable is stored in data_1 or data_2 and which index it has in this matrix.
Edit: And since the title was how to create a CSV from MAT-file... You can do this from an OpenModelica .mos-script:
filterSimulationResults("M_res.mat", "M_res.csv", readSimulationResultVars("M_res.mat"))

How to access all the data saved in a h5py file

I save a number of numpy arrays into a h5py file with different names corresponding to different datasets. Assuming I don't know those dataset names, how to access the saved data after reading the h5py file. For example:
f = h5py.file('filename','w')
f.create_dataset('file1',data=data1)
....
F = h5py.file('filename','r')
#next how to read out all the datasets without knowing their names in a prior
filenames = list(F.keys()) #which contains all the dataset names
data1 = F[filenames[0]].value.astype('float32')
...
See also the post How to know HDF5 dataset name in python

Categories

Resources