I am opening netcdf data from an opendap server (a subset of the data) using an URL. When I open it the data is (as far as I can see) not actually loaded until the variable is requested. I would like to save the data to a file on disk, how would I do this?
I currently have:
import numpy as np
import netCDF4 as NC
url = u'http://etc/etc/hourly?varname[0:1:10][0:1:30]'
set = NC.Dataset(url) # I think data is not yet loaded here, only the "layout"
varData = set.variables['varname'][:,:] # I think data is loaded here
# now i want to save this data to a file (for example test.nc), set.close() obviously wont work
Hope someone can help, thanks!
If you can use xarray, this should work as:
import xarray as xr
url = u'http://etc/etc/hourly?varname[0:1:10][0:1:30]'
ds = xr.open_dataset(url, engine='netcdf4') # or engine='pydap'
ds.to_netcdf('test.nc')
The xarray documentation has another example of how you could do this.
It's quite simple; create a new NetCDF file, and copy whatever you want :) Luckily this can be automated for a large part, in copying the correct dimensions, NetCDF attributes, ... from the input file. I quickly coded this example, the input file is also a local file, but if the reading with OPenDAP already works, it should work in a similar way.
import netCDF4 as nc4
# Open input file in read (r), and output file in write (w) mode:
nc_in = nc4.Dataset('drycblles.default.0000000.nc', 'r')
nc_out = nc4.Dataset('local_copy.nc', 'w')
# For simplicity; copy all dimensions (with correct size) to output file
for dim in nc_in.dimensions:
nc_out.createDimension(dim, nc_in.dimensions[dim].size)
# List of variables to copy (they have to be in nc_in...):
# If you want all vaiables, this could be replaced with nc_in.variables
vars_out = ['z', 'zh', 't', 'th', 'thgrad']
for var in vars_out:
# Create variable in new file:
var_in = nc_in.variables[var]
var_out = nc_out.createVariable(var, datatype=var_in.dtype, dimensions=var_in.dimensions)
# Copy NetCDF attributes:
for attr in var_in.ncattrs():
var_out.setncattr(attr, var_in.getncattr(attr))
# Copy data:
var_out[:] = var_in[:]
nc_out.close()
Hope it helps, if not let me know.
Related
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
I have a VTK file that correctly populates the data in ParaView:
However, when I open that same file with VTK's Python API, I cannot for the life of me seem to find these same labeled datasets. Here's what I've tried:
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
reader = vtk.vtkUnstructuredGridReader()
reader.SetFileName('test.vtk')
reader.Update()
adapter = dsa.WrapDataObject(reader.GetOutput())
print(adapter.PointData.keys()) # ['hu', 'disp']
print(adapter.CellData.keys()) # []
print(adapter.FieldData.keys()) # []
So, it seems that ParaView is able to identify the other datasets beyond just 'hu' and 'disp', but I cannot seem to find them in the corresponding Python object.
I'm assuming it's there somewhere. Anyone know why they, e.g., 'meanstress', don't appear as keys?
You need to ask the reader to read all the data.
reader.ReadAllScalarsOn()
reader.ReadAllVectorsOn()
...
Dependending of wich kind of data you are trying to load.
(scalars, vector, tensor ... See for the whole list: https://vtk.org/doc/nightly/html/classvtkDataReader.html#a831f470c6fbfc6e7209a1243ccb546e2 )
I saved a numpy array in .npy format on disk I load it using np.load() but I don't know how to save on the disk the changes I made .
There are two options you could explore. The first is if you know the position of the change in the file, you can:
file = open("path/to/file", "rb+")
file.seek(position)
file.seek(file.tell()). # There seems to be a bug in python which requires you to do this
file.write("new information") # Overwriting contents
Also see here why file.seek(file.tell())
The second is to save the modified array itself
myarray = np.load("/path/to/my.npy")
myarray[10] = 50.0 # Any new value
np.save("/path/to/my.npy", myarray)
I need to be able to quickly read lots of netCDF variables in python (1 variable per file). I'm finding that the Dataset function in netCDF4 library is rather slow compared to reading utilities in other languages (e.g., IDL).
My variables have shape of (2600,5200) and type float. They don't seem that big to me (filesize = 52Mb).
Here is my code:
import numpy as np
from netCDF4 import Dataset
import time
file = '20151120-235839.netcdf'
t0=time.time()
openFile = Dataset(file,'r')
raw_data = openFile.variables['MergedReflectivityQCComposite']
data = np.copy(raw_data)
openFile.close()
print time.time-t0
It takes about 3 seconds to read one variable (one file). I think the main slowdown is np.copy. raw_data is <type 'netCDF4.Variable'>, thus the copy. Is this the best/fastest way to do netCDF reads in python?
Thanks.
The power of Numpy is that you can create views into the exiting data in memory via the metadata it retains about the data. So a copy will always be slower than a view, via pointers. As JCOidl says it's not clear why you don't just use:
raw_data = openFile.variables['MergedReflectivityQCComposite'][:]
For more info see SciPy Cookbook and SO View onto a numpy array?
I'm not sure what to say about the np.copy operation (which is indeed slow), but I find that the PyNIO module from UCAR works well for both NetCDF and HDF files. This will place data into a numpy array:
import Nio
f = Nio.open_file(file, format="netcdf")
data = f.variables['MergedReflectivityQCComposite'][:]
f.close()
Testing your code versus the PyNIO code on a ndfCDF file I have resulted in 1.1 seconds for PyNIO, versus 3.1 seconds for the netCDF4 module. Your results may vary; worth a look though.
You can use xarray for that.
%matplotlib inline
import xarray as xr
### Single netcdf file ###
ds = xr.open_dataset('path/file.nc')
### Opening multiple NetCDF files and concatenating them by time ####
ds = xr.open_mfdatset('path/*.nc', concat_dim='time
To read the variable you can simply type ds.MergedReflectivityQCCompositeor ds.['MergedReflectivityQCComposite'][:]
You can also use xr.load_dataset but I find that it uses up more space than the open function. For xr.open_mfdataset, you can also chunk along the dimensions of the file if you want. There are other options for both functions and you might be interested to learn more about it in the xarray documentation.
I am using Scientific.IO.NetCDF to read NetCDF data into Python. I am trying to read a 4d 32bit variable with size (366,30,476,460) but I end up with zeros in my ndarray. Strangely if I read just the 3d data (1,30,476,460), the returned values are ok.
This is what I am trying to do:
from Scientific.IO.NetCDF import NetCDFFile as Dataset
from collections import namedtuple
# Define output data structure as a named tuple
Roms_data=namedtuple('Roms_data','Ti Tf Nt U V W Zeta')
# Open the NetCDF file for reading.
ncfile = Dataset(data_file,'r')
if Tstart==-1:
ti=0
tf=NTsav-1
else:
ti=Tstart-1
tf=Tend-1
try:
udata = ncfile.variables['u'][:]
print str(udata.shape)
except:
print ' Failed to read u data from '+data_file
The "[:]" means i am reading the whole 4d variable 'u' into an ndarray called udata. This does not work and udata is full of zeros. However, if I do:
try:
udata = ncfile.variables['u'][0,:,:,:]
print str(udata.shape)
except:
print ' Failed to read u data from '+data_file
then "udata" that is now a 3d ndarray has the values it is supposed to read from the NetCDF file.
Any help? Thanks in advance.
It is unclear to me what may cause your problem, but I have one alternative suggestionyou may try. It seems you are reading NetCDF4 data output from a ROMS ocean model. I do this regularily, but I always prefer to use the netcdf-python module for this:
from netCDF4 import Dataset
cdf=Dataset("ns8km_avg_16482_GLORYS2V1.nc","r")
u=cdf.variables["u"][:]
One benefit of the netcdf-python module is that it automatically adjusts for offset, scale, and fill_value in a netcdf file. The 4D array read from the netcdf file will therefore contain masked values. I wonder if the masking in your approach is not done correctly. Perhaps you could try installing netcdf-python and read your data with this approach and hopefully it could help.
Cheers, Trond