I have a VTK file that correctly populates the data in ParaView:
However, when I open that same file with VTK's Python API, I cannot for the life of me seem to find these same labeled datasets. Here's what I've tried:
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
reader = vtk.vtkUnstructuredGridReader()
reader.SetFileName('test.vtk')
reader.Update()
adapter = dsa.WrapDataObject(reader.GetOutput())
print(adapter.PointData.keys()) # ['hu', 'disp']
print(adapter.CellData.keys()) # []
print(adapter.FieldData.keys()) # []
So, it seems that ParaView is able to identify the other datasets beyond just 'hu' and 'disp', but I cannot seem to find them in the corresponding Python object.
I'm assuming it's there somewhere. Anyone know why they, e.g., 'meanstress', don't appear as keys?
You need to ask the reader to read all the data.
reader.ReadAllScalarsOn()
reader.ReadAllVectorsOn()
...
Dependending of wich kind of data you are trying to load.
(scalars, vector, tensor ... See for the whole list: https://vtk.org/doc/nightly/html/classvtkDataReader.html#a831f470c6fbfc6e7209a1243ccb546e2 )
Related
I am struggling to find a way to retrieve metadata information from a FILE using GDAL.
Specifically, I would like to retrieve the band names and the order in which they are stored in a given file (may that be a GEOTIFF or a NETCDF).
For instance, if we follow the description within the GDAL documentation, we have the "GetMetaData" method from the gdal.Dataset (see here and here). Despite this method returning a whole set of information regarding the dataset, it does not provide the band names and the order that they are stored within the given FILE. As a matter of fact, it seems to be an old problem (from 2015) that seems not to be solved yet (more info here). As it seems, "R" language has already solved this problem (see here), though Python hasn't.
Just to be thorough here, I know that there are other Python packages that can help in this endeavour (e.g., xarray, rasterio, etc.); nevertheless, it would be important to be concise with the set of packages that one should use in a single script. Therefore, I would like to know a definite way to find the band (a.k.a., variable) names and the order they are stored within a single FILE using gdal.
Please, let me know your thoughs in this regard.
Below, I present a starting point for solving this Issue, in which a file is opened by GDAL (creating a Dataset object).
from gdal import Dataset
from osgeo import gdal
OpeneddatasetFile = gdal.Open(f'NETCDF:{input}/{file_name}.nc:' + var)
if isinstance(OpeneddatasetFile , Dataset):
print("File opened successfully")
# here is where one should be capable of fetching the variable (a.k.a., band) names
# of the OpeneddatasetFile.
# Ideally, it would be most welcome some kind of method that could return a dictionary
# with this information
# something like:
# VariablesWithinFile = OpeneddatasetFile.getVariablesWithinFileAsDictionary()
I have finally found a way to retrieve variable names from the NETCDF file using GDAL, and that is thank's to the comments given by Robert Davy above.
I have organized the code into a set of functions to help its visualization. Notice that there is also a function for reading metadata from the NETCDF, which returns this info in a dictionary format (see the "readInfo" function).
from gdal import Dataset, InfoOptions
from osgeo import gdal
import numpy as np
def read_data(filename):
dataset = gdal.Open(filename)
if not isinstance(dataset, Dataset):
raise FileNotFoundError("Impossible to open the netcdf file")
return dataset
def readInfo(ds, infoFormat="json"):
"how to: https://gdal.org/python/"
info = gdal.Info(ds, options=InfoOptions(format=infoFormat))
return info
def listAllSubDataSets(infoDict: dict):
subDatasetVariableKeys = [x for x in infoDict["metadata"]["SUBDATASETS"].keys()
if "_NAME" in x]
subDatasetVariableNames = [infoDict["metadata"]["SUBDATASETS"][x]
for x in subDatasetVariableKeys]
formatedsubDatasetVariableNames = []
for x in subDatasetVariableNames:
s = x.replace('"', '').split(":")[-1]
s = ''.join(s)
formatedsubDatasetVariableNames.append(s)
return formatedsubDatasetVariableNames
if "__main__" == __name__:
filename = "netcdfFile.nc"
ds = read_data(filename)
infoDict = readInfo(ds)
infoDict["VariableNames"] = listAllSubDataSets(infoDict)
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
I am scraping cars, and will have many pictures, this part is not a problem. I want to save the car specifications also. I am wondering the best way to do this efficiently. Ideally, I would have something like built-in datasets in many libraries. Such as:
print(dataset)
{
'image': ([255, 203, 145, ...]),
'info': (['Audi', '355 HP', ...])
}
That way, I could easily extract images and info with dataset['info'], or something. I could easily assign both like x, y = dataset.
There are several options, but for structured data like this, it's common to store dictionaries using hdf5.
See python tutorial and full documentation here
http://docs.h5py.org/en/stable/quick.html
Here's a full python example. Notice the dictionary like interface.
import h5py
import numpy as np
#####
#writing output file
#####
my_file = h5py.File("output.h5",'w')
my_file['info'] = np.string_("some_random pixels") #hdf5 needs numpy to store strings
my_file['image'] = np.random.rand(5,5)
my_file.close()
#####
#reading input file
#####
loaded_file = h5py.File("output.h5",'r')
print(np.array(loaded_file['info'])) #hdf5 also needs numpy to read strings as well
print(np.array(loaded_file['image']))
loaded_file.close()
Background:
Good day, I need to extract information from a binary file produced by an equipment. The equipment comes with a matlab function to import binary file. From what i understand from the manual, the binary file contains a 64-bit floating point value between 0 and 1.
function phase = importData(folder, qUnit);
fileName = sprintf('%s\\%s.PH', folder, qUnit);
file = fopen(fileName, 'rb');
fseek(file, 0, 'bof');
phase = fread(file, inf, 'float64');
Question: The matlab function works fine, but I wish to import the data into python. May I know how can this be done on Python? I did some research myself and tried something like this at the bottom. But when I do print(fileContent) to check the imported data, python simply stops responding in windows.
with open('sample.PH', mode='rb') as binary_file:
fileContent = binary_file.read()
You can't read binary files like that into a useful form. You should use numpy.fromfile. This will give you a numerical vector with the data, similar to what you would get in MATLAB. You don't even need to open it (although you can if you want). Just give it a filename and it will automatically open, read, then close the file for you.
import numpy as np
file_content = np.fromfile('sample.PH', np.float64)
Edit: here is how you can do multiple repeated values:
import numpy as np
file_content = np.fromfile('sample.PH', [('data', np.float32),
('time', np.float64)])
(I put the last line on two lines for clarity, it could be one line if you prefer, or 10 for that matter, python doesn't care)
This will give you the equivalent of a MATLAB struct where one field is the data and the other field is the time. You can then access the data using file_content['data'] and the time using file_content['time']. There is more information at the link I provided above.
This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.
Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.