Background:
Good day, I need to extract information from a binary file produced by an equipment. The equipment comes with a matlab function to import binary file. From what i understand from the manual, the binary file contains a 64-bit floating point value between 0 and 1.
function phase = importData(folder, qUnit);
fileName = sprintf('%s\\%s.PH', folder, qUnit);
file = fopen(fileName, 'rb');
fseek(file, 0, 'bof');
phase = fread(file, inf, 'float64');
Question: The matlab function works fine, but I wish to import the data into python. May I know how can this be done on Python? I did some research myself and tried something like this at the bottom. But when I do print(fileContent) to check the imported data, python simply stops responding in windows.
with open('sample.PH', mode='rb') as binary_file:
fileContent = binary_file.read()
You can't read binary files like that into a useful form. You should use numpy.fromfile. This will give you a numerical vector with the data, similar to what you would get in MATLAB. You don't even need to open it (although you can if you want). Just give it a filename and it will automatically open, read, then close the file for you.
import numpy as np
file_content = np.fromfile('sample.PH', np.float64)
Edit: here is how you can do multiple repeated values:
import numpy as np
file_content = np.fromfile('sample.PH', [('data', np.float32),
('time', np.float64)])
(I put the last line on two lines for clarity, it could be one line if you prefer, or 10 for that matter, python doesn't care)
This will give you the equivalent of a MATLAB struct where one field is the data and the other field is the time. You can then access the data using file_content['data'] and the time using file_content['time']. There is more information at the link I provided above.
Related
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
I am trying to save variables after I run a program to a text file and read them in different module, to call them back in the original program. Point of that is to write plots with 4 different outcome of the main program.
attempt at coding
#main program
a = array([[0.05562032, 0.05386903, 0.05216994, 0.03045489, 0.03029977,
0.03014554],
[0. , 0.00175129, 0.00345037, 0.15353227, 0.1536874 ,
0.15384163]])
#save paramaters in external file
save_paramaters = open('save.txt','w')
save_paramaters.write(str(a))
save_paramaters.close()
I open the txt file in python module and save it as a variable, which I corrected manually(replacing spaces with commas)
#new program
dat = "save.txt"
b = open(dat, "r")
c = array(b.read())
In the main program I now call the variable with
a = array([[0.05562032, 0.05386903, 0.05216994, 0.03045489, 0.03029977,
0.03014554],
[0. , 0.00175129, 0.00345037, 0.15353227, 0.1536874 ,
0.15384163])
#save paramaters in external file
save_paramaters = open('save.txt','w')
save_paramaters.write(str(a))
save_paramaters.close()
#open the variable
from program import c
from matplotlib.pyplot import figure, plot
#and try to plot it
plot(c[1][:], label ='results2')
plot(c[0][:], label ='results1')
File "/Example.py", line 606, in example
plot(c[1][:], label ='results2') #model
IndexError: too many indices for array
If you want to save an array you can't just save it as text and expect python to figure it out. When you read it, you're reading it as text (as a string) and that's all your program can know.
If you want to save complex objects you have several other options:
You can save text (as you do) but parse it manually when reading it to turn it into an array. This is complex to write without bugs and will get even more complex if you have anything even more complex than an array.
You can save it using pickle - while this is a good solution for almost all objects, the file created wouldn't be readable to humans, and that's perhaps not what you want.
A good middle ground is to save objects as JSON - this is a standard for most datatypes and would work beautifully for dicts and lists and tuples (but will fail with more complex objects), and more importantly, it will be readable to humans such as yourself.
Let's say you go with JSON. You save a list like this:
import json
with open('save.txt','w') as f:
json.dump(your_object, f)
As simple as that. To read back the list:
import json
with open('save.txt','r') as f:
your_new_object = json.load(f)
This is fairly simple isn't it? Notice I used a with statement to open the files to make sure they close properly as well, but that's also more simple to write. Using pickles is fairly similar and even has the same syntax, but objects are saved as bytes and not text (so you have to use 'rb' and 'wb' modes on files to read and write, respectively).
To do the same thing with numpy array, we can also use numpy.save:
np.save('save', your_numpy_array)
And we read it back (with a npy extension) with numpy.load:
your_array = np.load('save.npy')
In readability terms, opening the file would be semi-readable (less than JSON, more than pickle)
I have a lot of satellite data that is consists of two-dimension.
(I convert H5 to 2d array data that not include latitude information
I made Lat/Lon information data additionally.)
I know real Lat/Lon coordination and grid coordination in one data.
How can I partially read 2d satellite file in Python?
"numpy.fromfile" was usually used to read binary file.
if I use option as count in numpy.fromfile, I can read binary file partially.
However I want to skip front records in one data for save memory.
for example, i have 3x3 2d data as follow:
python
a= [[1,2,3]
[4,5,6]
[7,8,9]]
I just read a[3][0] in Python. (result = 7)
When I read file in Fortran, I used "recl, rec".
Fortran
open(1, file='exsmaple.bin', access='direct', recl=4) ! recl=4 means 4 btype
read(1, rec=lat*x-lon) filename
close(1)
lat means position of latitude in data.
(lat = 3 in above exsample ; start number is 1 in Fortran.)
lon means position of longitude in data.
(lon = 1 in above exsample ; start number is 1 in Fortran.)
x is no. rows.
(x = 3, above example, array is 3x3)
I can read file, and use only 4 byte of memory.
I want to know similar method in Python.
Please give me special information to save time and memory.
Thank you for reading my question.
2016.10.28.
Solution
python
Data = [1,2,3,4,5,6,7,8,9], dtype = int8, filename=name
a = np.memmap(name, dtype='int8', mode='r', shape=(1), offset=6)
print a[0]
result : 7
To read .h5 files :
import h5py
ds = h5py.File(filename, "r")
variable = ds['variable_name']
It's hard to follow your description. Some proper code indentation would help over come your English language problems.
So you have data on a H5 file. The simplest approach is to h5py to load it into a Python/numpy session, and select the necessary data from those arrays.
But it sounds as though you have written a portion of this data to a 'plain' binary file. It might help to know how you did it. Also in what way is this 2d?
np.fromfile reads a file as though it was 1d. Can you read this file, up to some count? And with a correct dtype?
np.fromfile accepts an open file. So I think you can open the file, use seek to skip forward, and then read count items from there. But I haven't tested that idea.
This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.
Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.
I am trying to read a fortran file with headers as integers and then the actual data as 32 bit floats. Using numpy's fromfile('mydatafile', dtype=np.float32) it reads in the whole file as float32 but I need the headers to be in int32 for my output file. Using scipy's FortranFile it reads the headers:
f = FortranFile('mydatafile', 'r')
headers = f.read_ints(dtype=np.int32)
but when I do:
data = f.read_reals(dtype=np.float32)
it returns an empty array. I know it shouldn't be empty because using numpy's fromfile it reads all of the data. Oddly enough the scipy method worked for other files in my dataset, but not this one. Perhaps i'm not understanding the difference between each of the two read methods with numpy and scipy. Is there a way to isolate the headers (dtype=np.int32) and data (dtype=np.float32) when reading in the file with either method?
np.fromfile takes a "count" argument, which specifies how many items to read. If you know the number of integers in the header in advance, a simple way to do what you want without any type conversions would just be to read the header as integers, followed by the rest of the file as floats:
with open('filepath','r') as f:
header = np.fromfile(f, dtype=np.int, count=number_of_integers)
data = np.fromfile(f, dtype=np.float32)
#DavidTrevelyan has an quite okay way. Another way is to use the fortranfile package in combination with struct. Neither way is ideal, but then neither is scipy's FortranFile.
At least this way you can read mixed-type data. Here's an example:
from fortranfile import FortranFile
from struct import unpack
with FortranFile(to_open) as fh:
dat = fh.readRecord()
val_list = unpack('=4i20d'.format(ln), dat)
You can install it using pip install fortranfile. struct is standard, the (un)pack format is here.