Read dictionary from file - python

Background (optional)
I am writting a python script to analyse Abaqus (finite element software) outputs. This software generates a ".odb" which has a propriedtary format. However you can access the data stored inside of the databse thanks to python libraries specialy developped by Dassault (owner of the Abaqus sofware). The python script has to be run by the software in order to access these libraries :
abaqus python myScript.py
However it is really hard to use new libraries this way, and I cannot make it run the matplotlib library. So I would like to export the array I created inside a file, and access it later with an other script that would not require to be run using abaqus
The Problem
In order to manipulate the data, I am using collections. For exemple:
coord2s11=defaultdict(list)
This array stores the Z coordinate of a group of nodes and their stress value, at each time step:
coord2s11[time_step][node_number][0]=z_coordinate
coord2s11[time_step][node_number][1]=stress_value
For a given time step, the output would be :
defaultdict(<type 'list'>, {52101: [-61.83229635920749, 0.31428813934326172], 52102: [-51.948098314163417, 0.31094224750995636],[...], 52152: [440.18335942363655, -0.11255115270614624]})
And the glob (for all step time):
defaultdict(<type 'list'>, {0.0: defaultdict(<type 'list'>, {52101: [0.0, 0.0],[...]}), 12.660835266113281: defaultdict(<type 'list'>, {52101: [0.0, 0.0],[...],52152: [497.74876378582229, -0.24295337498188019]})})
If it is visually unpleasant, it is rather easy to use ! I printed this array inside this file using:
with open('node2coord.dat','w') as f :
f.write(str(glob))
I tried to follow the solution I found on this post, but when I try to read the file a store the value inside a new dictionnay
import ast
with open('node2coord.dat', 'r') as f:
s = f.read()
node2coord = ast.literal_eval(s)
I end up with a SyntaxError: invalid syntax, that I guess comes from the defaultdict(<type 'list'> here and there in the array.
Is there a way to get the data stored inside of the file or should I modify the way it is written inside the file ? Ideally I would like to create the exact same array I stored.
The solution by Joel Johnson
Creating a database using shelve. It is an easy and fast method. The following code did the trick for me to create the db :
import os
import shelve
curdir = os.path.dirname(__file__) #defining current directory
d = shelve.open(os.path.join(curdir, 'nameOfTheDataBase')) #creation of the db
d['keyLabel'] = glob # storing the dictionary in "d" under the key 'keyLabel'
d.close() # close the db
The "with" statement did not work for me.
And then to open it again :
import os
import shelve
curdir = os.path.dirname(__file__)
d = shelve.open(os.path.join(curdir, 'nameOfTheDataBase')) #opening the db
newDictionary = d['keyLabel'] #loading the dictionary inside of newDictionary
d.close()
If you ever get an error saying
ImportError: No module named gdbm
Just install the gdbm module. For linux :
sudo apt-get install python-gdbm
More information here

If you have access to shelve (which I think you would because it's part of the standard library) I would highly recommend using that. Using shelve is an easy way to store and load python objects without manually parsing and reconstructing them.
import shelve
with shelve.open('myData') as s:
s["glob"] = glob
Thats it for storing the data. Then when you need to retrieve it...
import shelve
with shelve.open('myData') as s:
glob = s["glob"]
It's as simple as that.

Related

Is there any feasible solution to read WOT battle results .dat files?

I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.

Python library to use .mat files [duplicate]

Is it possible to read binary MATLAB .mat files in Python?
I've seen that SciPy has alleged support for reading .mat files, but I'm unsuccessful with it. I installed SciPy version 0.7.0, and I can't find the loadmat() method.
An import is required, import scipy.io...
import scipy.io
mat = scipy.io.loadmat('file.mat')
Neither scipy.io.savemat, nor scipy.io.loadmat work for MATLAB arrays version 7.3. But the good part is that MATLAB version 7.3 files are hdf5 datasets. So they can be read using a number of tools, including NumPy.
For Python, you will need the h5py extension, which requires HDF5 on your system.
import numpy as np
import h5py
f = h5py.File('somefile.mat','r')
data = f.get('data/variable1')
data = np.array(data) # For converting to a NumPy array
First save the .mat file as:
save('test.mat', '-v7')
After that, in Python, use the usual loadmat function:
import scipy.io as sio
test = sio.loadmat('test.mat')
There is a nice package called mat4py which can easily be installed using
pip install mat4py
It is straightforward to use (from the website):
Load data from a MAT-file
The function loadmat loads all variables stored in the MAT-file into a simple Python data structure, using only Python’s dict and list objects. Numeric and cell arrays are converted to row-ordered nested lists. Arrays are squeezed to eliminate arrays with only one element. The resulting data structure is composed of simple types that are compatible with the JSON format.
Example: Load a MAT-file into a Python data structure:
from mat4py import loadmat
data = loadmat('datafile.mat')
The variable data is a dict with the variables and values contained in the MAT-file.
Save a Python data structure to a MAT-file
Python data can be saved to a MAT-file, with the function savemat. Data has to be structured in the same way as for loadmat, i.e. it should be composed of simple data types, like dict, list, str, int, and float.
Example: Save a Python data structure to a MAT-file:
from mat4py import savemat
savemat('datafile.mat', data)
The parameter data shall be a dict with the variables.
Having MATLAB 2014b or newer installed, the MATLAB engine for Python could be used:
import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat", nargout=1)
Reading the file
import scipy.io
mat = scipy.io.loadmat(file_name)
Inspecting the type of MAT variable
print(type(mat))
#OUTPUT - <class 'dict'>
The keys inside the dictionary are MATLAB variables, and the values are the objects assigned to those variables.
There is a great library for this task called: pymatreader.
Just do as follows:
Install the package: pip install pymatreader
Import the relevant function of this package: from pymatreader import read_mat
Use the function to read the matlab struct: data = read_mat('matlab_struct.mat')
use data.keys() to locate where the data is actually stored.
The keys will usually look like: dict_keys(['__header__', '__version__', '__globals__', 'data_opp']). Where data_opp will be the actual key which stores the data. The name of this key can ofcourse be changed between different files.
Last step - Create your dataframe: my_df = pd.DataFrame(data['data_opp'])
That's it :)
There is also the MATLAB Engine for Python by MathWorks itself. If you have MATLAB, this might be worth considering (I haven't tried it myself but it has a lot more functionality than just reading MATLAB files). However, I don't know if it is allowed to distribute it to other users (it is probably not a problem if those persons have MATLAB. Otherwise, maybe NumPy is the right way to go?).
Also, if you want to do all the basics yourself, MathWorks provides (if the link changes, try to google for matfile_format.pdf or its title MAT-FILE Format) a detailed documentation on the structure of the file format. It's not as complicated as I personally thought, but obviously, this is not the easiest way to go. It also depends on how many features of the .mat-files you want to support.
I've written a "small" (about 700 lines) Python script which can read some basic .mat-files. I'm neither a Python expert nor a beginner and it took me about two days to write it (using the MathWorks documentation linked above). I've learned a lot of new stuff and it was quite fun (most of the time). As I've written the Python script at work, I'm afraid I cannot publish it... But I can give some advice here:
First read the documentation.
Use a hex editor (such as HxD) and look into a reference .mat-file you want to parse.
Try to figure out the meaning of each byte by saving the bytes to a .txt file and annotate each line.
Use classes to save each data element (such as miCOMPRESSED, miMATRIX, mxDOUBLE, or miINT32)
The .mat-files' structure is optimal for saving the data elements in a tree data structure; each node has one class and subnodes
To read mat file to pandas dataFrame with mixed data types
import scipy.io as sio
mat=sio.loadmat('file.mat')# load mat-file
mdata = mat['myVar'] # variable in mat file
ndata = {n: mdata[n][0,0] for n in mdata.dtype.names}
Columns = [n for n, v in ndata.items() if v.size == 1]
d=dict((c, ndata[c][0]) for c in Columns)
df=pd.DataFrame.from_dict(d)
display(df)
Apart from scipy.io.loadmat for v4 (Level 1.0), v6, v7 to 7.2 matfiles and h5py.File for 7.3 format matfiles, there is anther type of matfiles in text data format instead of binary, usually created by Octave, which can't even be read in MATLAB.
Both of scipy.io.loadmat and h5py.File can't load them (tested on scipy 1.5.3 and h5py 3.1.0), and the only solution I found is numpy.loadtxt.
import numpy as np
mat = np.loadtxt('xxx.mat')
Can also use the hdf5storage library. official documentation here for details on matlab version support.
import hdf5storage
label_file = "./LabelTrain.mat"
out = hdf5storage.loadmat(label_file)
print(type(out)) # <class 'dict'>
from os.path import dirname, join as pjoin
import scipy.io as sio
data_dir = pjoin(dirname(sio.__file__), 'matlab', 'tests', 'data')
mat_fname = pjoin(data_dir, 'testdouble_7.4_GLNX86.mat')
mat_contents = sio.loadmat(mat_fname)
You can use above code to read the default saved .mat file in Python.
After struggling with this problem myself and trying other libraries (I have to say mat4py is a good one as well but with a few limitations) I have built this library ("matdata2py") that can handle most variable types and most importantly for me the "string" type. The .mat file needs to be saved in the -V7.3 version. I hope this can be useful for the community.
Installation:
pip install matdata2py
How to use this lib:
import matdata2py as mtp
To load the Matlab data file:
Variables_output = mtp.loadmatfile(file_Name, StructsExportLikeMatlab = True, ExportVar2PyEnv = False)
print(Variables_output.keys()) # with ExportVar2PyEnv = False the variables are as elements of the Variables_output dictionary.
with ExportVar2PyEnv = True you can see each variable separately as python variables with the same name as saved in the Mat file.
Flag descriptions
StructsExportLikeMatlab = True/False structures are exported in dictionary format (False) or dot-based format similar to Matlab (True)
ExportVar2PyEnv = True/False export all variables in a single dictionary (True) or as separate individual variables into the python environment (False)
scipy will work perfectly to load the .mat files.
And we can use the get() function to convert it to a numpy array.
mat = scipy.io.loadmat('point05m_matrix.mat')
x = mat.get("matrix")
print(type(x))
print(len(x))
plt.imshow(x, extent=[0,60,0,55], aspect='auto')
plt.show()
To Upload and Read mat files in python
Install mat4py in python.On successful installation we get:
Successfully installed mat4py-0.5.0.
Importing loadmat from mat4py.
Save file actual location inside a variable.
Load mat file format to a data value using python
pip install mat4py
from mat4py import loadmat
boston = r"E:\Downloads\boston.mat"
data = loadmat(boston, meta=False)

How to save multiple data at once in Python

I am running a script which takes, say, an hour to generate the data I want. I want to be able to save all of the relevant variables to some external file so I can fiddle with them later without having to run the hour-long calculation over again. Is there an easy way I can save all of the variables I need into one convenient file?
In Matlab I would just contain all of the results of the calculation in a single structure so that later I could just load results.mat and I would have everything I need stored as results.output1, results.output2 or whatever. What is the Python equivalent of this?
In particular, the data that I would like to save includes arrays of complex numbers, which seems to present difficulties for using things like json.
I suggest taking look at built-in shelve module which provides persistent, dictionary-like object and generally does work with all native Python types so you can do:
Write complex to some file (in my example it is named mydata) under key n (keep in mind that keys should be strings).
import shelve
my_number = 2+7j
with shelve.open('mydata') as db:
db['n'] = my_number
Later retrieve that number from given file
import shelve
with shelve.open('mydata') as db:
my_number = db['n']
print(my_number) # (2+7j)
You can use pickle function in Python and then use the dump function to dump all your data into a file. You can reuse the data later.I suggest you find more about pickle.
I would recommend a json file. With json you can assign variables to keywords, just like dictionaries in stock python. The json package is automatically installed when installing python.
import json
dict = {var1: "abcde", var2: "fghij"}
with open(path, "w") as file:
json.dump(dict, file, indent=2, ensure_ascii = False)
You can also load this from a file using the same api:
with open(path, r) as file:
text = file.read()
dict = json.loads(text)
Edit: Json can also handle every datatype python can, so if you want to save an array you can just define that in the dict:
dict = {list1: ["ab", "cd", "ef"]}

Is it possible to store a python pickle object as a string inside a class?

I want to do some testing on a feature of my (python) program which is computationally very heavy. I could run the code, store the output in a pandas.DataFrame, pickle the df and distribute with my package so that the tests can be ran by users. However I think this goes against the principles of unittesting, namely, that a test should be independent of external sources and self contained.
An alternative idea would be if I were to store a pickle file as a string within an importable python class then dynamically write the pickle file and clean it up after the test. Is this possible to do and if so how can I do it?
Here's a small bit of code that simply write a df to pickle.pickle in the current working directory.
import pickle
import os
import pandas
df = pandas.DataFrame([1,2,3,4,5,6])
filename = os.path.join(os.getcwd(), 'pickle.pickle')
df.to_pickle(filename)
Would it then be possible to somehow get a string version of the pickle so that I can store it in a class?
Would it then be possible to somehow get a string version of the
pickle so that I can store it in a class?
Just read the full file:
with open(filename, "rb") as f:
data = f.read()
Then if you need you can just unpicle it with loads
unpickled = pickle.loads(data)

Working with JSON in Python 2.6?

I'm really new to Python, but I've picked a problem that actually pertains to work and I think as I figure out how to do it I'll learn along the way.
I have a directory full of JSON-formatted files. I've gotten as far as importing everything in the directory into a list, and iterating through the list to do a simple print that verifies I got the data.
I'm trying to figure out how to actually work with a given JSON object in Python. In javascript, its as simple as
var x = {'asd':'bob'}
alert( x.asd ) //alerts 'bob'
Accessing the various properties on an object is simple dot notation. What's the equivalent for Python?
So this is my code that is doing the import. I'd like to know how to work with the individual objects stored in the list.
#! /usr/local/bin/python2.6
import os, json
#define path to reports
reportspath = "reports/"
# Gets all json files and imports them
dir = os.listdir(reportspath)
jsonfiles = []
for fname in dir:
with open(reportspath + fname,'r') as f:
jsonfiles.append( json.load(f) )
for i in jsonfiles:
print i #prints the contents of each file stored in jsonfiles
What you get when you json.load a file containing the JSON form of a Javascript object such as {'abc': 'def'} is a Python dictionary (normally and affectionately called a dict) (which in this case happens to have the same textual representation as the Javascript object).
To access a specific item, you use indexing, mydict['abc'], while in Javascript you'd use attribute-access notation, myobj.abc. What you get with attribute-access notation in Python are methods that you can call on your dict, for example mydict.keys() would give ['abc'], a list with all the key values that are present in the dictionary (in this case, only one, and it's a string).
Dictionaries are extremely rich in functionality, with a wealth of methods that will make your head spin plus strong support for many Python language structures (for example, you can loop on a dict, for k in mydict:, and k will step through the dictionary's keys, iteratively and sequentially).
To access all properties, try eval() statement before append a list.
like:
import os
#define path to reports
reportspath = "reports/"
# Gets all json files and imports them
dir = os.listdir(reportspath)
for fname in dir:
json = eval(open(fname).read())
# now, json is a normal python object
print json
# list all properties...
print dir(json)

Categories

Resources