Storing python lmfit object in hdf5 file - python

I'm trying to store the results of a fit I made using the lmfit package for python in an hdf5 file using the h5py package for python.
Currently I find myself recreating the structure of the data object by hand (i.e. loop over all keys in dictionary, get values and save them).
I have the feeling there has to be a more efficient/pythonic way of saving such an object in an hdf5 file similar to how a pickle of an object would work.
Could anyone help me find a way to efficiently store the information contained in an lmfit.model.ModelFit or lmfit.parameter.Parameters object in an hdf5 file?
edited to show currently used.
def add_analysis_datagroup_to_file(self, group_name='Analysis'):
try:
self.analysis_group= self.f.create_group(group_name)
except ValueError:
print 'Datagroup name "%s" already exists in hdf5 file' %group_name
self.analysis_group = self.f[group_name]
def save_fitted_parameters(self, fit_results=None):
if fit_results is None:
fit_results = self.fit_results
try:
fit_grp = self.analysis_group.create_group('Fitted Params')
except:
fit_grp = self.analysis_group['Fitted Params']
for parname, par in self.fit_results.params.iteritems():
try:
par_group = fit_grp.create_group(parname)
except:
par_group = fit_grp[parname]
par_dict = vars(par)
for val_name, val in par_dict.iteritems():
if val_name == '_val':
val_name = 'value'
if val_name == 'correl' and val is not None:
try:
correl_group = par_group.create_group(val_name)
except:
correl_group = par_group[val_name]
for cor_name, cor_val in val.iteritems():
correl_group.attrs.create(name=cor_name, data=cor_val)
else:
try:
par_group.attrs.create(name=val_name, data=val)
except:
pass

This is quite an old post, but I just had the same problem so hopefully this answer will help someone...
You can use the built in method dumps() in the ModelResult class in lmfit to convert the file into json format which can then be saved as an hdf5 string. You can also use list comprehension to store an array of json files if you want to store multiple fit_results in place. the Parameter class as well as the Model class also have dumps methods.
To reload use loads() (Again, Parameters and Models can be reloaded as they have loads methods as well)
f = h5py.File('fit_result_example.hdf5','w')
grp = f.create_group('group1')
dt = h5py.special_dtype(vlen=str)
fit_results = np.asarray([fit_results.dumps()], dtype=dt)
grp.create_dataset('fit_results', data=fit_results)
f.close()

Related

json tree iteration missing return condition

Im iterating through a nested json tree with Pandas dataframe. The issue im having is more or less simple to solve, but im out of ideas. When im traversing trough the nested json tree i get to a part where i cant get out of it and continue on another branch (i.e. when i reach Placeholder 1 i cant return and continue with Placeholder 2 (see json below). Here is my code so far:
def recursiveImport(df):
for row,_ in enumerate(df):
# Get ID, Name, Type
id = df['ID'].values[row]
name = df['Name'].values[row]
type = df['Type'].values[row]
# Iterate through Value
if type == 'struct':
for i in df.at[row, 'Value']:
df = pd.json_normalize(i)
recursiveImport(df)
elif type != 'struct':
value = df['Value'].values[row]
print(f'Value: {value}')
return
data = pd.read_json('work_gmt.json', orient='records')
print(data)
recursiveImport(data)
And the (minified) data im using for this is below (you can use a online json viewer to get a better look):
[{"ID":11,"Name":"Data","Type":"struct","Value":[[{"ID":0,"Name":"humidity","Type":"u32","Value":0},{"ID":0,"Name":"meta","Type":"struct","Value":[{"ID":0,"Name":"height","Type":"e32","Value":[0,0]},{"ID":0,"Name":"voltage","Type":"u16","Value":0},{"ID":0,"Name":"Placeholder 1","Type":"u16","Value":0}]},{"ID":0,"Name":"Placeholder 2","Type":"struct","Value":[{"ID":0,"Name":"volume","Type":"struct","Value":[{"ID":0,"Name":"volume profile","Type":"struct","Value":[{"ID":0,"Name":"upper","Type":"u8","Value":0},{"ID":0,"Name":"middle","Type":"u8","Value":0},{"ID":0,"Name":"down","Type":"u8","Value":0}]}]}]}]]}]
I tried using an indexed approach and keep track of each branch, but that didn't work for me. Perhaps i have to use a Stack/Queue to keep track? Thanks in advance!
Cheers!

How to store and load a Python dictionary with HDF5

I'm having issues loading (I think storing is working – a file is being created and contains data) a dictionary (string key and array/list value) from a HDF5 file. I'm receiving the following error:
ValueError: malformed node or string: < HDF5 dataset "dataset_1": shape (), type "|O" >
My code is:
import h5py
def store_table(self, filename):
table = dict()
table['test'] = list(np.zeros(7,dtype=int))
with h5py.File(filename, "w") as file:
file.create_dataset('dataset_1', data=str(table))
file.close()
def load_table(self, filename):
file = h5py.File(filename, "r")
data = file.get('dataset_1')
print(ast.literal_eval(data))
I've read online using the ast method literal_eval should work but it doesn't appear to help... How do I 'unpack' the HDF5 so it's a dictionary again?
Any ideas would be appreciated.
It's not clear to me what you really want to accomplish. (I suspect your dictionaries have more than seven zeros. Otherwise, HDF5 is overkill to store your data.) If you have a lot of very large dictionaries, it would be better to covert the data to a NumPy array then either 1) create and load the dataset with data= or 2) create the dataset with an appropriate dtype then populate. You can create datasets with mixed datatypes, which is not addressed in the previous solution. If those situations don't apply, you might want to save the dictionary as attributes. Attributes can be associated to a group, a dataset, or the file object itself. Which is best depends on your requirements.
I wrote a short example to show how to load dictionary key/value pairs as attribute names/value pairs tagged to a group. For this example, I assumed the dictionary has a name key with the group name for association. The process is almost identical for a dataset or file object (just change the object reference).
import h5py
def load_dict_to_attr(h5f, thisdict) :
if 'name' not in thisdict:
print('Dictionary missing name key. Skipping function.')
return
dname = thisdict.get('name')
if dname in h5f:
print('Group:' + dname + ' exists. Skipping function.')
return
else:
grp = h5f.create_group(dname)
for key, val in thisdict.items():
grp.attrs[key] = val
###########################################
def get_grp_attrs(name, node) :
grp_dict = {}
for k in node.attrs.keys():
grp_dict[k]= node.attrs[k]
print (grp_dict)
###########################################
car1 = dict( name='my_car', brand='Ford', model='Mustang', year=1964,
engine='V6', disp=260, units='cu.in' )
car2 = dict( name='your_car', brand='Chevy', model='Camaro', year=1969,
engine='I6', disp=250, units='cu.in' )
car3 = dict( name='dads_car', brand='Mercedes', model='350SL', year=1972,
engine='V8', disp=4520, units='cc' )
car4 = dict( name='moms_car', brand='Plymouth', model='Voyager', year=1989,
engine='V6', disp=289, units='cu.in' )
a_truck = dict( brand='Dodge', model='RAM', year=1984,
engine='V8', disp=359, units='cu.in' )
garage = dict(my_car=car1,
your_car=car2,
dads_car=car3,
moms_car=car4,
a_truck=a_truck )
with h5py.File('SO_61226773.h5','w') as h5w:
for car in garage:
print ('\nLoading dictionary:', car)
load_dict_to_attr(h5w, garage.get(car))
with h5py.File('SO_61226773.h5','r') as h5r:
print ('\nReading dictionaries from Group attributes:')
h5r.visititems (get_grp_attrs)
If I understand what you are trying to do, this should work:
import numpy as np
import ast
import h5py
def store_table(filename):
table = dict()
table['test'] = list(np.zeros(7,dtype=int))
with h5py.File(filename, "w") as file:
file.create_dataset('dataset_1', data=str(table))
def load_table(filename):
file = h5py.File(filename, "r")
data = file.get('dataset_1')[...].tolist()
file.close();
return ast.literal_eval(data)
filename = "file.h5"
store_table(filename)
data = load_table(filename)
print(data)
My preferred solution is just to convert them to ascii and then store this binary data.
import h5py
import json
import itertools
#generate a test dictionary
testDict={
"one":1,
"two":2,
"three":3,
"otherStuff":[{"A":"A"}]
}
testFile=h5py.File("test.h5","w")
#create a test data set containing the binary representation of my dictionary data
testFile.create_dataset(name="dictionary",shape=(len([i.encode("ascii","ignore") for i in json.dumps(testDict)]),1),dtype="S10",data=[i.encode("ascii","ignore") for i in json.dumps(testDict)])
testFile.close()
testFile=h5py.File("test.h5","r")
#load the test data back
dictionary=testFile["dictionary"][:].tolist()
dictionary=list(itertools.chain(*dictionary))
dictionary=json.loads(b''.join(dictionary))
The two key parts are:
testFile.create_dataset(name="dictionary",shape=(len([i.encode("ascii","ignore") for i in json.dumps(testDict)]),1),dtype="S10",data=[i.encode("ascii","ignore") for i in json.dumps(testDict)])
Where
data=[i.encode("ascii","ignore") for i in json.dumps(testDict)])
Converts the dictionary to a list of ascii charecters (The string shape may also be calculated from this)
Decoding back from the hdf5 container is a little simpler:
dictionary=testFile["dictionary"][:].tolist()
dictionary=list(itertools.chain(*dictionary))
dictionary=json.loads(b''.join(dictionary))
All that this is doing is loading the string from the hdf5 container and converting it to a list of bytes. Then I coerce this into a bytes object which I can convert back to a dictionary with json.loads
If you are ok with the extra library usage (json, ittertools) I think this offers a somewhat more pythonic solution (which in my case wasnt a problem since I was using them anyway).

Defining variables in a loop based on a dictionary

I am converting DICOMs to PNGs with Python 3.x and Pydicom. There are occasional errors when reading DICOM header tags, causing the script to crash. Until now, I worked around it by using exceptions like below:
try: studyd = ds.StudyDate
except: studyd = ''
pass
...
This repetitive approach lengthens the code.
Unfortunately, I fail optimizing the code by defining a dictionary containing the Pydicom header and the target variable and looping through it. How could I do this with something like:
ds = pydicom.dcmread()
tags = { 'StudyDate': 'studyd', 'Modality': 'modal', 'PatientName': 'patname', etc.}
for key, val in tags.items():
...
Try this:
ds = pydicom.dcmread()
tags = { 'StudyDate': 'studyd', 'Modality': 'modal', 'PatientName': 'patname', etc.}
header_dict = dict()
for key, val in tags.items():
header_dict[val] = getattr(ds, key)
print(header_dict)
Using the getattr to get the header value and storing it in a dict against the specified name

How to read HDF5 files in Python

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file.
My code
import h5py
import numpy as np
f1 = h5py.File(file_name,'r+')
This works and the file is read. But how can I access data inside the file object f1?
Read HDF5
import h5py
filename = "file.hdf5"
with h5py.File(filename, "r") as f:
# Print all root level object names (aka keys)
# these can be group or dataset names
print("Keys: %s" % f.keys())
# get first object name/key; may or may NOT be a group
a_group_key = list(f.keys())[0]
# get the object type for a_group_key: usually group or dataset
print(type(f[a_group_key]))
# If a_group_key is a group name,
# this gets the object names in the group and returns as a list
data = list(f[a_group_key])
# If a_group_key is a dataset name,
# this gets the dataset values and returns as a list
data = list(f[a_group_key])
# preferred methods to get dataset values:
ds_obj = f[a_group_key] # returns as a h5py dataset object
ds_arr = f[a_group_key][()] # returns as a numpy array
Write HDF5
import h5py
# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))
# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
data_file.create_dataset("dataset_name", data=data_matrix)
See h5py docs for more information.
Alternatives
JSON: Nice for writing human-readable data; VERY commonly used (read & write)
CSV: Super simple format (read & write)
pickle: A Python serialization format (read & write)
MessagePack (Python package): More compact representation (read & write)
HDF5 (Python package): Nice for matrices (read & write)
XML: exists too *sigh* (read & write)
For your application, the following might be important:
Support by other programming languages
Reading / writing performance
Compactness (file size)
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Reading the file
import h5py
f = h5py.File(file_name, mode)
Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys():
print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
print(type(f[key])) # get the object type: usually group or dataset
Extracting the data
#Get the HDF5 group; key needs to be a group name from above
group = f[key]
#Checkout what keys are inside that group.
for key in group.keys():
print(key)
# This assumes group[some_key_inside_the_group] is a dataset,
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data
#After you are done
f.close()
you can use Pandas.
import pandas as pd
pd.read_hdf(filename,key)
Here's a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:
def read_hdf5(path):
weights = {}
keys = []
with h5py.File(path, 'r') as f: # open file
f.visit(keys.append) # append all keys to list
for key in keys:
if ':' in key: # contains data if ':' in key
print(f[key].name)
weights[f[key].name] = f[key].value
return weights
https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.
Haven't tested it thoroughly but does the job for me.
To read the content of .hdf5 file as an array, you can do something as follow
> import numpy as np
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)
Use below code to data read and convert into numpy array
import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)
Preferred method to read dataset values into a numpy array:
import h5py
# use Python file context manager:
with h5py.File('data_1.h5', 'r') as f1:
print(list(f1.keys())) # print list of root level objects
# following assumes 'x' and 'y' are dataset objects
ds_x1 = f1['x'] # returns h5py dataset object for 'x'
ds_y1 = f1['y'] # returns h5py dataset object for 'y'
arr_x1 = f1['x'][()] # returns np.array for 'x'
arr_y1 = f1['y'][()] # returns np.array for 'y'
arr_x1 = ds_x1[()] # uses dataset object to get np.array for 'x'
arr_y1 = ds_y1[()] # uses dataset object to get np.array for 'y'
print (arr_x1.shape)
print (arr_y1.shape)
from keras.models import load_model
h= load_model('FILE_NAME.h5')
If you have named datasets in the hdf file then you can use the following code to read and convert these datasets in numpy arrays:
import h5py
file = h5py.File('filename.h5', 'r')
xdata = file.get('xdata')
xdata= np.array(xdata)
If your file is in a different directory you can add the path in front of'filename.h5'.
What you need to do is create a dataset. If you take a look at the quickstart guide, it shows you that you need to use the file object in order to create a dataset. So, f.create_dataset and then you can read the data. This is explained in the docs.
Using bits of answers from this question and the latest doc, I was able to extract my numerical arrays using
import h5py
with h5py.File(filename, 'r') as h5f:
h5x = h5f[list(h5f.keys())[0]]['x'][()]
Where 'x' is simply the X coordinate in my case.
use this it works fine for me
weights = {}
keys = []
with h5py.File("path.h5", 'r') as f:
f.visit(keys.append)
for key in keys:
if ':' in key:
print(f[key].name)
weights[f[key].name] = f[key][()]
return weights
print(read_hdf5())
if you are using the h5py<='2.9.0'
then you can use
weights = {}
keys = []
with h5py.File("path.h5", 'r') as f:
f.visit(keys.append)
for key in keys:
if ':' in key:
print(f[key].name)
weights[f[key].name] = f[key].value
return weights
print(read_hdf5())

What is the best way to get values from a config file? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have 15 values that I want to get from a config file and store them in separate variables.
I am using
from ConfigParser import SafeConfigParser
parser = SafeConfigParser()
parser.read(configFile)
and it is a really good library.
Option #1
If I change the name of the variable and want it to match the config file entry I have to edit the corresponding line in the function
def fromConfig():
#open file
localOne = parser.get(section, 'one')
localTwo = parser.get(section, 'two')
return one, two
one = ''
two = ''
#etc
one, two = fromConfig()
Option #2
It is cleaner to see where the variables get their values from, but then I would be opening and closing the file for every variable
def getValueFromConfigFile(option):
#open file
value = parser.get(section, option)
return value
one = getValueFromConfigFile("one")
two = getValueFromConfigFile("two")
Option #3
This one doesn't make much sense since I have to have another list of all my variable names, but the function is cleaner.
def getValuesFromConfigFile(options):
#open file
values = []
for option in options:
values.append(parser.get(section, option))
return values
one = ''
two = ''
configList = ["one", "two"]
one, two = getValuesFromConfigFile(configList)
EDIT:
Here is my attempt at reading the file one and storing all values in a dict and then trying to use he values.
I have a multi-lined string and I am using
%(nl)s to be a new line character so then when I get the value
message = parser.get(section, 'message', vars={'nl':'\n'})
Here is my code:
from ConfigParser import SafeConfigParser
def getValuesFromConfigFile(configFile):
''' reads a single section of a config file as a dict '''
parser = SafeConfigParser()
parser.read(configFile)
section = parser.sections()[0]
options = dict(parser.items(section))
return options
options = getValuesFromConfigFile(configFile)
one = options["one"]
To get values from a single section as a dict:
options = dict(parser.items(section))
You could access individual values as usual: options["one"], options["two"]. In Python 3.2+ configparser provides dict-like access by itself.
For flexibility, to support updating config from a variety of source formats and/or centralize configuration management; you could define custom class that encapsulates parsing/access to config variables e.g.:
class Config(object):
# ..
def update_from_ini(self, inifile):
# read file..
self.__dict__.update(parser.items(section))
Individual values are available as instance attributes in this case: config.one, config.two.
A solution could be as well to use dictionaries & json which can make things verry easy & reusable
import json
def saveJson(fName, data):
f = open(fName, "w+")
f.write(json.dumps(data, indent=4))
f.close()
def loadJson(fName):
f = open(fName, "r")
data = json.loads(f.read())
f.close()
return data
mySettings = {
"one": "bla",
"two": "blabla"
}
saveJson("mySettings.json", mySettings)
myMoadedSettings = loadJson("mySettings.json")
print myMoadedSettings["two"]
As a possible solution:
module_variables = globals() # represents the current global symbol table
for name in ('one', 'two'):
module_variables[name] = parser.get(section, name)
print one, two

Categories

Resources