Accessing all the data contained in an h5 file with python - python

After I load an h5 files and then check the keys, is there any other data that can be stored in the h5 that I might be missing? For example:
import h5py
a = '/path/to/file.h5'
a_h5 = h5py.File(a)
a_h5.keys()

From the h5py documentation, it looks like you can also do:
a_h5.values()
a_h5.items()
I don't know much about this format, but this looks like additional information you can extract.

Related

write HDF h5 dataset (via h5py) which is a mix of string and numpy list

I have the following two datasets (I have several of these tuples):
filename_string: "something"
filename_list: [1,2,3,4,5] # this is a numpy array.
Id like to know how to write this in a compact format via h5py. The goal is to have the end user read this h5 datafile and be able to deduce the list and its corresponding filename.
I am able to efficiently write the numpy list to h5, but strings seems to be a big problem and errors out when I include this.
Any help would be great - wasted a few hours looking for a solution!
This little scrap of code will create a dataset named something (from the variable filename_string) that contains the data in your list filename_list.
import h5py
filename_string= "something"
filename_list= [1,2,3,4,5]
with h5py.File('SO_63137136.h5','w') as h5f:
h5f.create_dataset(filename_string, data=filename_list)

How to use pickle to load data in a faster way?

Hi I saved my data by using pickle package in python. I did it like this:
save=open('slidings.txt','wb')
pickle.dump(slidings,save)
save.close()
The data format is like this:
slidings=[dic,dic,...,dic]
where dic={key1:value1, key2:value2,...}
where value is a list.
The resulting txt file is 18GB, which is too large to be loaded. I use the following code to load my data:
import pickle
df=open('slidings.txt','rb')
slidings=pickle.load(df)
df.close()
But it's very slow.
Is there any alternative way to do this?
Whether an alternative way to save the data or load the data. Any suggestions would be much appreciated! Thank you.

Creating *.mat file from Python without using dictionary

I have few lists which i want to save it to a *.mat file. But according to scipy.io.savemat command documentation i Need to create a dictionary with the lists and then use the command to save it to a *.mat file.
If i save it according to the way mentioned in the docs the mat file will have structure with variables as the Arrays which i used in the dictionary. Now i have a Problem here, I have another program (which is not editable) will use the mat files and load them to plot some Graphs from the data. The program cannot process the structure because it is written in a way where if it loads a mat files and then it will directly process the Arrays in it.
So is there a way to save the mat file without using dictionaries? Please see the Image for more understanding
Thanks
This is the sample algorithm i used to save my *.mat file
import os
os.getcwd()
os.chdir(os.getcwd())
import scipy.io as sio
x=[1,2,3,4,5]
y=[234,5445,778] #can be 1000 lists
data={}
data['x']=x
data['y']=y
sio.savemat('test.mat',{'interpolated_data':data})
How about
scipy.io.savemat('interpolated_data_max_compare.mat',
{'NA1_X_order10_ACCE_ms2': np.zeros((3000,1)),
'NA1_X_order10_DISP_mm': np.ones((3000,1))})
Should work fine...
According to the code you added in your question, instead of sio.savemat('...', {'interpolated_data':data}), just save
sio.savemat('...', data)
and you should be fine: data is already a dictionary you don't need to add an extra level with {'interpolated_data': data} when saving.
You could use the Writing primitives directly
import scipy.io.matlab as ml
f=open("something.mat","wb")
mw=ml.mio5.MatFile5Writer(f)
mw.put_variables({"testVar":22})

How to find HDF5 file groups/keys within Python?

Lets's say someone gave me a random HDF5 document. I would like to write a function that checks what are the groups/"keys" used.
Take pandas HDFStore(). For many methods which retrieve HDF5 data, one requires to know the key, e.g. pandas.HDFStore.get()
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.HDFStore.get.html
What is the most efficient way to check the identity of keys if not a priori known?
You probably want to use the h5py package:
import h5py
with h5py.File("myfile.h5") as f:
print(f.keys()) # works like a dict

An XML file inside HDF5, h5py

I am using h5py to save data (float numbers), in groups. In addition to the data itself, I need to include an additional file (an .xml file, containing necessary information) within the hdf5. How do i do this? Is my approach wrong?
f = h5py.File('filename.h5')
f.create_dataset('/data/1',numpy_array_1)
f.create_dataset('/data/2',numpy_array_2)
.
.
my h5 tree should look thus:
/
/data
/data/1 (numpy_array_1)
/data/2 (numpy_array_2)
.
.
/morphology.xml (?)
One option is to add it as a variable-length string dataset.
http://code.google.com/p/h5py/wiki/HowTo#Variable-length_strings
E.g.:
import h5py
xmldata = """<xml>
<something>
<else>Text</else>
</something>
</xml>
"""
# Write the xml file...
f = h5py.File('test.hdf5', 'w')
str_type = h5py.new_vlen(str)
ds = f.create_dataset('something.xml', shape=(1,), dtype=str_type)
ds[:] = xmldata
f.close()
# Read the xml file back...
f = h5py.File('test.hdf5', 'r')
print f['something.xml'][0]
If you just need to attach the XML file to the hdf5 file, you can add it as an attribute to the hdf5 file.
xmlfh = open('morphology.xml', 'rb')
h5f.attrs['xml'] = xmlfh.read()
You can access the xml file then like this:
h5f.attrs['xml']
Notice, also, that you can't store attributes larger than 64K, you may want to compress the file before attaching. You can have a look at compressing libraries in the standard library of Python.
However, this doesn't make the information in the XML file very accessible. If you want to associate the metadata of each dataset to some metadata in the XML file, you could map it as you need using an XML library like lxml. You can also add each field of the XML data as a separate attribute so that you can query datasets by XML field, this all depends on what you have in the XML file. Try to think about how you would like to retrieve the data later.
You may also want to create groups for each xml file with its datasets and put it all in a single hdf5 file. I don't know how large are the files you are managing, YMMV.

Categories

Resources