Unpacking data with h5py - python

I want to write numpy arrays to a file and easily load them in again.
I would like to have a function save() that preferably works in the following way:
data = [a, b, c, d]
save('data.h5', data)
which then does the following
h5f = h5py.File('data.h5', 'w')
h5f.create_dataset('a', data=a)
h5f.create_dataset('b', data=b)
h5f.create_dataset('c', data=c)
h5f.create_dataset('d', data=d)
h5f.close()
Then subsequently I would like to easily load this data with for example
a, b, c, d = load('data.h5')
which does the following:
h5f = h5py.File('data.h5', 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
I can think of the following for saving the data:
h5f = h5py.File('data.h5', 'w')
data_str = ['a', 'b', 'c', 'd']
for name in data_str:
h5f.create_dataset(name, data=eval(name))
h5f.close()
I can't think of a similar way of using data_str to then load the data again.

Rereading the question (was this edited or not?), I see load is supposed to function as:
a, b, c, d = load('data.h5')
This eliminates the global variable names issue that I worried about earlier. Just return the 4 arrays (as a tuple), and the calling expression takes care of assigning names. Of course this way, the global variable names do not have to match the names in the file, nor the names used inside the function.
def load(filename):
h5f = h5py.File(filename, 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
return a,b,c,d
Or using a data_str parameter:
def load(filename, data_str=['a','b','c','d']):
h5f = h5py.File(filename, 'r')
arrays = []
for name in data_str:
var = h5f[name][:]
arrays.append(var)
h5f.close()
return arrays
For loading all the variables in the file, see Reading ALL variables in a .mat file with python h5py
An earlier answer that assumed you wanted to take the variable names from the file key names.
This isn't a h5py issue. It's about creating global (or local) variables using names from a dictionary (or other structure). In other words, how creat a variable, using a string as name.
This issue has come up often in connection with argparse, an commandline parser. It gives an object like args=namespace(a=1, b='value'). It is easy to turn that into a dictionary (with vars(args)), {'a':1, 'b':'value'}. But you have to do something tricky, and not Pythonic, to create a and b variables.
It's even worse if you create that dictionary inside a function, and then want to create global variables (i.e. outside the function).
The trick involves assigning to locals() or globals(). But since it's un-pythonic I'm reluctant to be more specific.
In so many words I'm saying the same thing as the accepted answer in https://stackoverflow.com/a/4467517/901925
For loading variables from a file into an Ipython environment, see
https://stackoverflow.com/a/28258184/901925 ipython-loading-variables-to-workspace

I would use deepdish (deepdish.io):
import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'obj2': obj2}, compression=('blosc', 9))

Related

How to pickle a function in Python?

I defined a simple function and pickled it
However when I deserialised it in another file
I couldn’t load it back
I got an error
Here is an example:
import pickle
def fnc(c=0):
a = 1
b = 2
return a,b,c
f = open('example', 'ab')
pickle.dump(fnc, f)
f.close()
f = open('example', 'rb')
fnc = pickle.load(f)
print(fnc)
print(fnc())
print(fnc(1))
<function fnc at 0x7f06345d7598>
(1, 2, 0)
(1, 2, 1)
You can also do it using shelve module. I believe it still uses pickle to store data, but very convenient feature of it is that you can store data in a form of key-value pairs. For example, if you store a ML model, you can store training data and/or feature column names along with the model itself which makes it more convenient.
import shelve
def func(a, b):
return a+b
# Now store function
with shelve.open('foo.shlv', 'w') as shlv:
shlv['function'] = func
# Load function
with shelve.open('foo.shlv', 'r') as shlv:
x = shlv['function']
print(x(2, 3))

Python equivalent of R's save()?

In R I can save multiple objects to harddrive using:
a = 3; b = "c", c = 2
save(a, b, filename = "filename.R")
I can then use load("filename.R") to get all objects back in workspace. Is there an equaivalent for Python?
I know I can use
import pickle
a = 3; b = "c", c = 2
with open("filename.pkl", 'wb') as f:
pickle.dump([a,b], f)
and load it back as:
with open("filename.pkl", 'rb') as f:
a,b = pickle.load(f)
but this requires that I know what is inside filename.pkl in order to do the assignment a,b = pickle.load(f). Is there another way of doing it that is closer to what I did in R? If not, is there a reason for this that I currently fail to see?
--
edit: I don't agree that the linked question discusses the same issue. I am not asking for all variables, only specific ones. Might well be that there is no way to dump all variables (maybe since some variables in the global env cannot be exported or whatnot...) but still possible to export some.

Save some arrays in a same file

I would like to save different arrays in the same file:
a = [[1,2],[3,4],[5,6]]
b = [1,3,5]
I read this documentation about the np.savetxt function
Nevertheless, I can't save a and b in the same file to access to them in an other programm.
I would like something like this:
a = load("file_path",a)
b = load("file_path",b)
How can I do this ?
You can use np.savez instead
np.savez("file_path.npz", a=a, b=b)
And then load with
npzfile = np.load(outfile)
a = npzfile['a']
b = npzfile['b']
EDIT: Updated np.savez call, so arrays are saved with their own names.

Assign variable names and data in a loop

I'm trying to load data into datasets in Python. My data is arranged by years. I want to assign variable names in a loop. Here's what it should look like in pseudocode:
import pandas as pd
for i in range(2010,2017):
Data+i = pd.read_csv("Data_from_" +str(i) + ".csv")
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
The resulting datasets would be Data2010 - Data2017.
While this is possible, it is not a good idea. Code with dynamically-generated variable names is difficult to read and maintain. That said, you could do it using the exec function, which executes code from a string. This would allow you to dynamically construct your variables names using string concatenation.
However, you really should use a dictionary instead. This gives you an object with dynamically-named keys, which is much more suited to your purposes. For example:
import pandas as pd
Data = {}
for i in range(2010,2017):
Data[i] = pd.read_csv("Data_from_" +str(i) + ".csv")
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
# Access data like this:
Data[2011]
You should also use snake_case for variable names in Python, so Data should be data.
If you really wanted to dynamically generate variables, you could do it like this. (But you aren't going to, right?)
import pandas as pd
for i in range(2010,2017):
exec("Data{} = pd.read_csv(\"Data_from_\" +str(i) + \".csv\")".format(i))
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
You can do this without exec too; have a look at Jooks' answer.
You can use globals, locals or a __init__ which uses setattr on self to assign the variables to an instance.
In [1]: globals()['data' + 'a'] = 'n'
In [2]: print dataa
n
In [3]: locals()['data' + 'b'] = 'n'
In [4]: print datab
n
In [5]: class Data(object):
...: def __init__(self, **kwargs):
...: for k, v in kwargs.items():
...: setattr(self, k, v)
...:
In [6]: my_data = Data(a=1, b=2)
In [7]: my_data.a
Out[7]: 1
I would probably go the third route. You may be approaching your solution in an unconventional way, as this pattern is not very familiar, even if it is possible.

Export function that names the export file after the input variable

I'm looking to get a function to export Numpy arrays, but to use the name of the variable input to the function as the name of the exported file. Something like:
MyArray = [some numbers]
def export(Varb):
return np.savetxt("%s.dat" %Varb.name, Varb)
export(MyArray)
that will output a file called 'MyArray.dat' filled with [some numbers]. I can't work out how to do the 'Varb.name' bit. Any suggestions?
I'm new to Python and programming so I hope there is something simple I've missed!
Thanks.
I don't recommend such a code style, but you can do it this way:
import copy
myarray = range(4)
for k, v in iter(copy.copy(locals()).items()):
if myarray == v:
print k
This gives myarray as output. To do this in a function useful for exports use:
import copy
def export_with_name(arg):
""" Export variable's string representation with it's name as filename """
for k, v in iter(copy.copy(globals()).items()):
if arg == v:
with open(k, 'w') as handle:
handle.writelines(repr(arg))
locals() and globals() both give dictionaries holding the variable names as keys and the variable values as values.
Use the function the following way:
some_data = list(range(4))
export_with_name(some_data)
gives a file called some_data with
[0, 1, 2, 3]
as content.
Tested and compatible with Python 2.7 and 3.3
You can't. Python objects don't know what named variables happen to be referencing them at any particular time. By the time the variable has been dereferenced and sent to the function, you don't know where it came from. Consider this bit of code:
map(export, myvarbs)
Here, the varbs were in some sort of container and didn't even have a named variable referencing them.
import os
import inspect
import re
number_array = [8,3,90]
def varname(p):
for line in inspect.getframeinfo(inspect.currentframe().f_back)[3]:
m = re.search(r'\bvarname\s*\(\s*([A-Za-z_][A-Za-z0-9_]*)\s*\)', line)
if m:
return m.group(1)
def export(arg):
file_name = "%s.dat" % varname(arg)
fd = open(file_name, "w")
fd.writelines(str(arg))
fd.close()
export(number_array)
Refer How can you print a variable name in python? to get more details about def varname()

Categories

Resources