Save some arrays in a same file - python

I would like to save different arrays in the same file:
a = [[1,2],[3,4],[5,6]]
b = [1,3,5]
I read this documentation about the np.savetxt function
Nevertheless, I can't save a and b in the same file to access to them in an other programm.
I would like something like this:
a = load("file_path",a)
b = load("file_path",b)
How can I do this ?

You can use np.savez instead
np.savez("file_path.npz", a=a, b=b)
And then load with
npzfile = np.load(outfile)
a = npzfile['a']
b = npzfile['b']
EDIT: Updated np.savez call, so arrays are saved with their own names.

Related

How to pickle a function in Python?

I defined a simple function and pickled it
However when I deserialised it in another file
I couldn’t load it back
I got an error
Here is an example:
import pickle
def fnc(c=0):
a = 1
b = 2
return a,b,c
f = open('example', 'ab')
pickle.dump(fnc, f)
f.close()
f = open('example', 'rb')
fnc = pickle.load(f)
print(fnc)
print(fnc())
print(fnc(1))
<function fnc at 0x7f06345d7598>
(1, 2, 0)
(1, 2, 1)
You can also do it using shelve module. I believe it still uses pickle to store data, but very convenient feature of it is that you can store data in a form of key-value pairs. For example, if you store a ML model, you can store training data and/or feature column names along with the model itself which makes it more convenient.
import shelve
def func(a, b):
return a+b
# Now store function
with shelve.open('foo.shlv', 'w') as shlv:
shlv['function'] = func
# Load function
with shelve.open('foo.shlv', 'r') as shlv:
x = shlv['function']
print(x(2, 3))

Closing files with dask's from_array function

I happened to use Dask's "from_array" method. As in the docs at https://docs.dask.org/en/latest/array-creation.html, I do as follows:
>>> import h5py
>>> f = h5py.File('myfile.hdf5') # HDF5 file
>>> d = f['/data/path'] # Pointer on on-disk array
>>> x = da.from_array(d, chunks=(1000, 1000))
But in this example, do you agree that I should close the hdf5 file after processing the data?
If yes, it may be useful to add a feature to Dask array to allow to just pass the file pointer and the dataset key in order to include a routine in Dask array that would close the source file, if any, when the dask array object is destroyed.
I know that a good way to proceed would be like this:
>>> import h5py
>>> with h5py.File('myfile.hdf5') as f: # HDF5 file
>>> d = f['/data/path'] # Pointer on on-disk array
>>> x = da.from_array(d, chunks=(1000, 1000))
But sometimes it is not really handy. For example, in my code, I have a function that returns a dask array from a filepath with some sanity checks in between, a bit like :
>>> import h5py
>>> function get_dask_array(filepath, key)
>>> f = h5py.File(filepath) # HDF5 file
>>> # ... some sanity checks here
>>> d = f[key] # Pointer on on-disk array
>>> # ... some sanity checks here
>>> return da.from_array(d, chunks=(1000, 1000))
In this case, I find it ugly to return the file pointer as well and keep it aside for the duration of the processing, before closing it.
Any suggestion on how I should do?
Thank you in advance for your answers,
Regards,
Edit: for now I am using a global variables inside a package as follows:
#atexit.register
def clean_files():
for f in SOURCE_FILES:
if os.path.isfile(s):
f.close()

Python equivalent of R's save()?

In R I can save multiple objects to harddrive using:
a = 3; b = "c", c = 2
save(a, b, filename = "filename.R")
I can then use load("filename.R") to get all objects back in workspace. Is there an equaivalent for Python?
I know I can use
import pickle
a = 3; b = "c", c = 2
with open("filename.pkl", 'wb') as f:
pickle.dump([a,b], f)
and load it back as:
with open("filename.pkl", 'rb') as f:
a,b = pickle.load(f)
but this requires that I know what is inside filename.pkl in order to do the assignment a,b = pickle.load(f). Is there another way of doing it that is closer to what I did in R? If not, is there a reason for this that I currently fail to see?
--
edit: I don't agree that the linked question discusses the same issue. I am not asking for all variables, only specific ones. Might well be that there is no way to dump all variables (maybe since some variables in the global env cannot be exported or whatnot...) but still possible to export some.

How to create numpy arrays automatically?

I wanted to create arrays by for loop to assign automatically array names.
But using a for loop, it didn't work and creating a dictionary with numpy.array() in it, does not work, too. Currently, I have no more ideas...
I am not really safe in handling with python.
import numpy as np
for file_name in folder:
file_name = np.array()
file_name.extend((blabla, blabla1))
I expected to get arrays with automatically assigned names, like file_name1, file_name2, ...
But I got the advice, "redeclared file_name defined above without usage" and the output was at line file_name = np.array()
TypeError: array() missing required argument 'object' (pos 1) ...
You can do it with globals() if you really want to use the strings as named variables.
globals()[filename] = np.array()
Example:
>>> globals()['test'] = 1
>>> test
1
Of course this populates the global namespace. Otherwise, you can use locals().
As #Mark Meyer said in comment, you should use dictionary (dict in Python) by setting file_name as key.
As per your error, when you create a numpy array, you should provide an iterable (ex. a list).
For example:
>>> folder = ['file1', 'file2']
>>> blabla = 0
>>> blabla1 = 1
>>> {f: np.array((blabla, blabla1)) for f in folder}
{'file1': array([0, 1]), 'file2': array([0, 1])}

Unpacking data with h5py

I want to write numpy arrays to a file and easily load them in again.
I would like to have a function save() that preferably works in the following way:
data = [a, b, c, d]
save('data.h5', data)
which then does the following
h5f = h5py.File('data.h5', 'w')
h5f.create_dataset('a', data=a)
h5f.create_dataset('b', data=b)
h5f.create_dataset('c', data=c)
h5f.create_dataset('d', data=d)
h5f.close()
Then subsequently I would like to easily load this data with for example
a, b, c, d = load('data.h5')
which does the following:
h5f = h5py.File('data.h5', 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
I can think of the following for saving the data:
h5f = h5py.File('data.h5', 'w')
data_str = ['a', 'b', 'c', 'd']
for name in data_str:
h5f.create_dataset(name, data=eval(name))
h5f.close()
I can't think of a similar way of using data_str to then load the data again.
Rereading the question (was this edited or not?), I see load is supposed to function as:
a, b, c, d = load('data.h5')
This eliminates the global variable names issue that I worried about earlier. Just return the 4 arrays (as a tuple), and the calling expression takes care of assigning names. Of course this way, the global variable names do not have to match the names in the file, nor the names used inside the function.
def load(filename):
h5f = h5py.File(filename, 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
return a,b,c,d
Or using a data_str parameter:
def load(filename, data_str=['a','b','c','d']):
h5f = h5py.File(filename, 'r')
arrays = []
for name in data_str:
var = h5f[name][:]
arrays.append(var)
h5f.close()
return arrays
For loading all the variables in the file, see Reading ALL variables in a .mat file with python h5py
An earlier answer that assumed you wanted to take the variable names from the file key names.
This isn't a h5py issue. It's about creating global (or local) variables using names from a dictionary (or other structure). In other words, how creat a variable, using a string as name.
This issue has come up often in connection with argparse, an commandline parser. It gives an object like args=namespace(a=1, b='value'). It is easy to turn that into a dictionary (with vars(args)), {'a':1, 'b':'value'}. But you have to do something tricky, and not Pythonic, to create a and b variables.
It's even worse if you create that dictionary inside a function, and then want to create global variables (i.e. outside the function).
The trick involves assigning to locals() or globals(). But since it's un-pythonic I'm reluctant to be more specific.
In so many words I'm saying the same thing as the accepted answer in https://stackoverflow.com/a/4467517/901925
For loading variables from a file into an Ipython environment, see
https://stackoverflow.com/a/28258184/901925 ipython-loading-variables-to-workspace
I would use deepdish (deepdish.io):
import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'obj2': obj2}, compression=('blosc', 9))

Categories

Resources