I defined a simple function and pickled it
However when I deserialised it in another file
I couldn’t load it back
I got an error
Here is an example:
import pickle
def fnc(c=0):
a = 1
b = 2
return a,b,c
f = open('example', 'ab')
pickle.dump(fnc, f)
f.close()
f = open('example', 'rb')
fnc = pickle.load(f)
print(fnc)
print(fnc())
print(fnc(1))
<function fnc at 0x7f06345d7598>
(1, 2, 0)
(1, 2, 1)
You can also do it using shelve module. I believe it still uses pickle to store data, but very convenient feature of it is that you can store data in a form of key-value pairs. For example, if you store a ML model, you can store training data and/or feature column names along with the model itself which makes it more convenient.
import shelve
def func(a, b):
return a+b
# Now store function
with shelve.open('foo.shlv', 'w') as shlv:
shlv['function'] = func
# Load function
with shelve.open('foo.shlv', 'r') as shlv:
x = shlv['function']
print(x(2, 3))
Related
I have the following in a function:
import h5py
with h5py.File(path, 'r') as f:
big = f['big']
box = f['box']
I'm currently writing a test for the function where I try to mock it
by something like:
def test_function(mocker):
mocker.patch("h5py.File", new=mocker.mock_open())
...
Where mocker comes from: https://pypi.org/project/pytest-mock/
What I want to achieve is for the mock to return me a dict by the name of f so I'm able to interact with it such as in the function above.
Is this possible, I'm prepared to use whatever brute force solution possible out there...
br.
KJ
As described here, you need to ensure the result of h5py.File().__enter__() returns an appropriate dictionary:
from unittest import mock
import h5py
import pytest
def foo():
with h5py.File('.', 'r') as f:
big = f['big']
box = f['box']
return big, box
def test_foo(mocker):
d = {'big': 1, 'box': 2}
m = mocker.MagicMock()
m.__enter__.return_value = d
mocker.patch("h5py.File",
return_value=m)
assert foo() == (1,2)
So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?
It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.
If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.
Use JSON
Similar to Pete's answer, I like using JSON because it maps very well to python data structures and is very readable:
Persisting data is trivial:
>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)
and loading it is about the same:
>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]
In addition, JSON loading doesn't suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.
If you want to write on every operation:
If you want to save on every operation, you can subclass the Python dict object:
import os
import json
class DictPersistJSON(dict):
def __init__(self, filename, *args, **kwargs):
self.filename = filename
self._load();
self.update(*args, **kwargs)
def _load(self):
if os.path.isfile(self.filename)
and os.path.getsize(self.filename) > 0:
with open(self.filename, 'r') as fh:
self.update(json.load(fh))
def _dump(self):
with open(self.filename, 'w') as fh:
json.dump(self, fh)
def __getitem__(self, key):
return dict.__getitem__(self, key)
def __setitem__(self, key, val):
dict.__setitem__(self, key, val)
self._dump()
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
for k, v in dict(*args, **kwargs).items():
self[k] = v
self._dump()
Which you can use like this:
db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write
Which is woefully inefficient, but can get you off the ground quickly.
Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.
Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:
1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.
2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.
It has 1 advantage: Absurdly easy to do.
My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:
Create a YAML file, "employment.yml":
new jersey:
mercer county:
pumbers: 3
programmers: 81
middlesex county:
salesmen: 62
programmers: 81
new york:
queens county:
plumbers: 9
salesmen: 36
Step 3: Read it in Python
import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()
and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.
If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.
pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).
if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.
Have you considered using dbm?
import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')
#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))
#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()
# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])
This tends to work even without a db.close()
I want to write numpy arrays to a file and easily load them in again.
I would like to have a function save() that preferably works in the following way:
data = [a, b, c, d]
save('data.h5', data)
which then does the following
h5f = h5py.File('data.h5', 'w')
h5f.create_dataset('a', data=a)
h5f.create_dataset('b', data=b)
h5f.create_dataset('c', data=c)
h5f.create_dataset('d', data=d)
h5f.close()
Then subsequently I would like to easily load this data with for example
a, b, c, d = load('data.h5')
which does the following:
h5f = h5py.File('data.h5', 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
I can think of the following for saving the data:
h5f = h5py.File('data.h5', 'w')
data_str = ['a', 'b', 'c', 'd']
for name in data_str:
h5f.create_dataset(name, data=eval(name))
h5f.close()
I can't think of a similar way of using data_str to then load the data again.
Rereading the question (was this edited or not?), I see load is supposed to function as:
a, b, c, d = load('data.h5')
This eliminates the global variable names issue that I worried about earlier. Just return the 4 arrays (as a tuple), and the calling expression takes care of assigning names. Of course this way, the global variable names do not have to match the names in the file, nor the names used inside the function.
def load(filename):
h5f = h5py.File(filename, 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()
return a,b,c,d
Or using a data_str parameter:
def load(filename, data_str=['a','b','c','d']):
h5f = h5py.File(filename, 'r')
arrays = []
for name in data_str:
var = h5f[name][:]
arrays.append(var)
h5f.close()
return arrays
For loading all the variables in the file, see Reading ALL variables in a .mat file with python h5py
An earlier answer that assumed you wanted to take the variable names from the file key names.
This isn't a h5py issue. It's about creating global (or local) variables using names from a dictionary (or other structure). In other words, how creat a variable, using a string as name.
This issue has come up often in connection with argparse, an commandline parser. It gives an object like args=namespace(a=1, b='value'). It is easy to turn that into a dictionary (with vars(args)), {'a':1, 'b':'value'}. But you have to do something tricky, and not Pythonic, to create a and b variables.
It's even worse if you create that dictionary inside a function, and then want to create global variables (i.e. outside the function).
The trick involves assigning to locals() or globals(). But since it's un-pythonic I'm reluctant to be more specific.
In so many words I'm saying the same thing as the accepted answer in https://stackoverflow.com/a/4467517/901925
For loading variables from a file into an Ipython environment, see
https://stackoverflow.com/a/28258184/901925 ipython-loading-variables-to-workspace
I would use deepdish (deepdish.io):
import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'obj2': obj2}, compression=('blosc', 9))
I'm a novice programmer, so please pardon me for not using Python-specific vocabulary.
Suppose I define a class CarSpecs with attributes CarReg, Make, Model and Color, create several instances of this class (call them records) and one by one append them to a text file named SuperCars. What I want my program to do is to read the whole file and return the number of cars which are Red (i.e. by looking up the attribute Color of each instance).
Here's what I've done so far:
Defined a class:
class Carspecs(object):
def __init__(self, carreg, make, model, color):
self.CarReg = carreg
self.Make = make
self.Model = model
self.Color = color
Then I created several instances and defined a function to add the instances(or you can say ''records'') to SuperCars:
def addCar(CarRecord):
import pickle
CarFile = open('Supercars', 'ab')
pickle.dump(CarRecord, CarFile)
CarFile.close()
What do I do next to output the number of Red cars?
You'll have to open that file again, read all of the records and then see which cars Color attribute equals red. Because you're saving each instance in the pickle you will have to do something like the following:
>>> with open('Supercars', 'rb') as f:
... data = []
... while True:
... try:
... data.append(pickle.load(f))
... except EOFError:
... break
...
>>>
>>> print(x for x in data if x.Color == 'red')
I suggest you store the data in a list and pickle that list, this way you don't have to use that hacky loop the get all items. Storing such a list is easy. Assume you've created a list of CarSpec-objects and stored them in a list records:
>>> with open('Supercars', 'wb') as f:
... pickle.dump(records, f)
...
>>>
and then reading it is as simple as:
>>> with open('Supercars', 'rb') as f:
... data = pickle.load(f)
...
>>>
And you can even filter it easily:
>>> with open('Supercars', 'rb') as f:
... data = [x for x in pickle.load(f) if x.Color == 'Red']
...
>>>
If you want to display before you're storing them in the pickle you can just iterate over the records-list and print cars with a red color.
What is the easiest way to save and load data in python, preferably in a human-readable output format?
The data I am saving/loading consists of two vectors of floats. Ideally, these vectors would be named in the file (e.g. X and Y).
My current save() and load() functions use file.readline(), file.write() and string-to-float conversion. There must be something better.
The most simple way to get a human-readable output is by using a serialisation format such a JSON. Python contains a json library you can use to serialise data to and from a string. Like pickle, you can use this with an IO object to write it to a file.
import json
file = open('/usr/data/application/json-dump.json', 'w+')
data = { "x": 12153535.232321, "y": 35234531.232322 }
json.dump(data, file)
If you want to get a simple string back instead of dumping it to a file, you can use json.dumps() instead:
import json
print json.dumps({ "x": 12153535.232321, "y": 35234531.232322 })
Reading back from a file is just as easy:
import json
file = open('/usr/data/application/json-dump.json', 'r')
print json.load(file)
The json library is full-featured, so I'd recommend checking out the documentation to see what sorts of things you can do with it.
There are several options -- I don't exactly know what you like. If the two vectors have the same length, you could use numpy.savetxt() to save your vectors, say x and y, as columns:
# saving:
f = open("data", "w")
f.write("# x y\n") # column names
numpy.savetxt(f, numpy.array([x, y]).T)
# loading:
x, y = numpy.loadtxt("data", unpack=True)
If you are dealing with larger vectors of floats, you should probably use NumPy anyway.
If it should be human-readable, I'd
also go with JSON. Unless you need to
exchange it with enterprise-type
people, they like XML better. :-)
If it should be human editable and
isn't too complex, I'd probably go
with some sort of INI-like format,
like for example configparser.
If it is complex, and doesn't need to
be exchanged, I'd go with just
pickling the data, unless it's very
complex, in which case I'd use ZODB.
If it's a LOT of data, and needs to
be exchanged, I'd use SQL.
That pretty much covers it, I think.
A simple serialization format that is easy for both humans to computers read is JSON.
You can use the json Python module.
Here is an example of Encoder until you probably want to write for Body class:
# add this to your code
class BodyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
if hasattr(obj, '__jsonencode__'):
return obj.__jsonencode__()
if isinstance(obj, set):
return list(obj)
return obj.__dict__
# Here you construct your way to dump your data for each instance
# you need to customize this function
def deserialize(data):
bodies = [Body(d["name"],d["mass"],np.array(d["p"]),np.array(d["v"])) for d in data["bodies"]]
axis_range = data["axis_range"]
timescale = data["timescale"]
return bodies, axis_range, timescale
# Here you construct your way to load your data for each instance
# you need to customize this function
def serialize(data):
file = open(FILE_NAME, 'w+')
json.dump(data, file, cls=BodyEncoder, indent=4)
print("Dumping Parameters of the Latest Run")
print(json.dumps(data, cls=BodyEncoder, indent=4))
Here is an example of the class I want to serialize:
class Body(object):
# you do not need to change your class structure
def __init__(self, name, mass, p, v=(0.0, 0.0, 0.0)):
# init variables like normal
self.name = name
self.mass = mass
self.p = p
self.v = v
self.f = np.array([0.0, 0.0, 0.0])
def attraction(self, other):
# not important functions that I wrote...
Here is how to serialize:
# you need to customize this function
def serialize_everything():
bodies, axis_range, timescale = generate_data_to_serialize()
data = {"bodies": bodies, "axis_range": axis_range, "timescale": timescale}
BodyEncoder.serialize(data)
Here is how to dump:
def dump_everything():
data = json.loads(open(FILE_NAME, "r").read())
return BodyEncoder.deserialize(data)
Since we're talking about a human editing the file, I assume we're talking about relatively little data.
How about the following skeleton implementation. It simply saves the data as key=value pairs and works with lists, tuples and many other things.
def save(fname, **kwargs):
f = open(fname, "wt")
for k, v in kwargs.items():
print >>f, "%s=%s" % (k, repr(v))
f.close()
def load(fname):
ret = {}
for line in open(fname, "rt"):
k, v = line.strip().split("=", 1)
ret[k] = eval(v)
return ret
x = [1, 2, 3]
y = [2.0, 1e15, -10.3]
save("data.txt", x=x, y=y)
d = load("data.txt")
print d["x"]
print d["y"]
As I commented in the accepted answer, using numpy this can be done with a simple one-liner:
Assuming you have numpy imported as np (which is common practice),
np.savetxt('xy.txt', np.array([x, y]).T, fmt="%.3f", header="x y")
will save the data in the (optional) format and
x, y = np.loadtxt('xy.txt', unpack=True)
will load it.
The file xy.txt will then look like:
# x y
1.000 1.000
1.500 2.250
2.000 4.000
2.500 6.250
3.000 9.000
Note that the format string fmt=... is optional, but if the goal is human-readability it may prove quite useful. If used, it is specified using the usual printf-like codes (In my example: floating-point number with 3 decimals).