So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?
It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.
If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.
Use JSON
Similar to Pete's answer, I like using JSON because it maps very well to python data structures and is very readable:
Persisting data is trivial:
>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)
and loading it is about the same:
>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]
In addition, JSON loading doesn't suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.
If you want to write on every operation:
If you want to save on every operation, you can subclass the Python dict object:
import os
import json
class DictPersistJSON(dict):
def __init__(self, filename, *args, **kwargs):
self.filename = filename
self._load();
self.update(*args, **kwargs)
def _load(self):
if os.path.isfile(self.filename)
and os.path.getsize(self.filename) > 0:
with open(self.filename, 'r') as fh:
self.update(json.load(fh))
def _dump(self):
with open(self.filename, 'w') as fh:
json.dump(self, fh)
def __getitem__(self, key):
return dict.__getitem__(self, key)
def __setitem__(self, key, val):
dict.__setitem__(self, key, val)
self._dump()
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
for k, v in dict(*args, **kwargs).items():
self[k] = v
self._dump()
Which you can use like this:
db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write
Which is woefully inefficient, but can get you off the ground quickly.
Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.
Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:
1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.
2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.
It has 1 advantage: Absurdly easy to do.
My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:
Create a YAML file, "employment.yml":
new jersey:
mercer county:
pumbers: 3
programmers: 81
middlesex county:
salesmen: 62
programmers: 81
new york:
queens county:
plumbers: 9
salesmen: 36
Step 3: Read it in Python
import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()
and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.
If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.
pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).
if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.
Have you considered using dbm?
import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')
#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))
#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()
# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])
This tends to work even without a db.close()
Related
What is the recommended way of serializing a namedtuple to json with the field names retained?
Serializing a namedtuple to json results in only the values being serialized and the field names being lost in translation. I would like the fields also to be retained when json-ized and hence did the following:
class foobar(namedtuple('f', 'foo, bar')):
__slots__ = ()
def __iter__(self):
yield self._asdict()
The above serializes to json as I expect and behaves as namedtuple in other places I use (attribute access etc.,) except with a non-tuple like results while iterating it (which fine for my use case).
What is the "correct way" of converting to json with the field names retained?
If it's just one namedtuple you're looking to serialize, using its _asdict() method will work (with Python >= 2.7)
>>> from collections import namedtuple
>>> import json
>>> FB = namedtuple("FB", ("foo", "bar"))
>>> fb = FB(123, 456)
>>> json.dumps(fb._asdict())
'{"foo": 123, "bar": 456}'
This is pretty tricky, since namedtuple() is a factory which returns a new type derived from tuple. One approach would be to have your class also inherit from UserDict.DictMixin, but tuple.__getitem__ is already defined and expects an integer denoting the position of the element, not the name of its attribute:
>>> f = foobar('a', 1)
>>> f[0]
'a'
At its heart the namedtuple is an odd fit for JSON, since it is really a custom-built type whose key names are fixed as part of the type definition, unlike a dictionary where key names are stored inside the instance. This prevents you from "round-tripping" a namedtuple, e.g. you cannot decode a dictionary back into a namedtuple without some other a piece of information, like an app-specific type marker in the dict {'a': 1, '#_type': 'foobar'}, which is a bit hacky.
This is not ideal, but if you only need to encode namedtuples into dictionaries, another approach is to extend or modify your JSON encoder to special-case these types. Here is an example of subclassing the Python json.JSONEncoder. This tackles the problem of ensuring that nested namedtuples are properly converted to dictionaries:
from collections import namedtuple
from json import JSONEncoder
class MyEncoder(JSONEncoder):
def _iterencode(self, obj, markers=None):
if isinstance(obj, tuple) and hasattr(obj, '_asdict'):
gen = self._iterencode_dict(obj._asdict(), markers)
else:
gen = JSONEncoder._iterencode(self, obj, markers)
for chunk in gen:
yield chunk
class foobar(namedtuple('f', 'foo, bar')):
pass
enc = MyEncoder()
for obj in (foobar('a', 1), ('a', 1), {'outer': foobar('x', 'y')}):
print enc.encode(obj)
{"foo": "a", "bar": 1}
["a", 1]
{"outer": {"foo": "x", "bar": "y"}}
It looks like you used to be able to subclass simplejson.JSONEncoder to make this work, but with the latest simplejson code, that is no longer the case: you have to actually modify the project code. I see no reason why simplejson should not support namedtuples, so I forked the project, added namedtuple support, and I'm currently waiting for my branch to be pulled back into the main project. If you need the fixes now, just pull from my fork.
EDIT: Looks like the latest versions of simplejson now natively support this with the namedtuple_as_object option, which defaults to True.
I wrote a library for doing this: https://github.com/ltworf/typedload
It can go from and to named-tuple and back.
It supports quite complicated nested structures, with lists, sets, enums, unions, default values. It should cover most common cases.
edit: The library also supports dataclass and attr classes.
It's impossible to serialize namedtuples correctly with the native python json library. It will always see tuples as lists, and it is impossible to override the default serializer to change this behaviour. It's worse if objects are nested.
Better to use a more robust library like orjson:
import orjson
from typing import NamedTuple
class Rectangle(NamedTuple):
width: int
height: int
def default(obj):
if hasattr(obj, '_asdict'):
return obj._asdict()
rectangle = Rectangle(width=10, height=20)
print(orjson.dumps(rectangle, default=default))
=>
{
"width":10,
"height":20
}
There is a more convenient solution is to use the decorator (it uses the protected field _fields).
Python 2.7+:
import json
from collections import namedtuple, OrderedDict
def json_serializable(cls):
def as_dict(self):
yield OrderedDict(
(name, value) for name, value in zip(
self._fields,
iter(super(cls, self).__iter__())))
cls.__iter__ = as_dict
return cls
#Usage:
C = json_serializable(namedtuple('C', 'a b c'))
print json.dumps(C('abc', True, 3.14))
# or
#json_serializable
class D(namedtuple('D', 'a b c')):
pass
print json.dumps(D('abc', True, 3.14))
Python 3.6.6+:
import json
from typing import TupleName
def json_serializable(cls):
def as_dict(self):
yield {name: value for name, value in zip(
self._fields,
iter(super(cls, self).__iter__()))}
cls.__iter__ = as_dict
return cls
# Usage:
#json_serializable
class C(NamedTuple):
a: str
b: bool
c: float
print(json.dumps(C('abc', True, 3.14))
It recursively converts the namedTuple data to json.
print(m1)
## Message(id=2, agent=Agent(id=1, first_name='asd', last_name='asd', mail='2#mai.com'), customer=Customer(id=1, first_name='asd', last_name='asd', mail='2#mai.com', phone_number=123123), type='image', content='text', media_url='h.com', la=123123, ls=4512313)
def reqursive_to_json(obj):
_json = {}
if isinstance(obj, tuple):
datas = obj._asdict()
for data in datas:
if isinstance(datas[data], tuple):
_json[data] = (reqursive_to_json(datas[data]))
else:
print(datas[data])
_json[data] = (datas[data])
return _json
data = reqursive_to_json(m1)
print(data)
{'agent': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'id': 1},
'content': 'text',
'customer': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'phone_number': 123123,
'id': 1},
'id': 2,
'la': 123123,
'ls': 4512313,
'media_url': 'h.com',
'type': 'image'}
The jsonplus library provides a serializer for NamedTuple instances. Use its compatibility mode to output simple objects if needed, but prefer the default as it is helpful for decoding back.
This is an old question. However:
A suggestion for all those with the same question, think carefully about using any of the private or internal features of the NamedTuple because they have before and will change again over time.
For example, if your NamedTuple is a flat value object and you're only interested in serializing it and not in cases where it is nested into another object, you could avoid the troubles that would come up with __dict__ being removed or _as_dict() changing and just do something like (and yes this is Python 3 because this answer is for the present):
from typing import NamedTuple
class ApiListRequest(NamedTuple):
group: str="default"
filter: str="*"
def to_dict(self):
return {
'group': self.group,
'filter': self.filter,
}
def to_json(self):
return json.dumps(self.to_dict())
I tried to use the default callable kwarg to dumps in order to do the to_dict() call if available, but that didn't get called as the NamedTuple is convertible to a list.
Here is my take on the problem. It serializes the NamedTuple, takes care of folded NamedTuples and Lists inside of them
def recursive_to_dict(obj: Any) -> dict:
_dict = {}
if isinstance(obj, tuple):
node = obj._asdict()
for item in node:
if isinstance(node[item], list): # Process as a list
_dict[item] = [recursive_to_dict(x) for x in (node[item])]
elif getattr(node[item], "_asdict", False): # Process as a NamedTuple
_dict[item] = recursive_to_dict(node[item])
else: # Process as a regular element
_dict[item] = (node[item])
return _dict
simplejson.dump() instead of json.dump does the job. It may be slower though.
I'm trying to load data into datasets in Python. My data is arranged by years. I want to assign variable names in a loop. Here's what it should look like in pseudocode:
import pandas as pd
for i in range(2010,2017):
Data+i = pd.read_csv("Data_from_" +str(i) + ".csv")
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
The resulting datasets would be Data2010 - Data2017.
While this is possible, it is not a good idea. Code with dynamically-generated variable names is difficult to read and maintain. That said, you could do it using the exec function, which executes code from a string. This would allow you to dynamically construct your variables names using string concatenation.
However, you really should use a dictionary instead. This gives you an object with dynamically-named keys, which is much more suited to your purposes. For example:
import pandas as pd
Data = {}
for i in range(2010,2017):
Data[i] = pd.read_csv("Data_from_" +str(i) + ".csv")
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
# Access data like this:
Data[2011]
You should also use snake_case for variable names in Python, so Data should be data.
If you really wanted to dynamically generate variables, you could do it like this. (But you aren't going to, right?)
import pandas as pd
for i in range(2010,2017):
exec("Data{} = pd.read_csv(\"Data_from_\" +str(i) + \".csv\")".format(i))
# Stores data from file "Data_from_YYYY.csv" as dataset DataYYYY.
You can do this without exec too; have a look at Jooks' answer.
You can use globals, locals or a __init__ which uses setattr on self to assign the variables to an instance.
In [1]: globals()['data' + 'a'] = 'n'
In [2]: print dataa
n
In [3]: locals()['data' + 'b'] = 'n'
In [4]: print datab
n
In [5]: class Data(object):
...: def __init__(self, **kwargs):
...: for k, v in kwargs.items():
...: setattr(self, k, v)
...:
In [6]: my_data = Data(a=1, b=2)
In [7]: my_data.a
Out[7]: 1
I would probably go the third route. You may be approaching your solution in an unconventional way, as this pattern is not very familiar, even if it is possible.
How do I serialize a Python dictionary into a string, and then back to a dictionary? The dictionary will have lists and other dictionaries inside it.
It depends on what you're wanting to use it for. If you're just trying to save it, you should use pickle (or, if you’re using CPython 2.x, cPickle, which is faster).
>>> import pickle
>>> pickle.dumps({'foo': 'bar'})
b'\x80\x03}q\x00X\x03\x00\x00\x00fooq\x01X\x03\x00\x00\x00barq\x02s.'
>>> pickle.loads(_)
{'foo': 'bar'}
If you want it to be readable, you could use json:
>>> import json
>>> json.dumps({'foo': 'bar'})
'{"foo": "bar"}'
>>> json.loads(_)
{'foo': 'bar'}
json is, however, very limited in what it will support, while pickle can be used for arbitrary objects (if it doesn't work automatically, the class can define __getstate__ to specify precisely how it should be pickled).
>>> pickle.dumps(object())
b'\x80\x03cbuiltins\nobject\nq\x00)\x81q\x01.'
>>> json.dumps(object())
Traceback (most recent call last):
...
TypeError: <object object at 0x7fa0348230c0> is not JSON serializable
Pickle is great but I think it's worth mentioning literal_eval from the ast module for an even lighter weight solution if you're only serializing basic python types. It's basically a "safe" version of the notorious eval function that only allows evaluation of basic python types as opposed to any valid python code.
Example:
>>> d = {}
>>> d[0] = range(10)
>>> d['1'] = {}
>>> d['1'][0] = range(10)
>>> d['1'][1] = 'hello'
>>> data_string = str(d)
>>> print data_string
{0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], '1': {0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 1: 'hello'}}
>>> from ast import literal_eval
>>> d == literal_eval(data_string)
True
One benefit is that the serialized data is just python code, so it's very human friendly. Compare it to what you would get with pickle.dumps:
>>> import pickle
>>> print pickle.dumps(d)
(dp0
I0
(lp1
I0
aI1
aI2
aI3
aI4
aI5
aI6
aI7
aI8
aI9
asS'1'
p2
(dp3
I0
(lp4
I0
aI1
aI2
aI3
aI4
aI5
aI6
aI7
aI8
aI9
asI1
S'hello'
p5
ss.
The downside is that as soon as the the data includes a type that is not supported by literal_ast you'll have to transition to something else like pickling.
Use Python's json module, or simplejson if you don't have python 2.6 or higher.
If you fully trust the string and don't care about python injection attacks then this is very simple solution:
d = { 'method' : "eval", 'safe' : False, 'guarantees' : None }
s = str(d)
d2 = eval(s)
for k in d2:
print k+"="+d2[k]
If you're more safety conscious then ast.literal_eval is a better bet.
One thing json cannot do is dict indexed with numerals. The following snippet
import json
dictionary = dict({0:0, 1:5, 2:10})
serialized = json.dumps(dictionary)
unpacked = json.loads(serialized)
print(unpacked[0])
will throw
KeyError: 0
Because keys are converted to strings. cPickle preserves the numeric type and the unpacked dict can be used right away.
pyyaml should also be mentioned here. It is both human readable and can serialize any python object.
pyyaml is hosted here:
https://pypi.org/project/PyYAML
While not strictly serialization, json may be reasonable approach here. That will handled nested dicts and lists, and data as long as your data is "simple": strings, and basic numeric types.
A new alternative to JSON or YaML is NestedText. It supports strings that are nested in lists and dictionaries to any depth. It conveys nesting through the use of indenting, and so has no need for either quoting or escaping. As such, the result tends to be very readable. The result looks like YaML, but without all the special cases. It is especially appropriate for serializing code snippets. For example, here is an a single test case extracted from a much larger set that was serialized with NestedText:
base tests:
-
args: --quiet --config test7 files -N configs/subdir
expected:
> Archive: test7-\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d
> «TESTS»/configs/subdir/
> «TESTS»/configs/subdir/file
Be aware, that integers, floats, and bools are converted to strings.
If you are trying to only serialize then pprint may also be a good option. It requires the object to be serialized and a file stream.
Here's some code:
from pprint import pprint
my_dict = {1:'a',2:'b'}
with open('test_results.txt','wb') as f:
pprint(my_dict,f)
I am not sure if we can deserialize easily. I was using json to serialize and deserialze earlier which works correctly in most cases.
f.write(json.dumps(my_dict, sort_keys = True, indent = 2, ensure_ascii=True))
However, in one particular case, there were some errors writing non-unicode data to json.
What is the easiest way to save and load data in python, preferably in a human-readable output format?
The data I am saving/loading consists of two vectors of floats. Ideally, these vectors would be named in the file (e.g. X and Y).
My current save() and load() functions use file.readline(), file.write() and string-to-float conversion. There must be something better.
The most simple way to get a human-readable output is by using a serialisation format such a JSON. Python contains a json library you can use to serialise data to and from a string. Like pickle, you can use this with an IO object to write it to a file.
import json
file = open('/usr/data/application/json-dump.json', 'w+')
data = { "x": 12153535.232321, "y": 35234531.232322 }
json.dump(data, file)
If you want to get a simple string back instead of dumping it to a file, you can use json.dumps() instead:
import json
print json.dumps({ "x": 12153535.232321, "y": 35234531.232322 })
Reading back from a file is just as easy:
import json
file = open('/usr/data/application/json-dump.json', 'r')
print json.load(file)
The json library is full-featured, so I'd recommend checking out the documentation to see what sorts of things you can do with it.
There are several options -- I don't exactly know what you like. If the two vectors have the same length, you could use numpy.savetxt() to save your vectors, say x and y, as columns:
# saving:
f = open("data", "w")
f.write("# x y\n") # column names
numpy.savetxt(f, numpy.array([x, y]).T)
# loading:
x, y = numpy.loadtxt("data", unpack=True)
If you are dealing with larger vectors of floats, you should probably use NumPy anyway.
If it should be human-readable, I'd
also go with JSON. Unless you need to
exchange it with enterprise-type
people, they like XML better. :-)
If it should be human editable and
isn't too complex, I'd probably go
with some sort of INI-like format,
like for example configparser.
If it is complex, and doesn't need to
be exchanged, I'd go with just
pickling the data, unless it's very
complex, in which case I'd use ZODB.
If it's a LOT of data, and needs to
be exchanged, I'd use SQL.
That pretty much covers it, I think.
A simple serialization format that is easy for both humans to computers read is JSON.
You can use the json Python module.
Here is an example of Encoder until you probably want to write for Body class:
# add this to your code
class BodyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
if hasattr(obj, '__jsonencode__'):
return obj.__jsonencode__()
if isinstance(obj, set):
return list(obj)
return obj.__dict__
# Here you construct your way to dump your data for each instance
# you need to customize this function
def deserialize(data):
bodies = [Body(d["name"],d["mass"],np.array(d["p"]),np.array(d["v"])) for d in data["bodies"]]
axis_range = data["axis_range"]
timescale = data["timescale"]
return bodies, axis_range, timescale
# Here you construct your way to load your data for each instance
# you need to customize this function
def serialize(data):
file = open(FILE_NAME, 'w+')
json.dump(data, file, cls=BodyEncoder, indent=4)
print("Dumping Parameters of the Latest Run")
print(json.dumps(data, cls=BodyEncoder, indent=4))
Here is an example of the class I want to serialize:
class Body(object):
# you do not need to change your class structure
def __init__(self, name, mass, p, v=(0.0, 0.0, 0.0)):
# init variables like normal
self.name = name
self.mass = mass
self.p = p
self.v = v
self.f = np.array([0.0, 0.0, 0.0])
def attraction(self, other):
# not important functions that I wrote...
Here is how to serialize:
# you need to customize this function
def serialize_everything():
bodies, axis_range, timescale = generate_data_to_serialize()
data = {"bodies": bodies, "axis_range": axis_range, "timescale": timescale}
BodyEncoder.serialize(data)
Here is how to dump:
def dump_everything():
data = json.loads(open(FILE_NAME, "r").read())
return BodyEncoder.deserialize(data)
Since we're talking about a human editing the file, I assume we're talking about relatively little data.
How about the following skeleton implementation. It simply saves the data as key=value pairs and works with lists, tuples and many other things.
def save(fname, **kwargs):
f = open(fname, "wt")
for k, v in kwargs.items():
print >>f, "%s=%s" % (k, repr(v))
f.close()
def load(fname):
ret = {}
for line in open(fname, "rt"):
k, v = line.strip().split("=", 1)
ret[k] = eval(v)
return ret
x = [1, 2, 3]
y = [2.0, 1e15, -10.3]
save("data.txt", x=x, y=y)
d = load("data.txt")
print d["x"]
print d["y"]
As I commented in the accepted answer, using numpy this can be done with a simple one-liner:
Assuming you have numpy imported as np (which is common practice),
np.savetxt('xy.txt', np.array([x, y]).T, fmt="%.3f", header="x y")
will save the data in the (optional) format and
x, y = np.loadtxt('xy.txt', unpack=True)
will load it.
The file xy.txt will then look like:
# x y
1.000 1.000
1.500 2.250
2.000 4.000
2.500 6.250
3.000 9.000
Note that the format string fmt=... is optional, but if the goal is human-readability it may prove quite useful. If used, it is specified using the usual printf-like codes (In my example: floating-point number with 3 decimals).
So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?
It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.
If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.
Use JSON
Similar to Pete's answer, I like using JSON because it maps very well to python data structures and is very readable:
Persisting data is trivial:
>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)
and loading it is about the same:
>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]
In addition, JSON loading doesn't suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.
If you want to write on every operation:
If you want to save on every operation, you can subclass the Python dict object:
import os
import json
class DictPersistJSON(dict):
def __init__(self, filename, *args, **kwargs):
self.filename = filename
self._load();
self.update(*args, **kwargs)
def _load(self):
if os.path.isfile(self.filename)
and os.path.getsize(self.filename) > 0:
with open(self.filename, 'r') as fh:
self.update(json.load(fh))
def _dump(self):
with open(self.filename, 'w') as fh:
json.dump(self, fh)
def __getitem__(self, key):
return dict.__getitem__(self, key)
def __setitem__(self, key, val):
dict.__setitem__(self, key, val)
self._dump()
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
for k, v in dict(*args, **kwargs).items():
self[k] = v
self._dump()
Which you can use like this:
db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write
Which is woefully inefficient, but can get you off the ground quickly.
Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.
Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:
1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.
2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.
It has 1 advantage: Absurdly easy to do.
My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:
Create a YAML file, "employment.yml":
new jersey:
mercer county:
pumbers: 3
programmers: 81
middlesex county:
salesmen: 62
programmers: 81
new york:
queens county:
plumbers: 9
salesmen: 36
Step 3: Read it in Python
import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()
and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.
If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.
pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).
if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.
Have you considered using dbm?
import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')
#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))
#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()
# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])
This tends to work even without a db.close()