The difference between `str` and `pickle` serialization methods for dictionary values - python

If I want to save a Dictionary structure to a file and read this Dictionary from the file directly later, I have two methods but I do not know the differences between the two methods. Could anyone explain it?
Here is a simple example. Suppose this is my dictionary:
D = {'zzz':123,
'lzh':321,
'cyl':333}
The first method to save it to the file:
with open('tDF.txt','w') as f: # save
f.write(str(D) + '\n')
with open('tDf.txt','r') as f:
Data = f.read() # read. Data is string
Data = eval(Data) # convert to Dictionary structure format
The second method (using pickle):
import pickle
with open('tDF.txt','w') as f: # save
pickle.dump(D,f)
with open('tDF.txt','r') as f:
D = pickle.load(f) # D is Dictionary structure format
I think the first method is much simple. What is the differences?
Thanks!

Writing str value representation
If you writhe str value of your data, you rely on the fact, it is properly shaped.
In some cases (e.g. float numbers, but also more complex objects) you would loose some precision or information.
Using repr instead of str might improve the situation a bit, as repr is supposed to provide the text in a form, which is likely to be working in case of reading it back (but without any guarantee)
Writing pickled data
Pickle is taking care about every bit, so you will have serialized precise information.
This is quite significant difference.
Using other serialization methods
Personally, I prefer serializing into json or sometime yaml, as this form of data is well readable, portable and can be even edited.
Serialize to JSON
For json it works this way:
import json
data = {"a", "aha", "b": "bebe", age: 123, num: 3.1415}
with open("data.json", "w") as f:
json.dump(data, f)
with open("data.json", "r") as f:
readdata = json.load(data, f)
print readdata
Serialize to YAML
With YAML:
Firt be sure, you have some YAML lib installed, e.g.:
$ pip install pyyaml
Personally, I have it installed all the time, as I use it very often.
Then, then script changes only a bit:
import yaml
data = {"a", "aha", "b": "bebe", age: 123, num: 3.1415}
with open("data.yaml", "w") as f:
yaml.dump(data, f)
with open("data.yaml", "r") as f:
readdata = yaml.load(data, f)
print readdata
Conclusions
For rather simple data types, the methods described above works easily.
In case you start using instances of classes you have defined, it would require proper definition of
loaders and serializers for given formats. Describing this is out of scope of this question, but it
is definitely possible for all cases, where some solution exists (as there are types of values, which
are not possible to serialize reliably, like file pointers, database connections etc.)

Related

Load a JSON with raw_unicode_escape encoded strings

I have a JSON file where strings are encoded in raw_unicode_escape (the file itself is UTF-8). How do I parse it so that strings will be UTF-8 in memory?
For individual properties, I could use the following code, but the JSON is very big and manually converting every string after parsing isn't an option.
# Contents of file 'file.json' ('\u00c3\u00a8' is 'è')
# { "name": "\u00c3\u00a8" }
with open('file.json', 'r') as input:
j = json.load(input)
j['name'] = j['name'].encode('raw_unicode_escape').decode('utf-8')
Since the JSON can be quite huge, the approach has to be "incremental" and I cannot read the whole file ahead of time, save it in a string and then do some processing.
Finally, I should note that the JSON is actually stored in a zip file, so instead of open() it's ZipFile.open().
Since codecs.open('file.json', 'r', 'raw_unicode_escape') works somehow, I took a look at its source code and came up with a solution.
>>> from codecs import getreader
>>>
>>> with open('file.json', 'r') as input:
... reader = getreader('raw_unicode_escape')(input)
... j = json.loads(reader.read().encode('raw_unicode_escape'))
... print(j['name'])
...
è
Of course, that will work even if input is another type of file-like object, like a file inside a zip archive in my case.
Eventually, I've turned down the hypothesis of an incremental encoder (it doesn't make sense with JSONs, see), but for those interested I suggest taking a look at this answer as well as codecs.iterencode().

Dictionary to string not being read back as a dictionary

Since the Json And Pickle methods aren't working out, i've decided to save my dictionaries as strings, and that works, but they arent being read.
I.E
Dictionary
a={'name': 'joe'}
Save:
file = open("save.txt", "w")
file.write(str(a))
file.close()
And that works.
But my load method doesn't read it.
Load:
f = open("save.txt", "r")
a = f
f.close()
So, it just doesn't become f.
I really don't want to use json or pickle, is there any way I could get this method working?
First, you're not actually reading anything from the file (the file is not its contents). Second, when you fix that, you're going to get a string and need to transform that into a dictonary.
Fortunately both are straightforward to address....
from ast import literal_eval
with open("save.txt") as infile:
data = literal_eval(infile.read())

Change string list to list

So, I saved a list to a file as a string. In particular, I did:
f = open('myfile.txt','w')
f.write(str(mylist))
f.close()
But, later when I open this file again, take the (string-ified) list, and want to change it back to a list, what happens is something along these lines:
>>> list('[1,2,3]')
['[', '1', ',', '2', ',', '3', ']']
Could I make it so that I got the list [1,2,3] from the file?
There are two easiest major options here. First, using ast.literal_eval:
>>> import ast
>>> ast.literal_eval('[1,2,3]')
[1, 2, 3]
Unlike eval, this is safer since it will only evaluate python literals, such as lists, dictionaries, NoneTypes, strings, etc. This would throw an error if we use a code inside.
Second, make use of the json module, using json.loads:
>>> import json
>>> json.loads('[1,2,3]')
[1, 2, 3]
A great advantage of using json is that it's cross-platform, and you can also write to file easily.
with open('data.txt', 'w') as f:
json.dump([1, 2, 3], f)
In [285]: import ast
In [286]: ast.literal_eval('[1,2,3]')
Out[286]: [1, 2, 3]
Use ast.literal_eval instead of eval whenever possible:
ast.literal_eval:
Safely evaluate[s] an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, numbers, tuples, lists,
dicts, booleans, and None.
Edit: Also, consider using json. json.loads operates on a different string format, but is generally faster than ast.literal_eval. (So if you use json.load, be sure to save your data using json.dump.) Moreover, the JSON format is language-independent.
Python developers traditionally use pickle to serialize their data and write it to a file.
You could do so like so:
import pickle
mylist = [1,2,3]
f = open('myfile', 'wb')
pickle.dump(mylist, f)
And then reopen like so:
import pickle
f = open('myfile', 'rb')
mylist = pickle.load(f) # [1,2,3]
I would write python objects to file using the built-in json encoding or, if you don't like json, with pickle and cpickle. Both allow for easy deserialization and serialization of data. I'm on my phone but when I get home I'll upload sample code.
EDIT:
Ok, get ready for a ton of Python code, and some opinions...
JSON
Python has builtin support for JSON, or JavaScript Object Notation, a lightweight data interchange format. JSON supports Python's basic data types, such as dictionaries (which JSON calls objects: basically just key-value pairs, and lists: comma-separated values encapsulated by [ and ]. For more information on JSON, see this resource. Now to the code:
import json #Don't forget to import
my_list = [1,2,"blue",["sub","list",5]]
with open('/path/to/file/', 'w') as f:
string_to_write = json.dumps(my_list) #Dump the string
f.write(string_to_write)
#The character string [1,2,"blue",["sub","list",5]] is written to the file
Note that the with statement will close the file automatically when the block finishes executing.
To load the string back, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = json.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
I like JSON. Use JSON unless you really, really have a good reason for not.
CPICKLE
Another method for serializing Python data to a file is called pickling, in which we write more than we 'need' to a file so that we have some meta-information about how the characters in the file relate to Python objects. There is a builtin pickle class, but we'll use cpickle, because it is implemented in C and is much, much faster than pickle (about 100X but I don't have a citation for that number). The dumping code then becomes
import cpickle #Don't forget to import
with open('/path/to/file/', 'w') as f:
string_to_write = cpickle.dumps(my_list) #Dump the string
f.write(string_to_write)
#A very weird character string is written to the file, but it does contain the contents of our list
To load, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = cpickle.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
Comparison
Note the similarities between the code we wrote using JSON and the code we wrote using cpickle. In fact, the only major difference between the two methods is what text (which characters) gets actually written to the file. I believe JSON is faster and more space-efficient than cpickle - but cpickle is a valid alternative. Also, JSON format is much more universal than cpickle's weird syntax.
A note on eval
Please don't use eval() haphazardly. It seems like you're relatively new to Python, and eval can be a risky function to jump right into. It allows for the unchecked evaluation of any Python code, and as such can be a) risky, if you ever are relying on the user to input text, and b) can lead to sloppy, non-Pythonic code.
That last point is just my two cents.
tl:dr; Use JSON to dump and load Python objects to file
Write to file without brackets: f.write(str(mylist)[1:-1])
After reading the line, split it to get a list: data = line.split(',')
To convert to integers: data = map(int, data)

storing template replacement values in a separate file

Using string.Template I want to store the values to substitute into the template in separate files that I can loop through.
Looping is the easy part. I then want to run
result = s.safe_substitute(title=titleVar, content=contentVar)
on my template. I’m just a little stumped in what format to store these values in a text file and how to read that file with python.
What you are looking for is call serialization. In this case, you want to serialize a dict, such as
values = dict(title='titleVar', content='contentVar')
There are may ways to serialize, using XML, pickle, YAML, JSON formats for example. Here is how you could do it with JSON:
import string
import json
values = dict(title='titleVar', content='contentVar')
with open('/tmp/values', 'w') as f:
json.dump(values, f)
with open('/tmp/values', 'r') as f:
newvals = json.load(f)
s = string.Template('''\
$title
$content''')
result = s.safe_substitute(newvals)
print(result)

how to compare values in an existing dictionary and update the dictionary back to a file?

I am making an utility of sorts with dictionary. What I am trying to achieve is this:
for each XML file that I parse, the existing dictionary is loaded from a file (output.dict) and compared/updated for the current key and stored back along with existing values. I tried with has_key() and attributerror, it does not work.
Since I trying one file at a time, it creates multiple dictionaries and am unable to compare. This is where I am stuck.
def createUpdateDictionary(servicename, xmlfile):
dictionary = {}
if path.isfile == 'output.dict':
dictionary.update (eval(open('output.dict'),'r'))
for event, element in etree.iterparse(xmlfile):
dictionary.setdefault(servicename, []).append(element.tag)
f = open('output.dict', 'a')
write_dict = str(dictionary2)
f.write(write_dict)
f.close()
(here the servicename is nothing but a split '.' of xmlfile which forms the key and values are nothing by the element's tag name)
def createUpdateDictionary(servicename, xmlfile):
dictionary = {}
if path.isfile == 'output.dict':
dictionary.update (eval(open('output.dict'),'r'))
There is a typo, as the 'r' argument belongs to open(), not eval(). Furthermore, you cannot evaluate a file object as returned by open(), you have to read() the contents first.
f = open('output.dict', 'a')
write_dict = str(dictionary2)
f.write(write_dict)
f.close()
Here, you are appending the string representation to the file. The string representation is not guaranteed to represent the dictionary completely. It is meant to be readable by humans to allow inspection, not to persist the data.
Moreover, since you are using 'a' to append the data, you are storing multiple copies of the updated dictionary in the file. Your file might look like:
{}{"foo": []}{"foo": [], "bar":[]}
This is clearly not what you want; you won't even by able to eval() it later (syntax error!).
Since eval() will execute arbitrary Python code, it is considered evil and you really should not use it for object serialization. Either use pickle, which is the standard way of serialization in Python, or use json, which is a human-readable standard format supported by other languages as well.
import json
def createUpdateDictionary(servicename, xmlfile):
with open('output.dict', 'r') as fp:
dictionary = json.load(fp)
# ... process XML, update dictionary ...
with open('output.dict', 'w') as fp:
json.dump(dictionary, fp)

Categories

Resources