Change string list to list - python

So, I saved a list to a file as a string. In particular, I did:
f = open('myfile.txt','w')
f.write(str(mylist))
f.close()
But, later when I open this file again, take the (string-ified) list, and want to change it back to a list, what happens is something along these lines:
>>> list('[1,2,3]')
['[', '1', ',', '2', ',', '3', ']']
Could I make it so that I got the list [1,2,3] from the file?

There are two easiest major options here. First, using ast.literal_eval:
>>> import ast
>>> ast.literal_eval('[1,2,3]')
[1, 2, 3]
Unlike eval, this is safer since it will only evaluate python literals, such as lists, dictionaries, NoneTypes, strings, etc. This would throw an error if we use a code inside.
Second, make use of the json module, using json.loads:
>>> import json
>>> json.loads('[1,2,3]')
[1, 2, 3]
A great advantage of using json is that it's cross-platform, and you can also write to file easily.
with open('data.txt', 'w') as f:
json.dump([1, 2, 3], f)

In [285]: import ast
In [286]: ast.literal_eval('[1,2,3]')
Out[286]: [1, 2, 3]
Use ast.literal_eval instead of eval whenever possible:
ast.literal_eval:
Safely evaluate[s] an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, numbers, tuples, lists,
dicts, booleans, and None.
Edit: Also, consider using json. json.loads operates on a different string format, but is generally faster than ast.literal_eval. (So if you use json.load, be sure to save your data using json.dump.) Moreover, the JSON format is language-independent.

Python developers traditionally use pickle to serialize their data and write it to a file.
You could do so like so:
import pickle
mylist = [1,2,3]
f = open('myfile', 'wb')
pickle.dump(mylist, f)
And then reopen like so:
import pickle
f = open('myfile', 'rb')
mylist = pickle.load(f) # [1,2,3]

I would write python objects to file using the built-in json encoding or, if you don't like json, with pickle and cpickle. Both allow for easy deserialization and serialization of data. I'm on my phone but when I get home I'll upload sample code.
EDIT:
Ok, get ready for a ton of Python code, and some opinions...
JSON
Python has builtin support for JSON, or JavaScript Object Notation, a lightweight data interchange format. JSON supports Python's basic data types, such as dictionaries (which JSON calls objects: basically just key-value pairs, and lists: comma-separated values encapsulated by [ and ]. For more information on JSON, see this resource. Now to the code:
import json #Don't forget to import
my_list = [1,2,"blue",["sub","list",5]]
with open('/path/to/file/', 'w') as f:
string_to_write = json.dumps(my_list) #Dump the string
f.write(string_to_write)
#The character string [1,2,"blue",["sub","list",5]] is written to the file
Note that the with statement will close the file automatically when the block finishes executing.
To load the string back, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = json.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
I like JSON. Use JSON unless you really, really have a good reason for not.
CPICKLE
Another method for serializing Python data to a file is called pickling, in which we write more than we 'need' to a file so that we have some meta-information about how the characters in the file relate to Python objects. There is a builtin pickle class, but we'll use cpickle, because it is implemented in C and is much, much faster than pickle (about 100X but I don't have a citation for that number). The dumping code then becomes
import cpickle #Don't forget to import
with open('/path/to/file/', 'w') as f:
string_to_write = cpickle.dumps(my_list) #Dump the string
f.write(string_to_write)
#A very weird character string is written to the file, but it does contain the contents of our list
To load, use
with open('/path/to/file/', 'r') as f:
string_from_file = f.read()
mylist = cpickle.loads(string_from_file) #Load the string
#my_list is now the Python object [1,2,"blue",["sub","list",5]]
Comparison
Note the similarities between the code we wrote using JSON and the code we wrote using cpickle. In fact, the only major difference between the two methods is what text (which characters) gets actually written to the file. I believe JSON is faster and more space-efficient than cpickle - but cpickle is a valid alternative. Also, JSON format is much more universal than cpickle's weird syntax.
A note on eval
Please don't use eval() haphazardly. It seems like you're relatively new to Python, and eval can be a risky function to jump right into. It allows for the unchecked evaluation of any Python code, and as such can be a) risky, if you ever are relying on the user to input text, and b) can lead to sloppy, non-Pythonic code.
That last point is just my two cents.
tl:dr; Use JSON to dump and load Python objects to file

Write to file without brackets: f.write(str(mylist)[1:-1])
After reading the line, split it to get a list: data = line.split(',')
To convert to integers: data = map(int, data)

Related

Load a JSON with raw_unicode_escape encoded strings

I have a JSON file where strings are encoded in raw_unicode_escape (the file itself is UTF-8). How do I parse it so that strings will be UTF-8 in memory?
For individual properties, I could use the following code, but the JSON is very big and manually converting every string after parsing isn't an option.
# Contents of file 'file.json' ('\u00c3\u00a8' is 'è')
# { "name": "\u00c3\u00a8" }
with open('file.json', 'r') as input:
j = json.load(input)
j['name'] = j['name'].encode('raw_unicode_escape').decode('utf-8')
Since the JSON can be quite huge, the approach has to be "incremental" and I cannot read the whole file ahead of time, save it in a string and then do some processing.
Finally, I should note that the JSON is actually stored in a zip file, so instead of open() it's ZipFile.open().
Since codecs.open('file.json', 'r', 'raw_unicode_escape') works somehow, I took a look at its source code and came up with a solution.
>>> from codecs import getreader
>>>
>>> with open('file.json', 'r') as input:
... reader = getreader('raw_unicode_escape')(input)
... j = json.loads(reader.read().encode('raw_unicode_escape'))
... print(j['name'])
...
è
Of course, that will work even if input is another type of file-like object, like a file inside a zip archive in my case.
Eventually, I've turned down the hypothesis of an incremental encoder (it doesn't make sense with JSONs, see), but for those interested I suggest taking a look at this answer as well as codecs.iterencode().

Writing an object to python file

I have a following code:
matrix_file = open("abc.txt", "rU")
matrix = matrix_file.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.write(vals)
ea.close()
However I am getting the following error:
TypeError: expected a character buffer object
How do I buffer the output and what data type is the variable vals?
vals is a list. If you want to write a list of strings to a file, as opposed to an individual string, use writelines:
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.writelines(vals)
ea.close()
Note that this will not insert newlines for you (although in your specific case your strings already end in newlines, as pointed out in the comments). If you need to add newlines you could do the following as an example:
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.writelines([line+'\n' for line in vals])
ea.close()
The write function will only handle characters or bytes. To write arbitrary objects, use python's pickle library. Write with pickle.dump(), read them back with pickle.load().
But if what you're really after is writing something in the same format as your input, you'll have to write out the matrix values and newlines yourself.
for line in vals:
ea.write(line)
ea.close()
You've now written a file that looks like abc.txt, except that the first row and first character from each line has been removed. (You dropped those when constructing vals.)
Somehow I doubt this is what you intended, since you chose to name it abc_format.txt, but anyway this is how you write out a list of lines of text.
You cannot "write" objects to files. Rather, use the pickle module:
matrix_file = open("abc.txt", "rU")
matrix = matrix_file.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
#pickling begins!
import pickle
f = open("abc_format.txt")
pickle.dump(vals, f) #call with (object, file)
f.close()
Then read it like this:
import pickle
f = open("abc_format.txt")
vals = pickle.load(f) #exactly the same list
f.close()
You can do this with any kind of object, your own or built-in. You can only write strings and bytes to files, python's open() function just opens it like opening notepad would.
To answer your first question, vals is a list, because anything in [operation(i) for i in iterated_over] is a list comprehension, and list comprehensions make lists. To see what the type of any object is, just use the type() function; e.g. type([1,4,3])
Examples: https://repl.it/qKI/3
Documentation here:
https://docs.python.org/2/library/pickle.html and https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
First of all instead of opening and closing the file separately you can use with statement that does the job automatically.and about the Error,as it says the write method only accepts character buffer object so you need to convert your list to a string.
For example you can use join function that join the items within an iterable object with a specific delimiter and return a concatenated string.
with open("abc.txt", "rU") as f,open("abc_format.txt",'w') as out:
matrix = f.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
out.write('\n'.join(vals))
Also as a more pythonic way as the file objects are iterators you can do it in following code and get the first line with calling its next method and pass the rest to join function :
with open("abc.txt", "rU") as f,open("abc_format.txt",'w') as out:
matrix = next(f)
out.write('\n'.join(f))

The difference between `str` and `pickle` serialization methods for dictionary values

If I want to save a Dictionary structure to a file and read this Dictionary from the file directly later, I have two methods but I do not know the differences between the two methods. Could anyone explain it?
Here is a simple example. Suppose this is my dictionary:
D = {'zzz':123,
'lzh':321,
'cyl':333}
The first method to save it to the file:
with open('tDF.txt','w') as f: # save
f.write(str(D) + '\n')
with open('tDf.txt','r') as f:
Data = f.read() # read. Data is string
Data = eval(Data) # convert to Dictionary structure format
The second method (using pickle):
import pickle
with open('tDF.txt','w') as f: # save
pickle.dump(D,f)
with open('tDF.txt','r') as f:
D = pickle.load(f) # D is Dictionary structure format
I think the first method is much simple. What is the differences?
Thanks!
Writing str value representation
If you writhe str value of your data, you rely on the fact, it is properly shaped.
In some cases (e.g. float numbers, but also more complex objects) you would loose some precision or information.
Using repr instead of str might improve the situation a bit, as repr is supposed to provide the text in a form, which is likely to be working in case of reading it back (but without any guarantee)
Writing pickled data
Pickle is taking care about every bit, so you will have serialized precise information.
This is quite significant difference.
Using other serialization methods
Personally, I prefer serializing into json or sometime yaml, as this form of data is well readable, portable and can be even edited.
Serialize to JSON
For json it works this way:
import json
data = {"a", "aha", "b": "bebe", age: 123, num: 3.1415}
with open("data.json", "w") as f:
json.dump(data, f)
with open("data.json", "r") as f:
readdata = json.load(data, f)
print readdata
Serialize to YAML
With YAML:
Firt be sure, you have some YAML lib installed, e.g.:
$ pip install pyyaml
Personally, I have it installed all the time, as I use it very often.
Then, then script changes only a bit:
import yaml
data = {"a", "aha", "b": "bebe", age: 123, num: 3.1415}
with open("data.yaml", "w") as f:
yaml.dump(data, f)
with open("data.yaml", "r") as f:
readdata = yaml.load(data, f)
print readdata
Conclusions
For rather simple data types, the methods described above works easily.
In case you start using instances of classes you have defined, it would require proper definition of
loaders and serializers for given formats. Describing this is out of scope of this question, but it
is definitely possible for all cases, where some solution exists (as there are types of values, which
are not possible to serialize reliably, like file pointers, database connections etc.)

Writing Json in for loop in Python

I am downloading Json files from an API, I use the following code to write the JSON. Each item the loop gives me a JSON file. I need to save it and extract entities from the appended JSON file using a loop.
for item in style_ls:
dat = get_json(api, item)
specs_dict[item] = dat
with open("specs_append.txt", "a") as myfile:
json.dump(dat, myfile)
myfile.close()
print item
with open ("specs_data.txt", "w") as my file:
json.dump(spec_dict, myfile)
myfile.close()
I know that I cannot get a valid JSON format from the specs_append.txt, but I can get one from the specs_data.txt. I am doing the first one just because my program needs atleast 3-4 days to complete and there are high chances that my system may shutdown. So is there anyway I can do this efficiently ?
If not is there anyway I can extract it from specs_append.txt <{JSON}{JSON}> format (which is not a valid JSON format)?
If not should I write specs_dict to a txt file every time in the loop, so that even if program gets terminated i can start if from that point in loop and still get a valid json format?
I suggest several possible solutions.
One solution is to write custom code to slurp in the input file. I would suggest putting a special line before each JSON object in the file, such as: ###
Then you could write code like this:
import json
def json_get_objects(f):
temp = ''
line = next(f) # pull first line
assert line == SPECIAL_LINE
for line in f:
if line != SPECIAL_LINE:
temp += line
else:
# found special marker, temp now contains a complete JSON object
j = json.loads(temp)
yield j
temp = ''
# after loop done, yield up last JSON object
if temp:
j = json.loads(temp)
yield j
with open("specs_data.txt", "r") as f:
for j in json_get_objects(f):
pass # do something with JSON object j
Two notes on this. First, I am simply appending to a string over and over; this used to be a very slow way to do this in Python, so if you are using a very old version of Python, don't do it this way unless your JSON objects are very small. Second, I wrote code to split the input and yield up JSON objects one at a time, but you could also use a guaranteed-unique string, slurp in all the data with a single call to f.read() and then split on your guaranteed-unique string using the str.split() method function.
Another solution would be to write the whole file as a valid JSON list of valid JSON objects. Write the file like this:
{"mylist":[
# first JSON object, followed by a comma
# second JSON object, followed by a comma
# third JSON object
]}
This would require your file appending code to open the file with writing permission, and seek to the last ] in the file before writing a comma plus newline, then the new JSON object on the end, and then finally writing ]} to close out the file. If you do it this way, you can use json.loads() to slurp the whole thing in and have a list of JSON objects.
Finally, I suggest that maybe you should just use a database. Use SQLite or something and just throw the JSON strings in to a table. If you choose this, I suggest using an ORM to make your life simple, rather than writing SQL commands by hand.
Personally, I favor the first suggestion: write in a special line like ###, then have custom code to split the input on those marks and then get the JSON objects.
EDIT: Okay, the first suggestion was sort of assuming that the JSON was formatted for human readability, with a bunch of short lines:
{
"foo": 0,
"bar": 1,
"baz": 2
}
But it's all run together as one big long line:
{"foo":0,"bar":1,"baz":2}
Here are three ways to fix this.
0) write a newline before the ### and after it, like so:
###
{"foo":0,"bar":1,"baz":2}
###
{"foo":0,"bar":1,"baz":2}
Then each input line will alternately be ### or a complete JSON object.
1) As long as SPECIAL_LINE is completely unique (never appears inside a string in the JSON) you can do this:
with open("specs_data.txt", "r") as f:
temp = f.read() # read entire file contents
lst = temp.split(SPECIAL_LINE)
json_objects = [json.loads(x) for x in lst]
for j in json_objects:
pass # do something with JSON object j
The .split() method function can split up the temp string into JSON objects for you.
2) If you are certain that each JSON object will never have a newline character inside it, you could simply write JSON objects to the file, one after another, putting a newline after each; then assume that each line is a JSON object:
import json
def json_get_objects(f):
for line in f:
if line.strip():
yield json.loads(line)
with open("specs_data.txt", "r") as f:
for j in json_get_objects(f):
pass # do something with JSON object j
I like the simplicity of option (2), but I like the reliability of option (0). If a newline ever got written in as part of a JSON object, option (0) would still work, but option (2) would error.
Again, you can also simply use an actual database (SQLite) with an ORM and let the database worry about the details.
Good luck.
Append json data to a dict on every loop.
In the end dump this dict as a json and write it to a file.
For getting you an idea for appending data to dict:
>>> d1 = {'suku':12}
>>> t1 = {'suku1':212}
>>> d1.update(t1)
>>> d1
{'suku1': 212, 'suku': 12}

how to compare values in an existing dictionary and update the dictionary back to a file?

I am making an utility of sorts with dictionary. What I am trying to achieve is this:
for each XML file that I parse, the existing dictionary is loaded from a file (output.dict) and compared/updated for the current key and stored back along with existing values. I tried with has_key() and attributerror, it does not work.
Since I trying one file at a time, it creates multiple dictionaries and am unable to compare. This is where I am stuck.
def createUpdateDictionary(servicename, xmlfile):
dictionary = {}
if path.isfile == 'output.dict':
dictionary.update (eval(open('output.dict'),'r'))
for event, element in etree.iterparse(xmlfile):
dictionary.setdefault(servicename, []).append(element.tag)
f = open('output.dict', 'a')
write_dict = str(dictionary2)
f.write(write_dict)
f.close()
(here the servicename is nothing but a split '.' of xmlfile which forms the key and values are nothing by the element's tag name)
def createUpdateDictionary(servicename, xmlfile):
dictionary = {}
if path.isfile == 'output.dict':
dictionary.update (eval(open('output.dict'),'r'))
There is a typo, as the 'r' argument belongs to open(), not eval(). Furthermore, you cannot evaluate a file object as returned by open(), you have to read() the contents first.
f = open('output.dict', 'a')
write_dict = str(dictionary2)
f.write(write_dict)
f.close()
Here, you are appending the string representation to the file. The string representation is not guaranteed to represent the dictionary completely. It is meant to be readable by humans to allow inspection, not to persist the data.
Moreover, since you are using 'a' to append the data, you are storing multiple copies of the updated dictionary in the file. Your file might look like:
{}{"foo": []}{"foo": [], "bar":[]}
This is clearly not what you want; you won't even by able to eval() it later (syntax error!).
Since eval() will execute arbitrary Python code, it is considered evil and you really should not use it for object serialization. Either use pickle, which is the standard way of serialization in Python, or use json, which is a human-readable standard format supported by other languages as well.
import json
def createUpdateDictionary(servicename, xmlfile):
with open('output.dict', 'r') as fp:
dictionary = json.load(fp)
# ... process XML, update dictionary ...
with open('output.dict', 'w') as fp:
json.dump(dictionary, fp)

Categories

Resources