My goal is to have a text file that allows me to append data to the end and retrieve and remove the first data entry from the file. Essentially I want to use a text file as a queue (first in, first out). I thought of two ways to accomplish this, but I am unsure which way is more Pythonic and efficient. The first way is to use the json library.
import json
def add_to_queue(item):
q = retrieve_queue()
q.append(item)
write_to_queue(q)
def pop_from_queue():
q = retrieve_queue()
write_to_queue(q[1:])
return q[0]
def write_to_queue(data):
with open('queue.txt', 'w') as file_pointer:
json.dump(data, file_pointer)
def retrieve_queue():
try:
with open('queue.txt', 'r') as file_pointer:
return json.load(file_pointer)
except (IOError, ValueError):
return []
Seems pretty clean, but it requires serialization/deserialization of all of the json data every time I write/read, even though I only need the first item in the list.
The second option is to call readlines() and writelines() to retrieve and to store the data in the text file.
def add_to_queue(item):
with open('queue.txt', 'a') as file_pointer:
file_pointer.write(item + '\n')
def pop_from_queue():
with open('queue.txt', 'r+') as file_pointer:
lines = file_pointer.readlines()
file_pointer.seek(0)
file_pointer.truncate()
file_pointer.writelines(lines[1:])
return lines[0].strip()
Both of them work fine, so my question is: what is the recommended way to implement a "text file queue"? Is using json "better" (more Pythonic/faster/more memory efficient) than reading and writing to the file myself? Both of these solutions seem rather complicated based on the simplicity of the problem; am I missing a more obvious way to do this?
Related
I have a generator that yields rows from a CSV file one at a time, something like:
import csv
def as_csv(filename):
with open(filename) as fin:
yield from csv.reader(fin)
However, I need to also capture the raw string returned from the file, as this needs to be persisted at the same time.
As far as I can tell, the csv built-in can be used on an ad-hoc basis, something like this:
import csv
def as_csv_and_raw(filename):
with open(filename) as fin:
for row in fin:
raw = row.strip()
values = csv.reader([raw])[0]
yield (values, raw)
... but this has the overhead of creating a new reader and a new iterable for each row of the file, so on files with millions of rows I'm concerned about the performance impact.
It feels like I could create a coroutine that could interact with the primary function, yielding the parsed fields in a way where I can control the input directly without losing it, something like this:
import csv
def as_csv_and_raw(filename):
with open(filename) as fin:
reader = raw_to_csv(some_coroutine())
reader.next()
for row in fin:
raw = row.strip()
fields = reader.send(raw)
yield fields, raw
def raw_to_csv(data):
yield from csv.reader(data)
def some_coroutine():
# what goes here?
raise NotImplementedError
I haven't really wrapped my head around coroutines and using yield as an expression, so I'm not sure what goes in some_coroutine, but the intent is that each time I send a value in, that value is run through the csv.reader object and I get the set of fields back.
Can someone provide the implementation of some_coroutine, or alternately show me a better mechanism for getting the desired data?
You can use itertools.tee to create two independent iterators from the iterable file object, create a csv.reader from one of them, and then zip the other iterator with it for output:
from itertools import tee
def as_csv_and_raw(filename):
with open(filename) as fin:
row, raw = tee(fin)
yield from zip(csv.reader(row), raw)
Since the Json And Pickle methods aren't working out, i've decided to save my dictionaries as strings, and that works, but they arent being read.
I.E
Dictionary
a={'name': 'joe'}
Save:
file = open("save.txt", "w")
file.write(str(a))
file.close()
And that works.
But my load method doesn't read it.
Load:
f = open("save.txt", "r")
a = f
f.close()
So, it just doesn't become f.
I really don't want to use json or pickle, is there any way I could get this method working?
First, you're not actually reading anything from the file (the file is not its contents). Second, when you fix that, you're going to get a string and need to transform that into a dictonary.
Fortunately both are straightforward to address....
from ast import literal_eval
with open("save.txt") as infile:
data = literal_eval(infile.read())
I am downloading Json files from an API, I use the following code to write the JSON. Each item the loop gives me a JSON file. I need to save it and extract entities from the appended JSON file using a loop.
for item in style_ls:
dat = get_json(api, item)
specs_dict[item] = dat
with open("specs_append.txt", "a") as myfile:
json.dump(dat, myfile)
myfile.close()
print item
with open ("specs_data.txt", "w") as my file:
json.dump(spec_dict, myfile)
myfile.close()
I know that I cannot get a valid JSON format from the specs_append.txt, but I can get one from the specs_data.txt. I am doing the first one just because my program needs atleast 3-4 days to complete and there are high chances that my system may shutdown. So is there anyway I can do this efficiently ?
If not is there anyway I can extract it from specs_append.txt <{JSON}{JSON}> format (which is not a valid JSON format)?
If not should I write specs_dict to a txt file every time in the loop, so that even if program gets terminated i can start if from that point in loop and still get a valid json format?
I suggest several possible solutions.
One solution is to write custom code to slurp in the input file. I would suggest putting a special line before each JSON object in the file, such as: ###
Then you could write code like this:
import json
def json_get_objects(f):
temp = ''
line = next(f) # pull first line
assert line == SPECIAL_LINE
for line in f:
if line != SPECIAL_LINE:
temp += line
else:
# found special marker, temp now contains a complete JSON object
j = json.loads(temp)
yield j
temp = ''
# after loop done, yield up last JSON object
if temp:
j = json.loads(temp)
yield j
with open("specs_data.txt", "r") as f:
for j in json_get_objects(f):
pass # do something with JSON object j
Two notes on this. First, I am simply appending to a string over and over; this used to be a very slow way to do this in Python, so if you are using a very old version of Python, don't do it this way unless your JSON objects are very small. Second, I wrote code to split the input and yield up JSON objects one at a time, but you could also use a guaranteed-unique string, slurp in all the data with a single call to f.read() and then split on your guaranteed-unique string using the str.split() method function.
Another solution would be to write the whole file as a valid JSON list of valid JSON objects. Write the file like this:
{"mylist":[
# first JSON object, followed by a comma
# second JSON object, followed by a comma
# third JSON object
]}
This would require your file appending code to open the file with writing permission, and seek to the last ] in the file before writing a comma plus newline, then the new JSON object on the end, and then finally writing ]} to close out the file. If you do it this way, you can use json.loads() to slurp the whole thing in and have a list of JSON objects.
Finally, I suggest that maybe you should just use a database. Use SQLite or something and just throw the JSON strings in to a table. If you choose this, I suggest using an ORM to make your life simple, rather than writing SQL commands by hand.
Personally, I favor the first suggestion: write in a special line like ###, then have custom code to split the input on those marks and then get the JSON objects.
EDIT: Okay, the first suggestion was sort of assuming that the JSON was formatted for human readability, with a bunch of short lines:
{
"foo": 0,
"bar": 1,
"baz": 2
}
But it's all run together as one big long line:
{"foo":0,"bar":1,"baz":2}
Here are three ways to fix this.
0) write a newline before the ### and after it, like so:
###
{"foo":0,"bar":1,"baz":2}
###
{"foo":0,"bar":1,"baz":2}
Then each input line will alternately be ### or a complete JSON object.
1) As long as SPECIAL_LINE is completely unique (never appears inside a string in the JSON) you can do this:
with open("specs_data.txt", "r") as f:
temp = f.read() # read entire file contents
lst = temp.split(SPECIAL_LINE)
json_objects = [json.loads(x) for x in lst]
for j in json_objects:
pass # do something with JSON object j
The .split() method function can split up the temp string into JSON objects for you.
2) If you are certain that each JSON object will never have a newline character inside it, you could simply write JSON objects to the file, one after another, putting a newline after each; then assume that each line is a JSON object:
import json
def json_get_objects(f):
for line in f:
if line.strip():
yield json.loads(line)
with open("specs_data.txt", "r") as f:
for j in json_get_objects(f):
pass # do something with JSON object j
I like the simplicity of option (2), but I like the reliability of option (0). If a newline ever got written in as part of a JSON object, option (0) would still work, but option (2) would error.
Again, you can also simply use an actual database (SQLite) with an ORM and let the database worry about the details.
Good luck.
Append json data to a dict on every loop.
In the end dump this dict as a json and write it to a file.
For getting you an idea for appending data to dict:
>>> d1 = {'suku':12}
>>> t1 = {'suku1':212}
>>> d1.update(t1)
>>> d1
{'suku1': 212, 'suku': 12}
Is there a short way to get number of objects in pickled file - shorter than writing a function that opens the file, keeps calling pickle.load method and updating num_of_objs by 1 until it catches EOFError and returns the value?
No, there isn't. The pickle format does not store that information.
If you need that type of metadata, you need to add it to the file yourself when writing:
pickle.dump(len(objects), fileobj)
for ob in objects:
pickle.dump(ob, fileobj)
Now the first record tells you how many more are to follow.
There is no direct way of finding the length of a pickle, but if you are afraid of running an endless loop you could try the following,
company_id_processed=[]
with open("responses_pickle.pickle", "rb") as f:
while True:
try:
current_id=pickle.load(f)['name']
company_id_processed.append(current_id)
except EOFError:
print 'Pickle ends'
break
The best way is to store and load data object with descriptive file name. For example, if you want to save two dataframes, you can name the pickle file as "datasets_name_2DFs.pickle". When you want to load them, you can simply get the number in the file name and use for loop equals to that number to get the pickle object. This is easier for me. For the code part, you can do what ever suits you.
Or you can use other methods like this:
with open(path, "wb") as f:
pickle.dump(len(data), f)
for value in data:
pickle.dump(value, f)
data_list = []
with open(path, "rb") as f:
for _ in range(pickle.load(f)):
data_list.append(pickle.load(f))
print data_list
I am python beginner struggling to create and save a list containing tuples from csv file in python.
The code I got for now is:
def load_file(filename):
fp = open(filename, 'Ur')
data_list = []
for line in fp:
data_list.append(line.strip().split(','))
fp.close()
return data_list
and then I would like to save the file
def save_file(filename, data_list):
fp = open(filename, 'w')
for line in data_list:
fp.write(','.join(line) + '\n')
fp.close()
Unfortunately, my code returns a list of lists, not a list of tuples... Is there a way to create one list containing multiple tuples without using csv module?
split returns a list, if you want a tuple, convert it to a tuple:
data_list.append(tuple(line.strip().split(',')))
Please use the csv module.
First question: why is a list of lists bad? In the sense of "duck-typing", this should be fine, so maybe you think about it again.
If you really need a list of tuples - only small changes are needed.
Change the line
data_list.append(line.strip().split(','))
to
data_list.append(tuple(line.strip().split(',')))
That's it.
If you ever want to get rid of custom code (less code is better code), you could stick to the csv-module. I'd strongly recommend using as many library methods as possible.
To show-off some advanced Python features: your load_file-method could also look like:
def load_file(filename):
with open(filename, 'Ur') as fp:
data_list = [tuple(line.strip().split(",") for line in fp]
I use a list comprehension here, it's very concise and easy to understand.
Additionally, I use the with-statement, which will close your file pointer, even if an exception occurred within your code. Please always use with when working with external resources, like files.
Just wrap "tuple()" around the line.strip().split(',') and you'll get a list of tuples. You can see it in action in this runnable gist.