I know there's a similar question already asked, but doesn't answer what I need, as mine is a little different.
My code:
def tFileRead(fileName, JSON=False):
with open(fileName) as f:
if JSON:
return json.load(f)
for line in f:
yield line.rstrip('\n')
What I want to do:
if JSON is true, it means its reading from a json file and I want to return json.load(f), otherwise, I want to yield the lines of the file into a generator.
I've tried the alternative of converting the generator into json, but that got very messy, very fast, and doesn't work very well.
The first solution that came to my mind was to explicitly return a generator object which would provide the exact behavior you tried to achieve.
The problem is: if you explicitly returned a generator object like this return (line.rstrip('\n') for line in f) the file would be closed after returning and any further reading from the file would raise an exception.
You should write two functions here: one that reads a json file and one for the normal file. Then you can write a wrapper that desides on an argument which of these two functions to call.
Or just move the iteration part into another function like this:
def iterate_file(file_name):
with open(file_name) as fin:
for line in fin:
yield line.rstrip("\n")
def file_read(file_name, as_json=False):
if as_json:
with open(file_name) as fin:
return json.load(fin)
else:
return iterate_file(file_name)
You could yield from the dictionary loaded with JSON, thus iterating over the key-value-pairs in the dict, but this would not be your desired behaviour.
def tFileRead(fileName, JSON=False):
with open(fileName) as f:
if JSON:
yield from json.load(f).items() # works, but differently
for line in f:
yield line.rstrip('\n')
It would be nice if you could just return a generator, but this will not work, since using with, the file is closed as soon as the function returns, i.e. before the generator is consumed.
def tFileRead(fileName, JSON=False):
with open(fileName) as f:
if JSON:
return json.load(f)
else:
return (line.rstrip('\n') for line in f) # won't work
Alternatively, you could define another function just for yielding the lines from the file and use that in the generator:
def tFileRead(fileName, JSON=False):
if JSON:
with open(fileName) as f:
return json.load(f)
else:
def withopen(fileName):
with open(fileName) as f:
yield from f
return (line.rstrip('\n') for line in withopen(fileName))
But once you are there, you can really just use two separate functions for reading the file en-block as JSON or for iterating the lines...
Related
I am trying to transfer data received from one function (reading) to another (writing).
Existing data inside file.txt should be transfer into json format and be printed to the console, that data should be taken and be written on the file called pfile.txt, by the second function.
I just can't get them to work together. When running each function separately as commands in plain shell, they work; combined, not so much. What am I missing here?
def reading():
filename = 'file.txt'
with open(filename, 'r') as f:
print(json.loads(f.read()))
reading()
def writing():
with open('pfile.txt', 'w+') as pf:
pf.write(reading() in writing()) <-- this doesn't work
pf.write('hello SO') <-- this does work
writing()
When you refer to a function with a pair of parenthesis, Python will call that function with no arguments and resolve it's return value (if any). This is not bash; functions pass data to each other as variables in memory, not through stdin/stdout.
Your code as written appears to be riddled with infinite loops (functions calling themselves) and likely will crash with "recursion depth exceeded" errors. These can be fixed by not calling functions within themselves (or having cycles of functions that call each other).
There's nothing about your code as written that needs multiple functions. I'd go down to 1 function:
def read_and_write():
filename = 'file.txt'
with open(filename, 'r') as f:
content = json.loads(f.read())
print(content)
with open('pfile.txt', 'w+') as pf:
pf.write(content)
If you want two functions, try the following:
def read():
filename = 'file.txt'
with open(filename, 'r') as f:
content = json.loads(f.read())
return content
def write():
content = read()
with open('pfile.txt', 'w+') as pf:
pf.write(content)
90% of the time when I read file, it ends up like this:
with open('file.txt') as f:
for line in f:
my_function(line)
This seems to be a very common scenario, so I thought of a shorter way, but is this safe? I mean will the file be closed correctly or do you see any other problems with this approach? :
for line in open('file.txt'):
my_function(line)
Edit: Thanks Eric, this seems to be best solution. Hopefully I don't turn this into discussion with this, but what do you think of this approach for the case when we want to use line in several operations (not just as argument for my_function):
def line_generator(filename):
with open(filename) as f:
for line in f:
yield line
and then using:
for line in line_generator('groceries.txt'):
print line
grocery_list += [line]
Does this function have disadvantages over iterate_over_file?
If you need this often, you could always define :
def iterate_over_file(filename, func):
with open(filename) as f:
for line in f:
func(line)
def my_function(line):
print line,
Your pythonic one-liner is now :
iterate_over_file('file.txt', my_function)
using a context manager is the best way, and that pretty much bars the way to your one-liner solution. If you naively want to create a one-liner you get:
with open('file.txt') as f: for line in f: my_function(line) # wrong code!!
which is invalid syntax.
So if you badly want a one-liner you could do
with open('file.txt') as f: [my_function(line) for line in f]
but that's bad practice since you're creating a list comprehension only for the side effect (you don't care about the return of my_function).
Another approach would be
with open('file.txt') as f: collections.deque((my_function(line) for line in f), maxlen=0)
so no list comprehension is created, and you force consumption of the iterator using a itertools recipe (0-size deque: no memory allocated either)
Conclusion:
to reach the "pythonic/one-liner" goal, we sacrificed readability.
Sometimes the best approach doesn't hold in one line, period.
Building upon the approach by Eric, you could also make it a bit more generic by just writing a function that uses with to open the file and then just returns the file. This, however:
def with_open(filename):
with open(filename) as f:
return f # won't work!
does not work, as the file f will already be closed by with when returned by the function. Instead, you can make it a generator function, and yield the individual lines:
def with_open(filename):
with open(filename) as f:
for line in f:
yield line
or shorter, with newer versions of Python:
def with_open(filename):
with open(filename) as f:
yield from f
And use it like this:
for line in with_open("test.txt"):
print line
or this:
nums = [int(n) for n in with_open("test.txt")]
My goal is to have a text file that allows me to append data to the end and retrieve and remove the first data entry from the file. Essentially I want to use a text file as a queue (first in, first out). I thought of two ways to accomplish this, but I am unsure which way is more Pythonic and efficient. The first way is to use the json library.
import json
def add_to_queue(item):
q = retrieve_queue()
q.append(item)
write_to_queue(q)
def pop_from_queue():
q = retrieve_queue()
write_to_queue(q[1:])
return q[0]
def write_to_queue(data):
with open('queue.txt', 'w') as file_pointer:
json.dump(data, file_pointer)
def retrieve_queue():
try:
with open('queue.txt', 'r') as file_pointer:
return json.load(file_pointer)
except (IOError, ValueError):
return []
Seems pretty clean, but it requires serialization/deserialization of all of the json data every time I write/read, even though I only need the first item in the list.
The second option is to call readlines() and writelines() to retrieve and to store the data in the text file.
def add_to_queue(item):
with open('queue.txt', 'a') as file_pointer:
file_pointer.write(item + '\n')
def pop_from_queue():
with open('queue.txt', 'r+') as file_pointer:
lines = file_pointer.readlines()
file_pointer.seek(0)
file_pointer.truncate()
file_pointer.writelines(lines[1:])
return lines[0].strip()
Both of them work fine, so my question is: what is the recommended way to implement a "text file queue"? Is using json "better" (more Pythonic/faster/more memory efficient) than reading and writing to the file myself? Both of these solutions seem rather complicated based on the simplicity of the problem; am I missing a more obvious way to do this?
I have a list of around 100 files form which I wanted to read and match one word.
Here's the piece of code I wrote.
import re
y = 'C:\\prova.txt'
var1 = open(y, 'r')
for line in var1:
if re.match('(.*)version(.*)', line):
print line
var1.close()
every time I try to pass a tuple to y I get this error:
TypeError: coercing to Unicode: need string or buffer, tuple found.
(I think that open() does not accept any tuple but only strings)
So I could I get it to work with a list of files?
Thank you in advance!!!!
You are quite correct that open doesn't accept a tuple and needs a string. So you have to iterate over the file names one by one:
import re
for path in paths:
with open(path) as f:
for line in f:
if re.match('(.*)version(.*)', line):
print line
Here I use paths as the variable the hold the file names — it can be a tuple or a list or some other object that you can iterate over.
Use fileinput.input instead of open.
This module implements a helper class and functions to quickly write a loop over standard input or a list of files
[...] To specify an alternative list of filenames, pass it as the first argument to input(). A single file name is also allowed.
Example:
import fileinput
for line in fileinput.input(list_of_files):
# etc...
Just iterate over the tuple. And you don't need a regex here.
y = ('C:\\prova.txt', 'C:\\prova2.txt')
for filename in y:
with open(filename) as f:
for line in f:
if 'version' in line:
print line
Using the with statement this way also saves you from having to close the files you're working with. They will be closed automatically when the with block is exited.
Something like this:
import re
files = ['a.txt', 'b.txt']
for f in files:
with open(f, 'r') as var1:
for line in var1:
if re.match('(.*)version(.*)', line):
print line
def simple_search(filenames, query):
for filename in filenames:
with open(filename) as f:
for line_num, line in enumerate(f, 1):
if query in line:
print filename, line_num, line.strip()
My added value: (1) it's useless printing the line contents without showing which line in which file (2) doesn't double-space the output
I am trying to write a function in Python 3 that will write all lines that end with the string 'halloween' to a file. When I call this function, I can only get one line to write to the output file (file_2.txt). Can anyone point out where my problem is? Thanks in advance.
def parser(reader_o, infile_object, outfile_object):
for line in reader_o:
if line.endswith('halloween'):
return(line)
with open("file_1.txt", "r") as file_input:
reader = file_input.readlines()
with open("file_2.txt", "w") as file_output:
file_output.write(parser(reader))
def parser(reader_o):
for line in reader_o:
if line.rstrip().endswith('halloween'):
yield line
with open("file_1.txt", "r") as file_input:
with open("file_2.txt", "w") as file_output:
file_output.writelines(parser(file_input))
This is called a generator. It can also be written as an expression instead of a function:
with open("file_1.txt", "r") as file_input:
with open("file_2.txt", "w") as file_output:
file_output.writelines(line for line in file_input if line.rstrip().endswith('halloween'))
If you're on Python 2.7 / 3.2, you can do the two withs like this:
with open("file_1.txt", "r") as file_input, open("file_2.txt", "w") as file_output:
You don't need to do readlines() on the file, just telling the loop to iterate over the open file itself will do the exact same thing.
Your problem was that return always would exit the loop on the first match. yield stops the loop, passes out the value, then the generator can be started again from the same point.
line.endswith('halloween') might only work on the last of a file, since all other lines will have a newline appended. rstrip the line first. Also, use yield instead of return.
if line.rstrip().endswith('halloween'):
yield line
Note that this will also strip off spaces at the end of the line, which may or may not be what you want.
You'll also have to modify your consumer to
with open("file_2.txt", "w") as file_output:
for ln in parser(reader):
file_output.write(ln)
Perhaps your parser function should be a generator. At the moment it's only called once and returns the first line that has "halloween" in it.
Like the following:
def parser(reader_o):
for line in reader_o:
if line.endswith('halloween'):
yield line
with open("file_1.txt", "r") as file_input:
with open("file_2.txt", "w") as file_output:
file_output.writelines(parser(file_input))