converting string to integer from file - python

I am asked to read lines from one file and print every other line from that file to another file. This is what i have so far.
my_file = open("thisFile.txt", "r")
out_file = open("thatFile.txt", "w")
for line in my_file:
line = int(line)
if line%2 !=0:
print(line,file=out_file)
my_file.close()
out_file.close()

from itertools import islice
with open('input') as fin, open('output', 'w') as fout:
fout.writelines(islice(fin, None, None, 2)

This is a relatively simple task - your main issue is the use of int() - you are trying to convert the entire line to a number, which is then failing. What you want to do is keep track the number of the line you are on. The easiest way to do this is with the enumerate() builtin.
with open("thisFile.txt") as in_file, open("thatFile.txt") as out_file:
for line_no, line in enumerate(in_file, start=1):
if line_no % 2:
out_file.write(line)
Note the use of the with statement here too - this automatically closes files, and will do so even if exceptions occur, it's good practice to use it. It's also significantly more readable.

The minimal change to your program to make it do what you want, bringing in no new concepts, is this:
my_file = open("thisFile.txt", "r")
out_file = open("thatFile.txt", "w")
i = 1
for line in my_file:
if i % 2 != 0:
print(line, file=out_file)
i += 1
my_file.close()
out_file.close()
Your mistake was simply that you thought int(line) would give you the line number. What that actually does is attempt to interpret the text of each line as an integer, which is not what you want.
Now, a great deal of Python's virtue lies in its enormous library of helpful functions. The above works, but is not taking proper advantage of what the language has to offer. The first thing I would change is to use enumerate rather than an explicit counter, the second thing I would change is to use the write method of files rather than print, and the third thing I would change is to take advantage of the "nonzero is true" rule:
my_file = open("thisFile.txt", "r")
out_file = open("thatFile.txt", "w")
for i, line in enumerate(my_file, start=1):
if i % 2:
out_file.write(line)
my_file.close()
out_file.close()
I said that this works, but really it only works unless something goes wrong. If anything throws an exception (for instance, if an I/O error occurs) the file objects don't get closed.1 The with statement tells Python to make sure that the files get closed, whether or not an exception happens; this is conveniently also less typing.
with open("thisFile.txt", "r") as my_file,
open("thatFile.txt", "w") as out_file:
for i, line in enumerate(my_file, start=1):
if i % 2:
out_file.write(line)
You may notice that this is the same as Lattyware's answer. (Jon Clement's answer is maybe a little too clever; he appears to be playing golf.)
1 In this simple piece of code, if an exception happens CPython's garbage collector will notice that the file objects are no longer accessible and will close them for you, but this isn't something to rely on as your programs get more compicated.

Related

How to efficiently read and delete a specific line of a large file with a custom newline character using Python (3.9 preferred)?

Similar to this question, but slightly more complex
I have a large txt file, that looks something like this:
"
AAAAAAAAAAAAAA.BBBBBBBBBBBBBB.CCCCCCCCCCCCCC.DDDDDDDDDDDDDD.EEEEEEEEEEEEEE.FFFFFFFFFFFFFF.GGGGGGGGGGGGGG.HHHHHHHHHHHHHH.IIIIIIIIIIIIII.JJJJJJJJJJJJJJ.KKKKKKKKKKKKKK.
"
Each line break is a ".", the file ends in a linebreak, each line is exactly 14 characters long.
GollyJer's answer to the mentioned question is good, but I have a few extra requirements:
I'd like to be able to input a specific line number and have that
one line be returned
Then I'd like the line that is read to be deleted from the file.
I can't have the real txt be loaded into RAM as it's over 600GB
I don't know where to begin with altering the code to do this.
Is this even possible? How can I do this?
Thanks
I might explore the walrus operator to clean this up and I really have no idea if this is going to be "fast enough". The idea is to read upto the point you want. read/print the stuff to delete then read the rest:
line_to_delete = 2
with open("in.txt", "rt") as file_in:
with open("out.txt", "wt") as file_out:
file_out.write(file_in.read(15 * (line_to_delete -1)))
print(file_in.read(15))
file_out.write(file_in.read())
I think that might be memory intensive so you might produce a more streamy result by doing:
line_to_delete = 2
with open("in.txt", "rt") as file_in:
current_line = 1
with open("out.txt", "wt") as file_out:
while True:
line = file_in.read(15)
if not line:
break
if current_line == line_to_delete:
print(line)
else:
file_out.write(line)
current_line += 1
both print BBBBBBBBBBBBBB. and produce a file like:
AAAAAAAAAAAAAA.CCCCCCCCCCCCCC.DDDDDDDDDDDDDD.EEEEEEEEEEEEEE.FFFFFFFFFFFFFF.GGGGGGGGGGGGGG.HHHHHHHHHHHHHH.IIIIIIIIIIIIII.JJJJJJJJJJJJJJ.KKKKKKKKKKKKKK.

Is there pythonic oneliner to iterate over lines of a file?

90% of the time when I read file, it ends up like this:
with open('file.txt') as f:
for line in f:
my_function(line)
This seems to be a very common scenario, so I thought of a shorter way, but is this safe? I mean will the file be closed correctly or do you see any other problems with this approach? :
for line in open('file.txt'):
my_function(line)
Edit: Thanks Eric, this seems to be best solution. Hopefully I don't turn this into discussion with this, but what do you think of this approach for the case when we want to use line in several operations (not just as argument for my_function):
def line_generator(filename):
with open(filename) as f:
for line in f:
yield line
and then using:
for line in line_generator('groceries.txt'):
print line
grocery_list += [line]
Does this function have disadvantages over iterate_over_file?
If you need this often, you could always define :
def iterate_over_file(filename, func):
with open(filename) as f:
for line in f:
func(line)
def my_function(line):
print line,
Your pythonic one-liner is now :
iterate_over_file('file.txt', my_function)
using a context manager is the best way, and that pretty much bars the way to your one-liner solution. If you naively want to create a one-liner you get:
with open('file.txt') as f: for line in f: my_function(line) # wrong code!!
which is invalid syntax.
So if you badly want a one-liner you could do
with open('file.txt') as f: [my_function(line) for line in f]
but that's bad practice since you're creating a list comprehension only for the side effect (you don't care about the return of my_function).
Another approach would be
with open('file.txt') as f: collections.deque((my_function(line) for line in f), maxlen=0)
so no list comprehension is created, and you force consumption of the iterator using a itertools recipe (0-size deque: no memory allocated either)
Conclusion:
to reach the "pythonic/one-liner" goal, we sacrificed readability.
Sometimes the best approach doesn't hold in one line, period.
Building upon the approach by Eric, you could also make it a bit more generic by just writing a function that uses with to open the file and then just returns the file. This, however:
def with_open(filename):
with open(filename) as f:
return f # won't work!
does not work, as the file f will already be closed by with when returned by the function. Instead, you can make it a generator function, and yield the individual lines:
def with_open(filename):
with open(filename) as f:
for line in f:
yield line
or shorter, with newer versions of Python:
def with_open(filename):
with open(filename) as f:
yield from f
And use it like this:
for line in with_open("test.txt"):
print line
or this:
nums = [int(n) for n in with_open("test.txt")]

Most efficient way to "nibble" the first line of text from a text document then resave it in python

I have a text document that I would like to repeatedly remove the first line of text from every 30 seconds or so.
I have already written (or more accurately copied) the code for the python resettable timer object that allows a function to be called every 30 seconds in a non blocking way if not asked to reset or cancel.
Resettable timer in python repeats until cancelled
(If someone could check the way I implemented the repeat in that is ok, because my python sometimes crashes while running that, would be appreciated :))
I now want to write my function to load a text file and perhaps copy all but the first line and then rewrite it to the same text file. I can do this, this way I think... but is it the most efficient ?
def removeLine():
with open(path, 'rU') as file:
lines = deque(file)
try:
print lines.popleft()
except IndexError:
print "Nothing to pop?"
with open(path, 'w') as file:
file.writelines(lines)
This works, but is it the best way to do it ?
I'd use the fileinput module with inplace=True:
import fileinput
def removeLine():
inputfile = fileinput.input(path, inplace=True, mode='rU')
next(inputfile, None) # skip a line *if present*
for line in inputfile:
print line, # write out again, but without an extra newline
inputfile.close()
inplace=True causes sys.stdout to be redirected to the open file, so we can simply 'print' the lines.
The next() call is used to skip the first line; giving it a default None suppresses the StopIteration exception for an empty file.
This makes rewriting a large file more efficient as you only need to keep the fileinput readlines buffer in memory.
I don't think a deque is needed at all, even for your solution; just use next() there too, then use list() to catch the remaining lines:
def removeLine():
with open(path, 'rU') as file:
next(file, None) # skip a line *if present*
lines = list(file)
with open(path, 'w') as file:
file.writelines(lines)
but this requires you to read all of the file in memory; don't do that with large files.

Pythonic way to read file line by line?

What's the Pythonic way to go about reading files line by line of the two methods below?
with open('file', 'r') as f:
for line in f:
print line
or
with open('file', 'r') as f:
for line in f.readlines():
print line
Or is there something I'm missing?
File handles are their own iterators (specifically, they implement the iterator protocol) so
with open('file', 'r') as f:
for line in f:
# code
Is the preferred usage. f.readlines() returns a list of lines, which means absorbing the entire file into memory -> generally ill advised, especially for large files.
It should be pointed out that I agree with the sentiment that context managers are worthwhile, and have included one in my code example.
Of the two you presented, the first is recommended practice. As pointed out in the comments, any solution (like that below) which doesn't use a context manager means that the file is left open, which is a bad idea.
Original answer which leaves dangling file handles so shouldn't be followed
However, if you don't need f for any purpose other than reading the lines, you can just do:
for line in open('file', 'r'):
print line
theres' no need for .readlines() method call.
PLUS: About with statement
The execution behavior of with statement is as commented below,
with open("xxx.txt",'r') as f:
// now, f is an opened file in context
for line in f:
// code with line
pass // when control exits *with*, f is closed
print f // if you print, you'll get <closed file 'xxx.txt'>

Prepend a line to an existing file in Python

I need to add a single line to the first line of a text file and it looks like the only options available to me are more lines of code than I would expect from python. Something like this:
f = open('filename','r')
temp = f.read()
f.close()
f = open('filename', 'w')
f.write("#testfirstline")
f.write(temp)
f.close()
Is there no easier way? Additionally, I see this two-handle example more often than opening a single handle for reading and writing ('r+') - why is that?
Python makes a lot of things easy and contains libraries and wrappers for a lot of common operations, but the goal is not to hide fundamental truths.
The fundamental truth you are encountering here is that you generally can't prepend data to an existing flat structure without rewriting the entire structure. This is true regardless of language.
There are ways to save a filehandle or make your code less readable, many of which are provided in other answers, but none change the fundamental operation: You must read in the existing file, then write out the data you want to prepend, followed by the existing data you read in.
By all means save yourself the filehandle, but don't go looking to pack this operation into as few lines of code as possible. In fact, never go looking for the fewest lines of code -- that's obfuscation, not programming.
I would stick with separate reads and writes, but we certainly can express each more concisely:
Python2:
with file('filename', 'r') as original: data = original.read()
with file('filename', 'w') as modified: modified.write("new first line\n" + data)
Python3:
with open('filename', 'r') as original: data = original.read()
with open('filename', 'w') as modified: modified.write("new first line\n" + data)
Note: file() function is not available in python3.
Other approach:
with open("infile") as f1:
with open("outfile", "w") as f2:
f2.write("#test firstline")
for line in f1:
f2.write(line)
or a one liner:
open("outfile", "w").write("#test firstline\n" + open("infile").read())
Thanks for the opportunity to think about this problem :)
Cheers
with open("file", "r+") as f: s = f.read(); f.seek(0); f.write("prepend\n" + s)
You can save one write call with this:
f.write('#testfirstline\n' + temp)
When using 'r+', you would have to rewind the file after reading and before writing.
Here's a 3 liner that I think is clear and flexible. It uses the list.insert function, so if you truly want to prepend to the file use l.insert(0, 'insert_str'). When I actually did this for a Python Module I am developing, I used l.insert(1, 'insert_str') because I wanted to skip the '# -- coding: utf-8 --' string at line 0. Here is the code.
f = open(file_path, 'r'); s = f.read(); f.close()
l = s.splitlines(); l.insert(0, 'insert_str'); s = '\n'.join(l)
f = open(file_path, 'w'); f.write(s); f.close()
This does the job without reading the whole file into memory, though it may not work on Windows
def prepend_line(path, line):
with open(path, 'r') as old:
os.unlink(path)
with open(path, 'w') as new:
new.write(str(line) + "\n")
shutil.copyfileobj(old, new)
One possibility is the following:
import os
open('tempfile', 'w').write('#testfirstline\n' + open('filename', 'r').read())
os.rename('tempfile', 'filename')
If you wish to prepend in the file after a specific text then you can use the function below.
def prepend_text(file, text, after=None):
''' Prepend file with given raw text '''
f_read = open(file, 'r')
buff = f_read.read()
f_read.close()
f_write = open(file, 'w')
inject_pos = 0
if after:
pattern = after
inject_pos = buff.find(pattern)+len(pattern)
f_write.write(buff[:inject_pos] + text + buff[inject_pos:])
f_write.close()
So first you open the file, read it and save it all into one string.
Then we try to find the character number in the string where the injection will happen. Then with a single write and some smart indexing of the string we can rewrite the whole file including the injected text now.
Am I not seeing something or couldn't we just use a buffer large-enough to read-in the input file in parts (instead of the whole content) and with this buffer traverse the file while it is open and keep exchanging file<->buffer contents?
This seems much more efficient (for big files especially) than reading the whole content in memory, modifying it in memory and writing it back to the same file or (even worse) a different one. Sorry that now I don't have time to implement a sample snippet, I'll get back to this later, but maybe you get the idea.
As I suggested in this answer, you can do it using the following:
def prepend_text(filename: Union[str, Path], text: str):
with fileinput.input(filename, inplace=True) as file:
for line in file:
if file.isfirstline():
print(text)
print(line, end="")
If you rewrite it like this:
with open('filename') as f:
read_data = f.read()
with open('filename', 'w') as f:
f.write("#testfirstline\n" + read_data)
It's rather short and simple.
For 'r+' the file needs to exist already.
this worked for me
def prepend(str, file):
with open(file, "r") as fr:
read = fr.read()
with open(file, "w") as fw:
fw.write(str + read)
fw.close()

Categories

Resources