How do I start reading a file from the top in python? - python

I am trying to add dependencies from a list to a requirements.txt file depending on the platform the software is going to run. So I wrote the following code:
if platform.system() == 'Windows':
# Add windows only requirements
platform_specific_req = [req1, req2]
elif platform.system() == 'Linux':
# Add linux only requirements
platform_specific_req = [req3]
with open('requirements.txt', 'a+') as file_handler:
for requirement in platform_specific_req:
already_in_file = False
# Make sure the requirement is not already in the file
for line in file_handler.readlines():
line = line.rstrip() # remove '\n' at end of line
if line == requirement:
already_in_file = True
break
if not already_in_file:
file_handler.write('{0}\n'.format(requirement))
file_handler.close()
But what is happening with this code is that when the second requirement is going to be searched in the list of requirements already in the file, the for line in file_handler.readlines(): seems to be pointing to the last element of the list in the file so the new requirement is actually only compared to the last element in the list, and if it is not the same one it gets added. Obviously this is causing several elements to be duplicated in the list, since only the first requirement is being compared against all the elements in the list. How can I tell python to start comparing from the top of the file again?
Solution:
I received many great responses, I learned a lot, thanks Guys. I ended up combining two solutions; the one from Antti Haapala and the one from Matthew Franglen into one. I am showing the final code here for reference:
# Append the extra requirements to the requirements.txt file
with open('requirements.txt', 'r') as file_in:
reqs_in_file = set([line.rstrip() for line in file_in])
missing_reqs = set(platform_specific_reqs).difference(reqs_in_file)
with open('requirements.txt', 'a') as file_out:
for req in missing_reqs:
file_out.write('{0}\n'.format(req))

The answer to your explicit question: file_handler.seek(0) will seek it back to the beginning of the file.
Some neat improvements:
You can use the file handler itself as an iterator instead of calling the readlines() method.
If your file is too large to read entirely in to memory, then iterating over the lines in the file directly is fine - but you should change how you're doing it. As is, you're iterating over the entire file for each requirement, but IO is costly. You should probably iterate over the lines, and for each line check if it's one of the requirements. Like so:
with open('requirements.txt', 'a+') as file_handler:
for line in file_handler:
line = line.rstrip()
if line in platform_specific_req:
platform_specific_req.remove(line)
for req in platform_specific_req:
file_handler.write('{0}\n'.format(req))

You open the file handle before iterating over the existing requirement list. You then read the entire file handle for each requirement.
The file handle will finish after the first requirement because you have not reopened it. Reopening the file for each iteration would be very wasteful - read the file into a list and then use that inside the loops. Or do a set comparison!
file_content = set([line.rstrip() for line in file_handler])
only_in_platform = set(platform_specific_req).difference(file_content)

Do not try to read the file again for each requirement. While appending does work for this very use case, for modifications in general it is easier to just:
Read the content from the file into a list (preferably skipping empty lines)
Modify the list
Open the file again for writing and save the modified data.
So for example
with open('requirements.txt', 'r') as fin:
requirements = [ i for i in (line.strip() for line in fin) if i ]
for req in platform_specific_req:
if req not in requirements:
requirements.append(req)
with open('requirements.txt', 'w') as fout:
for req in requirements:
fout.write('{0}\n'.format(req))
# or print(req, file=fout)

I know I'm answering a little late, but I would suggest doing it this way, opening it once, reading and appending in the same go. Note this should work on every platform regardless of your system:
import os
def ensure_in_file(lines, file_path):
'''
idempotent function to append lines to a file if they're not already there
'''
with open(file_path, 'r+U') as f: # r+U allows append, Universal Newline mode
# set of all lines in the file, less newlines, and trailing spaces too.
file_lines = set(l.rstrip() for l in f)
# write lines not in the file, add the os line separator as you go
f.writelines(l + os.linesep for l in set(lines).difference(file_lines))
You can test this
a_file = '/temp/temp/foo/bar' # insert your own file path here.
# with open(a_file, 'w') as f: # ensure a blank file
# pass
ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f:
print f.read()
ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f:
print f.read()
Each print statement should demonstrate that the file has each line once.

Related

How to update line of file in python

How to update existing line of file in Python?
Example: I want to update session.xml to sesion-config.xml without writing new line.
Input A.txt:
fix-config = session.xml
Expected output A.txt:
fix-config = session-config.xml
You can't mutate lines in a text file - you have to write an entirely new line, which, if not at the end of a file, requires rewriting all the rest.
The simplest way to do this is to store the file in a list, process it, and create a new file:
with open('A.txt') as f:
l = list(f)
with open('A.txt', 'w') as output:
for line in l:
if line.startswith('fix-config'):
output.write('fix-config = session-config.xml\n')
else:
output.write(line)
The solution #TigerhawkT3 suggested would work great for small/medium files.
For extremely large files loading the entire file into memory might not be possible, and then you would want to process each line separately.
Something along these lines should work:
import shutil
with open('A.txt') as input_file:
with open('temp.txt', 'w') as temp_file:
for l in input_file:
if l.startswith('fix-config'):
temp_file.write('fix-config = session-config.xml\n')
else:
temp_file.write(l)
shutil.move('temp.txt', 'A.txt')

Open a file for input and output in Python

I have the following code which is intended to remove specific lines of a file. When I run it, it prints the two filenames that live in the directory, then deletes all information in them. What am I doing wrong? I'm using Python 3.2 under Windows.
import os
files = [file for file in os.listdir() if file.split(".")[-1] == "txt"]
for file in files:
print(file)
input = open(file,"r")
output = open(file,"w")
for line in input:
print(line)
# if line is good, write it to output
input.close()
output.close()
open(file, 'w') wipes the file. To prevent that, open it in r+ mode (read+write/don't wipe), then read it all at once, filter the lines, and write them back out again. Something like
with open(file, "r+") as f:
lines = f.readlines() # read entire file into memory
f.seek(0) # go back to the beginning of the file
f.writelines(filter(good, lines)) # dump the filtered lines back
f.truncate() # wipe the remains of the old file
I've assumed that good is a function telling whether a line should be kept.
If your file fits in memory, the easiest solution is to open the file for reading, read its contents to memory, close the file, open it for writing and write the filtered output back:
with open(file_name) as f:
lines = list(f)
# filter lines
with open(file_name, "w") as f: # This removes the file contents
f.writelines(lines)
Since you are not intermangling read and write operations, the advanced file modes like "r+" are unnecessary here, and only compicate things.
If the file does not fit into memory, the usual approach is to write the output to a new, temporary file, and move it back to the original file name after processing is finished.
One way is to use the fileinput stdlib module. Then you don't have to worry about open/closing and file modes etc...
import fileinput
from contextlib import closing
import os
fnames = [fname for fname in os.listdir() if fname.split(".")[-1] == "txt"] # use splitext
with closing(fileinput.input(fnames, inplace=True)) as fin:
for line in fin:
# some condition
if 'z' not in line: # your condition here
print line, # suppress new line but adjust for py3 - print(line, eol='') ?
When using inplace=True - the fileinput redirects stdout to be to the file currently opened. A backup of the file with a default '.bak' extension is created which may come in useful if needed.
jon#minerva:~$ cat testtext.txt
one
two
three
four
five
six
seven
eight
nine
ten
After running the above with a condition of not line.startswith('t'):
jon#minerva:~$ cat testtext.txt
one
four
five
six
seven
eight
nine
You're deleting everything when you open the file to write to it. You can't have an open read and write to a file at the same time. Use open(file,"r+") instead, and then save all the lines to another variable before writing anything.
You should not open the same file for reading and writing at the same time.
"w" means create a empty for writing. If the file already exists, its data will be deleted.
So you can use a different file name for writing.

Python readline() on the Mac

New to python and trying to learn the ropes of file i/o.
Working with pulling lines from a large (2 million line) file in this format:
56fr4
4543d
4343d
hirh3
I've been reading that readline() is best because it doesn't pull the whole file into memory. But when I try to read the documentation on it, it seems to be Unix only? And I'm on a Mac.
Can I use readline on the Mac without loading the whole file into memory? What would the syntax be to simply readline number 3 in the file? The examples in the docs are a bit over my head.
Edit
Here is the function to return a code:
def getCode(i):
with open("test.txt") as file:
for index, line in enumerate(f):
if index == i:
code = # what does it equal?
break
return code
You don't need readline:
with open("data.txt") as file:
for line in file:
# do stuff with line
This will read the entire file line-by-line, but not all at once (so you don't need all the memory). If you want to abort reading the file, because you found the line you want, use break to terminate the loop. If you know the index of the line you want, use this:
with open("data.txt") as file:
for index, line in enumerate(file):
if index == 2: # looking for third line (0-based indexes)
# do stuff with this line
break # no need to go on
+1 # SpaceC0wb0y
You could also do:
f = open('filepath')
f.readline() # first line - let it pass
f.readline() # second line - let it pass
third_line = f.readline()
f.close()

Python Overwriting files after parsing

I'm new to Python, and I need to do a parsing exercise. I got a file, and I need to parse it (just the headers), but after the process, i need to keep the file the same format, the same extension, and at the same place in disk, but only with the differences of new headers..
I tried this code...
for line in open ('/home/name/db/str/dir/numbers/str.phy'):
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
print linepars
..and it does the job, but I don't know how to "overwrite" the file with the new parsing.
The easiest way, but not the most efficient (by far, and especially for long files) would be to rewrite the complete file.
You could do this by opening a second file handle and rewriting each line, except in the case of the header, you'd write the parsed header. For example,
fr = open('/home/name/db/str/dir/numbers/str.phy')
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
fr.close()
EDIT: Note that this does not use readlines(), so its more memory efficient. It also does not store every output line, but only one at a time, writing it to file immediately.
Just as a cool trick, you could use the with statement on the input file to avoid having to close it (Python 2.5+):
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
with open('/home/name/db/str/dir/numbers/str.phy') as fr:
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
P.S. Welcome :-)
As others are saying here, you want to open a file and use that file object's .write() method.
The best approach would be to open an additional file for writing:
import os
current_cfg = open(...)
parsed_cfg = open(..., 'w')
for line in current_cfg:
new_line = parse(line)
print new_line
parsed.cfg.write(new_line + '\n')
current_cfg.close()
parsed_cfg.close()
os.rename(....) # Rename old file to backup name
os.rename(....) # Rename new file into place
Additionally I'd suggest looking at the tempfile module and use one of its methods for either naming your new file or opening/creating it. Personally I'd favor putting the new file in the same directory as the existing file to ensure that os.rename will work atomically (the configuration file named will be guaranteed to either point at the old file or the new file; in no case would it point at a partially written/copied file).
The following code DOES the job.
I mean it DOES overwrite the file ON ONESELF; that's what the OP asked for. That's possible because the transformations are only removing characters, so the file's pointer fo that writes is always BEHIND the file's pointer fi that reads.
import re
regx = re.compile('\AENS([A-Z]+)0+([0-9]{6})')
with open('bomo.phy','rb+') as fi, open('bomo.phy','rb+') as fo:
fo.writelines(regx.sub('\\1\\2',line) for line in fi)
I think that the writing isn't performed by the operating system one line at a time but through a buffer. So several lines are read before a pool of transformed lines are written. That's what I think.
newlines = []
for line in open ('/home/name/db/str/dir/numbers/str.phy').readlines():
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
newlines.append( linepars )
open ('/home/name/db/str/dir/numbers/str.phy', 'w').write('\n'.join(newlines))
(sidenote: Of course if you are working with large files, you should be aware that the level of optimization required may depend on your situation. Python by nature is very non-lazily-evaluated. The following solution is not a good choice if you are parsing large files, such as database dumps or logs, but a few tweaks such as nesting the with clauses and using lazy generators or a line-by-line algorithm can allow O(1)-memory behavior.)
targetFile = '/home/name/db/str/dir/numbers/str.phy'
def replaceIfHeader(line):
if line.startswith('ENS'):
return re.sub('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
else:
return line
with open(targetFile, 'r') as f:
newText = '\n'.join(replaceIfHeader(line) for line in f)
try:
# make backup of targetFile
with open(targetFile, 'w') as f:
f.write(newText)
except:
# error encountered, do something to inform user where backup of targetFile is
edit: thanks to Jeff for suggestion

Replace a word in a file

I am new to Python programming...
I have a .txt file....... It looks like..
0,Salary,14000
0,Bonus,5000
0,gift,6000
I want to to replace the first '0' value to '1' in each line. How can I do this? Any one can help me.... With sample code..
Thanks in advance.
Nimmyliji
I know that you're asking about Python, but forgive me for suggesting that perhaps a different tool is better for the job. :) It's a one-liner via sed:
sed 's/^0,/1,/' yourtextfile.txt > output.txt
This applies the regex /^0,/ (which matches any 0, that occurs at the beginning of a line) to each line and replaces the matched text with 1, instead. The output is directed into the file output.txt specified.
inFile = open("old.txt", "r")
outFile = open("new.txt", "w")
for line in inFile:
outFile.write(",".join(["1"] + (line.split(","))[1:]))
inFile.close()
outFile.close()
If you would like something more general, take a look to Python csv module. It contains utilities for processing comma-separated values (abbreviated as csv) in files. But it can work with arbitrary delimiter, not only comma. So as you sample is obviously a csv file, you can use it as follows:
import csv
reader = csv.reader(open("old.txt"))
writer = csv.writer(open("new.txt", "w"))
writer.writerows(["1"] + line[1:] for line in reader)
To overwrite original file with new one:
import os
os.remove("old.txt")
os.rename("new.txt", "old.txt")
I think that writing to new file and then renaming it is more fault-tolerant and less likely corrupt your data than direct overwriting of source file. Imagine, that your program raised an exception while source file was already read to memory and reopened for writing. So you would lose original data and your new data wouldn't be saved because of program crash. In my case, I only lose new data while preserving original.
o=open("output.txt","w")
for line in open("file"):
s=line.split(",")
s[0]="1"
o.write(','.join(s))
o.close()
Or you can use fileinput with in place edit
import fileinput
for line in fileinput.FileInput("file",inplace=1):
s=line.split(",")
s[0]="1"
print ','.join(s)
f = open(filepath,'r')
data = f.readlines()
f.close()
edited = []
for line in data:
edited.append( '1'+line[1:] )
f = open(filepath,'w')
f.writelines(edited)
f.flush()
f.close()
Or in Python 2.5+:
with open(filepath,'r') as f:
data = f.readlines()
with open(outfilepath, 'w') as f:
for line in data:
f.write( '1' + line[1:] )
This should do it. I wouldn't recommend it for a truly big file though ;-)
What is going on (ex 1):
1: Open the file in read mode
2,3: Read all the lines into a list (each line is a separate index) and close the file.
4,5,6: Iterate over the list constructing a new list where each line has the first character replaced by a 1. The line[1:] slices the string from index 1 onward. We concatenate the 1 with the truncated list.
7,8,9: Reopen the file in write mode, write the list to the file (overwrite), flush the buffer, and close the file handle.
In Ex. 2:
I use the with statement that lets the file handle closing itself, but do essentially the same thing.

Categories

Resources