Issues reading and writing txt files line by line

Issues reading and writing txt files line by line - python

The script is written using PyQt4.10.1 and Python2.7
I have been working on a simple tool to do allow a user to search for paths and then save them out to a config file for another program to read later. If there is already a config file then the script reads it and displays the existing paths for the user to edit or add to. I wrote a gui to make it as user friendly as possible. There are a couple issues I am having with it.
First, when I read in the config file I am using the following code:
try:
self.paths = open(configFile, "r")
self.data = self.paths.readlines()
self.paths.close()
except:
self.data = None
if self.data is not None:
for line in self.data:
print line
#self.listDelegate is the model for my QListView
self.listDelegate.insertRows(0, 1, line)
When I do that I get the following in my gui:
This (above) is how it looks when you first input the data (before the data is saved and then reopened)
This (above) is how the data looks after the config file is saved and then read back in (note the extra space below the path).
The config file is only read in when the script is first opened.
the following is how the config file looks when it is written out.
C:\Program Files
C:\MappedDrives
C:\NVIDIA
Now all of that wouldnt be a big deal but when I open the config file to edit it with this tool then the extra space in the gui is read as another line break. so the config file is then printed as:
C:\Program Files
C:\MappedDrives
C:\NVIDIA
Then the problem just gets bigger and bigger every time I edit the file.
This issue leads me to the second issue (which I think may be the culprit). When I write the lines from the gui to the config file I use the following code:
rowCount = self.listDelegate.rowCount()
if rowCount > 0:
myfile = open(configFile, 'w')
for i in range(rowCount):
myfile.write(str(self.listDelegate.index(i).data(role = QtCore.Qt.DisplayRole).toPyObject()))
myfile.write("\n")
myfile.close()
I am assuming that the issue with the extra line breaks is because I am adding the line breaks in manually. The problem is that I need each path to be on its own line for the config file to be usable later. I don't have a lot of experience writing out text files and everyone says that the easiest way to write them out line by line is to add in the line breaks by hand. If anyone has any better ideas I would love to hear them.
Sorry for the long winded explanation. If I am not being clear enough please tell me and I will try to explain myself better.
Thanks for the help!

The problem is that every time you read the file the line break remains at the end of the line. From the description of readline:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline.
If you try
self.paths = open(configFile, "r")
self.data = self.paths.readlines()
for line in self.data:
print repr(line)
which prints the representation of every line as python code you will get something like
'C:\\Program Files\n'
'C:\\MappedDrives\n'
'C:\\NVIDIA\n'
As you later insert further newlines the easiest fix is probably to remove the trailing newline:
for line in self.data:
strippedLine = line.rstrip('\n')

Related

Python script not finding value in log file when the value is in the file

The code below is meant to find any xls or csv file used in a process. The .log file contains full paths with extensions and definitely contains multiple values with "xls" or "csv". However, Python can't find anything...Any idea? The weird thing is when I copy the content of the log file and paste it to another notepad file and save it as log, it works then...
infile=r"C:\Users\me\Desktop\test.log"
important=[]
keep_words=["xls","csv"]
with open(infile,'r') as f:
for line in f:
for word in keep_words:
if word in line:
important.append(line)
print(important)

I was able to figure it out...encoding issue...
with io.open(infile,encoding='utf16') as f:

You must change the line
for line in f:
to
for line in f.readlines():
You made the python search in the bytes opened file, not in his content, even in his lines (in a list, just like the readlines method);
I hope I was able to help (sorry about my bad English).

Reading a csv file in python and creating database

I'm working on a function where I need to accept a CSV file name as a string, open and read it, create a database, and then return the database. The attempt I have so far seems to have the correct logic but it is giving me the error "No such file or directory 'filename.csv'. The files I'm reading are called file0.csv, file1.csv, etc. I'll include an example of one of them with my code below. Does anyone have any advice on how to fix this? Thanks
Edit: I realize that what I included below is an example of the database. Apparantly the first line of the file is the header row and the code I have now is reading the header row when it shouldn't be. Here is the updated code below
Code:
def read_file(filename):
thefile = open(filename)
data = []
for line in thefile:
data.append(line)
thefile.close()
return data
Example database:
{'Leonardo da Vinci': [('Mona Lisa', 1503,
76.8, 53.0, 'oil paint', 'France'), ('The
Last Supper', 1495, 460.0, 880.0, 'tempera',
'Italy')]}

Let's look at just the first two lines of your code:
def read_file(filename):
thefile = open('filename.csv')
I surmise that, since you want to be able to process more than one file with this code you want to be able to call read_file substituting various filenames in place of filename. Correct?
OK, then one flaw in the code is that filename in the first line is a variable but 'filename.csv' is a literal. This means that no matter what you put for filename in the first line it will NOT change the literal. To do that the second line would have to be, for instance,
thefile = open ('%s.csv' % filename, 'r')
This would put what's in the filename variable in place of the %s and do what you seem to want.
What most respondents are yammering about: Your script (ie, the Python code) might be in one disc folder or directory but the files you want to process might be in a different folder or directory. When you run a script without telling it where to look for files it will assume that you mean in the folder where it's running. At your stage of the game, the easiest thing to do is to put the Python script and the files it needs all in the same folder, and then run then in that same folder.

Maybe you want something like this? It opens your file (provided you find its correct location), accumulates its lines and returns the set of lines in your file.
def read_file(filename):
thefile = open('file0.csv', 'r')
lines = []
for line in thefile:
lines.append(line)
thefile.close()
return lines

Python - Opening a text file for edition before modifiying it in place with fileinput.input()

I have a python script used to edit a text file. Firstly, the first line of the text file is removed. After that, a line is added to the end of the text file.
I noticed a weird phenomenon, but I cannot explain the reason of this behaviour:
This script works as expected (removes the first line and adds a line at the end of the file):
import fileinput
# remove first line of text file
i = 0
for line in fileinput.input('test.txt', inplace=True):
i += 1
if i != 1:
print line.strip()
# add a line at the end of the file
f = open('test.txt', 'a+') # <= line that is moved
f.write('test5')
f.close()
But in the following script, as the text file is opened before removing, the removal occurs but the content isn't added (with the write() method):
import fileinput
# file opened before removing
f = open('test.txt', 'a+') # <= line that is moved
# remove first line of text file
i = 0
for line in fileinput.input('test.txt', inplace=True):
i += 1
if i != 1:
print line.strip()
# add a line at the end of the file
f.write('test5')
f.close()
Note that in the second example, open() is placed a the beginning, whereas in the first it is called after removing the last line of the text file.
What's the explanation of the behaviour?

When using fileinput with the inplace parameter, the modified content is saved in a backup file. The backup file is renamed to the original file when the output file is closed. In your example, you do not close the fileinput file explicitly, relying on the self-triggered closing, which is not documented and might be unreliable.
The behaviour you describe in the first example is best explained if we assume that opening the same file again triggers fileinput.close(). In your second example, the renaming only happens after f.close() is executed, thus overwriting the other changes (adding "test5").
So apparently you should explicitly call fileinput.close() in order to have full control over when your changes are written to disk. (It is generally recommended to release external resources explicitly as soon as they are not needed anymore.)
EDIT:
After more profound testing, this is what I think is happening in your second example:
You open a stream with mode a+ to the text file and bind it to the variable f.
You use fileinput to alter the same file. Under the hood, a new file is created, which is afterwards renamed to what the file was called originally. Note: this doesn't actually change the original file – rather, the original file is made inaccessible, as its former name now points to a new file.
However, the stream f still points to the original file (which has no file name anymore). You can still write to this file and close it properly, but you cannot see it anymore (since it has no filename anymore).
Please note that I'm not an expert in this kind of low-level operations; the details might be wrong and the terminology certainly is. Also, the behaviour might be different across OS and Python implementations. However, it might still help you understand why things go different from what you expected.
In conclusion I'd say that you shouldn't be doing what you do in your second example. Why don't you just read the file into memory, alter it, and then write it back to disk? Actual in-place (on-disk) altering of files is no fun in Python, as it's too high-level a language.

Deleting the first line of a text file in python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Editing specific line in text file in python
I am writing a software that allows users to write data into a text file. However, I am not sure how to delete the first line of the text file and rewrite the line. I want the user to be able to update the text file's first line by clicking on a button and inputing in something but that requires deleting and writing a new line as the first line which I am not sure how to implement. Any help would be appreciated.
Edit:
So I sought out the first line of the file and tried to write another line but that doesn't delete the previous line.
file.seek(0)
file.write("This is the new first line \n")

You did not describe how you opened the file to begin with. If you used file = open(somename, "a") that file will not be truncated but new data is written at the end (even after a seek on most if not all modern systems). You would have to open the file with "r+")
But your example assumes that the line you write is exactly the same length as what the user typed. There is no line organisation in the files, just bytes, some of which indicate line ending.
Wat you need to do is use a temporary file or a temporary buffer in memory for all the lines and then write the lines out with the first replaced.
If things fit in memory (which I assume since few users are going to type so much it does not fit), you should be able to do:
lines = open(somename, 'r').readlines()
lines[0] = "This is the new first line \n"
file = open(somename, 'w')
for line in lines:
file.write(line)
file.close()

You could use readlines to get an array of lines and then use del on the first index of the array. This might help. http://www.daniweb.com/software-development/python/threads/68765/how-to-remove-a-number-of-lines-from-a-text-file-

Python Overwriting files after parsing

I'm new to Python, and I need to do a parsing exercise. I got a file, and I need to parse it (just the headers), but after the process, i need to keep the file the same format, the same extension, and at the same place in disk, but only with the differences of new headers..
I tried this code...
for line in open ('/home/name/db/str/dir/numbers/str.phy'):
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
print linepars
..and it does the job, but I don't know how to "overwrite" the file with the new parsing.

The easiest way, but not the most efficient (by far, and especially for long files) would be to rewrite the complete file.
You could do this by opening a second file handle and rewriting each line, except in the case of the header, you'd write the parsed header. For example,
fr = open('/home/name/db/str/dir/numbers/str.phy')
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
fr.close()
EDIT: Note that this does not use readlines(), so its more memory efficient. It also does not store every output line, but only one at a time, writing it to file immediately.
Just as a cool trick, you could use the with statement on the input file to avoid having to close it (Python 2.5+):
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
with open('/home/name/db/str/dir/numbers/str.phy') as fr:
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
P.S. Welcome :-)

As others are saying here, you want to open a file and use that file object's .write() method.
The best approach would be to open an additional file for writing:
import os
current_cfg = open(...)
parsed_cfg = open(..., 'w')
for line in current_cfg:
new_line = parse(line)
print new_line
parsed.cfg.write(new_line + '\n')
current_cfg.close()
parsed_cfg.close()
os.rename(....) # Rename old file to backup name
os.rename(....) # Rename new file into place
Additionally I'd suggest looking at the tempfile module and use one of its methods for either naming your new file or opening/creating it. Personally I'd favor putting the new file in the same directory as the existing file to ensure that os.rename will work atomically (the configuration file named will be guaranteed to either point at the old file or the new file; in no case would it point at a partially written/copied file).

The following code DOES the job.
I mean it DOES overwrite the file ON ONESELF; that's what the OP asked for. That's possible because the transformations are only removing characters, so the file's pointer fo that writes is always BEHIND the file's pointer fi that reads.
import re
regx = re.compile('\AENS([A-Z]+)0+([0-9]{6})')
with open('bomo.phy','rb+') as fi, open('bomo.phy','rb+') as fo:
fo.writelines(regx.sub('\\1\\2',line) for line in fi)
I think that the writing isn't performed by the operating system one line at a time but through a buffer. So several lines are read before a pool of transformed lines are written. That's what I think.

newlines = []
for line in open ('/home/name/db/str/dir/numbers/str.phy').readlines():
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
newlines.append( linepars )
open ('/home/name/db/str/dir/numbers/str.phy', 'w').write('\n'.join(newlines))

(sidenote: Of course if you are working with large files, you should be aware that the level of optimization required may depend on your situation. Python by nature is very non-lazily-evaluated. The following solution is not a good choice if you are parsing large files, such as database dumps or logs, but a few tweaks such as nesting the with clauses and using lazy generators or a line-by-line algorithm can allow O(1)-memory behavior.)
targetFile = '/home/name/db/str/dir/numbers/str.phy'
def replaceIfHeader(line):
if line.startswith('ENS'):
return re.sub('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
else:
return line
with open(targetFile, 'r') as f:
newText = '\n'.join(replaceIfHeader(line) for line in f)
try:
# make backup of targetFile
with open(targetFile, 'w') as f:
f.write(newText)
except:
# error encountered, do something to inform user where backup of targetFile is
edit: thanks to Jeff for suggestion

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.