Strange error with text file editing in Python - python

I'm using a text file to store the weight of a neural network that I'm making, but I'm having serious trouble editing the weights stored in this text field. Essentially, I am making a file with a very regular format: Word + \t + Weight + \n, I then use the follow code to run through this text file and grab the parts:
with open(Neuron_File, 'r+') as Original_Neurons:
for Neuron in Original_Neurons:
Word_Stem = re.sub(r'^([a-z-]*)([\t]?)([0-9.]*)(\n)$', r'\1', Neuron)
Weight = float(re.sub(r'^([a-z-]*)([\t]?)([0-9.]*)(\n)$', r'\3', Neuron))
Which is working, however I would then like to be able to change the value of Weight, and write it back to the same text file in the same place. I have managed to successfully create a new file that is modified in the way that I would like, however I am having a strange problem with writing it back to the original file. I am using the below code for it:
def Replace(New_File, Old_File):
for line in open(New_File):
open(Old_File, 'w').write(str(line))
But for some reason this function simply breaks at a certain point in the file. The first 80% transfers fine, but then it cuts the file off at a seemingly random point in the middle of a line. Any ideas? I know there are other questions that are on similar topics, but none of them seem applicable to my situation, and I can't find any mention of another error like the one I'm getting.
Problem is navigable, but my primary interest is in what the origin of this error was. I've never seen anything like it and it intrigued me, as I had no idea what was going on, hoping someone on here would have more of an idea.

with open('input.txt') as in_file:
with open('output.txt', 'w') as out_file:
for line in in_file.readlines():
word, weight = line.split()[:2]
out_file.write('%s\t%s' % (word, float(weight) * 2))
with-block automaticaly closes opened files

You need to close the file handle for the file you're writing.
def Replace(New_File, Old_File):
Old_File_Handle = open(Old_File, 'w')
for line in open(New_File):
Old_File_Handle.write(str(line))
Old_File_Handle.close()
Alternately, use the with statement.

Alternatively, you can just use something like shelve to handle this for you.

def Replace(New_File, Old_File):
for line in open(New_File):
# next line REWRITES outfile EVERY TIME!!!
open(Old_File, 'w').write(str(line))
Result: Old_File will contain ONLY LAST line
Correct implementation:
def Replace(New_File, Old_File):
# open files
inp = open(New_File)
out = open(Old_File, 'w')
for line in inp:
out.write(line)
# close files
out.close()
inp.close()

Related

Printing results from my code to .txt doesn´t work anymore

I am trying to print some of the results of my algorithm (score) to a .txt file to have that data for further analysis. Here, the algorithm shall create the file and then open it to write the number down. Then I thought about closing it again.
My problem here is, that I don´t even find the file. If I create one by my own, and only try to write the number, that doesn´t work as well.
This is for the analysis of Reinforcement Learning for a robot. The scores are symbolizing Q-values and are important for further analysis. Score is here a random number.
if __name__ == '__main__':
open('try.txt', 'w+').close()
for e in range(agent.load_episode + 1, EPISODES):
...
for t in range(agent.episode_step):
...
if done:
...
saveFile = open('try.txt','w')
saveFile.write(str(score))
saveFile.close()
From the first part I try to create a new file called try.txt (I only create the file once). Them after, I open the file, write something and close it again. When the next Q-value is calculated, the file is opened again.
Should the file contain only the last calculated value, all the values (possibly each in new line) from single run, or even values through separate runs? Nevertheless, this, a bit modified, snipped might be what you are looking for:
if __name__ == '__main__':
with open('try.txt', 'w') as saveFile: # change to 'a' if you want the results to be stored between runs
for e in range(agent.load_episode + 1, EPISODES):
...
for t in range(agent.episode_step):
...
if done:
...
# saveFile.truncate() uncommenting this means that the file only stores the latest value
saveFile.write(str(score) + '\n') # write each result to new line
saveFile.flush() # this line makes the results accessible from file as soon as they are calculated
In python with is the preferred method of opening files, as it takes care of closing it at the right moment. When opening file in 'w' mode the caret inside the file is placed at the beginning of file and if a file had any data in it, it gets erased.
The 'a' mode appends to file. You may want to take a look at this.
Now I believe that you wanted to open and close the file on and on, as to have the data accessible as soon as the iteration is finished. That is what saveFile.flush() is for. Please let me know if this helps you!
To better control where the file gets created take use of os module:
import os
directory = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(directory, 'try.txt')
# print(file_path)
with open(file_path, 'w') as saveFile:
Try changing saveFile = open('try.txt', 'w') to with open('try.txt', 'a+') as saveFile:

Write strings to another file

The Problem - Update:
I could get the script to print out but had a hard time trying to figure out a way to put the stdout into a file instead of on a screen. the below script worked on printing results to the screen. I posted the solution right after this code, scroll to the [ solution ] at the bottom.
First post:
I'm using Python 2.7.3. I am trying to extract the last words of a text file after the colon (:) and write them into another txt file. So far I am able to print the results on the screen and it works perfectly, but when I try to write the results to a new file it gives me str has no attribute write/writeline. Here it the code snippet:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
def ripple(x):
with open(x) as file:
for line in file:
for word in line.split():
if ':' in word:
try:
print word.split(':')[-1]
except (IndexError):
pass
ripple(x)
The code above works perfectly when printing to the screen. However I have spent hours reading Python's documentation and can't seem to find a way to have the results written to a file. I know how to open a file and write to it with writeline, readline, etc, but it doesn't seem to work with strings.
Any suggestions on how to achieve this?
PS: I didn't add the code that caused the write error, because I figured this would be easier to look at.
End of First Post
The Solution - Update:
Managed to get python to extract and save it into another file with the code below.
The Code:
inputFile = open ('c:/folder/Thefile.txt', 'r')
outputFile = open ('c:/folder/ExtractedFile.txt', 'w')
tempStore = outputFile
for line in inputFile:
for word in line.split():
if ':' in word:
splitting = word.split(':')[-1]
tempStore.writelines(splitting +'\n')
print splitting
inputFile.close()
outputFile.close()
Update:
checkout droogans code over mine, it was more efficient.
Try this:
with open('workfile', 'w') as f:
f.write(word.split(':')[-1] + '\n')
If you really want to use the print method, you can:
from __future__ import print_function
print("hi there", file=f)
according to Correct way to write line to file in Python. You should add the __future__ import if you are using python 2, if you are using python 3 it's already there.
I think your question is good, and when you're done, you should head over to code review and get your code looked at for other things I've noticed:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
First off, thanks for putting example file contents at the top of your question.
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
I don't think this part is neccessary. You can just create a better parameter for ripple than x. I think file_loc is a pretty standard one.
def ripple(x):
with open(x) as file:
With open, you are able to mark the operation happening to the file. I also like to name my file object according to its job. In other words, with open(file_loc, 'r') as r: reminds me that r.foo is going to be my file that is being read from.
for line in file:
for word in line.split():
if ':' in word:
First off, your for word in line.split() statement does nothing but put the "Hello:there:buddy" string into a list: ["Hello:there:buddy"]. A better idea would be to pass split an argument, which does more or less what you're trying to do here. For example, "Hello:there:buddy".split(":") would output ['Hello', 'there', 'buddy'], making your search for colons an accomplished task.
try:
print word.split(':')[-1]
except (IndexError):
pass
Another advantage is that you won't need to check for an IndexError, since you'll have, at least, an empty string, which when split, comes back as an empty string. In other words, it'll write nothing for that line.
ripple(x)
For ripple(x), you would instead call ripple('/home/user/sometext.txt').
So, try looking over this, and explore code review. There's a guy named Winston who does really awesome work with Python and self-described newbies. I always pick up new tricks from that guy.
Here is my take on it, re-written out:
import os #for renaming the output file
def ripple(file_loc='/typical/location/while/developing.txt'):
outfile = "output.".join(os.path.basename(file_loc).split('.'))
with open(outfile, 'w') as w:
lines = open(file_loc, 'r').readlines() #everything is one giant list
w.write('\n'.join([line.split(':')[-1] for line in lines]))
ripple()
Try breaking this down, line by line, and changing things around. It's pretty condensed, but once you pick up comprehensions and using lists, it'll be more natural to read code this way.
You are trying to call .write() on a string object.
You either got your arguments mixed up (you'll need to call fileobject.write(yourdata), not yourdata.write(fileobject)) or you accidentally re-used the same variable for both your open destination file object and storing a string.

Changing contents of a file - Python

So I have a program which runs. This is part of the code:
FileName = 'Numberdata.dat'
NumberFile = open(FileName, 'r')
for Line in NumberFile:
if Line == '4':
print('1')
else:
print('9')
NumberFile.close()
A pretty pointless thing to do, yes, but I'm just doing it to enhance my understanding. However, this code doesn't work. The file remains as it is and the 4's are not replaced by 1's and everything else isn't replaced by 9's, they merely stay the same. Where am I going wrong?
Numberdata.dat is "444666444666444888111000444"
It is now:
FileName = 'Binarydata.dat'
BinaryFile = open(FileName, 'w')
for character in BinaryFile:
if charcter == '0':
NumberFile.write('')
else:
NumberFile.write('#')
BinaryFile.close()
You need to build up a string and write it to the file.
FileName = 'Numberdata.dat'
NumberFileHandle = open(FileName, 'r')
newFileString = ""
for Line in NumberFileHandle:
for char in line: # this will work for any number of lines.
if char == '4':
newFileString += "1"
elif char == '\n':
newFileString += char
else:
newFileString += "9"
NumberFileHandle.close()
NumberFileHandle = open(FileName, 'w')
NumberFileHandle.write(newFileString)
NumberFileHandle.close()
First, Line will never equal 4 because each line read from the file includes the newline character at the end. Try if Line.strip() == '4'. This will remove all white space from the beginning and end of the line.
Edit: I just saw your edit... naturally, if you have all your numbers on one line, the line will never equal 4. You probably want to read the file a character at a time, not a line at a time.
Second, you're not writing to any file, so naturally the file won't be getting changed. You will run into difficulty changing a file as you read it (since you have to figure out how to back up to the same place you just read from), so the usual practice is to read from one file and write to a different one.
Because you need to write to the file as well.
with open(FileName, 'w') as f:
f.write(...)
Right now you are just reading and manipulating the data, but you're not writing them back.
At the end you'll need to reopen your file in write mode and write to it.
If you're looking for references, take a look at theopen() documentation and at the Reading and Writing Files section of the Python Tutorial.
Edit: You shouldn't read and write at the same time from the same file. You could either, write to a temp file and at the end call shutil.move(), or load and manipulate your data and then re-open your original file in write mode and write them back.
You are not sending any output to the data, you are simply printing 1 and 9 to stdout which is usually the terminal or interpreter.
If you want to write to the file you have to use open again with w.
eg.
out = open(FileName, 'w')
you can also use
print >>out, '1'
Then you can call out.write('1') for example.
Also it is a better idea to read the file first if you want to overwrite and write after.
According to your comment:
Numberdata is just a load of numbers all one line. Maybe that's where I'm going wrong? It is "444666444666444888111000444"
I can tell you that the for cycle, iterate over lines and not over chars. There is a logic error.
Moreover, you have to write the file, as Rik Poggi said (just rember to open it in write mode)
A few things:
The r flag to open indicates read-only mode. This obviously won't let you write to the file.
print() outputs things to the screen. What you really want to do is output to the file. Have you read the Python File I/O tutorial?
for line in file_handle: loops through files one line at a time. Thus, if line == '4' will only be true if the line consists of a single character, 4, all on its own.
If you want to loop over characters in a string, then do something like for character in line:.
Modifying bits of a file "in place" is a bit harder than you think.
This is because if you insert data into the middle of a file, the rest of the data has to shuffle over to make room - this is really slow because everything after your insertion has to be rewritten.
In theory, a one-byte for one-byte replacement can be done fast, but in general people don't want to replace byte-for-byte, so this is an advanced feature. (See seek().) The usual approach is to just write out a whole new file.
Because print doesn't write to your file.
You have to open the file and read it, modify the string you obtain creating a new string, open again the file and write it again.
FileName = 'Numberdata.dat'
NumberFile = open(FileName, 'r')
data = NumberFile.read()
NumberFile.close()
dl = data.split('\n')
for i in range(len(dl)):
if dl[i] =='4':
dl[i] = '1'
else:
dl[i] = '9'
NumberFile = open(FileName, 'w')
NumberFile.write('\n'.join(dl))
NumberFile.close()
Try in this way. There are for sure different methods but this seems to be the most "linear" to me =)

Python Overwriting files after parsing

I'm new to Python, and I need to do a parsing exercise. I got a file, and I need to parse it (just the headers), but after the process, i need to keep the file the same format, the same extension, and at the same place in disk, but only with the differences of new headers..
I tried this code...
for line in open ('/home/name/db/str/dir/numbers/str.phy'):
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
print linepars
..and it does the job, but I don't know how to "overwrite" the file with the new parsing.
The easiest way, but not the most efficient (by far, and especially for long files) would be to rewrite the complete file.
You could do this by opening a second file handle and rewriting each line, except in the case of the header, you'd write the parsed header. For example,
fr = open('/home/name/db/str/dir/numbers/str.phy')
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
fr.close()
EDIT: Note that this does not use readlines(), so its more memory efficient. It also does not store every output line, but only one at a time, writing it to file immediately.
Just as a cool trick, you could use the with statement on the input file to avoid having to close it (Python 2.5+):
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
with open('/home/name/db/str/dir/numbers/str.phy') as fr:
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
P.S. Welcome :-)
As others are saying here, you want to open a file and use that file object's .write() method.
The best approach would be to open an additional file for writing:
import os
current_cfg = open(...)
parsed_cfg = open(..., 'w')
for line in current_cfg:
new_line = parse(line)
print new_line
parsed.cfg.write(new_line + '\n')
current_cfg.close()
parsed_cfg.close()
os.rename(....) # Rename old file to backup name
os.rename(....) # Rename new file into place
Additionally I'd suggest looking at the tempfile module and use one of its methods for either naming your new file or opening/creating it. Personally I'd favor putting the new file in the same directory as the existing file to ensure that os.rename will work atomically (the configuration file named will be guaranteed to either point at the old file or the new file; in no case would it point at a partially written/copied file).
The following code DOES the job.
I mean it DOES overwrite the file ON ONESELF; that's what the OP asked for. That's possible because the transformations are only removing characters, so the file's pointer fo that writes is always BEHIND the file's pointer fi that reads.
import re
regx = re.compile('\AENS([A-Z]+)0+([0-9]{6})')
with open('bomo.phy','rb+') as fi, open('bomo.phy','rb+') as fo:
fo.writelines(regx.sub('\\1\\2',line) for line in fi)
I think that the writing isn't performed by the operating system one line at a time but through a buffer. So several lines are read before a pool of transformed lines are written. That's what I think.
newlines = []
for line in open ('/home/name/db/str/dir/numbers/str.phy').readlines():
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
newlines.append( linepars )
open ('/home/name/db/str/dir/numbers/str.phy', 'w').write('\n'.join(newlines))
(sidenote: Of course if you are working with large files, you should be aware that the level of optimization required may depend on your situation. Python by nature is very non-lazily-evaluated. The following solution is not a good choice if you are parsing large files, such as database dumps or logs, but a few tweaks such as nesting the with clauses and using lazy generators or a line-by-line algorithm can allow O(1)-memory behavior.)
targetFile = '/home/name/db/str/dir/numbers/str.phy'
def replaceIfHeader(line):
if line.startswith('ENS'):
return re.sub('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
else:
return line
with open(targetFile, 'r') as f:
newText = '\n'.join(replaceIfHeader(line) for line in f)
try:
# make backup of targetFile
with open(targetFile, 'w') as f:
f.write(newText)
except:
# error encountered, do something to inform user where backup of targetFile is
edit: thanks to Jeff for suggestion

editing a single .txt line in python 3.1

i have some data stored in a .txt file in this format:
----------|||||||||||||||||||||||||-----------|||||||||||
1029450386abcdefghijklmnopqrstuvwxy0293847719184756301943
1020414646canBeFollowedBySpaces 3292532113435532419963
don't ask...
i have many lines of this, and i need a way to add more digits to the end of a particular line.
i've written code to find the line i want, but im stumped as to how to add 11 characters to the end of it. i've looked around, this site has been helpful with some other issues i've run into, but i can't seem to find what i need for this.
it is important that the line retain its position in the file, and its contents in their current order.
using python3.1, how would you turn this:
1020414646canBeFollowedBySpaces 3292532113435532419963
into
1020414646canBeFollowedBySpaces 329253211343553241996301846372998
As a general principle, there's no shortcut to "inserting" new data in the middle of a text file. You will need to make a copy of the entire original file in a new file, modifying your desired line(s) of text on the way.
For example:
with open("input.txt") as infile:
with open("output.txt", "w") as outfile:
for s in infile:
s = s.rstrip() # remove trailing newline
if "target" in s:
s += "0123456789"
print(s, file=outfile)
os.rename("input.txt", "input.txt.original")
os.rename("output.txt", "input.txt")
Check out the fileinput module, it can do sort of "inplace" edits with files. though I believe temporary files are still involved in the internal process.
import fileinput
for line in fileinput.input('input.txt', inplace=1, backup='.orig'):
if line.startswith('1020414646canBeFollowedBySpaces'):
line = line.rstrip() + '01846372998' '\n'
print(line, end='')
The print now prints to the file instead of the console.
You might want to back up your original file before editing.
target_chain = '1020414646canBeFollowedBySpaces 3292532113435532419963'
to_add = '01846372998'
with open('zaza.txt','rb+') as f:
ch = f.read()
x = ch.find(target_chain)
f.seek(x + len(target_chain),0)
f.write(to_add)
f.write(ch[x + len(target_chain):])
In this method it's absolutely obligatory to open the file in binary mode 'b' for some reason linked to the treatment of the end of lines by Python (see Universal Newline, enabled by default)
The mode 'r+' is to allow the writing as well as the reading
In this method, what is before the target_chain in the file remains untouched. And what is after the target_chain is shifted ahead. As said by Greg Hewgill, there is no possibility to move apart bits on a hard drisk to insert new bits in the middle.
Evidently, if the file is very big, reading all of its content in ch could be too much memory consuming and the algorithm should then be changed: reading line after line until the line containing the target_chain, and then reading the next line before inserting, and then continuing to do "reading the next line - re-writing on the current line" until the end of the file in order to shift progressively the content from the line concerned with addition.
You see what I mean...
Copy the file, line by line, to another file. When you get to the line that needs extra chars then add them before writing.

Categories

Resources