I am a newbie to python. I have a code in which I must write the contents again to my same file,but when I do it it clears my content.Please help to fix it.
How should I modify my code such that the contents will be written back on the same file?
My code:
import re
numbers = {}
with open('1.txt') as f,open('11.txt', 'w') as f1:
for line in f:
row = re.split(r'(\d+)', line.strip())
words = tuple(row[::2])
if words not in numbers:
numbers[words] = [int(n) for n in row[1::2]]
numbers[words] = [n+1 for n in numbers[words]]
row[1::2] = map(str, numbers[words])
indentation = (re.match(r"\s*", line).group())
print (indentation + ''.join(row))
f1.write(indentation + ''.join(row) + '\n')
In general, it's a bad idea to write over a file you're still processing (or change a data structure over which you are iterating). It can be done...but it requires much care, and there is little safety or restart-ability should something go wrong in the middle (an error, a power failure, etc.)
A better approach is to write a clean new file, then rename it to the old name. For example:
import re
import os
filename = '1.txt'
tempname = "temp{0}_{1}".format(os.getpid(), filename)
numbers = {}
with open(filename) as f, open(tempname, 'w') as f1:
# ... file processing as before
os.rename(tempname, filename)
Here I've dropped filenames (both original and temporary) into variables, so they can be easily referred to multiple times or changed. This also prepares for the moment when you hoist this code into a function (as part of a larger program), as opposed to making it the main line of your program.
You don't strictly need the temporary name to embed the process id, but it's a standard way of making sure the temp file is uniquely named (temp32939_1.txt vs temp_1.txt or tempfile.txt, say).
It may also be helpful to create backups of the files as they were before processing. In which case, before the os.rename(tempname, filename) you can drop in code to move the original data to a safer location or a backup name. E.g.:
backupname = filename + ".bak"
os.rename(filename, backupname)
os.rename(tempname, filename)
While beyond the scope of this question, if you used a read-process-overwrite strategy frequently, it would be possible to create a separate module that abstracted these file-handling details away from your processing code. Here is an example.
Use
open('11.txt', 'a')
To append to the file instead of w for writing (a new or overwriting a file).
If you want to read and modify file in one time use "r+' mode.
f=file('/path/to/file.txt', 'r+')
content=f.read()
content=content.replace('oldstring', 'newstring') #for example change some substring in whole file
f.seek(0) #move to beginning of file
f.write(content)
f.truncate() #clear file conent "tail" on disk if new content shorter then old
f.close()
Related
I have a folder with csv formated documents with a .arw extension. Files are named as 1.arw, 2.arw, 3.arw ... etc.
I would like to write a code that reads all the files, checks and replaces the forwardslash / with a dash -. And finally creates new files with the replaced character.
The code I wrote as follows:
for i in range(1,6):
my_file=open("/path/"+str(i)+".arw", "r+")
str=my_file.read()
if "/" not in str:
print("There is no forwardslash")
else:
str_new = str.replace("/","-")
print(str_new)
f = open("/path/new"+str(i)+".arw", "w")
f.write(str_new)
my_file.close()
But I get an error saying:
'str' object is not callable.
How can I make it work for all the files in a folder? Apparently my for loop does not work.
The actual error is that you are replacing the built-in str with your own variable with the same name, then try to use the built-in str() after that.
Simply renaming the variable fixes the immediate problem, but you really want to refactor the code to avoid reading the entire file into memory.
import logging
import os
for i in range(1,6):
seen_slash = False
input_filename = "/path/"+str(i)+".arw"
output_filename = "/path/new"+str(i)+".arw"
with open(input_filename, "r+") as input, open(output_filename, "w") as output:
for line in input:
if not seen_slash and "/" in line:
seen_slash = True
line_new = line.replace("/","-")
print(line_new.rstrip('\n')) # don't duplicate newline
output.write(line_new)
if not seen_slash:
logging.warn("{0}: No slash found".format(input_filename))
os.unlink(output_filename)
Using logging instead of print for error messages helps because you keep standard output (the print output) separate from the diagnostics (the logging output). Notice also how the diagnostic message includes the name of the file we found the problem in.
Going back and deleting the output filename when you have examined the entire input file and not found any slashes is a mild wart, but should typically be more efficient.
This is how I would do it:
for i in range(1,6):
with open((str(i)+'.arw'), 'r') as f:
data = f.readlines()
for element in data:
element.replace('/', '-')
f.close()
with open((str(i)+'.arw'), 'w') as f:
for element in data:
f.write(element)
f.close()
this is assuming from your post that you know that you have 6 files
if you don't know how many files you have you can use the OS module to find the files in the directory.
I have been searching for a solution for this and haven't been able to find one. I have a directory of folders which contain multiple, very-large csv files. I'm looping through each csv in each folder in the directory to replace values of certain headers. I need the headers to be consistent (from file to file) in order to run a different script to process all the data properly.
I found this solution that I though would work: change first line of a file in python.
However this is not working as expected. My code:
from_file = open(filepath)
# for line in f:
# if
data = from_file.readline()
# print(data)
# with open(filepath, "w") as f:
print 'DBG: replacing in file', filepath
# s = s.replace(search_pattern, replacement)
for i in range(len(search_pattern)):
data = re.sub(search_pattern[i], replacement[i], data)
# data = re.sub(search_pattern, replacement, data)
to_file = open(filepath, mode="w")
to_file.write(data)
shutil.copyfileobj(from_file, to_file)
I want to replace the header values in search_pattern with values in replacement without saving or writing to a different file - I want to modify the file. I have also tried
shutil.copyfileobj(from_file, to_file, -1)
As I understand it that should copy the whole file rather than breaking it up in chunks, but it doesn't seem to have an effect on my output. Is it possible that the csv is just too big?
I haven't been able to determine a different way to do this or make this way work. Any help would be greatly appreciated!
this answer from change first line of a file in python you copied from doesn't work in windows
On Linux, you can open a file for reading & writing at the same time. The system ensures that there's no conflict, but behind the scenes, 2 different file objects are being handled. And this method is very unsafe: if the program crashes while reading/writing (power off, disk full)... the file has a great chance to be truncated/corrupt.
Anyway, in Windows, you cannot open a file for reading and writing at the same time using 2 handles. It just destroys the contents of the file.
So there are 2 options, which are portable and safe:
create a file in the same directory, once copied, delete first file, and rename the new one
Like this:
import os
import shutil
filepath = "test.txt"
with open(filepath) as from_file, open(filepath+".new","w") as to_file:
data = from_file.readline()
to_file.write("something else\n")
shutil.copyfileobj(from_file, to_file)
os.remove(filepath)
os.rename(filepath+".new",filepath)
This doesn't take much longer, because the rename operation is instantaneous. Besides, if the program/computer crashes at any point, one of the files (old or new) is valid, so it's safe.
if patterns have the same length, use read/write mode
like this:
filepath = "test.txt"
with open(filepath,"r+") as rw_file:
data = rw_file.readline()
data = "h"*(len(data)-1) + "\n"
rw_file.seek(0)
rw_file.write(data)
Here we, read the line, replace the first line by the same amount of h characters, rewind the file and write the first line back, overwriting previous contents, keeping the rest of the lines. This is also safe, and even if the file is huge, it's very fast. The only constraint is that the pattern must be of the exact same size (else you would have remainders of the previous data, or you would overwrite the next line(s) since no data is shifted)
I'm fairly new to python and I'm having an issue with my python script (split_fasta.py). Here is an example of my issue:
list = ["1.fasta", "2.fasta", "3.fasta"]
for file in list:
contents = open(file, "r")
for line in contents:
if line[0] == ">":
new_file = open(file + "_chromosome.fasta", "w")
new_file.write(line)
I've left the bottom part of the program out because it's not needed. My issue is that when I run this program in the same direcoty as my fasta123 files, it works great:
python split_fasta.py *.fasta
But if I'm in a different directory and I want the program to output the new files (eg. 1.fasta_chromsome.fasta) to my current directory...it doesn't:
python /home/bin/split_fasta.py /home/data/*.fasta
This still creates the new files in the same directory as the fasta files. The issue here I'm sure is with this line:
new_file = open(file + "_chromosome.fasta", "w")
Because if I change it to this:
new_file = open("seq" + "_chromosome.fasta", "w")
It creates an output file in my current directory.
I hope this makes sense to some of you and that I can get some suggestions.
You are giving the full path of the old file, plus a new name. So basically, if file == /home/data/something.fasta, the output file will be file + "_chromosome.fasta" which is /home/data/something.fasta_chromosome.fasta
If you use os.path.basename on file, you will get the name of the file (i.e. in my example, something.fasta)
From #Adam Smith
You can use os.path.splitext to get rid of the .fasta
basename, _ = os.path.splitext(os.path.basename(file))
Getting back to the code example, I saw many things not recommended in Python. I'll go in details.
Avoid shadowing builtin names, such as list, str, int... It is not explicit and can lead to potential issues later.
When opening a file for reading or writing, you should use the with syntax. This is highly recommended since it takes care to close the file.
with open(filename, "r") as f:
data = f.read()
with open(new_filename, "w") as f:
f.write(data)
If you have an empty line in your file, line[0] == ... will result in a IndexError exception. Use line.startswith(...) instead.
Final code :
files = ["1.fasta", "2.fasta", "3.fasta"]
for file in files:
with open(file, "r") as input:
for line in input:
if line.startswith(">"):
new_name = os.path.splitext(os.path.basename(file)) + "_chromosome.fasta"
with open(new_name, "w") as output:
output.write(line)
Often, people come at me and say "that's hugly". Not really :). The levels of indentation makes clear what is which context.
I'm new to Python, and I need to do a parsing exercise. I got a file, and I need to parse it (just the headers), but after the process, i need to keep the file the same format, the same extension, and at the same place in disk, but only with the differences of new headers..
I tried this code...
for line in open ('/home/name/db/str/dir/numbers/str.phy'):
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
print linepars
..and it does the job, but I don't know how to "overwrite" the file with the new parsing.
The easiest way, but not the most efficient (by far, and especially for long files) would be to rewrite the complete file.
You could do this by opening a second file handle and rewriting each line, except in the case of the header, you'd write the parsed header. For example,
fr = open('/home/name/db/str/dir/numbers/str.phy')
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
fr.close()
EDIT: Note that this does not use readlines(), so its more memory efficient. It also does not store every output line, but only one at a time, writing it to file immediately.
Just as a cool trick, you could use the with statement on the input file to avoid having to close it (Python 2.5+):
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
with open('/home/name/db/str/dir/numbers/str.phy') as fr:
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
P.S. Welcome :-)
As others are saying here, you want to open a file and use that file object's .write() method.
The best approach would be to open an additional file for writing:
import os
current_cfg = open(...)
parsed_cfg = open(..., 'w')
for line in current_cfg:
new_line = parse(line)
print new_line
parsed.cfg.write(new_line + '\n')
current_cfg.close()
parsed_cfg.close()
os.rename(....) # Rename old file to backup name
os.rename(....) # Rename new file into place
Additionally I'd suggest looking at the tempfile module and use one of its methods for either naming your new file or opening/creating it. Personally I'd favor putting the new file in the same directory as the existing file to ensure that os.rename will work atomically (the configuration file named will be guaranteed to either point at the old file or the new file; in no case would it point at a partially written/copied file).
The following code DOES the job.
I mean it DOES overwrite the file ON ONESELF; that's what the OP asked for. That's possible because the transformations are only removing characters, so the file's pointer fo that writes is always BEHIND the file's pointer fi that reads.
import re
regx = re.compile('\AENS([A-Z]+)0+([0-9]{6})')
with open('bomo.phy','rb+') as fi, open('bomo.phy','rb+') as fo:
fo.writelines(regx.sub('\\1\\2',line) for line in fi)
I think that the writing isn't performed by the operating system one line at a time but through a buffer. So several lines are read before a pool of transformed lines are written. That's what I think.
newlines = []
for line in open ('/home/name/db/str/dir/numbers/str.phy').readlines():
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
newlines.append( linepars )
open ('/home/name/db/str/dir/numbers/str.phy', 'w').write('\n'.join(newlines))
(sidenote: Of course if you are working with large files, you should be aware that the level of optimization required may depend on your situation. Python by nature is very non-lazily-evaluated. The following solution is not a good choice if you are parsing large files, such as database dumps or logs, but a few tweaks such as nesting the with clauses and using lazy generators or a line-by-line algorithm can allow O(1)-memory behavior.)
targetFile = '/home/name/db/str/dir/numbers/str.phy'
def replaceIfHeader(line):
if line.startswith('ENS'):
return re.sub('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
else:
return line
with open(targetFile, 'r') as f:
newText = '\n'.join(replaceIfHeader(line) for line in f)
try:
# make backup of targetFile
with open(targetFile, 'w') as f:
f.write(newText)
except:
# error encountered, do something to inform user where backup of targetFile is
edit: thanks to Jeff for suggestion
I'm doing all this in the interpreter..
loc1 = '/council/council1'
file1 = open(loc1, 'r')
at this point i can do file1.read() and it prints the file's contents as a string to standard output
but if i add this..
string1 = file1.read()
string 1 comes back empty.. i have no idea what i could be doing wrong. this seems like the most basic thing!
if I go on to type file1.read() again, the output to standard output is just an empty string. so, somehow i am losing my file when i try to create a string with file1.read()
You can only read a file once. After that, the current read-position is at the end of the file.
If you add file1.seek(0) before you re-read it, you should be able to read the contents again. A better approach, however, is to read into a string the first time and then keep it in memory:
loc1 = '/council/council1'
file1 = open(loc1, 'r')
string1 = file1.read()
print string1
You do not lose it, you just move offset pointer to the end of file and try to read some more data. Since it is the end of the file, no more data is available and you get empty string. Try reopening file or seeking to zero position:
f.read()
f.seek(0)
f.read()
Using with is the best syntax to use because it closes the connection to the file after using it(since python 2.5):
with open('/council/council1', 'r') as input_file:
text = input_file.read()
print(text)
To quote the official documentation on read():
To read a file’s contents, call f.read(size)
When size is omitted or negative, the entire contents of the file will
be read and returned;
And the most relevant part:
If the end of the file has been reached, f.read() will return an empty
string ('').
Which means that if you use read() twice consecutively, it is expected that the second time you'll get an empty string. Either store it the first time or use f.seek(0) to go back to the start. Together, they provide a lower level API to give you greater control.
Besides using a context manager to automatically open and close the file, there's another way to read a whole text file, using pathlib, example below:
#!/usr/bin/env python3
from pathlib import Path
txt_file = Path("myfile.txt")
try:
content = txt_file.read_text()
except FileNotFoundError:
print("Could not find file")
else:
print(f"The content is: {content}")
print(f"I can also read again: {txt_file.read_text()}")
As you can see, you can call read_text() several times and you'll get the full content, no surprises. Of course you wouldn't want to do that in production code since read_text() opens and closes the file each time, it's still best to store it. I could recommend pathlib highly when dealing with files and file paths.
It's outside the scope, but it may be worth noting a difference when reading line by line. Unlike the file object obtained by open(), PosixPath returned by Path() is not iterable. The equivalent of:
with open('file.txt') as f:
for line in f:
print(line)
Would be something like:
for line in Path('file.txt').read_text().split('\n'):
print(line)
One advantage of the first approach, with open, is that the entire file is not read into memory at once.
make sure your location is correct. Do you actually have a directory called /council under your root directory (/) ?. also use, os.path.join() to create your path
loc1 = os.path.join("/path","dir1","dir2")