I have the following code which is intended to remove specific lines of a file. When I run it, it prints the two filenames that live in the directory, then deletes all information in them. What am I doing wrong? I'm using Python 3.2 under Windows.
import os
files = [file for file in os.listdir() if file.split(".")[-1] == "txt"]
for file in files:
print(file)
input = open(file,"r")
output = open(file,"w")
for line in input:
print(line)
# if line is good, write it to output
input.close()
output.close()
open(file, 'w') wipes the file. To prevent that, open it in r+ mode (read+write/don't wipe), then read it all at once, filter the lines, and write them back out again. Something like
with open(file, "r+") as f:
lines = f.readlines() # read entire file into memory
f.seek(0) # go back to the beginning of the file
f.writelines(filter(good, lines)) # dump the filtered lines back
f.truncate() # wipe the remains of the old file
I've assumed that good is a function telling whether a line should be kept.
If your file fits in memory, the easiest solution is to open the file for reading, read its contents to memory, close the file, open it for writing and write the filtered output back:
with open(file_name) as f:
lines = list(f)
# filter lines
with open(file_name, "w") as f: # This removes the file contents
f.writelines(lines)
Since you are not intermangling read and write operations, the advanced file modes like "r+" are unnecessary here, and only compicate things.
If the file does not fit into memory, the usual approach is to write the output to a new, temporary file, and move it back to the original file name after processing is finished.
One way is to use the fileinput stdlib module. Then you don't have to worry about open/closing and file modes etc...
import fileinput
from contextlib import closing
import os
fnames = [fname for fname in os.listdir() if fname.split(".")[-1] == "txt"] # use splitext
with closing(fileinput.input(fnames, inplace=True)) as fin:
for line in fin:
# some condition
if 'z' not in line: # your condition here
print line, # suppress new line but adjust for py3 - print(line, eol='') ?
When using inplace=True - the fileinput redirects stdout to be to the file currently opened. A backup of the file with a default '.bak' extension is created which may come in useful if needed.
jon#minerva:~$ cat testtext.txt
one
two
three
four
five
six
seven
eight
nine
ten
After running the above with a condition of not line.startswith('t'):
jon#minerva:~$ cat testtext.txt
one
four
five
six
seven
eight
nine
You're deleting everything when you open the file to write to it. You can't have an open read and write to a file at the same time. Use open(file,"r+") instead, and then save all the lines to another variable before writing anything.
You should not open the same file for reading and writing at the same time.
"w" means create a empty for writing. If the file already exists, its data will be deleted.
So you can use a different file name for writing.
Related
How do I append to a file instead of overwriting it?
Set the mode in open() to "a" (append) instead of "w" (write):
with open("test.txt", "a") as myfile:
myfile.write("appended text")
The documentation lists all the available modes.
You need to open the file in append mode, by setting "a" or "ab" as the mode. See open().
When you open with "a" mode, the write position will always be at the end of the file (an append). You can open with "a+" to allow reading, seek backwards and read (but all writes will still be at the end of the file!).
Example:
>>> with open('test1','wb') as f:
f.write('test')
>>> with open('test1','ab') as f:
f.write('koko')
>>> with open('test1','rb') as f:
f.read()
'testkoko'
Note: Using 'a' is not the same as opening with 'w' and seeking to the end of the file - consider what might happen if another program opened the file and started writing between the seek and the write. On some operating systems, opening the file with 'a' guarantees that all your following writes will be appended atomically to the end of the file (even as the file grows by other writes).
A few more details about how the "a" mode operates (tested on Linux only). Even if you seek back, every write will append to the end of the file:
>>> f = open('test','a+') # Not using 'with' just to simplify the example REPL session
>>> f.write('hi')
>>> f.seek(0)
>>> f.read()
'hi'
>>> f.seek(0)
>>> f.write('bye') # Will still append despite the seek(0)!
>>> f.seek(0)
>>> f.read()
'hibye'
In fact, the fopen manpage states:
Opening a file in append mode (a as the first character of mode)
causes all subsequent write operations to this stream to occur at
end-of-file, as if preceded the call:
fseek(stream, 0, SEEK_END);
Old simplified answer (not using with):
Example: (in a real program use with to close the file - see the documentation)
>>> open("test","wb").write("test")
>>> open("test","a+b").write("koko")
>>> open("test","rb").read()
'testkoko'
I always do this,
f = open('filename.txt', 'a')
f.write("stuff")
f.close()
It's simple, but very useful.
Python has many variations off of the main three modes, these three modes are:
'w' write text
'r' read text
'a' append text
So to append to a file it's as easy as:
f = open('filename.txt', 'a')
f.write('whatever you want to write here (in append mode) here.')
Then there are the modes that just make your code fewer lines:
'r+' read + write text
'w+' read + write text
'a+' append + read text
Finally, there are the modes of reading/writing in binary format:
'rb' read binary
'wb' write binary
'ab' append binary
'rb+' read + write binary
'wb+' read + write binary
'ab+' append + read binary
You probably want to pass "a" as the mode argument. See the docs for open().
with open("foo", "a") as f:
f.write("cool beans...")
There are other permutations of the mode argument for updating (+), truncating (w) and binary (b) mode but starting with just "a" is your best bet.
You can also do it with print instead of write:
with open('test.txt', 'a') as f:
print('appended text', file=f)
If test.txt doesn't exist, it will be created...
when we using this line open(filename, "a"), that a indicates the appending the file, that means allow to insert extra data to the existing file.
You can just use this following lines to append the text in your file
def FileSave(filename,content):
with open(filename, "a") as myfile:
myfile.write(content)
FileSave("test.txt","test1 \n")
FileSave("test.txt","test2 \n")
The 'a' parameter signifies append mode. If you don't want to use with open each time, you can easily write a function to do it for you:
def append(txt='\nFunction Successfully Executed', file):
with open(file, 'a') as f:
f.write(txt)
If you want to write somewhere else other than the end, you can use 'r+'†:
import os
with open(file, 'r+') as f:
f.seek(0, os.SEEK_END)
f.write("text to add")
Finally, the 'w+' parameter grants even more freedom. Specifically, it allows you to create the file if it doesn't exist, as well as empty the contents of a file that currently exists.
† Credit for this function goes to #Primusa
You can also open the file in r+ mode and then set the file position to the end of the file.
import os
with open('text.txt', 'r+') as f:
f.seek(0, os.SEEK_END)
f.write("text to add")
Opening the file in r+ mode will let you write to other file positions besides the end, while a and a+ force writing to the end.
if you want to append to a file
with open("test.txt", "a") as myfile:
myfile.write("append me")
We declared the variable myfile to open a file named test.txt. Open takes 2 arguments, the file that we want to open and a string that represents the kinds of permission or operation we want to do on the file
here is file mode options
Mode Description
'r' This is the default mode. It Opens file for reading.
'w' This Mode Opens file for writing.
If file does not exist, it creates a new file.
If file exists it truncates the file.
'x' Creates a new file. If file already exists, the operation fails.
'a' Open file in append mode.
If file does not exist, it creates a new file.
't' This is the default mode. It opens in text mode.
'b' This opens in binary mode.
'+' This will open a file for reading and writing (updating)
If multiple processes are writing to the file, you must use append mode or the data will be scrambled. Append mode will make the operating system put every write, at the end of the file irrespective of where the writer thinks his position in the file is. This is a common issue for multi-process services like nginx or apache where multiple instances of the same process, are writing to the same log
file. Consider what happens if you try to seek, then write:
Example does not work well with multiple processes:
f = open("logfile", "w"); f.seek(0, os.SEEK_END); f.write("data to write");
writer1: seek to end of file. position 1000 (for example)
writer2: seek to end of file. position 1000
writer2: write data at position 1000 end of file is now 1000 + length of data.
writer1: write data at position 1000 writer1's data overwrites writer2's data.
By using append mode, the operating system will place any write at the end of the file.
f = open("logfile", "a"); f.seek(0, os.SEEK_END); f.write("data to write");
Append most does not mean, "open file, go to end of the file once after opening it". It means, "open file, every write I do will be at the end of the file".
WARNING: For this to work you must write all your record in one shot, in one write call. If you split the data between multiple writes, other writers can and will get their writes in between yours and mangle your data.
Sometimes, beginners have this problem because they attempt to open and write to a file in a loop:
for item in my_data:
with open('results.txt', 'w') as f:
f.write(some_calculation(item))
The problem is that every time the file is opened for writing, it will be truncated (cleared out).
We can solve this by opening in append mode instead; but in cases like this, it will normally be better to solve the problem by inverting the logic. If the file is opened only once, then it won't get overwritten each time; and we can keep writing to it as long as it is open - we don't have to re-open it for each write (it would be pointless for Python to make things work that way, since it would add to the required code for no benefit).
Thus:
with open('results.txt', 'w') as f:
for item in my_data:
f.write(some_calculation(item))
The simplest way to append more text to the end of a file would be to use:
with open('/path/to/file', 'a+') as file:
file.write("Additions to file")
file.close()
The a+ in the open(...) statement instructs to open the file in append mode and allows read and write access.
It is also always good practice to use file.close() to close any files that you have opened once you are done using them.
I got a text file like this
Bruce
brucechungulloa#outlook.com
I've used this to read the text file and export it to a list
with open('info.txt') as f:
info = f.readlines()
for item in info:
reportePaises = open('reportePaises.txt', 'w')
reportePaises.write("%s\n" % item)
But when I want to write the elements of the list(info) into another text file, only the info[1] is written (the mail)
How can I write the entire list onto the text file?
with open('data.csv') as f:
with open('test2.txt', 'a') as wp:
for item in f.readlines():
wp.write("%s" % item)
wp.write('\n') # adds a new line after the looping is done
That will give you:
Bruce
brucechungulloa#outlook.com
In both files.
You were having problems because every time you open a file with 'w' flag, you overwrite it on the disk. So, you created a new file every time.
You should open the second file only once, in the with statement:
with open('info.txt') as f, open('reportePaises.txt', 'w') as reportePaises:
info = f.readlines()
for item in info:
reportePaises.write(item)
As #Pynchia suggested, it's probably better not to use .readlines(), and loop directly on input file instead.
with open('info.txt') as f, open('reportePaises.txt', 'w') as reportePaises:
for item in f:
reportePaises.write(item)
This way you don't create a copy of the while file in your RAM by saving it to a list, which may cause a huge delay if the file is big (and, obviously, uses more RAM). Instead, you treat the input file as an iterator and just read next line directly from your HDD on each iteration.
You also (if I did the testing right) don't need to append '\n' to every line. The newlines are already in item. Because of that you don't need to use string formatting at all, just reportePaises.write(item).
You are opening your file in write mode every time you write to a file, effectively overwriting the previous line that you wrote. Use the append mode, a, instead.
reportePaises = open('reportePaises.txt', 'a')
Edit: Alternatively, you can open the file once and instead of looping through the lines, write the whole contents as follows:
with open('reportePaises.txt', 'w') as file:
file.write(f.read())
Try this without open output file again and again.
with open('info.txt') as f:
info = f.readlines()
with open('reportePaises.txt', 'w') as f1:
for x in info:
f1.write("%s\n" % x)
That will work.
Two problems here. One is you are opening the output file inside the loop. That means it is being opened several times. Since you also use the "w" flag that means the file is truncated to zero each time it is opened. Therefore you only get the last line written.
It would be better to open the output file once outside the loop. You could even use an outer with block.
You can simply try the below code. Your code did not work because you added the opening on file handler 'reportPaises' within the for loop. You don't need to open the file handler again and again.
Try re running your code line by line in the python shell as it is very easy to debug the bugs in the code.
The below code will work
with open('something.txt') as f:
info = f.readlines()
reportePaises = open('reportePaises.txt', 'w')
for item in info:
reportePaises.write("%s" % item)
You don't need to add a \n to the output line because when you perform readlines, the \n character is preserved in the info list file. Please look observe below.
Try below
with open('something.txt') as f:
info = f.readlines()
print info
The output you will get is
['Bruce\n', 'brucechungulloa#outlook.com']
When I run the following in the Python IDLE Shell:
f = open(r"H:\Test\test.csv", "rb")
for line in f:
print line
#this works fine
however, when I run the following for a second time:
for line in f:
print line
#this does nothing
This does not work because you've already seeked to the end of the file the first time. You need to rewind (using .seek(0)) or re-open your file.
Some other pointers:
Python has a very good csv module. Do not attempt to implement CSV parsing yourself unless doing so as an educational exercise.
You probably want to open your file in 'rU' mode, not 'rb'. 'rU' is universal newline mode, which will deal with source files coming from platforms with different line endings for you.
Use with when working with file objects, since it will cleanup the handles for you even in the case of errors. Ex:
.
with open(r"H:\Test\test.csv", "rU") as f:
for line in f:
...
You can read the data from the file in a variable, and then you can iterate over this data any no. of times you want to in your script. This is better than doing seek back and forth.
f = open(r"H:\Test\test.csv", "rb")
data = f.readlines()
for line in data:
print line
for line in data:
print line
Output:
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
Because you've gone all the way through the CSV file, and the iterator is exhausted. You'll need to re-open it before the second loop.
Hey I need to split a large file in python into smaller files that contain only specific lines. How do I do this?
You're probably going to want to do something like this:
big_file = open('big_file', 'r')
small_file1 = open('small_file1', 'w')
small_file2 = open('small_file2', 'w')
for line in big_file:
if 'Charlie' in line: small_file1.write(line)
if 'Mark' in line: small_file2.write(line)
big_file.close()
small_file1.close()
small_file2.close()
Opening a file for reading returns an object that allows you to iterate over the lines. You can then check each line (which is just a string of whatever that line contains) for whatever condition you want, then write it to the appropriate file that you opened for writing. It is worth noting that when you open a file with 'w' it will overwrite anything already written to that file. If you want to simply add to the end, you should open it with 'a', to append.
Additionally, if you expect there to be some possibility of error in your reading/writing code, and want to make sure the files are closed, you can use:
with open('big_file', 'r') as big_file:
<do stuff prone to error>
Do you mean breaking it down into subsections? Like if I had a file with chapter 1, chapter 2, and chapter 3, you want it to be broken down into separate files for each chapter?
The way I've done this is similar to Wilduck's response, but closes the input file as soon as it reads in the data and keeps all the lines read in.
data_file = open('large_file_name', 'r')
lines = data_file.readlines()
data_file.close()
outputFile = open('output_file_one', 'w')
for line in lines:
if 'SomeName' in line:
outputFile.write(line)
outputFile.close()
If you wanted to have more than one output file you could either add more loops or open more than one outputFile at a time.
I'd recommend using Wilducks response, however, as it uses less space and will take less time with larger files since the file is read only once.
How big and does it need to be done in python? If this is on unix, would split/csplit/grep suffice?
First, open the big file for reading.
Second, open all the smaller file names for writing.
Third, iterate through every line. Every iteration, check to see what kind of line it is, then write it to that file.
More info on File I/O: http://docs.python.org/tutorial/inputoutput.html
I am new to Python programming...
I have a .txt file....... It looks like..
0,Salary,14000
0,Bonus,5000
0,gift,6000
I want to to replace the first '0' value to '1' in each line. How can I do this? Any one can help me.... With sample code..
Thanks in advance.
Nimmyliji
I know that you're asking about Python, but forgive me for suggesting that perhaps a different tool is better for the job. :) It's a one-liner via sed:
sed 's/^0,/1,/' yourtextfile.txt > output.txt
This applies the regex /^0,/ (which matches any 0, that occurs at the beginning of a line) to each line and replaces the matched text with 1, instead. The output is directed into the file output.txt specified.
inFile = open("old.txt", "r")
outFile = open("new.txt", "w")
for line in inFile:
outFile.write(",".join(["1"] + (line.split(","))[1:]))
inFile.close()
outFile.close()
If you would like something more general, take a look to Python csv module. It contains utilities for processing comma-separated values (abbreviated as csv) in files. But it can work with arbitrary delimiter, not only comma. So as you sample is obviously a csv file, you can use it as follows:
import csv
reader = csv.reader(open("old.txt"))
writer = csv.writer(open("new.txt", "w"))
writer.writerows(["1"] + line[1:] for line in reader)
To overwrite original file with new one:
import os
os.remove("old.txt")
os.rename("new.txt", "old.txt")
I think that writing to new file and then renaming it is more fault-tolerant and less likely corrupt your data than direct overwriting of source file. Imagine, that your program raised an exception while source file was already read to memory and reopened for writing. So you would lose original data and your new data wouldn't be saved because of program crash. In my case, I only lose new data while preserving original.
o=open("output.txt","w")
for line in open("file"):
s=line.split(",")
s[0]="1"
o.write(','.join(s))
o.close()
Or you can use fileinput with in place edit
import fileinput
for line in fileinput.FileInput("file",inplace=1):
s=line.split(",")
s[0]="1"
print ','.join(s)
f = open(filepath,'r')
data = f.readlines()
f.close()
edited = []
for line in data:
edited.append( '1'+line[1:] )
f = open(filepath,'w')
f.writelines(edited)
f.flush()
f.close()
Or in Python 2.5+:
with open(filepath,'r') as f:
data = f.readlines()
with open(outfilepath, 'w') as f:
for line in data:
f.write( '1' + line[1:] )
This should do it. I wouldn't recommend it for a truly big file though ;-)
What is going on (ex 1):
1: Open the file in read mode
2,3: Read all the lines into a list (each line is a separate index) and close the file.
4,5,6: Iterate over the list constructing a new list where each line has the first character replaced by a 1. The line[1:] slices the string from index 1 onward. We concatenate the 1 with the truncated list.
7,8,9: Reopen the file in write mode, write the list to the file (overwrite), flush the buffer, and close the file handle.
In Ex. 2:
I use the with statement that lets the file handle closing itself, but do essentially the same thing.