I am trying to process a csv file in python that has ^M character in the middle of each row/line which is a newline. I cant open the file in any mode other than 'rU'.
If I do open the file in the 'rU' mode, it reads in the newline and splits the file (creating a newline) and gives me twice the number of rows.
I want to remove the newline altogether. How?
Note that, as the docs say:
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.
So, you can always stick a filter on the file before handing it to your reader or DictReader. Instead of this:
with open('myfile.csv', 'rU') as myfile:
for row in csv.reader(myfile):
Do this:
with open('myfile.csv', 'rU') as myfile:
filtered = (line.replace('\r', '') for line in myfile)
for row in csv.reader(filtered):
That '\r' is the Python (and C) way of spelling ^M. So, this just strips all ^M characters out, no matter where they appear, by replacing each one with an empty string.
I guess I want to modify the file permanently as opposed to filtering it.
First, if you want to modify the file before running your Python script on it, why not do that from outside of Python? sed, tr, many text editors, etc. can all do this for you. Here's a GNU sed example:
gsed -i'' 's/\r//g' myfile.csv
But if you want to do it in Python, it's not that much more verbose, and you might find it more readable, so:
First, you can't really modify a file in-place if you want to insert or delete from the middle. The usual solution is to write a new file, and either move the new file over the old one (Unix only) or delete the old one (cross-platform).
The cross-platform version:
os.rename('myfile.csv', 'myfile.csv.bak')
with open('myfile.csv.bak', 'rU') as infile, open('myfile.csv', 'wU') as outfile:
for line in infile:
outfile.write(line.replace('\r'))
os.remove('myfile.csv.bak')
The less-clunky, but Unix-only, version:
temp = tempfile.NamedTemporaryFile(delete=False)
with open('myfile.csv', 'rU') as myfile, closing(temp):
for line in myfile:
temp.write(line.replace('\r'))
os.rename(tempfile.name, 'myfile.csv')
Related
i have a large file in my local disk which contains some fixed length string in first line. I need to programmatically replace that fixed length string using python without reading whole file in memory .
i have tried opening the file in append mode and seeking to 0 position. And then replace the string which is of 9 bytes. The code is also added here , what i tried .
with open ("largefile.txt", 'a') as f:
f.seek(0,0)
f.write("123456789")
I think you just want to open the file for writing without truncating it, which would be r+. to make this reproducible, we first create a file that matches this format:
with open('many_lines.txt', 'w') as fd:
print('abcdefghi', file=fd)
for i in range(10000):
print(f'line {i:09}', file=fd)
then we basically do what you were doing, but with the correct mode:
with open('many_lines.txt', 'r+') as fd:
print('123456789', file=fd)
or you can use write directly, with:
with open('many_lines.txt', 'r+') as fd:
fd.write('123456789')
Note: I'm opening in r+ so that you'll get an FileNotFoundError if it doesn't exist (or the filename is misspelled) rather than just blindly creating a tiny file
The open modes are directly copied from the C/POSIX API for the fopen so your use of a will trigger behaviour that says:
Subsequent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar
I got a text file like this
Bruce
brucechungulloa#outlook.com
I've used this to read the text file and export it to a list
with open('info.txt') as f:
info = f.readlines()
for item in info:
reportePaises = open('reportePaises.txt', 'w')
reportePaises.write("%s\n" % item)
But when I want to write the elements of the list(info) into another text file, only the info[1] is written (the mail)
How can I write the entire list onto the text file?
with open('data.csv') as f:
with open('test2.txt', 'a') as wp:
for item in f.readlines():
wp.write("%s" % item)
wp.write('\n') # adds a new line after the looping is done
That will give you:
Bruce
brucechungulloa#outlook.com
In both files.
You were having problems because every time you open a file with 'w' flag, you overwrite it on the disk. So, you created a new file every time.
You should open the second file only once, in the with statement:
with open('info.txt') as f, open('reportePaises.txt', 'w') as reportePaises:
info = f.readlines()
for item in info:
reportePaises.write(item)
As #Pynchia suggested, it's probably better not to use .readlines(), and loop directly on input file instead.
with open('info.txt') as f, open('reportePaises.txt', 'w') as reportePaises:
for item in f:
reportePaises.write(item)
This way you don't create a copy of the while file in your RAM by saving it to a list, which may cause a huge delay if the file is big (and, obviously, uses more RAM). Instead, you treat the input file as an iterator and just read next line directly from your HDD on each iteration.
You also (if I did the testing right) don't need to append '\n' to every line. The newlines are already in item. Because of that you don't need to use string formatting at all, just reportePaises.write(item).
You are opening your file in write mode every time you write to a file, effectively overwriting the previous line that you wrote. Use the append mode, a, instead.
reportePaises = open('reportePaises.txt', 'a')
Edit: Alternatively, you can open the file once and instead of looping through the lines, write the whole contents as follows:
with open('reportePaises.txt', 'w') as file:
file.write(f.read())
Try this without open output file again and again.
with open('info.txt') as f:
info = f.readlines()
with open('reportePaises.txt', 'w') as f1:
for x in info:
f1.write("%s\n" % x)
That will work.
Two problems here. One is you are opening the output file inside the loop. That means it is being opened several times. Since you also use the "w" flag that means the file is truncated to zero each time it is opened. Therefore you only get the last line written.
It would be better to open the output file once outside the loop. You could even use an outer with block.
You can simply try the below code. Your code did not work because you added the opening on file handler 'reportPaises' within the for loop. You don't need to open the file handler again and again.
Try re running your code line by line in the python shell as it is very easy to debug the bugs in the code.
The below code will work
with open('something.txt') as f:
info = f.readlines()
reportePaises = open('reportePaises.txt', 'w')
for item in info:
reportePaises.write("%s" % item)
You don't need to add a \n to the output line because when you perform readlines, the \n character is preserved in the info list file. Please look observe below.
Try below
with open('something.txt') as f:
info = f.readlines()
print info
The output you will get is
['Bruce\n', 'brucechungulloa#outlook.com']
When I run the following in the Python IDLE Shell:
f = open(r"H:\Test\test.csv", "rb")
for line in f:
print line
#this works fine
however, when I run the following for a second time:
for line in f:
print line
#this does nothing
This does not work because you've already seeked to the end of the file the first time. You need to rewind (using .seek(0)) or re-open your file.
Some other pointers:
Python has a very good csv module. Do not attempt to implement CSV parsing yourself unless doing so as an educational exercise.
You probably want to open your file in 'rU' mode, not 'rb'. 'rU' is universal newline mode, which will deal with source files coming from platforms with different line endings for you.
Use with when working with file objects, since it will cleanup the handles for you even in the case of errors. Ex:
.
with open(r"H:\Test\test.csv", "rU") as f:
for line in f:
...
You can read the data from the file in a variable, and then you can iterate over this data any no. of times you want to in your script. This is better than doing seek back and forth.
f = open(r"H:\Test\test.csv", "rb")
data = f.readlines()
for line in data:
print line
for line in data:
print line
Output:
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
Because you've gone all the way through the CSV file, and the iterator is exhausted. You'll need to re-open it before the second loop.
I have the following code which is intended to remove specific lines of a file. When I run it, it prints the two filenames that live in the directory, then deletes all information in them. What am I doing wrong? I'm using Python 3.2 under Windows.
import os
files = [file for file in os.listdir() if file.split(".")[-1] == "txt"]
for file in files:
print(file)
input = open(file,"r")
output = open(file,"w")
for line in input:
print(line)
# if line is good, write it to output
input.close()
output.close()
open(file, 'w') wipes the file. To prevent that, open it in r+ mode (read+write/don't wipe), then read it all at once, filter the lines, and write them back out again. Something like
with open(file, "r+") as f:
lines = f.readlines() # read entire file into memory
f.seek(0) # go back to the beginning of the file
f.writelines(filter(good, lines)) # dump the filtered lines back
f.truncate() # wipe the remains of the old file
I've assumed that good is a function telling whether a line should be kept.
If your file fits in memory, the easiest solution is to open the file for reading, read its contents to memory, close the file, open it for writing and write the filtered output back:
with open(file_name) as f:
lines = list(f)
# filter lines
with open(file_name, "w") as f: # This removes the file contents
f.writelines(lines)
Since you are not intermangling read and write operations, the advanced file modes like "r+" are unnecessary here, and only compicate things.
If the file does not fit into memory, the usual approach is to write the output to a new, temporary file, and move it back to the original file name after processing is finished.
One way is to use the fileinput stdlib module. Then you don't have to worry about open/closing and file modes etc...
import fileinput
from contextlib import closing
import os
fnames = [fname for fname in os.listdir() if fname.split(".")[-1] == "txt"] # use splitext
with closing(fileinput.input(fnames, inplace=True)) as fin:
for line in fin:
# some condition
if 'z' not in line: # your condition here
print line, # suppress new line but adjust for py3 - print(line, eol='') ?
When using inplace=True - the fileinput redirects stdout to be to the file currently opened. A backup of the file with a default '.bak' extension is created which may come in useful if needed.
jon#minerva:~$ cat testtext.txt
one
two
three
four
five
six
seven
eight
nine
ten
After running the above with a condition of not line.startswith('t'):
jon#minerva:~$ cat testtext.txt
one
four
five
six
seven
eight
nine
You're deleting everything when you open the file to write to it. You can't have an open read and write to a file at the same time. Use open(file,"r+") instead, and then save all the lines to another variable before writing anything.
You should not open the same file for reading and writing at the same time.
"w" means create a empty for writing. If the file already exists, its data will be deleted.
So you can use a different file name for writing.
I am new to Python programming...
I have a .txt file....... It looks like..
0,Salary,14000
0,Bonus,5000
0,gift,6000
I want to to replace the first '0' value to '1' in each line. How can I do this? Any one can help me.... With sample code..
Thanks in advance.
Nimmyliji
I know that you're asking about Python, but forgive me for suggesting that perhaps a different tool is better for the job. :) It's a one-liner via sed:
sed 's/^0,/1,/' yourtextfile.txt > output.txt
This applies the regex /^0,/ (which matches any 0, that occurs at the beginning of a line) to each line and replaces the matched text with 1, instead. The output is directed into the file output.txt specified.
inFile = open("old.txt", "r")
outFile = open("new.txt", "w")
for line in inFile:
outFile.write(",".join(["1"] + (line.split(","))[1:]))
inFile.close()
outFile.close()
If you would like something more general, take a look to Python csv module. It contains utilities for processing comma-separated values (abbreviated as csv) in files. But it can work with arbitrary delimiter, not only comma. So as you sample is obviously a csv file, you can use it as follows:
import csv
reader = csv.reader(open("old.txt"))
writer = csv.writer(open("new.txt", "w"))
writer.writerows(["1"] + line[1:] for line in reader)
To overwrite original file with new one:
import os
os.remove("old.txt")
os.rename("new.txt", "old.txt")
I think that writing to new file and then renaming it is more fault-tolerant and less likely corrupt your data than direct overwriting of source file. Imagine, that your program raised an exception while source file was already read to memory and reopened for writing. So you would lose original data and your new data wouldn't be saved because of program crash. In my case, I only lose new data while preserving original.
o=open("output.txt","w")
for line in open("file"):
s=line.split(",")
s[0]="1"
o.write(','.join(s))
o.close()
Or you can use fileinput with in place edit
import fileinput
for line in fileinput.FileInput("file",inplace=1):
s=line.split(",")
s[0]="1"
print ','.join(s)
f = open(filepath,'r')
data = f.readlines()
f.close()
edited = []
for line in data:
edited.append( '1'+line[1:] )
f = open(filepath,'w')
f.writelines(edited)
f.flush()
f.close()
Or in Python 2.5+:
with open(filepath,'r') as f:
data = f.readlines()
with open(outfilepath, 'w') as f:
for line in data:
f.write( '1' + line[1:] )
This should do it. I wouldn't recommend it for a truly big file though ;-)
What is going on (ex 1):
1: Open the file in read mode
2,3: Read all the lines into a list (each line is a separate index) and close the file.
4,5,6: Iterate over the list constructing a new list where each line has the first character replaced by a 1. The line[1:] slices the string from index 1 onward. We concatenate the 1 with the truncated list.
7,8,9: Reopen the file in write mode, write the list to the file (overwrite), flush the buffer, and close the file handle.
In Ex. 2:
I use the with statement that lets the file handle closing itself, but do essentially the same thing.