This code, borrowed from another place on stackoverflow, removes all of the places that the csv has "None" written in it. However, it also adds an extra line to the csv. How can I change this code to remove that extra line? I think the problem is caused by inplace, but when I take inplace away the file is no longer altered by running the code.
def cleanOutputFile(filename):
for line in fileinput.FileInput(filename,inplace=1):
line = line.replace('None',"")
print line
Thanks!
If you want to replace all the None's:
with open(filename) as f:
lines = f.read().replace("None","")
with open(filename,"w") as f1:
f1.write(lines)
Using rstrip with fileinput should also work:
import fileinput
for line in fileinput.FileInput(fileinput,inplace=1):
print line.replace('None',"").rstrip() # remove newline character to avoid adding extra lines in the output
The problem here has nothing to do with fileinput, or with the replace.
Lines read from a file always end in a newline.
print adds a newline, even if the thing you're printing already ends with a newline.
You can see this without even a file involved:
>>> a = 'abc'
>>> print a
abc
>>> a = 'abc\n'
>>> print a
abc
>>>
The solution is any of the following:
rstrip the newlines off the input: print line.rstrip('\n') (or do the strip earlier in your processing)
Use the "magic comma" to prevent print from adding the newline: print line,
Use Python 3-style print with from __future__ import print_function, so you can use the more flexible keyword arguments: print(line, end='')
Use sys.stdout.write instead of print.
Completely reorganize your code so you're no longer writing to stdout at all, but instead writing directly to a temporary file, or reading the whole file into memory and then writing it back out, etc.
Related
I have a csv
"AA","AB","AC"
"BA","BB","BC"
"CA","CB","CC"
after removing a string say " the csv format changes to
AA,AB,AC
BA,BB,BC
CA,CB,CB
What should I do to avoid the unwanted lines ?
import fileinput
for line in fileinput.FileInput("test.csv",inplace=1):
line = line.replace('"','')
print (line)
Looks like you're printing it, looks like Python 3, and looks like your file content already includes the necessary newlines. Therefore, you need to tell the print() function not to add its own newlines:
print(line, end='')
When read each line includes the terminating new line character. Furthermore print() will also add a new line of it's own, so you end up with two new lines.
But you are not using strip() as suggested by your question's title.
To get around that you can use rstrip() to remove any whitespace at the end of each line:
import fileinput
for line in fileinput.FileInput("test.csv",inplace=1):
line = line.replace('"','').rstrip()
print (line)
That will get rid of the extra new line characters, but note that it will also remove other whitespace at the end of the line.
An alternative is to prevent print() adding its own new line:
Python 2:
print(line), # comma prevents new line
Python 3:
print(line, end='')
Why are you doing this? You should use csv module , it would handle both the , as well as the quotes for you. Example -
import csv
with fileinput.FileInput('test.csv',inplace=1) as f:
reader = csv.reader(f)
for row in reader:
print (','.join(row))
Example/Demo -
>>> import csv
>>> with fileinput.FileInput('test.csv',inplace=1) as f:
... reader = csv.reader(f)
... for row in reader:
... print(','.join(row))
...
AA,AB,AC
BA,BB,BC
CA,CB,CC
You are seeing extra lines because the lines read from the file end with '\n' and the print(line) statement appends an extra newline.
You can use rstrip() to strip out the trailing newline:
import fileinput
for line in fileinput.FileInput("test.csv",inplace=1):
line = line.rstrip().replace('"','')
print (line)
I have this text file:
>P1;3RVYA
sequence:: : : : :::-1.00:-1.00
MYLRITNIVESSFFTKFIIYLIVLNGITMGLETSKTFMQSFGVYTTLFNQIVITIFTIEIILR-IYVHRISFFKD
PWSLFDFFVVAISLVPTSS---GFEILRVLRVLRLFRLVTAVPQMRKIVSALISVIPGMLSVIALMTLFFYIFAI
MATQLFGERFP---------------------------------------------EWFGTLGESFYTLFQVMTL
ESWSMGIVRP-LMEVYPYAWVFFIPFIFVVTFVMINLVVAICVDAM*
>P1;Dominio1
sequence:: : : : :::-1.00:-1.00
EWPPFEYMILATIIANCIVLALEQH---LPDDDKTPMSERLDDTEPYFIGIFCFEAGIKIIALGFAFHKGSYLRN
GWNVMDFVVVLTGILATVGTEFDLRTLRAVRVLRPLKLVSGIPSLQVVLKSIMKAMIPLLQIGLLLFFAILIFAI
IGLEFYMGKFHTTCFEEGTDDIQGESPAPCGTEEPARTCPNGTKCQPYWEGPNNGITQFDNILFAVLTVFQCITM
EGWTDLLYNSNDASGNTWNWLYFIPLIIIGSFFMLNLVLGVLSGEF*
I need to replace the word "sequence", it is under of the word 3RVYA (only that)
I have this command:
a="3RVYA"
for line in file('%s'%I):
if a in line:
print line
But just printed "3RV", I need only print the next line, that have the word "sequence", I need it to replace "sequence" for "structure".
I'm beginner in python, so...Can somebody help me please?
Thanks so much
You appear to be trying to modify a text file in-place by selectively changing some parts of it with other strings of different length. This isn't really possible, because of the way filesystems work nowadays (that is, byte-by-byte). However, the stdlib module fileinput simulates it well (behind the curtains it writes a new file, then at end end atomically replaces the old file with the new one). So...:
import fileinput
replacing = False
for line in fileinput.input('thefile.txt', inplace=True):
if replacing and 'sequence' in line:
line = line.replace('sequence', 'structure')
replacing = False
elif '3RVYA' in line:
replacing = True
print line,
This is Python 2; in Python 3, the last line becomes, instead:
print(line, end='')
I have looked around StackOverflow and couldn't find an answer to my specific question so forgive me if I have missed something.
import re
target = open('output.txt', 'w')
for line in open('input.txt', 'r'):
match = re.search(r'Stuff', line)
if match:
match_text = match.group()
target.write(match_text + '\n')
else:
continue
target.close()
The file I am parsing is huge so need to process it line by line.
This (of course) leaves an additional newline at the end of the file.
How should I best change this code so that on the final iteration of the 'if match' loop it doesn't put the extra newline character at the end of the file. Should it look through the file again at the end and remove the last line (seems a bit inefficient though)?
The existing StackOverflow questions I have found cover removing all new lines from a file.
If there is a more pythonic / efficient way to write this code I would welcome suggestions for my own learning also.
Thanks for the help!
Another thing you can do, is to truncate the file. .tell() gives us the current byte number in the file. We then subtract one, and truncate it there to remove the trailing newline.
with open('a.txt', 'w') as f:
f.write('abc\n')
f.write('def\n')
f.truncate(f.tell()-1)
On Linux and MacOS, the -1 is correct, but on Windows it needs to be -2. A more Pythonic method of determining which is to check os.linesep.
import os
remove_chars = len(os.linesep)
with open('a.txt', 'w') as f:
f.write('abc\n')
f.write('def\n')
f.truncate(f.tell() - remove_chars)
kindal's answer is also valid, with the exception that you said it's a large file. This method will let you handle a terabyte sized file on a gigabyte of RAM.
Write the newline of each line at the beginning of the next line. To avoid writing a newline at the beginning of the first line, use a variable that is initialized to an empty string and then set to a newline in the loop.
import re
with open('input.txt') as source, open('output.txt', 'w') as target:
newline = ''
for line in source:
match = re.search(r'Stuff', line)
if match:
target.write(newline + match.group())
newline = '\n'
I also restructured your code a bit (the else: continue is not needed, because what else is the loop going to do?) and changed it to use the with statement so the files are automatically closed.
The shortest path from what you have to what you want is probably to store the results in a list, then join the list with newlines and write that to the file.
import re
target = open('output.txt', 'w')
results = []
for line in open('input.txt', 'r'):
match = re.search(r'Stuff', line)
if match:
results.append(match.group())
target.write("\n".join(results))
target.close()
VoilĂ , no extra newline at the beginning or end. Might not scale very well of the resulting list is huge. (And like kindall I left out the else)
Since you're performing the same regex over and over, you'd probably want to compile it beforehand.
import re
prog = re.compile(r'Stuff')
I tend to input from and output to stdin and stdout for simplicity. But that's a matter of taste (and specs).
from sys import stdin, stdout
Ignoring the specific requirement about removing the final EOL[1], and just addressing the bit about your own learning, the whole thing could be written like this:
from itertools import imap
stdout.writelines(match.group() for match in imap(prog.match, stdin) if match)
[1] As others have commented, this is a Bad Thing, and it's extremely annoying when someone does this.
Specifically I have exported a csv file from Google Adwords.
I read the file line by line and change the phone numbers.
Here is the literal script:
for line in open('ads.csv', 'r'):
newdata = changeNums(line)
sys.stdout.write(newdata)
And changeNums() just performs some string replaces and returns the string.
The problem is at the end of the printed newlines is a musical note.
The original CSV does not have this note at the end of lines. Also, I cannot copy-paste the note.
Is this some kind of encoding issue or what's going on?
Try opening with universal line support:
for line in open('ads.csv', 'rU'):
# etc
Either:
the original file has some characters on it (and they're being show as this symbol in the terminal)
changeNums is creating those characters
stdout.write is sending some non interpreted newline symbol, that again is being shown by the terminal as this symbol, change this line to a print(newdata)
My guess: changeNums is adding it.
Best debugging commands:
print([ord(x) for x in line])
print([ord(x) for x in newdata])
print line == newdata
And check for the character values present in the string.
You can strip out the newlines by:
for line in open('ads.csv', 'r'):
line = line.rstrip('\n')
newdata = changeNums(line)
sys.stdout.write(newdata)
An odd "note" character at the end is usually a CR/LF newline issue between *nix and *dos/*win environments.
i have some data stored in a .txt file in this format:
----------|||||||||||||||||||||||||-----------|||||||||||
1029450386abcdefghijklmnopqrstuvwxy0293847719184756301943
1020414646canBeFollowedBySpaces 3292532113435532419963
don't ask...
i have many lines of this, and i need a way to add more digits to the end of a particular line.
i've written code to find the line i want, but im stumped as to how to add 11 characters to the end of it. i've looked around, this site has been helpful with some other issues i've run into, but i can't seem to find what i need for this.
it is important that the line retain its position in the file, and its contents in their current order.
using python3.1, how would you turn this:
1020414646canBeFollowedBySpaces 3292532113435532419963
into
1020414646canBeFollowedBySpaces 329253211343553241996301846372998
As a general principle, there's no shortcut to "inserting" new data in the middle of a text file. You will need to make a copy of the entire original file in a new file, modifying your desired line(s) of text on the way.
For example:
with open("input.txt") as infile:
with open("output.txt", "w") as outfile:
for s in infile:
s = s.rstrip() # remove trailing newline
if "target" in s:
s += "0123456789"
print(s, file=outfile)
os.rename("input.txt", "input.txt.original")
os.rename("output.txt", "input.txt")
Check out the fileinput module, it can do sort of "inplace" edits with files. though I believe temporary files are still involved in the internal process.
import fileinput
for line in fileinput.input('input.txt', inplace=1, backup='.orig'):
if line.startswith('1020414646canBeFollowedBySpaces'):
line = line.rstrip() + '01846372998' '\n'
print(line, end='')
The print now prints to the file instead of the console.
You might want to back up your original file before editing.
target_chain = '1020414646canBeFollowedBySpaces 3292532113435532419963'
to_add = '01846372998'
with open('zaza.txt','rb+') as f:
ch = f.read()
x = ch.find(target_chain)
f.seek(x + len(target_chain),0)
f.write(to_add)
f.write(ch[x + len(target_chain):])
In this method it's absolutely obligatory to open the file in binary mode 'b' for some reason linked to the treatment of the end of lines by Python (see Universal Newline, enabled by default)
The mode 'r+' is to allow the writing as well as the reading
In this method, what is before the target_chain in the file remains untouched. And what is after the target_chain is shifted ahead. As said by Greg Hewgill, there is no possibility to move apart bits on a hard drisk to insert new bits in the middle.
Evidently, if the file is very big, reading all of its content in ch could be too much memory consuming and the algorithm should then be changed: reading line after line until the line containing the target_chain, and then reading the next line before inserting, and then continuing to do "reading the next line - re-writing on the current line" until the end of the file in order to shift progressively the content from the line concerned with addition.
You see what I mean...
Copy the file, line by line, to another file. When you get to the line that needs extra chars then add them before writing.