I have a piece of code that's removing some unwanted lines from a text file and writing the results to a new one:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write('%s\n' % line)
The original file is single-spaced, but the new file has an empty line between each line of text. How can I lose the empty lines?
You're not removing the newline from the input lines, so you shouldn't be adding one (\n) on output.
Either strip the newlines off the lines you read or don't add new ones as you write it out.
Just do:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write(line)
Every line that you read from original file has \n (new line) character at the end, so do not add another one (right now you are adding one, which means you actually introduce empty lines).
My guess is that the variable "line" already has a newline in it, but you're writing an additional newline with the g.write('%s*\n*' % line)
line has a newline at the end.
Remove the \n from your write, or rstrip line.
Related
I want to read a file and remove the spaces. I swear I've done this multiple times, but some reason the method I used to use doesn;t seem to be working. I must be making some small mistake somewhere, so I decided to make a small practice file (because the files I actually need to use are EXTREMELY LARGE) to find out.
the original file says:
abcdefg
(new line)
hijklmn
but I want it to say:
abcdefghijklmn
file = open('please work.txt', 'r')
for line in file:
lines = line.strip()
print(lines)
close.file()
However, it just says:
abcdefg
(new line)
hijklmn
and when I use line.strip('\n') it says:
abcdefg
(big new line)
hijklmn
Any help will be greatly appreciated, because this was the first thing I learned and suddenly I can't remember how to use it!
If what you want to do is to concatenate each line into a single line, you could utilize rstrip and concatenate to a result variable:
with open('test.txt', 'r') as fin:
lines = ''
for line in fin:
stripped_line = line.rstrip()
lines += stripped_line
print(lines)
From a text file looking like this:
abcdefg hijklmnop
this is a line
The result would be abcdefg hijklmnopthis is a line. If you did want to remove the whitespace as well you could lines = lines.replace(' ','') after the loop which would result in abcdefghijklmnopthisisaline.
The (new line) in your output is from the print, which will output a \n. you can use print(lines, end='') to remove it.
strip() only removes leading & trailing spaces.
You can use string.replace(' ', '') to remove all spaces.
'abcdefg (new line) hijklmn'.replace(' ', '')
If your file has tab newline or other forms of spaces, the above will not work and you will need to use regex to remove all forms of space in the file.
import re
string = '''this is a \n
test \t\t\t
\r
\v
'''
re.sub(r'\s', '', string)
#'thisisatest'
I have a file with content:
0x11111111
0x22222222
0x33333333
0x44444444
And I'm reading it line by line using:
f = open('test1', 'r')
print "Register CIP_REGS_CONTROL value:"
for i in range(4):
content = f.read(11)
f.seek(11, 1)
print content
Note that there're 11 bytes each line due to the '\n' char at the end. But the output is:
0x11111111
0x33333333
There's an empty line after the 1st and 3rd line, I don't know why it's like that. If I delete the '\n' in each line, and change the size of reading and seeking to 10, I got:
0x11111111
0x33333333
2 lines are also missing. Anybody can help? Thanks in advance.
Remove your seek call. Each call is skipping the next 11 bytes. That is read also moves the current position.
Two things:
You don't need to seek after the read. Your position in the file will already be at the next character after the call the read.
When you call print, it will add append a newline (\n) to your output.
The simplest (and safest - it ensures your file gets closed properly) way would be to use a with construct, and readline()
print "Register CIP_REGS_CONTROL value:"
with open('test1', 'r') as f:
for i in range(4):
print f.readline().strip()
strip() when called with no arguments removes all whitespace (which includes \n) from the beginning and end of a string.
Specifically I have exported a csv file from Google Adwords.
I read the file line by line and change the phone numbers.
Here is the literal script:
for line in open('ads.csv', 'r'):
newdata = changeNums(line)
sys.stdout.write(newdata)
And changeNums() just performs some string replaces and returns the string.
The problem is at the end of the printed newlines is a musical note.
The original CSV does not have this note at the end of lines. Also, I cannot copy-paste the note.
Is this some kind of encoding issue or what's going on?
Try opening with universal line support:
for line in open('ads.csv', 'rU'):
# etc
Either:
the original file has some characters on it (and they're being show as this symbol in the terminal)
changeNums is creating those characters
stdout.write is sending some non interpreted newline symbol, that again is being shown by the terminal as this symbol, change this line to a print(newdata)
My guess: changeNums is adding it.
Best debugging commands:
print([ord(x) for x in line])
print([ord(x) for x in newdata])
print line == newdata
And check for the character values present in the string.
You can strip out the newlines by:
for line in open('ads.csv', 'r'):
line = line.rstrip('\n')
newdata = changeNums(line)
sys.stdout.write(newdata)
An odd "note" character at the end is usually a CR/LF newline issue between *nix and *dos/*win environments.
I am trying to write a jython code for deleting spaces from Text file.I have a following scenario.
I have a text file like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
Here all lines are single lines.
But what i want is like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
So in all i want to start checking from second line till i encounter END and delete all spaces at tyhe end of each line.
Can someone guide me for writing this code??
tried like:
srcfile=open('d:/BUR001.txt','r')
trgtfile=open('d:/BUR002.txt','w')
readfile=srcfile.readline()
while readfile:
trgtfile.write(readfile.replace('\s',''))
readfile=srcfile.readline()
srcfile.close()
trgtfile.close()
Thanks,
Mahesh
You can use fact that those special lines starts with special values:
line = srcfile.readline()
while line:
line2 = line
if not line2.startswith('START') and not line2.startswith('END'):
line2 = line2.replace(' ','')
trgtfile.write(line2)
line = srcfile.readline()
Also note that with readline() result strings ends with \n (or are empty at end of input file), and this code removes all spaces from the line, not only those at end of the line.
If I understood your example all you want is to remove empty lines, so instead of reading file line by line read it at once:
content = srcfile.read()
and then remove empty lines from content:
while '\n\n' in content:
content = content.replace('\n\n', '\n')
I have a text file with almost a thousand lines such as:
WorldNews,Current
WorldNews,Current,WorldNews#here',
'WorldNewsPro#here Zebra,Poacher',
'Dock,DS_URLs#here'
Zebra,Poacher,ZebraPoacher#here
Zebra,Dock,ZebraDock#here
Timer33,Timer33#here
Sometimes the line ends without "#here" sometimes it ends with "#here" sometimes it has "#here" in the middle of the line and sometimes the line ends with "#here'"
I want to strip all the lines that do NOT have "#here" in them at all. I tried RegEx:
> (^(#here$))
> [\W](#here)
etc. with no luck.
How should I pull the lines with "#here" so my new file (or the output) has only:
WorldNews,Current,WorldNews#here',
'WorldNewsProfessional52#here
Zebra,Poacher',
'DocuShare,AC_DS_URLs#here'
Zebra,Poacher,ZebraPoacher#here
Zebra,DocuShare,ZebraDocushare#here
XNTimer,XNTimer#here
I was thinking it should read the whole line from start to end and if it has #here anywhere in the line, print it. If not, ignore and read the next line.
Thanks,
Adrian
Maybe this helps: (assuming filename is the name of your input file)
with open(filename) as stream:
for line in stream:
if '#here' in line:
print line
You dont need regex. You can use a string methods to do such simple filtering:
def hasstr( lines, s ):
# a generator expression can filter out the lines
return (line for line in lines if s in line)
# get all lines in the file with #here in them
filtered = hasstr(open(the_file, 'rt'), '#here')
You want the in operator.
for line in sys.stdin:
if '#here' in line:
sys.stdout.write(line)