Specifically I have exported a csv file from Google Adwords.
I read the file line by line and change the phone numbers.
Here is the literal script:
for line in open('ads.csv', 'r'):
newdata = changeNums(line)
sys.stdout.write(newdata)
And changeNums() just performs some string replaces and returns the string.
The problem is at the end of the printed newlines is a musical note.
The original CSV does not have this note at the end of lines. Also, I cannot copy-paste the note.
Is this some kind of encoding issue or what's going on?
Try opening with universal line support:
for line in open('ads.csv', 'rU'):
# etc
Either:
the original file has some characters on it (and they're being show as this symbol in the terminal)
changeNums is creating those characters
stdout.write is sending some non interpreted newline symbol, that again is being shown by the terminal as this symbol, change this line to a print(newdata)
My guess: changeNums is adding it.
Best debugging commands:
print([ord(x) for x in line])
print([ord(x) for x in newdata])
print line == newdata
And check for the character values present in the string.
You can strip out the newlines by:
for line in open('ads.csv', 'r'):
line = line.rstrip('\n')
newdata = changeNums(line)
sys.stdout.write(newdata)
An odd "note" character at the end is usually a CR/LF newline issue between *nix and *dos/*win environments.
Related
I have a .txt file which contains only one line of text. For example:
command1;\ncommand2, output;\ncommand3\ncommand4, output;\n (but much longer). Since it is hard to read, I want to change this file to some more readable version. I want to remove all ';' and replace '\n' with a new line.
I have few working solutions for this problem:
For example I could remove all '\n' and use print function. Or, replace \\n with \n:
def clean_file(file):
# read file
with open(file) as f:
content = f.readline()
# get rid of ';' and '\n'
content = content.split(';')
for ind, val in enumerate(content):
content[ind] = val.replace('\\n', '\n') # it can be also replace(r'\n', '\n')
# write to file
with open(file, 'w') as f:
for line in content:
f.write(line)
OUT:
command1
command2, output
command3
command4, output
And in this scenario, it works properly!
But I have no idea why it is not working when I remove replace part:
def clean_file(file):
# read file
with open(file) as f:
content = f.readline()
# get rid of ';'
content = content.split(';')
# write to file
with open(file, 'w') as f:
for line in content:
f.write(line)
OUT:
command1\ncommand2, output\ncommand3\ncommand4, output\n
This will print everything in one line.
Can someone explain to me why I have to replace '\n' with the same value?
The file was created, and I am opening it on windows, but the script I am running on Linux.
Most editors in the Windows world (starting with notepad) require \r\n to correctly display an end of line and ignore \n alone. On the other hand, on Linux a single \n is enough for an end of line. If you run a Python script on Windows, it will be smart enough to automatically replace any '\n' with a \r\n at write time and symetrically replace \r\n from a file with a single \n provided the file is opened in text mode. But nothing of that will happen on Linux.
Long story short, text files have different end of lines on Linux and Windows, and text files having \r\n are known as dos text files on Linux.
You have probably been caught by that, and the only way to be sure is to open the file in binary mode and display the byte values (in hex to be more readable for people used to ASCII code)
You are not replacing the same value, you are removing the \ before \n. When handling strings a backslash often means that you have a fancy character (such as newline \n, tab \t, etc..), BUT sometimes you want to print an actual backslash! To do this in python we use \\ to add in a single backslash.
So, when printing out in your first example, python comes up to \n and thinks "new line", in your second example python sees \\n so the first two \ mean print a backslash, then the n is treated and printed like a normal n
I always have replaced \t\t with \t999999999\t by coding like
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\t', '\t999999999\t'),
So I thought coding like the following will work for replacing \t\r with \t999999999\r
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\r', '\t999999999\r'),
But surprisingly it doesn't.
The input is tab-delimited txt.
Is \r something special that it can't be replaced in usual way? Then how can I replace it by python?
===Question edited====
I tried this.
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\n', '\t999999999\n'),
It works!
My input was separating lines by \r\n
Perhaps Python reads \r\n just as \n ?
Perhaps that's why it worked?
Does this code work if input separates lines by \r only?
\r is (part of) a line separator. Python normalises line separators when reading files in text mode, using only \n for lines; \r and \r\n are replaced by \n when reading.
Note: When using fileinput you need to strip the newline from line otherwise you end up with double newlines in your output, rather than use print ..,:
for line in fileinput.input(input, inplace = 1):
line = line.replace('\t\n', '\t999999999\n')
print line.rstrip('\n')
print .., adds an extra space to all your lines.
I have an excel file that I converted to a text file with a list of numbers.
test = 'filelocation.txt'
in_file = open(test,'r')
for line in in_file:
print line
1.026106236
1.660274766
2.686381002
4.346655769
7.033036771
1.137969254
a = []
for line in in_file:
a.append(line)
print a
'1.026106236\r1.660274766\r2.686381002\r4.346655769\r7.033036771\r1.137969254'
I wanted to assign each value (in each line) to an individual element in the list. Instead it is creating one element separated by \r . i'm not sure what \r is but why is putting these into the code ?
I think I know a way to get rid of the \r from the string but i want to fix the problem from the source
To accepts any of \r, \n, \r\n as a newline you could use 'U' (universal newline) file mode:
>>> open('test_newlines.txt', 'rb').read()
'a\rb\nc\r\nd'
>>> list(open('test_newlines.txt'))
['a\rb\n', 'c\r\n', 'd']
>>> list(open('test_newlines.txt', 'U'))
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').readlines()
['a\rb\n', 'c\r\n', 'd']
>>> open('test_newlines.txt', 'U').readlines()
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').read().split()
['a', 'b', 'c', 'd']
If you want to get a numeric (float) array from the file; see Reading file string into an array (In a pythonic way)
use rstrip() or rstrip('\r') if you're sure than the last character is always \r.
for line in in_file:
print line.rstrip()
help on str.rstrip():
S.rstrip([chars]) -> string or unicode
Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
str.strip() removes both trailing and leading whitespaces.
You can strip the carriage returns and newlines from the line by using strip()
line.strip()
i.e.
for line in in_file:
a.append(line.strip())
print a
To fix this do:
for line in in_file:
a.append(line.strip())
.strip() the lines to remove the whitespace that you don't need:
lines = []
with open('filelocation.txt', 'r') as handle:
for line in handle:
line = line.strip()
lines.append(line)
print line
print lines
Also, I'd advise that you use the with ... notation to open a file. It's cleaner and closes the file automatically.
First, I generally like #J.F. Sebastian's answer, but my use case is closer to Python 2.7.1: How to Open, Edit and Close a CSV file, since my string came from a text file was output from Excel as a csv and was furthermore input using the csv module. As indicated at that question:
as for the 'rU' vs 'rb' vs ..., csv files really should be binary so
use 'rb'. However, its not uncommon to have csv files from someone who
copied it into notepad on windows and later it was joined with some
other file so you have funky line endings. How you deal with that
depends on your file and your preference. – #kalhartt Jan 23 at 3:57
I'm going to stick with reading as 'rb' as recommended in the python docs. For now, I know that the \r inside a cell is a result of quirks of how I'm using Excel, so I'll just create a global option for replacing '\r' with something else, which for now will be '\n', but later could be '' (an empty string, not a double quote) with a simple json change.
I have a piece of code that's removing some unwanted lines from a text file and writing the results to a new one:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write('%s\n' % line)
The original file is single-spaced, but the new file has an empty line between each line of text. How can I lose the empty lines?
You're not removing the newline from the input lines, so you shouldn't be adding one (\n) on output.
Either strip the newlines off the lines you read or don't add new ones as you write it out.
Just do:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write(line)
Every line that you read from original file has \n (new line) character at the end, so do not add another one (right now you are adding one, which means you actually introduce empty lines).
My guess is that the variable "line" already has a newline in it, but you're writing an additional newline with the g.write('%s*\n*' % line)
line has a newline at the end.
Remove the \n from your write, or rstrip line.
I am trying to write a jython code for deleting spaces from Text file.I have a following scenario.
I have a text file like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
Here all lines are single lines.
But what i want is like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
So in all i want to start checking from second line till i encounter END and delete all spaces at tyhe end of each line.
Can someone guide me for writing this code??
tried like:
srcfile=open('d:/BUR001.txt','r')
trgtfile=open('d:/BUR002.txt','w')
readfile=srcfile.readline()
while readfile:
trgtfile.write(readfile.replace('\s',''))
readfile=srcfile.readline()
srcfile.close()
trgtfile.close()
Thanks,
Mahesh
You can use fact that those special lines starts with special values:
line = srcfile.readline()
while line:
line2 = line
if not line2.startswith('START') and not line2.startswith('END'):
line2 = line2.replace(' ','')
trgtfile.write(line2)
line = srcfile.readline()
Also note that with readline() result strings ends with \n (or are empty at end of input file), and this code removes all spaces from the line, not only those at end of the line.
If I understood your example all you want is to remove empty lines, so instead of reading file line by line read it at once:
content = srcfile.read()
and then remove empty lines from content:
while '\n\n' in content:
content = content.replace('\n\n', '\n')