Python code that replaces \t\t failing to replace \t\r - python

I always have replaced \t\t with \t999999999\t by coding like
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\t', '\t999999999\t'),
So I thought coding like the following will work for replacing \t\r with \t999999999\r
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\r', '\t999999999\r'),
But surprisingly it doesn't.
The input is tab-delimited txt.
Is \r something special that it can't be replaced in usual way? Then how can I replace it by python?
===Question edited====
I tried this.
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\n', '\t999999999\n'),
It works!
My input was separating lines by \r\n
Perhaps Python reads \r\n just as \n ?
Perhaps that's why it worked?
Does this code work if input separates lines by \r only?

\r is (part of) a line separator. Python normalises line separators when reading files in text mode, using only \n for lines; \r and \r\n are replaced by \n when reading.
Note: When using fileinput you need to strip the newline from line otherwise you end up with double newlines in your output, rather than use print ..,:
for line in fileinput.input(input, inplace = 1):
line = line.replace('\t\n', '\t999999999\n')
print line.rstrip('\n')
print .., adds an extra space to all your lines.

Related

How to remove spaces from file to make one long line?

I want to read a file and remove the spaces. I swear I've done this multiple times, but some reason the method I used to use doesn;t seem to be working. I must be making some small mistake somewhere, so I decided to make a small practice file (because the files I actually need to use are EXTREMELY LARGE) to find out.
the original file says:
abcdefg
(new line)
hijklmn
but I want it to say:
abcdefghijklmn
file = open('please work.txt', 'r')
for line in file:
lines = line.strip()
print(lines)
close.file()
However, it just says:
abcdefg
(new line)
hijklmn
and when I use line.strip('\n') it says:
abcdefg
(big new line)
hijklmn
Any help will be greatly appreciated, because this was the first thing I learned and suddenly I can't remember how to use it!
If what you want to do is to concatenate each line into a single line, you could utilize rstrip and concatenate to a result variable:
with open('test.txt', 'r') as fin:
lines = ''
for line in fin:
stripped_line = line.rstrip()
lines += stripped_line
print(lines)
From a text file looking like this:
abcdefg hijklmnop
this is a line
The result would be abcdefg hijklmnopthis is a line. If you did want to remove the whitespace as well you could lines = lines.replace(' ','') after the loop which would result in abcdefghijklmnopthisisaline.
The (new line) in your output is from the print, which will output a \n. you can use print(lines, end='') to remove it.
strip() only removes leading & trailing spaces.
You can use string.replace(' ', '') to remove all spaces.
'abcdefg (new line) hijklmn'.replace(' ', '')
If your file has tab newline or other forms of spaces, the above will not work and you will need to use regex to remove all forms of space in the file.
import re
string = '''this is a \n
test \t\t\t
\r
\v
'''
re.sub(r'\s', '', string)
#'thisisatest'

Write to file does not go to the new line when '\n' in string

I have a .txt file which contains only one line of text. For example:
command1;\ncommand2, output;\ncommand3\ncommand4, output;\n (but much longer). Since it is hard to read, I want to change this file to some more readable version. I want to remove all ';' and replace '\n' with a new line.
I have few working solutions for this problem:
For example I could remove all '\n' and use print function. Or, replace \\n with \n:
def clean_file(file):
# read file
with open(file) as f:
content = f.readline()
# get rid of ';' and '\n'
content = content.split(';')
for ind, val in enumerate(content):
content[ind] = val.replace('\\n', '\n') # it can be also replace(r'\n', '\n')
# write to file
with open(file, 'w') as f:
for line in content:
f.write(line)
OUT:
command1
command2, output
command3
command4, output
And in this scenario, it works properly!
But I have no idea why it is not working when I remove replace part:
def clean_file(file):
# read file
with open(file) as f:
content = f.readline()
# get rid of ';'
content = content.split(';')
# write to file
with open(file, 'w') as f:
for line in content:
f.write(line)
OUT:
command1\ncommand2, output\ncommand3\ncommand4, output\n
This will print everything in one line.
Can someone explain to me why I have to replace '\n' with the same value?
The file was created, and I am opening it on windows, but the script I am running on Linux.
Most editors in the Windows world (starting with notepad) require \r\n to correctly display an end of line and ignore \n alone. On the other hand, on Linux a single \n is enough for an end of line. If you run a Python script on Windows, it will be smart enough to automatically replace any '\n' with a \r\n at write time and symetrically replace \r\n from a file with a single \n provided the file is opened in text mode. But nothing of that will happen on Linux.
Long story short, text files have different end of lines on Linux and Windows, and text files having \r\n are known as dos text files on Linux.
You have probably been caught by that, and the only way to be sure is to open the file in binary mode and display the byte values (in hex to be more readable for people used to ASCII code)
You are not replacing the same value, you are removing the \ before \n. When handling strings a backslash often means that you have a fancy character (such as newline \n, tab \t, etc..), BUT sometimes you want to print an actual backslash! To do this in python we use \\ to add in a single backslash.
So, when printing out in your first example, python comes up to \n and thinks "new line", in your second example python sees \\n so the first two \ mean print a backslash, then the n is treated and printed like a normal n

Music note appended to newlines Python

Specifically I have exported a csv file from Google Adwords.
I read the file line by line and change the phone numbers.
Here is the literal script:
for line in open('ads.csv', 'r'):
newdata = changeNums(line)
sys.stdout.write(newdata)
And changeNums() just performs some string replaces and returns the string.
The problem is at the end of the printed newlines is a musical note.
The original CSV does not have this note at the end of lines. Also, I cannot copy-paste the note.
Is this some kind of encoding issue or what's going on?
Try opening with universal line support:
for line in open('ads.csv', 'rU'):
# etc
Either:
the original file has some characters on it (and they're being show as this symbol in the terminal)
changeNums is creating those characters
stdout.write is sending some non interpreted newline symbol, that again is being shown by the terminal as this symbol, change this line to a print(newdata)
My guess: changeNums is adding it.
Best debugging commands:
print([ord(x) for x in line])
print([ord(x) for x in newdata])
print line == newdata
And check for the character values present in the string.
You can strip out the newlines by:
for line in open('ads.csv', 'r'):
line = line.rstrip('\n')
newdata = changeNums(line)
sys.stdout.write(newdata)
An odd "note" character at the end is usually a CR/LF newline issue between *nix and *dos/*win environments.

Python . How to get rid of '\r' in string?

I have an excel file that I converted to a text file with a list of numbers.
test = 'filelocation.txt'
in_file = open(test,'r')
for line in in_file:
print line
1.026106236
1.660274766
2.686381002
4.346655769
7.033036771
1.137969254
a = []
for line in in_file:
a.append(line)
print a
'1.026106236\r1.660274766\r2.686381002\r4.346655769\r7.033036771\r1.137969254'
I wanted to assign each value (in each line) to an individual element in the list. Instead it is creating one element separated by \r . i'm not sure what \r is but why is putting these into the code ?
I think I know a way to get rid of the \r from the string but i want to fix the problem from the source
To accepts any of \r, \n, \r\n as a newline you could use 'U' (universal newline) file mode:
>>> open('test_newlines.txt', 'rb').read()
'a\rb\nc\r\nd'
>>> list(open('test_newlines.txt'))
['a\rb\n', 'c\r\n', 'd']
>>> list(open('test_newlines.txt', 'U'))
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').readlines()
['a\rb\n', 'c\r\n', 'd']
>>> open('test_newlines.txt', 'U').readlines()
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').read().split()
['a', 'b', 'c', 'd']
If you want to get a numeric (float) array from the file; see Reading file string into an array (In a pythonic way)
use rstrip() or rstrip('\r') if you're sure than the last character is always \r.
for line in in_file:
print line.rstrip()
help on str.rstrip():
S.rstrip([chars]) -> string or unicode
Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
str.strip() removes both trailing and leading whitespaces.
You can strip the carriage returns and newlines from the line by using strip()
line.strip()
i.e.
for line in in_file:
a.append(line.strip())
print a
To fix this do:
for line in in_file:
a.append(line.strip())
.strip() the lines to remove the whitespace that you don't need:
lines = []
with open('filelocation.txt', 'r') as handle:
for line in handle:
line = line.strip()
lines.append(line)
print line
print lines
Also, I'd advise that you use the with ... notation to open a file. It's cleaner and closes the file automatically.
First, I generally like #J.F. Sebastian's answer, but my use case is closer to Python 2.7.1: How to Open, Edit and Close a CSV file, since my string came from a text file was output from Excel as a csv and was furthermore input using the csv module. As indicated at that question:
as for the 'rU' vs 'rb' vs ..., csv files really should be binary so
use 'rb'. However, its not uncommon to have csv files from someone who
copied it into notepad on windows and later it was joined with some
other file so you have funky line endings. How you deal with that
depends on your file and your preference. – #kalhartt Jan 23 at 3:57
I'm going to stick with reading as 'rb' as recommended in the python docs. For now, I know that the \r inside a cell is a result of quirks of how I'm using Excel, so I'll just create a global option for replacing '\r' with something else, which for now will be '\n', but later could be '' (an empty string, not a double quote) with a simple json change.

Writing to file with unwanted empty lines

I have a piece of code that's removing some unwanted lines from a text file and writing the results to a new one:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write('%s\n' % line)
The original file is single-spaced, but the new file has an empty line between each line of text. How can I lose the empty lines?
You're not removing the newline from the input lines, so you shouldn't be adding one (\n) on output.
Either strip the newlines off the lines you read or don't add new ones as you write it out.
Just do:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write(line)
Every line that you read from original file has \n (new line) character at the end, so do not add another one (right now you are adding one, which means you actually introduce empty lines).
My guess is that the variable "line" already has a newline in it, but you're writing an additional newline with the g.write('%s*\n*' % line)
line has a newline at the end.
Remove the \n from your write, or rstrip line.

Categories

Resources