Python . How to get rid of '\r' in string? - python

I have an excel file that I converted to a text file with a list of numbers.
test = 'filelocation.txt'
in_file = open(test,'r')
for line in in_file:
print line
1.026106236
1.660274766
2.686381002
4.346655769
7.033036771
1.137969254
a = []
for line in in_file:
a.append(line)
print a
'1.026106236\r1.660274766\r2.686381002\r4.346655769\r7.033036771\r1.137969254'
I wanted to assign each value (in each line) to an individual element in the list. Instead it is creating one element separated by \r . i'm not sure what \r is but why is putting these into the code ?
I think I know a way to get rid of the \r from the string but i want to fix the problem from the source

To accepts any of \r, \n, \r\n as a newline you could use 'U' (universal newline) file mode:
>>> open('test_newlines.txt', 'rb').read()
'a\rb\nc\r\nd'
>>> list(open('test_newlines.txt'))
['a\rb\n', 'c\r\n', 'd']
>>> list(open('test_newlines.txt', 'U'))
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').readlines()
['a\rb\n', 'c\r\n', 'd']
>>> open('test_newlines.txt', 'U').readlines()
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').read().split()
['a', 'b', 'c', 'd']
If you want to get a numeric (float) array from the file; see Reading file string into an array (In a pythonic way)

use rstrip() or rstrip('\r') if you're sure than the last character is always \r.
for line in in_file:
print line.rstrip()
help on str.rstrip():
S.rstrip([chars]) -> string or unicode
Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
str.strip() removes both trailing and leading whitespaces.

You can strip the carriage returns and newlines from the line by using strip()
line.strip()
i.e.
for line in in_file:
a.append(line.strip())
print a

To fix this do:
for line in in_file:
a.append(line.strip())

.strip() the lines to remove the whitespace that you don't need:
lines = []
with open('filelocation.txt', 'r') as handle:
for line in handle:
line = line.strip()
lines.append(line)
print line
print lines
Also, I'd advise that you use the with ... notation to open a file. It's cleaner and closes the file automatically.

First, I generally like #J.F. Sebastian's answer, but my use case is closer to Python 2.7.1: How to Open, Edit and Close a CSV file, since my string came from a text file was output from Excel as a csv and was furthermore input using the csv module. As indicated at that question:
as for the 'rU' vs 'rb' vs ..., csv files really should be binary so
use 'rb'. However, its not uncommon to have csv files from someone who
copied it into notepad on windows and later it was joined with some
other file so you have funky line endings. How you deal with that
depends on your file and your preference. – #kalhartt Jan 23 at 3:57
I'm going to stick with reading as 'rb' as recommended in the python docs. For now, I know that the \r inside a cell is a result of quirks of how I'm using Excel, so I'll just create a global option for replacing '\r' with something else, which for now will be '\n', but later could be '' (an empty string, not a double quote) with a simple json change.

Related

How to remove spaces from file to make one long line?

I want to read a file and remove the spaces. I swear I've done this multiple times, but some reason the method I used to use doesn;t seem to be working. I must be making some small mistake somewhere, so I decided to make a small practice file (because the files I actually need to use are EXTREMELY LARGE) to find out.
the original file says:
abcdefg
(new line)
hijklmn
but I want it to say:
abcdefghijklmn
file = open('please work.txt', 'r')
for line in file:
lines = line.strip()
print(lines)
close.file()
However, it just says:
abcdefg
(new line)
hijklmn
and when I use line.strip('\n') it says:
abcdefg
(big new line)
hijklmn
Any help will be greatly appreciated, because this was the first thing I learned and suddenly I can't remember how to use it!
If what you want to do is to concatenate each line into a single line, you could utilize rstrip and concatenate to a result variable:
with open('test.txt', 'r') as fin:
lines = ''
for line in fin:
stripped_line = line.rstrip()
lines += stripped_line
print(lines)
From a text file looking like this:
abcdefg hijklmnop
this is a line
The result would be abcdefg hijklmnopthis is a line. If you did want to remove the whitespace as well you could lines = lines.replace(' ','') after the loop which would result in abcdefghijklmnopthisisaline.
The (new line) in your output is from the print, which will output a \n. you can use print(lines, end='') to remove it.
strip() only removes leading & trailing spaces.
You can use string.replace(' ', '') to remove all spaces.
'abcdefg (new line) hijklmn'.replace(' ', '')
If your file has tab newline or other forms of spaces, the above will not work and you will need to use regex to remove all forms of space in the file.
import re
string = '''this is a \n
test \t\t\t
\r
\v
'''
re.sub(r'\s', '', string)
#'thisisatest'

Python code that replaces \t\t failing to replace \t\r

I always have replaced \t\t with \t999999999\t by coding like
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\t', '\t999999999\t'),
So I thought coding like the following will work for replacing \t\r with \t999999999\r
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\r', '\t999999999\r'),
But surprisingly it doesn't.
The input is tab-delimited txt.
Is \r something special that it can't be replaced in usual way? Then how can I replace it by python?
===Question edited====
I tried this.
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\n', '\t999999999\n'),
It works!
My input was separating lines by \r\n
Perhaps Python reads \r\n just as \n ?
Perhaps that's why it worked?
Does this code work if input separates lines by \r only?
\r is (part of) a line separator. Python normalises line separators when reading files in text mode, using only \n for lines; \r and \r\n are replaced by \n when reading.
Note: When using fileinput you need to strip the newline from line otherwise you end up with double newlines in your output, rather than use print ..,:
for line in fileinput.input(input, inplace = 1):
line = line.replace('\t\n', '\t999999999\n')
print line.rstrip('\n')
print .., adds an extra space to all your lines.

cropping off characters in python

I am new to Python and I have a .txt file containing numbers and I read them into an array in Python with the code below:
numberInput = []
with open('input.txt') as file:
numberInput = file.readlines()
print numberInput
Unfortunately, the output looks like this:
['54044\r\n', '14108\r\n', '79294\r\n', '29649\r\n', '25260\r\n', '60660\r\n', '2995\r\n', '53777\r\n', '49689\r\n', '9083\r\n', '16122\r\n', '90436\r\n', '4615\r\n', '40660\r\n', '25675\r\n', '58943\r\n', '92904\r\n', '9900\r\n', '95588\r\n', '46120']
How do I crop off the \r\n characters attached to each number in the array?
The \r\n you're seeing at the end of the strings is the newline indicator (a carriage return character followed by a newline character). You can easily remove it using str.strip:
numberInput = [line.strip() for line in file]
This is a list comprehension that iterates over your file (one line at a time) and strips off any whitespace found at either end of the line.
If you're wanting to use the numbers from the file as integers though, you can actually avoid stripping the lines, since the int constructor will ignore any whitespace. Here's how it would look if you did the conversion directly:
numberInput = [int(line) for line in file]
You should use str.splitlines() instead of readlines():
numberInput = []
with open('input.txt') as file:
numberInput = file.read().splitlines()
print numberInput
This read the whole file and splits it by "universal newlines" so you get the same list without \r\n.
See this question:
Best method for reading newline delimited files in Python and discarding the newlines?

Write a list to file containing text and hex values. How?

I need to write a list of values to a text file. Because of Windows, when I need to write a line feed character, windows does \n\r and other systems do \n.
It occurred to me that maybe I should write to file in binary.
How to I create a list like the following example and write to file in binary?
output = ['my first line', hex_character_for_line_feed_here, 'my_second_line']
How come the following does not work?
output = ['my first line', '\x0a', 'my second line']
Don't. Open the file in text mode and just let Python handle the newlines for you.
When you use the open() function you can set how Python should handle newlines with the newline keyword parameter:
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
So the default method is to write the correct line separator for your platform:
with open(outputfilename, 'w') as outputfile:
outputfile.write('\n'.join(output))
and does the right thing; on Windows \r\n characters are saved instead of \n.
If you specifically want to write \n only and not have Python translate these for you, use newline='':
with open(outputfilename, 'w', newline='') as outputfile:
outputfile.write('\n'.join(output))
Note that '\x0a' is exactly the same character as \n; \r is \x0d:
>>> '\x0a'
'\n'
>>> '\x0d'
'\r'
Create a text file, "myTextFile" in the same directory as your Python script. Then write something like:
# wb opens the file in "Write Binary" mode
myTextFile = open("myTextFile.txt", 'wb')
output = ['my first line', '369as3', 'my_second_line']
for member in output:
member.encode("utf-8") # Or whatever encoding you like =)
myTextFile.write(member + "\n")
This outputs a binary text file that looks like:
my first line
369as3
my_second_line
Edit: Updated for Python 3

Music note appended to newlines Python

Specifically I have exported a csv file from Google Adwords.
I read the file line by line and change the phone numbers.
Here is the literal script:
for line in open('ads.csv', 'r'):
newdata = changeNums(line)
sys.stdout.write(newdata)
And changeNums() just performs some string replaces and returns the string.
The problem is at the end of the printed newlines is a musical note.
The original CSV does not have this note at the end of lines. Also, I cannot copy-paste the note.
Is this some kind of encoding issue or what's going on?
Try opening with universal line support:
for line in open('ads.csv', 'rU'):
# etc
Either:
the original file has some characters on it (and they're being show as this symbol in the terminal)
changeNums is creating those characters
stdout.write is sending some non interpreted newline symbol, that again is being shown by the terminal as this symbol, change this line to a print(newdata)
My guess: changeNums is adding it.
Best debugging commands:
print([ord(x) for x in line])
print([ord(x) for x in newdata])
print line == newdata
And check for the character values present in the string.
You can strip out the newlines by:
for line in open('ads.csv', 'r'):
line = line.rstrip('\n')
newdata = changeNums(line)
sys.stdout.write(newdata)
An odd "note" character at the end is usually a CR/LF newline issue between *nix and *dos/*win environments.

Categories

Resources