cropping off characters in python - python

I am new to Python and I have a .txt file containing numbers and I read them into an array in Python with the code below:
numberInput = []
with open('input.txt') as file:
numberInput = file.readlines()
print numberInput
Unfortunately, the output looks like this:
['54044\r\n', '14108\r\n', '79294\r\n', '29649\r\n', '25260\r\n', '60660\r\n', '2995\r\n', '53777\r\n', '49689\r\n', '9083\r\n', '16122\r\n', '90436\r\n', '4615\r\n', '40660\r\n', '25675\r\n', '58943\r\n', '92904\r\n', '9900\r\n', '95588\r\n', '46120']
How do I crop off the \r\n characters attached to each number in the array?

The \r\n you're seeing at the end of the strings is the newline indicator (a carriage return character followed by a newline character). You can easily remove it using str.strip:
numberInput = [line.strip() for line in file]
This is a list comprehension that iterates over your file (one line at a time) and strips off any whitespace found at either end of the line.
If you're wanting to use the numbers from the file as integers though, you can actually avoid stripping the lines, since the int constructor will ignore any whitespace. Here's how it would look if you did the conversion directly:
numberInput = [int(line) for line in file]

You should use str.splitlines() instead of readlines():
numberInput = []
with open('input.txt') as file:
numberInput = file.read().splitlines()
print numberInput
This read the whole file and splits it by "universal newlines" so you get the same list without \r\n.
See this question:
Best method for reading newline delimited files in Python and discarding the newlines?

Related

CSV.Reader importing a list of lists

I am running the following on a csv of UIDs:
with open('C:/uid_sample.csv',newline='') as f:
reader = csv.reader(f,delimiter=' ')
uidlist = list(reader)
but the list returned is actually a list of lists:
[['27465307'], ['27459855'], ['27451353']...]
I'm using this workaround to get individual strings within one list:
for r in reader:
print(' '.join(r))
i.e.
['27465307','27459855','27451353',...]
Am I missing something where I can't do this automatically with the csv.reader or is there an issue with the formatting of my csv perhaps?
A CSV file is a file where each line, or row, contains columns that are usually delimited by commas. In your case, you told csv.reader() that your columns are delimited by a space. Since there aren't any spaces in any of the lines, each row of the csv.reader object has only one item. The problem here is that you aren't looking for a row with a single column; you are looking for a single item.
Really, you just want a list of the lines in the file. You could use f.readlines(), but that would include the newline character in each line. That actually isn't a problem if all you need to do with each line is convert it to an integer, but you might want to remove those characters. That can be done quite easily with a list comprehension:
newlist = [line.strip() for line in f]
If you are merely iterating through the lines (with afor loop, for example), you probably don't need a list. If you don't mind the newline characters, you can iterate through the file object directly:
for line in f:
uid = int(line)
print(uid)
If the newline characters need to go, you could either take them out per line:
for line in f:
line = line.strip()
...
or create a generator object:
uids = (line.strip() for line in f)
Note that reading a file is like reading a book: you can't read it again until you turn back to the first page, so remember to use f.seek(0) if you want to read the file more than once.

List the first words per line from a text file in Python

I need to select the first word on each line and make a list from them from a text file:
I would copy the text but it's the formatting is quite screwed up. will try
All the other text is unnecessary.
I have tried
string=[]
for line in f:
String.append(line.split(None, 1)[0]) # add only first word
from another solution, but it keeps returning a "Index out of bounds" error.
I can get the first word from the first line using string=text.partition(' ')[0]
but I do not know how to repeat this for the other lines.
I am still new to python and to the site, I hope my formatting is bearable! (when opened, I encode the text to accept symbols, like so
wikitxt=open('racinesPrefixesSuffixes.txt', 'r', encoding='utf-8')
could this be the issue?)
The reason it's raising an IndexError is because the specific line is empty.
You can do this:
words = []
for line in f:
if line.strip():
words.append(line.split(maxsplit=1)[0])
Here line.strip() is checking if the line consists of only whitespace. If it does only consist of whitespace, it will simply skip the line.
Or, if you like list comprehension:
words = [line.split(maxsplit=1)[0] for line in f if line.strip()]

Keep the new line symbols in the string when writing in a text file Python

I have a list of strings, and some of the strings contain '\n's. I want to write this list of strings into a text file and then later on read it back and store it to a list by using readlines(). I have to keep the original text; meaning not removing the new lines from the text.
If I don't remove all these new lines then of course readlines() will return a larger number of strings than the original list.
How can I achieve this? Or there's really no way and I should write in other formats instead. Thanks.
The following:
from __future__ import print_function
strings = ["asd", "sdf\n", "dfg"]
with open("output.txt", "w") as out_file:
for string in strings:
print(repr(string), file=out_file)
with open("output.txt") as in_file:
for line in in_file:
print(line.strip())
prints
'asd'
'sdf\n'
'dfg'
To print it normally (without the quotes), you can use ast.literal_eval: print(ast.literal_eval(line.strip()))

Python . How to get rid of '\r' in string?

I have an excel file that I converted to a text file with a list of numbers.
test = 'filelocation.txt'
in_file = open(test,'r')
for line in in_file:
print line
1.026106236
1.660274766
2.686381002
4.346655769
7.033036771
1.137969254
a = []
for line in in_file:
a.append(line)
print a
'1.026106236\r1.660274766\r2.686381002\r4.346655769\r7.033036771\r1.137969254'
I wanted to assign each value (in each line) to an individual element in the list. Instead it is creating one element separated by \r . i'm not sure what \r is but why is putting these into the code ?
I think I know a way to get rid of the \r from the string but i want to fix the problem from the source
To accepts any of \r, \n, \r\n as a newline you could use 'U' (universal newline) file mode:
>>> open('test_newlines.txt', 'rb').read()
'a\rb\nc\r\nd'
>>> list(open('test_newlines.txt'))
['a\rb\n', 'c\r\n', 'd']
>>> list(open('test_newlines.txt', 'U'))
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').readlines()
['a\rb\n', 'c\r\n', 'd']
>>> open('test_newlines.txt', 'U').readlines()
['a\n', 'b\n', 'c\n', 'd']
>>> open('test_newlines.txt').read().split()
['a', 'b', 'c', 'd']
If you want to get a numeric (float) array from the file; see Reading file string into an array (In a pythonic way)
use rstrip() or rstrip('\r') if you're sure than the last character is always \r.
for line in in_file:
print line.rstrip()
help on str.rstrip():
S.rstrip([chars]) -> string or unicode
Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
str.strip() removes both trailing and leading whitespaces.
You can strip the carriage returns and newlines from the line by using strip()
line.strip()
i.e.
for line in in_file:
a.append(line.strip())
print a
To fix this do:
for line in in_file:
a.append(line.strip())
.strip() the lines to remove the whitespace that you don't need:
lines = []
with open('filelocation.txt', 'r') as handle:
for line in handle:
line = line.strip()
lines.append(line)
print line
print lines
Also, I'd advise that you use the with ... notation to open a file. It's cleaner and closes the file automatically.
First, I generally like #J.F. Sebastian's answer, but my use case is closer to Python 2.7.1: How to Open, Edit and Close a CSV file, since my string came from a text file was output from Excel as a csv and was furthermore input using the csv module. As indicated at that question:
as for the 'rU' vs 'rb' vs ..., csv files really should be binary so
use 'rb'. However, its not uncommon to have csv files from someone who
copied it into notepad on windows and later it was joined with some
other file so you have funky line endings. How you deal with that
depends on your file and your preference. – #kalhartt Jan 23 at 3:57
I'm going to stick with reading as 'rb' as recommended in the python docs. For now, I know that the \r inside a cell is a result of quirks of how I'm using Excel, so I'll just create a global option for replacing '\r' with something else, which for now will be '\n', but later could be '' (an empty string, not a double quote) with a simple json change.

blank lines in file after sorting content of a text file in python

I have this small script that sorts the content of a text file
# The built-in function `open` opens a file and returns a file object.
# Read mode opens a file for reading only.
try:
f = open("tracks.txt", "r")
try:
# Read the entire contents of a file at once.
# string = f.read()
# OR read one line at a time.
#line = f.readline()
# OR read all the lines into a list.
lines = f.readlines()
lines.sort()
f.close()
f = open('tracks.txt', 'w')
f.writelines(lines) # Write a sequence of strings to a file
finally:
f.close()
except IOError:
pass
the only problem is that the text is displayed at the bottom of the text file everytime it's sortened...
I assume it also sorts the blank lines...anybody knows why?
and maybe can you suggest some tips on how to avoid this happening?
thanks in advance
An "empty" line read from a text file is represented in Python by a string containing only a newline ("\n"). You may also want to avoid lines whose "data" consists only of spaces, tabs, etc ("whitespace"). The str.strip() method lets you detect both cases (a newline is whitespace).
f = open("tracks.txt", "r")
# omit empty lines and lines containing only whitespace
lines = [line for line in f if line.strip()]
f.close()
lines.sort()
# now write the output file
This is a perfect opportunity to do some test-based development (see below). Some observations:
In the example below, I omit the aspect of reading from and writing to a file. That's not essential to this question, in my opinion.
I assume you want to strip trailing newlines and omit blank lines. If not, you'll need to adjust. (But you'll have the framework for asserting/confirming the expected behavior.)
I agree with chryss above that you generally don't need to reflexively wrap things in try blocks in Python. That's an anti-pattern that comes from Java (which forces it), I believe.
Anyway, here's the test:
import unittest
def sort_lines(text):
"""Return text sorted by line, remove empty lines and strip trailing whitespace."""
lines = text.split('\n')
non_empty = [line.rstrip() for line in lines if line.strip()]
non_empty.sort()
return '\n'.join(non_empty)
class SortTest(unittest.TestCase):
def test(self):
data_to_sort = """z some stuff
c some other stuff
d more stuff after blank lines
b another line
a the last line"""
actual = sort_lines(data_to_sort)
expected = """a the last line
b another line
c some other stuff
d more stuff after blank lines
z some stuff"""
self.assertEquals(actual, expected, "no match!")
unittest.main()
The reason it sorts the blank lines is that they are there. A blank line is an empty string followed by \n (or \r\n or \r, depending on the OS). Perfectly sortable.
I should like to note that "try:" nested into a "try:... except" block is a bit ugly, and I'd close the file after reading, for style's sake.

Categories

Resources