Why does [line in open("text.txt")] yield newlines? - python

Looking at solutions to reading in a file in Python, every time the newline character should be stripped off:
In [5]: [line for line in open("text.txt", "r")]
Out[5]: ['line1\n', 'line2']
The intuitive behavior (judging by the popularity of some questions (1, 2) about this) would be to just yield the stripped lines.
What is the rationale behind this?

Well, this is a line. A line is defined by ending with the character \n. If a sequence of characters did not end with a \n (or EOF) how could we know it was a line?
"hello world"
"hello world\n"
The first is not a line, if we print it twice we might get
hello worldhello world
Wile the second version will give us
hello world
hello world

Migrating the asker's response/solution from the question to an answer:
Granted: 'intuitive' is subjective. 'Consistent', however, is less so. Apparently the 'line' concept in "line1\nline2".splitlines() is a different one than the one handled by the iter(open("text.txt")):
>>> assert(open("text.txt").readlines() == \
... open("text.txt").read().splitlines())
AssertionError
Pretty sure people do get caught by this.
So I was mistaken: maybe my intuition is just in line with the splitlines interpretation: the split stuff should not include the separators. Maybe the answer to my question is not technical, but more like "since PEP-xyz was approved by different people than PEP-qrs". Maybe I should post it to some python language forum.

Related

Python IDLE shell - How to make multiline strings

ANSWERED (But if for some reason you want to read it you can)
I am a beginner at Python and I know how to type the correct syntax but when I repeat it, it comes out with no extra lines, it just says "\n"
DETAILS ON EXAMPLE BELOW:
My variable is fred and I set it to what it says below, and then on the 3rd line I made it print the text.
Please tell me if I didn't describe it correctly, keep in mind I'm a beginner.
EXAMPLE:
>>> fred = '''How do dinosaurs pay their bills?
With tyrannosaurus checks!'''
>>> fred
'How do dinosaurs pay their bills?\nWith tyrannosaurus checks!'
The \n in that string represents the new line symbol. For pretty much all purposes, it is a new line.
What you're doing is correct.
To print the lines on separate lines:
for line in jack.split("\n"):
print(line)

Splitting long line printed to screen the right way in python

This might be a silly question but I'd like to know how other people handle this or if there's a standard/recommended way of going about it.
Below are two approaches to splitting a long text line when printing it to screen in python. Which one should be used?
Option 1
if some_condition: # Senseless indenting.
if another condition: # Senseless indenting.
print 'This is a very long line of text that goes beyond the 80\n\
character limit.'
Option 2
if some_condition: # Senseless indenting.
if another condition: # Senseless indenting.
print 'This is a very long line of text that goes beyond the 80'
print 'character limit.'
I personally find Option 1 ugly but Option 2 seems like it would go against the pythonic way of keeping things simple by using a second print call.
One way to do it can be with parenthesis:
print ('This is a very long line of text that goes beyond the 80\n'
'character limit.')
Of course, there are several ways of doing it. Another way (as suggested in comments) is the triple quote:
print '''This is a very long line of text that goes beyond the 80
character limit.'''
Personally I don't like that one much because it seems like breaking the indentation, but that's just me.
If you have a long string and want to insert line breaks at appropriate points, the textwrap module provides functionality to do just that. Ex:
import textwrap
def format_long_string(long_string):
wrapper = textwrap.TextWrapper()
wrapper.width = 80
return wrapper.fill(long_string)
long_string = ('This is a really long string that is raw and unformatted '
'that may need to be broken up into little bits')
print format_long_string(long_string)
This results in the following being printed:
This is a really long string that is raw and unformatted that may need to be
broken up into little bits

Python - Remove Last Line From String

I'm trying to capture and manipulate data within a Telnet session using telnetlib, things are going fairly well, however my newbness with Python is causing me some headache.
My issue is pretty straight forward, I am able to capture and display the data I want (So far) however I just seem to be cycling through errors whenever I try to remove the last line of data from the data I have captured. My code goes something like this:
... Snipped Boring Stuff ...
tn.write('command to gather data' + '\r')
data = tn.read_very_eager()
print data
... snipped more boring stuff ...
Pretty simple... So, how in the world would I remove the last line of data from either tn.read_very_eager() or data() ?
Any direction would be awesome... Sorry for the really simple question, but the stuff I've read and tried so far have done nothing but frustrate me, my keyboard can't take much more abuse :)
You can remove the last line of a string like this:
def remove_last_line_from_string(s):
return s[:s.rfind('\n')]
string = "String with\nsome\nnewlines to answer\non StackOverflow.\nThis line gets removed"
print string
string = remove_last_line_from_string(string)
print '\n\n'+string
The output will be:
>>>
String with
some
newlines to answer
on StackOverflow.
This line gets removed
String with
some
newlines to answer
on StackOverflow.
>>>

Good method to substitute end-lines '\n' into spaces in a string

I have an error message that spans across multiple (2-3) lines. I want to catch it and embed in a warning. I think that substituting new-lines into spaces is ok.
My question is, which method is the best practice. I know this is not the best kind of question, but I want to code it properly. I also might be missing something. So far I have came up with 3 methods:
string.replace()
regular expression
string.translate()
I was leaning towards string.translate(), however after reading how it works, I think it's an overkill to covert every character into itself except '\n'. Regexp also seems like an overkill for such a simple task.
Is there any other method designated to it, or should I pick up one of the aforementioned? I care about portability and robustness more than speed but it is still somewhat relevant.
Just use the replace method:
>>> "\na".replace("\n", " ")
' a'
>>>
It is the simplest solution. Using Regex is overkill and also means you have to import. translate is a little better, but still doesn't give anything that replace doesn't (except more typing of course).
replace should run faster too.
If you want to leave all these implementation details up to the python implementation you could do:
s = "This\nis\r\na\rtest"
print " ".join(s.splitlines())
# prints: This is a test
Note:
This method uses the universal newlines approach to splitting lines.
Which means:
universal newlines A manner of interpreting text streams in which all of the following are recognized as ending a line: the Unix end-of-line convention '\n', the Windows convention '\r\n', and the old Macintosh convention '\r'. See PEP 278 and PEP 3116, as well as str.splitlines() for an additional use.
A benefit of splitting lines over replacing linefeeds is that you can filter out lines you don't need, i.e. to avoid clutter in your log. For example, if you have this output of traceback.format_exc():
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ZeroDivisionError: integer division or modulo by zero
And you need to add only the last line(s) to your log:
import traceback
try:
1/0
except: # of course you wouldn't catch exceptions like this in real code
print traceback.format_exc().splitlines()[-1]
# prints: ZeroDivisionError: integer division or modulo by zero
For reference:
http://docs.python.org/2/library/stdtypes.html#str.splitlines
http://docs.python.org/2/library/stdtypes.html#str.join
http://docs.python.org/2/glossary.html#term-universal-newlines
http://www.python.org/dev/peps/pep-0278/
http://docs.python.org/2/library/traceback.html
This is another fast/portable option. It is more or less the same as replace but less readable
errMsg = """Something went wrong
This message is long"""
" ".join(errMsg.splitlines())
With timing results although I guarantee this will be different based on message length
>>> s = """\
' '.join('''Something went wrong
This message is long'''.splitlines())"""
>>> timeit.timeit(stmt=s, number=100000)
0.06071170746817329
>>> q = """'''\
Something went wrong
This message is long'''.replace("\\n",' ')"""
>>> timeit.timeit(stmt=q, number=100000)
0.049164684830429906
This should work on both Windows and Linux.
string.replace('\r\n', ' ').replace('\n', ' ')

Python : splitting and splitting

I need some help;
I'm trying to program a sort of command prompt with python
I need to split a text file into lines then split them into strings
example :
splitting
command1 var1 var2;
command2 (blah, bleh);
command3 blah (b bleh);
command4 var1(blah b(bleh * var2));
into :
line1=['command1','var1','var2']
line2=['command2']
line2_sub1=['blah','bleh']
line3=['blah']
line3_sub1=['b','bleh']
line4=['command4']
line4_sub1=['blah','b']
line4_sub2=['bleh','var2']
line4_sub2_operand=['*']
Would that be possible at all?
If so could some one explain how or give me a piece of code that would do it?
Thanks a lot,
It's been pointed out, that there appears to be no reasoning to your language. All I can do is point you to pyparsing, which is what I would use if I were solving a problem similar to this, here is a pyparsing example for the python language.
Like everyone else is saying, your language is confusingly designed and you probably need to simplify it. But I'm going to give you what you're looking for and let you figure that out the hard way.
The standard python file object (returned by open()) is an iterator of lines, and the split() method of the python string class splits a string into a list of substrings. So you'll probably want to start with something like:
for line in command_file
words = line.split(' ')
http://docs.python.org/3/library/string.html
you could use this code to read the file line by line and split it by spaces between words.
a= True
f = open(filename)
while a:
nextline=f.readline()
wordlist= nextline.split("")
print(wordlist)
if nextline=="\n":
a= False
What you're talking about is writing a simple programming language. It's not extraordinarily difficult if you know what you're doing, but it is the sort of thing most people take a full semester class to learn. The fact that you've got multiple different types of lexical units with what looks to be a non-trivial, recursive syntax means that you'll need a scanner and a parser. If you really want to teach yourself to do this, this might not be a bad place to start.
If you simplify your grammar such that each command only has a fixed number of arguments, you can probably get away with using regular expressions to represent the syntax of your individual commands.
Give it a shot. Just don't expect it to all work itself out overnight.

Categories

Resources