Reading lines including space - python

I want to read file including spaces in each lines
My current code
def data():
f = open("save.aln")
for line in f.readlines():
print "</br>"
print line
I am using python and output embedded in html
File to be read - http://pastebin.com/EaeKsyvg
Thanks

It seems that your problem is that you need space preserving in HTML. The simple solution would be to put your output between <pre> elemenets
def data():
print "<pre>"
f = open("save.aln")
for line in f.readlines():
print line
print "</pre>"
Note that in this case you don't need the <br> elements either, since the newline characters are also preserved.

The problem that you are faced with is that HTML ignores multiple whitespaces. #itsadok's solution is great. I upvoted it. But, it's not the only way to do this either.
If you want to explicitly turn those whitespaces into HTML whitespace characters, you could to this:
def data():
f = open("save.aln")
for line in f.readlines():
print "<br />"
print line.replace(" ", "&nbsp")
Cheers

import cgi
with open('save.aln') as f:
for line in f:
print cgi.escape(line) # escape <, >, &
print '<br/>'

Related

In python, why slicing doesn't work in readline()? [duplicate]

I'm using Python 3 to loop through lines of a .txt file that contains strings. These strings will be used in a curl command. However, it is only working correctly for the last line of the file. I believe the other lines end with newlines, which throws the string off:
url = https://
with open(file) as f:
for line in f:
str = (url + line)
print(str)
This will return:
https://
endpoint1
https://
endpoint2
https://endpoint3
How can I resolve all strings to concatonate like the last line?
I've looked at a couple of answers like How to read a file without newlines?, but this answer converts all content in the file to one line.
Use str.strip
Ex:
url = https://
with open(file) as f:
for line in f:
s = (url + line.strip())
print(s)
If the strings end with newlines you can call .strip() to remove them. i.e:
url = https://
with open(file) as f:
for line in f:
str = (url + line.strip())
print(str)
I think str.strip() will solve your problem

How to remove quotation marks from variable that writes to a file?

Everytime i run this part code everything goes smoothly BUT when it writes the variable to the file it shows up with quotation marks, is there a way to remove them and write it as simple text?
try:
with open(tokens) as f:
lines = f.readlines()
answer = random.choice(lines)
print(answer)
except:
file_name = tokens
opened_file = open(tokens, 'a')
opened_file.write("%r\n" %user_input)
opened_file.close()
writing in the file looks like this:
'Whats up'
and i want it to look like this:
Whats up
In your line that writes to the log you are using %r as your format in the string. the Python docs say
Replacing %s and %r:
>
"repr() shows quotes: {!r}; str() doesn't: {!s}".format('test1', 'test2') "repr() shows quotes: 'test1'; str() doesn't: test2"
So replace this line
opened_file.write("%r\n" %user_input)
with
opened_file.write("%s\n" %user_input)

How to rejoin split words in a file?

So far on python I have made a file using the code:
text_file = open("Sentences_Positions.txt", "w")
text_file.write (str(positions))
text_file.write (str(ssplit))
text_file.close()
The code makes the file and writes individual words to it which I previously split, I need to find a way to open the file and join the split words then print it I have tried.
text_file = open("Sentences_Positions.txt", "r")
rejoin = ("Sentences_positions.txt").join('')
print (rejoin)
But all this does is print a blank line in the shell, how should I approach this and what other code could i try?
Read the file content and join them by ''
content = textfile.read().split(' ')
print ''.join(content)
Replace:
rejoin = ("Sentences_positions.txt").join('')
with:
rejoin = ''.join(text_file.read().split(' '))
Also, you should probably not use open but rather the context manager:
with open("Sentences_Positions.txt") as text_file:
rejoin = ''.join(text_file.read().split(' '))
print (rejoin)
Otherwise the file remains open. Using the context manager, it will close it when it's done. (True for the first part of your code as well).

Inserting an XML Element w/ Python

I have a quick and dirty build script that needs to update a couple of lines in a small xml config file. Since the file is so small, I'm using an admittedly inefficient process to update the file in place just to keep things simple:
def hide_osx_dock_icon(app):
for line in fileinput.input(os.path.join(app, 'Contents', 'Info.plist'), inplace=True):
line = re.sub(r'(<key>CFBundleDevelopmentRegion</key>)', '<key>LSUIElement</key><string>1</string>\g<1>', line.strip(), flags=re.IGNORECASE)
print line.strip()
The idea is to find the <key>CFBundleDevelopmentRegion</key> text and insert the LSUIElement content right in front of it. I'm doing something just like this in another area and it's working fine so I guess I'm just missing something, but I don't see it.
What am I doing wrong?
You are printing only the last line, because your print statement falls outside of the for loop:
for line in fileinput.input(os.path.join(app, 'Contents', 'Info.plist'), inplace=True):
line = re.sub(r'(<key>CFBundleDevelopmentRegion</key>)', '<key>LSUIElement</key><string>1</string>\g<1>', line.strip(), flags=re.IGNORECASE)
print line.strip()
Indent that line to match the previous:
for line in fileinput.input(os.path.join(app, 'Contents', 'Info.plist'), inplace=True):
line = re.sub(r'(<key>CFBundleDevelopmentRegion</key>)', '<key>LSUIElement</key><string>1</string>\g<1>', line.strip(), flags=re.IGNORECASE)
print line.strip()

How to exclude U+2028 from line separators in Python when reading file?

I have a file in UTF-8, where some lines contain the U+2028 Line Separator character (http://www.fileformat.info/info/unicode/char/2028/index.htm). I don't want it to be treated as a line break when I read lines from the file. Is there a way to exclude it from separators when I iterate over the file or use readlines()? (Besides reading the entire file into a string and then splitting by \n.) Thank you!
I can't duplicate this behaviour in python 2.5, 2.6 or 3.0 on mac os x - U+2028 is always treated as non-endline. Could you go into more detail about where you see this error?
That said, here is a subclass of the "file" class that might do what you want:
#/usr/bin/python
# -*- coding: utf-8 -*-
class MyFile (file):
def __init__(self, *arg, **kwarg):
file.__init__(self, *arg, **kwarg)
self.EOF = False
def next(self, catchEOF = False):
if self.EOF:
raise StopIteration("End of file")
try:
nextLine= file.next(self)
except StopIteration:
self.EOF = True
if not catchEOF:
raise
return ""
if nextLine.decode("utf8")[-1] == u'\u2028':
return nextLine+self.next(catchEOF = True)
else:
return nextLine
A = MyFile("someUnicode.txt")
for line in A:
print line.strip("\n").decode("utf8")
I couldn't reproduce that behavior but here's a naive solution that just merges readline results until they don't end with U+2028.
#!/usr/bin/env python
from __future__ import with_statement
def my_readlines(f):
buf = u""
for line in f.readlines():
uline = line.decode('utf8')
buf += uline
if uline[-1] != u'\u2028':
yield buf
buf = u""
if buf:
yield buf
with open("in.txt", "rb") as fin:
for l in my_readlines(fin):
print l
Thanks to everyone for answering.
I think I know why you might not have been able to replicate this.I just realized that it happens if I decode the file when opening, as in:
f = codecs.open(filename, encoding='utf-8')
for line in f:
print line
The lines are not separated on u2028, if I open the file first and then decode individual lines:
f = open(filename)
for line in f:
print line.decode("utf8")
(I'm using Python 2.6 on Windows. The file was originally UTF16LE and then it was converted into UTF8).
This is very interesting, I guess I won't be using codecs.open much from now on :-).
If you use Python 3.0 (note that I don't, so I can't test), according to the documentation you can pass an optional newline parameter to open to specifify which line seperator to use. However, the documentation doesn't mention U+2028 at all (it only mentions \r, \n, and \r\n as line seperators), so it's actually a suprise to me that this even occurs (although I can confirm this even with Python 2.6).
The codecs module is doing the RIGHT thing. U+2028 is named "LINE SEPARATOR" with the comment "may be used to represent this semantic unambiguously". So treating it as a line separator is sensible.
Presumably the creator would not have put the U+2028 characters there without good reason ... does the file have u"\n" as well? Why do you want lines not to be split on U+2028?

Categories

Resources