In python, why slicing doesn't work in readline()? [duplicate] - python

I'm using Python 3 to loop through lines of a .txt file that contains strings. These strings will be used in a curl command. However, it is only working correctly for the last line of the file. I believe the other lines end with newlines, which throws the string off:
url = https://
with open(file) as f:
for line in f:
str = (url + line)
print(str)
This will return:
https://
endpoint1
https://
endpoint2
https://endpoint3
How can I resolve all strings to concatonate like the last line?
I've looked at a couple of answers like How to read a file without newlines?, but this answer converts all content in the file to one line.

Use str.strip
Ex:
url = https://
with open(file) as f:
for line in f:
s = (url + line.strip())
print(s)

If the strings end with newlines you can call .strip() to remove them. i.e:
url = https://
with open(file) as f:
for line in f:
str = (url + line.strip())
print(str)

I think str.strip() will solve your problem

Related

Python Regex to find CRLF

I'm trying to write a regex that will find any CRLF in python.
I am able to successfully open the file and use newlines to determine what newlines its using CRLF or LF. My numerous regex attempts have failed
with open('test.txt', 'rU') as f:
text = f.read()
print repr(f.newlines)
regex = re.compile(r"[^\r\n]+", re.MULTILINE)
print(regex.match(text))
I've done numerous iterations on the regex and in every case it till either detect \n as \r\n or not work at all.
You could try using the re library to search for the \r & \n patterns.
import re
with open("test.txt", "rU") as f:
for line in f:
if re.search(r"\r\n", line):
print("Found CRLF")
regex = re.compile(r"\r\n")
line = regex.sub("\n", line)
if re.search(r"\r", line):
print("Found CR")
regex = re.compile(r"\r")
line = regex.sub("\n", line)
if re.search(r"\n", line):
print("Found LF")
regex = re.compile(r"\n")
line = regex.sub("\n", line)
print(line)
Assuming your test.txt file looks something like this:
This is a test file
with a line break
at the end of the file.
As I mentioned in a comment, you're opening the file with universal newlines, which means that Python will automatically perform newline conversion when reading from or writing to the file. Your program therefore will not see CR-LF sequences; they will be converted to just LF.
Generally, if you want to portably observe all bytes from a file unchanged, then you must open the file in binary mode:
In Python 2:
from __future__ import print_function
import re
with open('test.txt', 'rb') as f:
text = f.read()
regex = re.compile(r"[^\r\n]+", re.MULTILINE)
print(regex.match(text))
In Python 3:
import re
with open('test.txt', 'rb') as f:
text = f.read()
regex = re.compile(rb"[^\r\n]+", re.MULTILINE)
print(regex.match(text))

Import file with string python for certain condition

I am trying to import a txt file to a list in python.
What am I doing right now
with open('my_connection_page.txt', 'r') as f:
url = f.readlines()
It just put everything into the url[0].
This is the Text file
[u'/scheck/', u'/amanda/', u'/in/amanda/', u'/462aa6aa/', u'/462aa6aa/', u'/895161106/', u'/895161106/', u'/anshabenhudson/']
What should I do?
Use url = f.read().split() instead. You can use delimiter in split().

Delete every non utf-8 symbols from string

I have a big amount of files and parser. What I Have to do is strip all non utf-8 symbols and put data in mongodb.
Currently I have code like this.
with open(fname, "r") as fp:
for line in fp:
line = line.strip()
line = line.decode('utf-8', 'ignore')
line = line.encode('utf-8', 'ignore')
somehow I still get an error
bson.errors.InvalidStringData: strings in documents must be valid UTF-8:
1/b62010montecassianomcir\xe2\x86\x90ta0\xe2\x86\x90008923304320733/290066010401040101506055soccorin
I don't get it. Is there some simple way to do it?
UPD: seems like Python and Mongo don't agree about definition of Utf-8 Valid string.
Try below code line instead of last two lines. Hope it helps:
line=line.decode('utf-8','ignore').encode("utf-8")
For python 3, as mentioned in a comment in this thread, you can do:
line = bytes(line, 'utf-8').decode('utf-8', 'ignore')
The 'ignore' parameter prevents an error from being raised if any characters are unable to be decoded.
If your line is already a bytes object (e.g. b'my string') then you just need to decode it with decode('utf-8', 'ignore').
Example to handle no utf-8 characters
import string
test=u"\n\n\n\n\n\n\n\n\n\n\n\n\n\nHi <<First Name>>\nthis is filler text \xa325 more filler.\nadditilnal filler.\n\nyet more\xa0still more\xa0filler.\n\n\xa0\n\n\n\n\nmore\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nfiller.\x03\n\t\t\t\t\t\t almost there \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nthe end\n\n\n\n\n\n\n\n\n\n\n\n\n"
print ''.join(x for x in test if x in string.printable)
with open(fname, "r") as fp:
for line in fp:
line = line.strip()
line = line.decode('cp1252').encode('utf-8')

With python and cherrypy, how do i read a txt file and display it to a page?

I have got a txt file in the format of:
line_1
line_2
line_3
I am trying to read it into a list and displaying it on to a web page just as it looks inside the txt file; one line under another. Here is my code
#cherrypy.expose
def readStatus(self):
f = open("directory","r")
lines = "\n".join(f.readlines())
f.close()
page += "<p>%s</p>" % (lines)
However, the output i have been getting is:
line_1 line_2 line_3
It would be great if someone could give me a hit as to what to do so line_1, line_2 and line_3 are displayed on 3 seperate lines inside the web browser?
Thanks in advance.
You're wrapping paragraph tags around all of the filenames. You probably meant to put paragraph tags around each filename individually:
with open("directory", "r") as f:
page = "\n".join("<p>%s</p>" % line for line in f)
Or, more semantically, you could put it all in an unordered list:
with open("directory", "r") as f:
page = '<ul>%s</ul>' % "\n".join("<li>%s</li>" % line for line in f)
Alternatively, you could put it all inside of a pre (preformatted text) tag:
with open('directory', 'r') as f:
page = '<pre>%s</pre>' % f.read()
Additionally, you might want to consider escaping the filenames with cgi.escape so browsers don't interpret any special characters in the filename.

Reading lines including space

I want to read file including spaces in each lines
My current code
def data():
f = open("save.aln")
for line in f.readlines():
print "</br>"
print line
I am using python and output embedded in html
File to be read - http://pastebin.com/EaeKsyvg
Thanks
It seems that your problem is that you need space preserving in HTML. The simple solution would be to put your output between <pre> elemenets
def data():
print "<pre>"
f = open("save.aln")
for line in f.readlines():
print line
print "</pre>"
Note that in this case you don't need the <br> elements either, since the newline characters are also preserved.
The problem that you are faced with is that HTML ignores multiple whitespaces. #itsadok's solution is great. I upvoted it. But, it's not the only way to do this either.
If you want to explicitly turn those whitespaces into HTML whitespace characters, you could to this:
def data():
f = open("save.aln")
for line in f.readlines():
print "<br />"
print line.replace(" ", "&nbsp")
Cheers
import cgi
with open('save.aln') as f:
for line in f:
print cgi.escape(line) # escape <, >, &
print '<br/>'

Categories

Resources