Fetch URLs from a Text File Using Python - python

I have a list of URLs in a text file that I would like to fetch using urllib2. I know you can use urllib2.urlopen(the_url) to read the content from the url, but how can I make it so Python reads those URLs line by line as they are in the text file and prints the results?
Thanks.

If the file just consists of one URL per line, you can just do:
with open(filename, "r") as f:
for line in f:
url = urllib2.urlopen(line)
...
Is there something about the way the file is formatted that makes this more complicated?

Related

Python O365 send email with HTML file

I'm using O365 for Python.
Sending an email and building the body my using the setBodyHTML() function. However at the present I need to write the actual HTML code inside the function. I don't want to do that. I want to just have python look at an HTML file I saved somewhere and send an email using that file as the body. Is that possible? Or am I confined to copy/pasting my HTML into that function? I'm using office365 for business. Thanks.
In other words instead of this: msg.setBodyHTML("<h3>Hello</h3>") I want to be able to do this: msg.setBodyHTML("C:\somemsg.html")
I guess you can assign the file content to a variable first, i.e.:
file = open('C:/somemsg.html', 'r')
content = file.read()
file.close()
msg.setBodyHTML(content)
You can do this via a simple reading of that file into a string, which you then can pass to the setBodyHTML function.
Here's a quick function example that will do the trick:
def load_html_from_file(path):
contents = ""
with open(path, 'r') as f:
contents = f.read()
return contents
Later, you can do something along the lines of
msg.setBodyHTML(load_html_from_file("C:\somemsg.html"))
or
html_contents = load_html_from_file("C:\somemsg.html")
msg.setBodyHTML(html_contents)

How to download file from url using python in the same format?

I have to download file, I see I can do it using "urllib2".
response = urllib2.urlopen(URL)
file=response.read()
But, I can't read line by line.
Here's what I tried:
response = urllib2.urlopen(URL)
for line in response.read():
#do stuff
All the file is one line.
The original file was split by new lines.
Can someone tell me how to change the file to be like the original one?
Just split the output:
response = urllib2.urlopen(URL)
lines=response.read().splitlines()

Python Read URLs from File and print to file

I have a list of URLs in a text file from which I want to fetch the article text, author and article title. When these three elements are obtained I want them to be written to a file. So far I can read the URLs from the text file but Python only prints out the URLS and one (the final article). How can I re-write my script in order for Python to read and write every single URL and content?
I have to the following Python script (version 2.7 - Mac OS X Yosemite):
from newspaper import Article
f = open('text.txt', 'r') #text file containing the URLS
for line in f:
print line
url = line
first_article = Article(url)
first_article.download()
first_article.parse()
# write/append to file
with open('anothertest.txt', 'a') as f:
f.write(first_article.title)
f.write(first_article.text)
print str(first_article.title)
for authors in first_article.authors:
print authors
if not authors:
print 'No author'
print str(first_article.text)
You're getting the last article, because you're iterating over all the lines of the file:
for line in f:
print line
and once the loop is over, line contains the last value.
url = line
If you move the contents of your code within the loop, so that:
with open('text.txt', 'r') as f: #text file containing the URLS
with open('anothertest.txt', 'a') as fout:
for url in f:
print(u"URL Line: {}".format(url.encode('utf-8')))
# you might want to remove endlines and whitespaces from
# around the URL, which what strip() does
article = Article(url.strip())
article.download()
article.parse()
# write/append to file
fout.write(article.title)
fout.write(article.text)
print(u"Title: {}".format(article.title.encode('utf-8')))
# print authors only if there are authors to show.
if len(article.authors) == 0:
print('No author!')
else:
for author in article.authors:
print(u"Author: {}".format(author.encode('utf-8')))
print("Text of the article:")
print(article.text.encode('utf-8'))
I also made a few changes to improve your code:
use with open() also for reading the file, to properly release the file descriptor
when you don't need it anymore ;
call the output file fout to avoid shadowing the first file
made the opening call for fout done once, before entering the loop to avoid opening/closing the file at each iteration,
check for length of article.authors instead of checking for existence of authors
as authors won't exist when you don't get within the loop because article.authors
is empty.
HTH

Python - html2text write to file

I have text files which contain html tags which I want to remove using html2text with Python:
import html2text
html = open("textFileWithHtml.txt").read()
print html2text.html2text(html)
My question is how can I write the output to a .txt file ? (I want to create the new text file without the html elements -- the file does not previously exist)
You need to open another file for writing.
import html2text
html = open("textFileWithHtml.txt")
f = html.read()
w = open("out.txt", "w")
w.write(html2text.html2text(f).encode('utf-8'))
html.close()
w.close()
You should open a file and write to it.
import html2text
# Open your file
with open("textFileWithHtml.txt", 'r') as f_html:
html = f_html.read()
# Open a file and write to it
with open('your_file.txt', 'w') as f:
f.write(html2text.html2text(html).encode('utf-8'))
It is good practice to use the with keyword when dealing with file objects.
And it is more pythonic too.
See more information for files reading / writing files : https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
Edit
If you have issues with encoding, try using .encode('utf-8'). I've added it in my code snipped. Look for python unicode if you have issues regarding this (https://docs.python.org/2/howto/unicode.html)

Python- need to append characters to the beginning and end of each line in text file

I should preface that I am a complete Python Newbie.
Im trying to create a script that will loop through a directory and its subdirectories looking for text files. When it encounters a text file it will parse the file and convert it to NITF XML and upload to an FTP directory.
At this point I am still working on reading the text file into variables so that they can be inserted into the XML document in the right places. An example to the text file is as follows.
Headline
Subhead
By A person
Paragraph text.
And here is the code I have so far:
with open("path/to/textFile.txt") as f:
#content = f.readlines()
head,sub,auth = [f.readline().strip() for i in range(3)]
data=f.read()
pth = os.getcwd()
print head,sub,auth,data,pth
My question is: how do I iterate through the body of the text file(data) and wrap each line in HTML P tags? For example;
<P>line of text in file </P> <P>Next line in text file</p>.
Something like
output_format = '<p>{}</p>\n'.format
with open('input') as fin, open('output', 'w') as fout:
fout.writelines( output_format(line.strip()) for line in fin )
This assumes that you want to write the new content back to the original file:
with open('path/to/textFile.txt') as f:
content = f.readlines()
with open('path/to/textFile.txt', 'w') as f:
for line in content:
f.write('<p>' + line.strip() + '</p>\n')
with open('infile') as fin, open('outfile',w) as fout:
for line in fin:
fout.write('<P>{0}</P>\n'.format(line[:-1]) #slice off the newline. Same as `line.rstrip('\n')`.
#Only do this once you're sure the script works :)
shutil.move('outfile','infile') #Need to replace the input file with the output file
in you case, you should probably replace
data=f.read()
with:
data = '\n'.join("<p>%s</p>" % l.strip() for l in f)
use data=f.readlines() here,
and then iterate over data and try something like this:
for line in data:
line="<p>"+line.strip()+"</p>"
#write line+'\n' to a file or do something else
append the and <\p> for each line
ex:
data_new=[]
data=f.readlines()
for lines in data:
data_new.append("<p>%s</p>\n" % data.strip().strip("\n"))
You could use the fileinput module to modify one or more files in-place, with optional backup file creation if desired (see its documentation for details). Here's it being used to process one file.
import fileinput
for line in fileinput.input('testinput.txt', inplace=1):
print '<P>'+line[:-1]+'<\P>'
The 'testinput.txt' argument could also be a sequence of two or more file names instead of just a single one, which could be useful especially if you're using os.walk() to generate the list of files in the directory and its subdirectories to process (as you probably should be doing).

Categories

Resources