Pulling info from an api url - python

I'm trying to pull the average of temperatures from this API from a bunch of different ZIP codes. I can currently do so by manually changing the ZIP code in the URL for the API, but I was hoping it to be able to loop through a list of ZIP codes or ask for input and use those zip codes.
However, I'm rather new and have no idea on how to add variables and stuff to a link, either that or I'm overcomplicating it. So basically I was searching for some methods to add a variable to the link or something to the same effect so I can change it whenever I want.
import urllib.request
import json
out = open("output.txt", "w")
link = "http://api.openweathermap.org/data/2.5/weather?zip={zip-code},us&appid={api-key}"
print(link)
x = urllib.request.urlopen(link)
url = x.read()
out.write(str(url, 'utf-8'))
returnJson = json.loads(url)
print('\n')
print(returnJson["main"]["temp"])

import urllib.request
import json
zipCodes = ['123','231','121']
out = open("output.txt", "w")
for i in zipCodes:
link = "http://api.openweathermap.org/data/2.5/weather?zip=" + i + ",us&appid={api-key}"
x = urllib.request.urlopen(link)
url = x.read()
out.write(str(url, 'utf-8'))
returnJson = json.loads(url)
print(returnJson["main"]["temp"])
out.close()
You can achieve what you want by looping through a list of zipcodes and creating a new URL from them.

Related

After reading urls from a text file, how can I save all the responses into separate files?

I have a script that reads urls from a text file, performs a request and then saves all the responses in one text file. How can I save each response in a different text file instead of all in the same file? For example, if my text file labeled input.txt has 20 urls, I would like to save the responses in 20 different .txt files like output1.txt, output2.txt instead of just one .txt file. So for each request, the response in saved in a new .txt file. Thank you
import requests
from bs4 import BeautifulSoup
with open('input.txt', 'r') as f_in:
for line in map(str.strip, f_in):
if not line:
continue
response = requests.get(line)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
categories = soup.find_all("a", {"class":'navlabellink nvoffset nnormal'})
for category in categories:
data = line + "," + category.text
with open('output.txt', 'a+') as f:
f.write(data + "\n")
print(data)
Here's a quick way to implement what others have hinted at:
import requests
from bs4 import BeautifulSoup
with open('input.txt', 'r') as f_in:
for i, line in enumerate(map(str.strip, f_in)):
if not line:
continue
...
with open(f'output_{i}.txt', 'w') as f:
f.write(data + "\n")
print(data)
You can make a new file by using open('something.txt', 'w'). If the file is found, it'll erase its content. Else, it'll make a new file named 'something.txt'. Now, you can use file.write() to write your info!
I'm not sure, if I understood your problem right.
I would create an array/list and would create an object for each url request and response. Then add the objects to the array/list and write for each object a different file.
There are at least two ways you could generate files for each url. One, shown below, is to create a hash of some data unique data of the file. In this case I chose category but your could also use the whole contents of the file. This creates a unique string to use for a file name so that two links with the same category text don't overwrite each other when saved.
Another way, not shown, is to find some unique value within the data itself and use it as the filename without hashing it. However, this can cause more problems than it solves since data on the Internet should not be trusted.
Here's your code with an MD5 hash used for a filename. MD5 is not a secure hashing function for passwords but it's safe for creating unique filenames.
Updated Snippet
import hashlib
import requests
from bs4 import BeautifulSoup
with open('input.txt', 'r') as f_in:
for line in map(str.strip, f_in):
if not line:
continue
response = requests.get(line)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
categories = soup.find_all("a", {"class":'navlabellink nvoffset nnormal'})
for category in categories:
data = line + "," + category.text
filename = hashlib.sha256()
filename.update(category.text.encode('utf-8'))
with open('{}.html'.format(filename.hexdigest()), 'w') as f:
f.write(data + "\n")
print(data)
Code added
filename = hashlib.sha256()
filename.update(category.text.encode('utf-8'))
with open('{}.html'.format(filename.hexdigest()), 'w') as f:
Capturing Updated Pages
If you care about catching contents of a page at different points in time, hash the whole contents of the file. That way, if anything within the page changes the previous contents of the page aren't lost. In this case, I hash both the url and the file contents and concatenate the hashes with the URL hash followed by a hash of the file contents. That way, all versions of a file are visible when the directory is sorted.
hashed_contents = hashlib.sha256()
hashed_contents.update(category['href'].encode('utf-8'))
with open('{}.html'.format(filename.hexdigest()), 'w') as f:
for category in categories:
data = line + "," + category.text
hashed_url = hashlib.sha256()
hashed_url.update(category['href'].encode('utf-8'))
page = requests.get(category['href'])
hashed_content = hashlib.sha256()
hashed_content.update(page.text.encode('utf-8')
filename = '{}_{}.html'.format(hashed_url.hexdigest(), hashed_content.hexdigest())
with open('{}.html'.format(filename.hexdigest()), 'w') as f:
f.write(data + "\n")
print(data)

How to input a list of URLs saved in a .txt to a Python program?

I have a list of URLs saved in a .txt file and I would like to feed them, one at a time, to a variable named url to which I apply methods from the newspaper3k python library. The program extracts the URL content, authors of the article, a summary of the text, etc, then prints the info to a new .txt file. The script works fine when you give it one URL as user input, but what should I do in order to read from a .txt with thousands of URLs?
I am only beginning with Python, as a matter of fact this is my first script, so I have tried to simply say url = (myfile.txt), but I realized this wouldn't work because I have to read the file one line at a time. So I have tried to apply read() and readlines() to it, but it wouldn't work properly because 'str' object has no attribute 'read' or 'readlines'. What should I use to read those URLs saved in a .txt file, each beginning in a new line, as the input of my simple script? Should I convert string to something else?
Extract from the code, lines 1-18:
from newspaper import Article
from newspaper import fulltext
import requests
url = input("Article URL: ")
a = Article(url, language='pt')
html = requests.get(url).text
text = fulltext(html)
download = a.download()
parse = a.parse()
nlp = a.nlp()
title = a.title
publish_date = a.publish_date
authors = a.authors
keywords = a.keywords
summary = a.summary
Later I have built some functions to display the info in a desired format and save it to a new .txt. I know this is a very basic one, but I am honestly stuck... I have read other similar questions here but I couldn't properly understand or apply the suggestions. So, what is the best way to read URLs from a .txt file in order to feed them, one at a time, to the url variable, to which other methods are them applied to extract its content?
This is my first question here and I understand the forum is aimed at more experienced programmers, but I would really appreciate some help. If I need to edit or clarify something in this post, please let me know and I will correct immediately.
Here is one way you could do it:
from newspaper import Article
from newspaper import fulltext
import requests
with open('myfile.txt',r) as f:
for line in f:
#do not forget to strip the trailing new line
url = line.rstrip("\n")
a = Article(url, language='pt')
html = requests.get(url).text
text = fulltext(html)
download = a.download()
parse = a.parse()
nlp = a.nlp()
title = a.title
publish_date = a.publish_date
authors = a.authors
keywords = a.keywords
summary = a.summary
This could help you:
url_file = open('myfile.txt','r')
for url in url_file.readlines():
print url
url_file.close()
You can apply it on your code as the following
from newspaper import Article
from newspaper import fulltext
import requests
url_file = open('myfile.txt','r')
for url in url_file.readlines():
a = Article(url, language='pt')
html = requests.get(url).text
text = fulltext(html)
download = a.download()
parse = a.parse()
nlp = a.nlp()
title = a.title
publish_date = a.publish_date
authors = a.authors
keywords = a.keywords
summary = a.summary
url_file.close()

Using python to download table data without .csv file address provided

My purpose is to download the data from this website:
http://transoutage.spp.org/
When opening this website, in the bottom of web, there is a description used to illustrate how to auto-download the data. For example:
http://transoutage.spp.org/report.aspx?download=true&actualendgreaterthan=3/1/2018&includenulls=true
The code I wrote is this:
import requests
ul_begin = 'http://transoutage.spp.org/report.aspx?download=true'
timeset = '3/1/2018' #define the time, m/d/yyyy
fn = ['&actualendgreaterthan='] + [timeset] + ['&includenulls=true']
fn = ''.join(fn)
ul = ul_begin+fn
r = requests.get(ul, verify=False)
Since, if you input the web address,
http://transoutage.spp.org/report.aspx?download=true&actualendgreaterthan=3/1/2018&includenulls=true,
into the Chrome, it will auto-download the data in .csv file. I do not know how to continue my code.
Please help me!!!!
You need to write the response you receive to a file:
r = requests.get(ul, verify=False)
if 200 >= r.status_code <= 300:
# If the request has succeeded
file_path = '<path_where_file_has_to_be_downloaded>`
f = open(file_path, 'w+')
f.write(r.content)
f.close()
This will work properly if the csv file is small in size. but for large files, you need to use stream param to download: http://masnun.com/2016/09/18/python-using-the-requests-module-to-download-large-files-efficiently.html

How to prevent writing into txt file the same words using open(text.txt,a)?

I have a question regarding appending to text file. I have written a script and what this script does is that it will read the URL in JSON format and extract the list of titles and write into the file "WordsInCategory.text".
As this code will be used in a loop thus I used f1 = open('WordsInCategory.text', 'a').
But I encountered a problem, that is it will add in already existing title into the file.
I am having trouble coming out with a solution to solve this problem and using 'w' will overwrite what it is written.
My code is as follows:
import urllib2
import json
url1 ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtype=page&cmtitle=Category:Geography&cmlimit=100'
json_obj = urllib2.urlopen(url1)
data1 = json.load(json_obj)
f1 = open('WordsInCategory.text', 'a')
for item in data1['query']:
for i in data1['query']['categorymembers']:
f1.write((i['title']).encode('utf8')+"\n")
Please advice on how I should modify my code.
Thank you.
I would suggest saving every title in an array, before writing to a file (and hence writing only once to the given file). You can modify your code this way :
import urllib2
import json
data = []
f1 = open('WordsInCategory.text', 'w')
url1 ='https://en.wikipedia.org/w/api.php?\
action=query&format=json&list=categorymembers\
&cmtype=page&cmtitle=Category:Geography&cmlimit=100'
json_obj = urllib2.urlopen(url1)
data1 = json.load(json_obj)
for item in data1['query']:
for i in data1['query']['categorymembers']:
data.append(i['title'].encode('utf8')+"\n")
# Do additional requests, and append the new titles to the data array
f1.write(''.join(set(data)))
f1.close()
set allows me to delete any duplicate entry.
If keeping the titles in memory is a problem, you can check if the title already exists before writing it to the file, but it may be awfully time consuming :
import urllib2
import json
data = []
url1 ='https://en.wikipedia.org/w/api.php?\
action=query&format=json&list=categorymembers\
&cmtype=page&cmtitle=Category:Geography&cmlimit=100'
json_obj = urllib2.urlopen(url1)
data1 = json.load(json_obj)
for item in data1['query']:
for i in data1['query']['categorymembers']:
title = (i['title'].encode('utf8')+"\n")
with open('WordsInCategory.text', 'r') as title_check:
if title not in title_check:
data.append(title)
with open('WordsInCategory.text', 'a') as f1:
f1.write(''.join(set(data)))
# Handle additional requests
Hope it'll be helpful.
You can track the titles you added.
titles = []
and then add each title to the list when writing
if title not in titles:
# write to file
titles += title

Pulling data from xml page into .txt

Im trying to pull just the keywords from an xml output like shown on:
http://clients1.google.com/complete/search?hl=en&output=toolbar&q=test+a
I have tried putting together the below but i don't seem to get any errors or any output. Any ideas?
import urllib2 as ur
import re
f = ur.urlopen(u'http://clients1.google.com/complete/search?hl=en&output=toolbar&q=test+a')
res = f.readlines()
for d in res:
data = re.findall('<CompleteSuggestion><\/CompleteSuggestion>',d)
for i in data:
print i
file = open("keywords.txt", "a")
file.write(i + '\n')
file.close()
I am trying to,
Fetch the xml from url given
Store list of keywords from XML file, parsed using regex
Thanks,
from urllib2 import urlopen
import re
xml_url = u'http://clients1.google.com/complete/search?hl=en&output=toolbar&q=test+a'
xml_file_contents = urlopen(xml_url).readlines()
keywords_file = open("keywords.txt", "a")
for entry in xml_file_contents:
output = "\n".join(re.findall('data=\"([^\"]*)',entry))
print output
keywords_file.write(output + '\n')
keywords_file.close()
output:
test anxiety
test america
test adobe flash
test automation
test act
test alternator
test and set
test adblock
test adobe shockwave
test automation tools
Let me know in case of any doubt

Categories

Resources