Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am pretty new to programing so I am sure this is not correct but its the best I can do based on my research. Thanks.
import pandas as pd
import numpy as np
import requests
import yelp
requests.get(https://api.yelp.com/v3/autocomplete?text=del&latitude=37.786882&longitude=-122.399972,headers={'Authorization: Bearer <API KEY that I have>'})
My noob self tells me this is a dictonary
headers={'Authorization: Bearer <API KeY>'}
I know this is prob 100% wrong so I would really love to learn more about using rest API's in Python. I am just doing this as a personal project. My overall goal is to be able to access yelps public data via API. For example, I want to get the reviews for business X.
Update
requests.get("https://api.yelp.com/v3/autocomplete?text=del&latitude=37.786882&longitude=-122.399972",headers={'Authorization: Bearer <API KEY>'})
I now get the following error
AttributeError: 'set' object has no attribute 'items'
You're definitely not 100% wrong #g_altobelli!
Let's take the example of getting reviews for business X, where X is one of my favorite restaurants -- la taqueria in San Francisco. Their restaurant id (which can be found in the url of their review page as the last element) is la-taqueria-san-francisco-2.
Now to our code:
You have the right idea using requests, I think your parameters might just be slightly off. It's helpful inititally to have some headers. Here's what I added:
import requests
API_KEY = "<my api key>"
API_HOST = 'https://api.yelp.com'
BUSINESS_PATH = '/v3/businesses/'
Then I created a function, that took in the business id and returned a jsonified result of the basic data. That looked like this:
def get_business(business_id):
business_path = BUSINESS_PATH + business_id
url = API_HOST + business_path + '/reviews'
headers = {'Authorization': f"Bearer {API_KEY}"}
response = requests.get(url, headers=headers)
return response.json()
Finally, I called the function with my values and printed the result:
results = get_business('la-taqueria-san-francisco-2')
print(results)
The output I got was json, and looked roughly like the following:
{'reviews': [{'id': 'pD3Yvc4QdUCBISy077smYw', 'url': 'https://www.yelp.com/biz/la-taqueria-san-francisco-2?hrid=pD3Yvc4QdUCBISy077smYw&adjust_creative=hEbqN49-q6Ct_cMosX68Zg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=hEbqN49-q6Ct_cMosX68Zg', 'text': 'My second time here.. \nI love the Burito here it has the distinct taste of freshness.. we order super steak burito and boy it did not disappoint! everything...}
Does this help? Let me know if you have any more questions.
Related
EDIT:
In a similar vein, when I now try to log into their account with a post request, what is returned is none of the errors they suggest on their site, but is in fact a "JSON exception". Is there any way to debug this, or is an error code 500 completely impossible to deal with?
I'm well aware this question has been asked before. Sadly, when trying the proposed answers, none worked. I have an extremely simple Python project with urllib, and I've never done web programming in Python before, nor am I even a regular Python user. My friend needs to get access to content from this site, but their user-friendly front-end is down and I learned that they have a public API to access their content. Not knowing what I'm doing, but glad to try to help and interested in the challenge, I have very slowly set out.
Note that it is necessary for me to only use standard Python libraries, so that any finished project could easily be emailed to their computer and just work.
The following works completely fine minus the "originalLanguage" query, but when using it, which the API has documented as an array value, no matter whether I comma-separate things, or write "originalLanguage[0]" or "originalLanguage0" or anything that I've seen online, this creates the error message from the server: "Array value expected but string detected" or something along those lines.
Is there any way for me to get this working? Because it clearly can work, otherwise the API wouldn't document it. Many thanks.
In case it helps, when using "[]" or "<>" or "{}" or any delimeter I could think of, my IDE didn't recognise it as part of the URL.
import urllib.request as request
import urllib.parse as parse
def make_query(url, params):
url += "?"
for i in range(len(params)):
url += list(params)[i]
url += '='
url += list(params.values())[i]
if i < len(params) - 1:
url += '&'
return url
base = "https://api.mangadex.org/manga"
params = {
"limit": "50",
"originalLanguage": "en"
}
url = make_query(base, params)
req = request.Request(url)
response = request.urlopen(req)
I have recently started learning Web scraping using Scrapy in python and am facing issues with scraping data from AccuWeather.org site (https://www.accuweather.com/en/gb/london/ec4a-2/may-weather/328328?year=2020).
Basically I am capturing dates and its weather temperature for my reporting purpose.
When inspected the site I found too many div tags so getting confused to write the code. Hence thought I would seek experts help on this.
Here is my code for your reference.
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'quotes'
start_urls = ['https://www.accuweather.com/en/gb/london/ec4a-2/may-weather/328328?year=2020']
def parse(self, response):
All_div_tags = response.css('div.content-module')[0]
#Grid_tag = All_div_tags.css('div.monthly-grid')
Date_tag = All_div_tags.css('div.date::text').extract()
yield {
'Date' : Date_tag}
I wrote this in PyCharm and am getting error as "code is not handled or not allowed".
please could someone help me with this?
I've tried to read some websites that gave me the same error. It happens because some websites don't allow web scraping on them. To get data from these websites, you would probably need to use their API if they have one.
Fortunately, AccuWeather has made it easy to use their API (unlike other APIs):
You first need to create an account at their developers' website: https://developer.accuweather.com/
Now, create a new app by going to My Apps > Add a new app.
You will probably see some information about your app (if you don't, press its name and it will probably show up). The only information you will need is your API Key, which is essential for APIs.
AccuWeather has pretty good documentation about their API here, yet I will show you how to use the most useful ones. You will need to have the location key of the city you want to get the weather from, that is shown in the URL of its weather page, for example, London's URL is www.accuweather.com/en/gb/london/ec4a-2/weather-forecast/328328, so its location key is 328328.
When you have the location key of the city/cities you want to get the weather from, open a file, and type:
import requests
import json
If you want the daily weather (as shown here), type:
response = requests.get(url="http://dataservice.accuweather.com/forecasts/v1/daily/1day/LOCATIONKEY?apikey=APIKEY")
print(response.status_code)
Replacing APIKEY with your API key, and LOCATIONKEY with the city's location key. It should now display 200 when you run it (meaning the request was successful)
Now, load it as a JSON file:
response_json = json.loads(response.content)
And you can now get some information from it, such as the day's "definition":
print(response_json["Headline"]["Text"])
The minimum temperature:
min_temperature = response_json["DailyForecasts"][0]["Temperature"]["Minimum"]["Value"]
print(f"Minimum Temperature: {min_temperature}")
The maximum temperature
max_temperature = response_json["DailyForecasts"][0]["Temperature"]["Maximum"]["Value"]
print(f"Maximum Temperature: {max_temperature}")
The minimum temperature and maximum temperature with the unit:
min_temperature = str(response_json["DailyForecasts"][0]["Temperature"]["Minimum"]["Value"]) + response_json["DailyForecasts"][0]["Temperature"]["Minimum"]["Unit"]
print(f"Minimum Temperature: {min_temperature}")
max_temperature = str(response_json["DailyForecasts"][0]["Temperature"]["Maximum"]["Value"]) + response_json["DailyForecasts"][0]["Temperature"]["Maximum"]["Unit"]
print(f"Maximum Temperature: {max_temperature}")
And more.
If you have any questions, let me know. I hope I could help you!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to crawl some data from this type of url:
http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?start=0&count=5¤cy=&language=english
I don't know, it contains some kind of html-tags but i don't know how to actually scrape this page (i used beautifulSoup for my other urls).
Hope you can help me out.
The page you loaded is a JSON file. Use the JSON library like so:
import requests
import json
html = requests.get('http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?start=0&count=5¤cy=&language=english')
# Load the parsed page into a JSON object.
steam_json = json.loads(html.text)
# Extract whatever you want like this:
success_status = steam_json['success']
You may want to do it with python, i.e. jsoup is a BeautifulSoup-like library for Java. The url returns a json. Your first have to load it as a python-native instance. In this case the corresponding python-native object is a dictionary, using the json library:
import json, urllib2
request = urllib2.Request(url=your_url)
request.add_header('User-agent',user_agent) # let's say you want to add headers like user-agent etc...
response = urllib2.urlopen(request)
dico = json.loads(response.read())
Then you have to explore the key-value pairs which are of interest for you, and parse the values containing html as you usually do with beautifulSoup.
Also, note that the site from which you want to get data, can be hypermedia-driven (see HATEOAS), which is a kind of AJAX implemented with no graphical interface. Whatever it might be, it allows you to be more precise (and thus more server-friendly) in the data you request.
url_base = "http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?"
start = 0
count = 5
currency = ''
language = 'english'
your_url = url_base + "start={0}&count={1}¤cy={2}&language={3}".format(start,count,currency,language)
So, I am playing around with Etilbudsavis' API (Danish directory containing offers from retail stores). My goal is to retrieve data based on a search query. the API acutally allows this, out of the box. However, when I try to do this, I end up with an error saying that my token is missing. Anyways, here is my code:
from urllib2 import urlopen
from json import load
import requests
body = {'api_key': 'secret_api_key'}
response = requests.post('https://api.etilbudsavis.dk/v2/sessions', data=body)
print response.text
new_body = {'_token:': 'token_obtained_from_POST_method', 'query:': 'coca-cola'}
new_response = requests.get('https://api.etilbudsavis.dk/v2/offers/search', data=new_body)
print new_response.text
Full error:
{"code":1107,"id":"00ilpgq7etum2ksrh4nr6y1jlu5ng8cj","message":"missing token","
details":"Missing token\nNo token found in request to an endpoint that requires
a valid token.","previous":null,"#note.1":"Hey! It looks like you found an error
. If you have any questions about this error, feel free to contact support with
the above error id."}
Since this is a GET request, you should use the params argument to pass the data in the URL.
new_response = requests.get('https://api.etilbudsavis.dk/v2/offers/search', params=new_body)
See the requests docs.
I managed to solve the problem with the help of Daniel Roseman who reminded me of the fact that playing with an API in the Python Shell is different from interacting with the API in the browser. The docs clearly stated that you'd have to sign the API token in order for it to work. I missed that tiny detail ... Never the less, Daniel helped me figure everything out. Thanks again, Dan.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I've been looking for an API to automatically retrieve Google Insights information for part of another algorithm, but have been unable to find anything. The first result on Google delivers a site with a python plugin which is now out of date.
Does such an API exist, or has anyone written a plugin, perhaps for python?
As far as I can tell, there is no API available as of yet, and neither is there a working implementation of a method for extracting data from Google Insights. However, I have found a solution to my (slightly more specific) problem, which could really just be solved by knowing how many times certain terms are searched for.
This can be done by interfacing with the Google Suggest protocol for webbrowser search bars. When you give it a word, it returns a list of suggested phrases as well as the number of times each phase has been searched (I'm not sure about the time unit, presumably in the last year).
Here is some python code for doing this, slightly adapted from code by odewahn1 at O'reilly Answers and working on Python 2.6 and lower:
from sgmllib import SGMLParser
import urllib2
import urllib
# Define the class that will parse the suggestion XML
class PullSuggestions(SGMLParser):
def reset(self):
SGMLParser.reset(self)
self.suggestions = []
self.queries = []
def start_suggestion(self, attrs):
for a in attrs:
if a[0] == 'data': self.suggestions.append(a[1])
def start_num_queries(self, attrs):
for a in attrs:
if a[0] == 'int': self.queries.append(a[1])
# ENTER THE BASE QUERY HERE
base_query = "" #This is the base query
base_query += "%s"
alphabet = "abcdefghijklmnopqrstuvwxyz"
for letter in alphabet:
q = base_query % letter;
query = urllib.urlencode({'q' : q})
url = "http://google.com/complete/search?output=toolbar&%s" % query
res = urllib2.urlopen(url)
parser = PullSuggestions()
parser.feed(res.read())
parser.close()
for i in range(0,len(parser.suggestions)):
print "%s\t%s" % (parser.suggestions[i], parser.queries[i])
This at least solves the problem in part, but unfortunately it is still difficult to reliably obtain the number of searches for any specific word or phrase and impossible to obtain the search history of different phrases.
I just started searching for it and found a good way to retrieve it using python in the following script.Basically it is passing specialized quote to google historical financial database.
def get_index(gindex, startdate=20040101):
"""
API wrapper for Google Domestic Trends data.
https://www.google.com/finance/domestic_trends
Available Indices:
'ADVERT', 'AIRTVL', 'AUTOBY', 'AUTOFI', 'AUTO', 'BIZIND', 'BNKRPT',
'COMLND', 'COMPUT', 'CONSTR', 'CRCARD', 'DURBLE', 'EDUCAT', 'INVEST',
'FINPLN', 'FURNTR', 'INSUR', 'JOBS', 'LUXURY', 'MOBILE', 'MTGE',
'RLEST', 'RENTAL', 'SHOP', 'TRAVEL', 'UNEMPL'
"""
base_url = 'http://www.google.com/finance/historical?q=GOOGLEINDEX_US:'
full_url = '%s%s&output=csv&startdate=%s' % (base_url, gindex, startdate)
dframe = read_csv(urlopen(full_url), index_col=0)
dframe.index = DatetimeIndex(dframe.index)
dframe = dframe.sort_index(0)
for col in dframe.columns:
if len(dframe[col].unique()) == 1:
dframe.pop(col)
if len(dframe.columns) == 1 and dframe.columns[0] == 'Close':
dframe.columns = [gindex]
return dframe[gindex]
I couldn't find any documentation provided by Google, but Brad Jasper seems to have come up with some method for querying Insights for information. Note: I'm not sure if it still works... Good luck!
Use Python to Access Google Insights API
Sadly no, however the Google Adwords API Keyword Estimator may solve your need