Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I've been looking for an API to automatically retrieve Google Insights information for part of another algorithm, but have been unable to find anything. The first result on Google delivers a site with a python plugin which is now out of date.
Does such an API exist, or has anyone written a plugin, perhaps for python?
As far as I can tell, there is no API available as of yet, and neither is there a working implementation of a method for extracting data from Google Insights. However, I have found a solution to my (slightly more specific) problem, which could really just be solved by knowing how many times certain terms are searched for.
This can be done by interfacing with the Google Suggest protocol for webbrowser search bars. When you give it a word, it returns a list of suggested phrases as well as the number of times each phase has been searched (I'm not sure about the time unit, presumably in the last year).
Here is some python code for doing this, slightly adapted from code by odewahn1 at O'reilly Answers and working on Python 2.6 and lower:
from sgmllib import SGMLParser
import urllib2
import urllib
# Define the class that will parse the suggestion XML
class PullSuggestions(SGMLParser):
def reset(self):
SGMLParser.reset(self)
self.suggestions = []
self.queries = []
def start_suggestion(self, attrs):
for a in attrs:
if a[0] == 'data': self.suggestions.append(a[1])
def start_num_queries(self, attrs):
for a in attrs:
if a[0] == 'int': self.queries.append(a[1])
# ENTER THE BASE QUERY HERE
base_query = "" #This is the base query
base_query += "%s"
alphabet = "abcdefghijklmnopqrstuvwxyz"
for letter in alphabet:
q = base_query % letter;
query = urllib.urlencode({'q' : q})
url = "http://google.com/complete/search?output=toolbar&%s" % query
res = urllib2.urlopen(url)
parser = PullSuggestions()
parser.feed(res.read())
parser.close()
for i in range(0,len(parser.suggestions)):
print "%s\t%s" % (parser.suggestions[i], parser.queries[i])
This at least solves the problem in part, but unfortunately it is still difficult to reliably obtain the number of searches for any specific word or phrase and impossible to obtain the search history of different phrases.
I just started searching for it and found a good way to retrieve it using python in the following script.Basically it is passing specialized quote to google historical financial database.
def get_index(gindex, startdate=20040101):
"""
API wrapper for Google Domestic Trends data.
https://www.google.com/finance/domestic_trends
Available Indices:
'ADVERT', 'AIRTVL', 'AUTOBY', 'AUTOFI', 'AUTO', 'BIZIND', 'BNKRPT',
'COMLND', 'COMPUT', 'CONSTR', 'CRCARD', 'DURBLE', 'EDUCAT', 'INVEST',
'FINPLN', 'FURNTR', 'INSUR', 'JOBS', 'LUXURY', 'MOBILE', 'MTGE',
'RLEST', 'RENTAL', 'SHOP', 'TRAVEL', 'UNEMPL'
"""
base_url = 'http://www.google.com/finance/historical?q=GOOGLEINDEX_US:'
full_url = '%s%s&output=csv&startdate=%s' % (base_url, gindex, startdate)
dframe = read_csv(urlopen(full_url), index_col=0)
dframe.index = DatetimeIndex(dframe.index)
dframe = dframe.sort_index(0)
for col in dframe.columns:
if len(dframe[col].unique()) == 1:
dframe.pop(col)
if len(dframe.columns) == 1 and dframe.columns[0] == 'Close':
dframe.columns = [gindex]
return dframe[gindex]
I couldn't find any documentation provided by Google, but Brad Jasper seems to have come up with some method for querying Insights for information. Note: I'm not sure if it still works... Good luck!
Use Python to Access Google Insights API
Sadly no, however the Google Adwords API Keyword Estimator may solve your need
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am pretty new to programing so I am sure this is not correct but its the best I can do based on my research. Thanks.
import pandas as pd
import numpy as np
import requests
import yelp
requests.get(https://api.yelp.com/v3/autocomplete?text=del&latitude=37.786882&longitude=-122.399972,headers={'Authorization: Bearer <API KEY that I have>'})
My noob self tells me this is a dictonary
headers={'Authorization: Bearer <API KeY>'}
I know this is prob 100% wrong so I would really love to learn more about using rest API's in Python. I am just doing this as a personal project. My overall goal is to be able to access yelps public data via API. For example, I want to get the reviews for business X.
Update
requests.get("https://api.yelp.com/v3/autocomplete?text=del&latitude=37.786882&longitude=-122.399972",headers={'Authorization: Bearer <API KEY>'})
I now get the following error
AttributeError: 'set' object has no attribute 'items'
You're definitely not 100% wrong #g_altobelli!
Let's take the example of getting reviews for business X, where X is one of my favorite restaurants -- la taqueria in San Francisco. Their restaurant id (which can be found in the url of their review page as the last element) is la-taqueria-san-francisco-2.
Now to our code:
You have the right idea using requests, I think your parameters might just be slightly off. It's helpful inititally to have some headers. Here's what I added:
import requests
API_KEY = "<my api key>"
API_HOST = 'https://api.yelp.com'
BUSINESS_PATH = '/v3/businesses/'
Then I created a function, that took in the business id and returned a jsonified result of the basic data. That looked like this:
def get_business(business_id):
business_path = BUSINESS_PATH + business_id
url = API_HOST + business_path + '/reviews'
headers = {'Authorization': f"Bearer {API_KEY}"}
response = requests.get(url, headers=headers)
return response.json()
Finally, I called the function with my values and printed the result:
results = get_business('la-taqueria-san-francisco-2')
print(results)
The output I got was json, and looked roughly like the following:
{'reviews': [{'id': 'pD3Yvc4QdUCBISy077smYw', 'url': 'https://www.yelp.com/biz/la-taqueria-san-francisco-2?hrid=pD3Yvc4QdUCBISy077smYw&adjust_creative=hEbqN49-q6Ct_cMosX68Zg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=hEbqN49-q6Ct_cMosX68Zg', 'text': 'My second time here.. \nI love the Burito here it has the distinct taste of freshness.. we order super steak burito and boy it did not disappoint! everything...}
Does this help? Let me know if you have any more questions.
We have a question - answer corpus like shown below
Q: Why did Lincoln issue the Emancipation Proclamation?
A: The goal was to weaken the rebellion, which was led and controlled by slave owners.
Q: Who is most noted for his contributions to the theory of molarity and molecular weight?
A: Amedeo Avogadro
Q: When did he drop John from his name?
A: upon graduating from college
Q: What do beetles eat?
A: Some are generalists, eating both plants and animals. Other beetles are highly specialised in their diet.
Consider question as queries and answers as documents.
We have to build a system that for a given query (semantically similar to one of the questions in the question corpus) be able to get the right document (answers in the answer corpus)
Can anyone suggest any algorithm or good way to proceed in building it.
Your question is too broad and the task you are trying to do is challenging. However, I suggest you to read about IR-based Factoid Question Answering. This document has reference to many state-of-art techniques. Reading this document should lead you to several ideas.
Please note that, you need to follow different approach for IR-based Factoid QA and knowledge-based QA. First, identify what type of QA system you want to build.
Lastly, I believe simple document matching technique for QA won't be enough. But you can try simple approach using Lucene #Debasis suggested and see whether it does well.
Consider a question and its answer (assuming there is only one) as one single document in Lucene. Lucene supports a field view of documents; so while constructing a document make question the searchable field. Once you retrieve the top ranked questions given a query question, use the get method of the Document class to return the answers.
A code skeleton (fill this up yourself):
//Index
IndexWriterConfig iwcfg = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(...);
....
Document doc = new Document();
doc.add(new Field("FIELD_QUESTION", questionBody, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("FIELD_ANSWER", answerBody, Field.Store.YES, Field.Index.ANALYZED));
...
...
// Search
IndexReader reader = new IndexReader(..);
IndexSearcher searcher = new IndexSearcher(reader);
...
...
QueryParser parser = new QueryParser("FIELD_QUESTION", new StandardAnalyzer());
Query q = parser.parse(queryQuestion);
...
...
TopDocs topDocs = searcher.search(q, 10); // top-10 retrieved
// Accumulate the answers from the retrieved questions which
// are similar to the query (new) question.
StringBuffer buff = new StringBuffer();
for (ScoreDoc sd : topDocs.scoreDocs) {
Document retrievedDoc = reader.document(sd.doc);
buff.append(retrievedDoc.get("FIELD_ANSWER")).append("\n");
}
System.out.println("Generated answer: " + buff.toString());
I'm attempting to search for contacts via the Google Contacts API, using multiple search terms. Searching by a single term works fine and returns contact(s):
query = gdata.contacts.client.ContactsQuery()
query.text_query = '1048'
feed = gd_client.GetContacts(q=query)
for entry in feed.entry:
# Do stuff
However, I would like to search by multiple terms:
query = gdata.contacts.client.ContactsQuery()
query.text_query = '1048 1049 1050'
feed = gd_client.GetContacts(q=query)
When I do this, no results are returned, and I've found so far that spaces are being replaced by + signs:
https://www.google.com/m8/feeds/contacts/default/full?q=3066+3068+3073+3074
I'm digging through the gdata-client-python code right now to find where it's building the query string, but wanted to toss the question out there as well.
According to the docs, both types of search are supported by the API, and I've seen some similar docs when searching through related APIs (Docs, Calendar, etc):
https://developers.google.com/google-apps/contacts/v3/reference#contacts-query-parameters-reference
Thanks!
Looks like I was mistaken in my understanding of the gdata query string functionality.
https://developers.google.com/gdata/docs/2.0/reference?hl=en#Queries
'The service returns all entries that match all of the search terms (like using AND between terms).'
Helps to read the docs and understand them!
Here is my Duck Duck Go search script.
import duckduckgo
r = duckduckgo.query('DuckDuckGo')
print r.results[0].url
it returns; list index out of range. If i print r.results i get;
[<duckduckgo.Result object at 0x0000000002E98F60>]
But if i search for anything other than 'DuckDuckGo'. It returns an Empty value
[]
I havefollowed exactly what they did in the example code.
https://github.com/mikejs/python-duckduckgo
That is documented behaviour. There are different result attributes.
Your first query returns a list of results.
r = duckduckgo.query('DuckDuckGo')
if r.type == 'answer':
print r.results # [<duckduckgo.Result object>]
Your other search returns a disambiguation and your results are in r.related not in r.results
r = duckduckgo.query('Python')
if r.type == 'disambiguation':
print r.related # [<duckduckgo.Result object>]
Edit: python-duckduckgo uses the DuckDuckGo API which does not give you all the search result links
Our Instant Answer API gives you free access to many of our instant
answers like: topic summaries (API example), categories (API example),
disambiguation (API example), !bang redirects (API example), and
definitions (API example).
This API does not include all of our links, however. That is, it is
not a full search results API or a way to get DuckDuckGo results into
your applications beyond our instant answers. Because of the way we
generate our search results, we unfortunately do not have the rights
to fully syndicate our results. For the same reason, we cannot allow
framing our results without our branding. Please see our partnerships
page for more info on guidelines and getting in touch with us.
You can't do what you want to do using the DuckDuckGo API but a possible workaround has been posted on Stackoverflow: https://stackoverflow.com/a/11923803/241866
I'd like to write a script (preferably in python, but other languages is not a problem), that can parse what you type into a google search. Suppose I search 'cats', then I'd like to be able to parse the string cats and, for example, append it to a .txt file on my computer.
So if my searches were 'cats', 'dogs', 'cows' then I could have a .txt file like so,
cats
dogs
cows
Anyone know any APIs that can parse the search bar and return the string inputted? Or some object that I can cast into a string?
EDIT: I don't want to make a chrome extension or anything, but preferably a python (or bash or ruby) script I can run in terminal that can do this.
Thanks
If you have access to the URL, you can look for "&q=" to find the search term. (http://google.com/...&q=cats..., for example).
I can offer 2 popular solution
1) Google have a search-engine API https://developers.google.com/products/#google-search
(It have restriction on 100 requests per day)
cutted code:
def gapi_parser(args):
query = args.text; count = args.max_sites
import config
api_key = config.api_key
cx = config.cx
#Note: This API returns up to the first 100 results only.
#https://developers.google.com/custom-search/v1/using_rest?hl=ru-RU#WorkingResults
results = []; domains = set(); errors = []; start = 1
while True:
req = 'https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={q}&alt=json&start={start}'.format(key=api_key, cx=cx, q=query, start=start)
if start>=100: #google API does not can do more
break
con = urllib2.urlopen(req)
if con.getcode()==200:
data = con.read()
j = json.loads(data)
start = int(j['queries']['nextPage'][0]['startIndex'])
for item in j['items']:
match = re.search('^(https?://)?\w(\w|\.|-)+', item['link'])
if match:
domain = match.group(0)
if domain not in results:
results.append(domain)
domains.update([domain])
else:
errors.append('Can`t recognize domain: %s' % item['link'])
if len(domains) >= args.max_sites:
break
print
for error in errors:
print error
return (results, domains)
2) I wrote a selenuim based script what parse a page in real browser instance, but this solution have a some restrictions, for example captcha if you run searches like a robots.
A few options you might consider, with their advantages and disadvantages:
URL:
advantage: as Chris mentioned, accessing the URL and manually changing it is an option. It should be easy to write a script for this, and I can send you my perl script if you want
disadvantage: I am not sure if you can do it. I made a perl script for that before, but it didn't work because Google states that you can't use its services outside the Google interface. You might face the same problem
Google's search API:
advantage: popular choice. Good documentation. It should be a safe choice
disadvantage: Google's restrictions.
Research other search engines:
advantage: they might not have the same restrictions as Google. You might find some search engines that let you play around more and have more freedom in general.
disadvantage: you're not going to get results that are as good as Google's