get more than 10 results Google custom search API - python

I am trying to use Google custom search API,what I want to do is search the first 20 results, I tried changing the num=10 in URL to 20 but gives 400 Error, How can I fix or requests the second page of results ?(Note I am using search entire web)
Here is the code I am using
import requests,json
url="https://www.googleapis.com/customsearch/v1?q=SmartyKat+Catnip+Cat+Toys&cx=012572433248785697579%3A1mazi7ctlvm&num=10&fields=items(link%2Cpagemap%2Ctitle)&key={YOUR_API_KEY}"
res=requests.get(url)
di=json.loads(res.text)

Unfortunately, it is not possible to receive more than 10 results from Google custom search API. However, if you do want more results you can make multiple calls by increasing your start parameter by 10.
See this link: https://developers.google.com/custom-search/v1/using_rest#query-params

The information in the accepted answer https://stackoverflow.com/a/55866268/42346 is accurate.
Below is a Python function I wrote as an extension of the function in the 4th step of this answer https://stackoverflow.com/a/37084643/42346 to return up to 100 results from the Google Search API. It increases the start parameter by 10 for each API call, handling the number of results to return automatically. For example, if you request 25 results the function will induce 3 API calls of: 10 results, 10 results, and 5 results.
Background information:
For instructions on how to set-up a Google Custom Search engine: https://stackoverflow.com/a/37084643/42346
More detail about how to specify that it search the entire web here:
https://stackoverflow.com/a/11206266/42346
from googleapiclient.discovery import build
from pprint import pprint as pp
import math
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)
num_search_results = kwargs['num']
if num_search_results > 100:
raise NotImplementedError('Google Custom Search API supports max of 100 results')
elif num_search_results > 10:
kwargs['num'] = 10 # this cannot be > 10 in API call
calls_to_make = math.ceil(num_search_results / 10)
else:
calls_to_make = 1
kwargs['start'] = start_item = 1
items_to_return = []
while calls_to_make > 0:
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
items_to_return.extend(res['items'])
calls_to_make -= 1
start_item += 10
kwargs['start'] = start_item
leftover = num_search_results - start_item + 1
if 0 < leftover < 10:
kwargs['num'] = leftover
return items_to_return
And here's an example of how you'd call that:
NUM_RESULTS = 25
MY_SEARCH = 'why do cats chase their own tails'
MY_API_KEY = 'Google API key'
MY_CSE_ID = 'Custom Search Engine ID'
results = google_search(MY_SEARCH, MY_API_KEY, MY_CSE_ID, num=NUM_RESULTS)
for result in results:
pp(result)

Related

How to get all customers data from Shopify Python API?

For a private Shopify app, I want to retrieve all the customers data and write into a csv file. I have tried the option below for getting page-wise 250 records at a time. But I am getting an error:
HTTPError: Bad Request
shopify.ShopifyResource.set_site(shop_url)
import sys
import pandas as pd
% Get all customers
def get_all_resources(resource, **kwargs):
resource_count = resource.count(**kwargs)
resources = []
if resource_count > 0:
for page in range(1, ((resource_count-1) // 250) + 2):
kwargs.update({"limit" : 250, "page" : page})
resources.extend(resource.find(**kwargs))
return resources
all_customers = get_all_resources(shopify.Customer)
data=[]
for customer in all_customers:
tempdata=[]
tempdata.append(customer.id)
tempdata.append(customer.first_name)
tempdata.append(customer.last_name)
tempdata.append(customer.addresses)
tempdata.append(customer.phone)
tempdata.append(customer.email)
data.append(tempdata)
df=pd.DataFrame(data,columns=['CustomerCode','FirstName','LastName','Address','MobileNo','Email'])
df.to_csv('CustomerDataFromServer.csv',index=False)
shopify.ShopifyResource.clear_session()
You cannot use page-based pagination anymore.
Use cursor-based pagination instead.

How to retrieve large amounts of data (5000+ videos) from YouTube Data API v3?

My goal is to extract all videos from a playlist which can have many videos, ~3000 and can have more than 5000 videos. With maxResults=50 and after implementing pagination with nextPageToken, I'm only able to call the API 20 times, after which nextPageToken isn't sent with the response
I'm calling the API from a python application. I have a while loop running till nextPageToken isn't sent, ideally this should happen AFTER all the videos are extracted, but it prematurely exits after calling the API 19-20 times
def main():
youtube = get_authorised_youtube() # returns YouTube resource authorized with OAuth.
first_response = make_single_request(youtube, None) # make_single_request() takes in the youtube resource and nextPageToken, if any.
nextPageToken = first_response["nextPageToken"]
try:
count = 0
while True:
response = make_single_request(youtube, nextPageToken)
nextPageToken = response["nextPageToken"]
count += 1
print(count, end=" ")
print(nextPageToken)
except KeyError as e: # KeyError to catch if nextPageToken wasn't present
response.pop("items")
print(response) # prints the last response for analysis
if __name__ == '__main__':
main()
snippet of make_single_request():
def make_single_request(youtube, nextPageToken):
if nextPageToken is None:
request = youtube.videos().list(
part="id",
myRating="like",
maxResults=50
)
else:
request = youtube.videos().list(
part="id",
myRating="like",
pageToken=nextPageToken,
maxResults=50
)
response = request.execute()
return response
Expected the code to make upwards of 50 API calls but is observed to only make around 20 calls, consistently.
Note: The following code was executed with an unpaid GCP account. The calls made has part="id" which has a quota cost of 0. The calls limit according to GCP is: 10,000. According to the quota on the console, I make only 20.
Output:
1 CGQQAA
2 CJYBEAA
3 CMgBEAA
4 CPoBEAA
5 CKwCEAA
6 CN4CEAA
7 CJADEAA
8 CMIDEAA
9 CPQDEAA
10 CKYEEAA
11 CNgEEAA
12 CIoFEAA
13 CLwFEAA
14 CO4FEAA
15 CKAGEAA
16 CNIGEAA
17 CIQHEAA
18 CLYHEAA
19 {'kind': 'youtube#videoListResponse', 'etag': '"ETAG"', 'prevPageToken': 'CLYHEAE', 'pageInfo': {'totalResults': TOTAL_RESULTS(>4000), 'resultsPerPage': 50}}
EDIT: After changing maxResults=20, It is observed that the code makes around 50 API calls, therefore the total number of videos that can be extracted is a constant at 1000.
For obtaining the entire list of liked videos of a given channel without any omissions, I suggest you to use PlaylistItems endpoint instead, queried for the given channel's liked-videos playlist by passing a proper value to the endpoint's playlistId parameter.
A given channel's liked-videos playlist ID is obtained upon querying the channel's own endpoint. The needed ID is to be found at .items.contentDetails.relatedPlaylists.likes.
if the goal is to retrieve the FULL list of liked videos in a tideous but working way you can checkout this question.
you basically scrape the data of a deeplink page...
and whats not mentioned in this post is that after you have retrieved the video ids and you may want more data, you can use the videos endpoint with a list of comma seperated video ids to get more informations.
if you need inspirations for the script this is an adjusted version of the api scripts that are provided by youtube
just adjust the credentials file path and the input path of the file thats been retrieved by doing the webscrape
import os
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors
import json
scopes = ["https://www.googleapis.com/auth/youtube.readonly"]
def do_request(youtube, video_ids):
#https://developers.google.com/youtube/v3/docs/videos/list
request = youtube.videos().list(
part='contentDetails,id,snippet,statistics',
id=','.join(video_ids),
maxResults=50
)
return request.execute()["items"]
def main(video_ids):
# Disable OAuthlib's HTTPS verification when running locally.
# *DO NOT* leave this option enabled in production.
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
client_secrets_file = "INPUTAPICREDFILEHERE./creds.json"
# Get credentials and create an API client
flow = google_auth_oauthlib.flow.InstalledAppFlow.from_client_secrets_file(
client_secrets_file, scopes)
credentials = flow.run_console()
youtube = googleapiclient.discovery.build(
api_service_name, api_version, credentials=credentials)
data = { 'items': [] }
current_id_batch = []
for id in video_ids:
if len(current_id_batch) == 50:
print(f"Fetching.. current batch {len(data['items'])} of {len(video_ids)}")
result = do_request(youtube, current_id_batch)
data['items'].extend(result)
current_id_batch = []
current_id_batch.append(id)
result = do_request(youtube, current_id_batch)
data['items'].extend(result)
with open('./data.json', 'w') as outfile:
outfile.write(json.dumps(data, indent=4))
if __name__ == "__main__":
liked_vids = {}
f = open('PATHTOLIKEDVIDEOS/liked_videos.json', encoding="utf8")
liked_vids = json.load(f)
main(list(liked_vids.keys()))
Try to wait some time in a such way:
import time
time.sleep(1) # time here in seconds

Dealing with request rate limits, MusicBrainz API

Question: Is a time delay a good way of dealing with request rate limits?
I am very new to requests, APIs and web services. I am trying to create a web service that, given an ID, makes a request to MusicBrainz API and retrieves some information. However, apparently I am making too many requests, or making them too fast. In the last line of the code, if the delay parameter is set to 0, this error will appear:
{'error': 'Your requests are exceeding the allowable rate limit. Please see http://wiki.musicbrainz.org/XMLWebService for more information.'}
And looking into that link, I found out that:
The rate at which your IP address is making requests is measured. If that rate is too high, all your requests will be declined (http 503) until the rate drops again. Currently that rate is (on average) 1 request per second.
Therefore I thought, okey, I will insert a time delay of 1 second, and it will work. And it worked, but I guess there are nicer, neater and smarter ways of dealing with such a problem. Do you know one?
CODE:
####################################################
################### INSTRUCTIONS ###################
####################################################
'''
This script runs locally and returns a JSON formatted file, containing
information about the release-groups of an artist whose MBID must be provided.
'''
#########################################
############ CODE STARTS ################
#########################################
#IMPORT PACKAGES
#All of them come with Anaconda3 installation, otherwise they can be installed with pip
import requests
import json
import math
import time
#Base URL for looking-up release-groups on musicbrainz.org
root_URL = 'http://musicbrainz.org/ws/2/'
#Parameters to run an example
offset = 10
limit = 1
MBID = '65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab'
def collect_data(MBID, root_URL):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
MBID - MusicBrainz Identity of some artist.
root_URL - MusicBrainz root_URL for requests
Returns:
decoded_output - dictionary file containing all the information about the release-groups
of type album of the requested artist
'''
#Joins paths. Note: Release-groups can be filtered by type.
URL_complete = root_URL + 'release-group?artist=' + MBID + '&type=album' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
assert request.status_code == 200
output = request.content #bits file
decoded_output = json.loads(output) #dict file
return decoded_output
def collect_releases(release_group_id, root_URL, delay = 1):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
release_group_id - ID of the release-group whose number of releases is to be extracted
root_URL - MusicBrainz root_URL for requests
Returns:
releases_count - integer containing the number of releases of the release-group
'''
URL_complete = root_URL + 'release-group/' + release_group_id + '?inc=releases' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
#Parses the content of the request to a dictionary
output = request.content
decoded_output = json.loads(output)
#Time delay to not exceed MusicBrainz request rate limit
time.sleep(delay)
releases_count = 0
if 'releases' in decoded_output:
releases_count = len(decoded_output['releases'])
else:
print(decoded_output)
#raise ValueError(decoded_output)
return releases_count
def paginate(store_albums, offset, limit = 50):
'''
Description: Auxiliar function to paginate results
Arguments:
store_albums - Dictionary containing information about each release-group
offset - Integer. Corresponds to starting album to show.
limit - Integer. Default to 50. Maximum number of albums to show per page
Returns:
albums_paginated - Paginated albums according to specified limit and offset
'''
#Restricts limit to 150
if limit > 150:
limit = 150
if offset > len(store_albums['albums']):
raise ValueError('Offset is greater than number of albums')
#Apply offset
albums_offset = store_albums['albums'][offset:]
#Count pages
pages = math.ceil(len(albums_offset) / limit)
albums_limited = []
if len(albums_offset) > limit:
for i in range(pages):
albums_limited.append(albums_offset[i * limit : (i+1) * limit])
else:
albums_limited = albums_offset
albums_paginated = {'albums' : None}
albums_paginated['albums'] = albums_limited
return albums_paginated
def post(MBID, offset, limit, delay = 1):
#Calls the auxiliar function 'collect_data' that retrieves the JSON file from MusicBrainz API
json_file = collect_data(MBID, root_URL)
#Creates list and dictionary for storing the information about each release-group
album_details_list = []
album_details = {"id": None, "title": None, "year": None, "release_count": None}
#Loops through all release-groups in the JSON file
for item in json_file['release-groups']:
album_details["id"] = item["id"]
album_details["title"] = item["title"]
album_details["year"] = item["first-release-date"].split("-")[0]
album_details["release_count"] = collect_releases(item["id"], root_URL, delay)
album_details_list.append(album_details.copy())
#Creates dictionary with all the albums of the artist
store_albums = {"albums": None}
store_albums["albums"] = album_details_list
#Paginates the dictionary
stored_paginated_albums = paginate(store_albums, offset , limit)
#Returns JSON typed file containing the different albums arranged according to offset&limit
return json.dumps(stored_paginated_albums)
#Runs the program and prints the JSON output as specified in the wording of the exercise
print(post(MBID, offset, limit, delay = 1))
There aren't any nicer ways of dealing with this problem, other than asking the API owner to increase your rate limit. The only way to avoid a rate limit problem is by not making too many requests at a time, and besides hacking the API in such a way that you bypass its requests counter, you're stuck with waiting one second between each request.

Search via Python Search API timing out intermittently

We have an application that is basically just a form submission for requesting a team drive to be created. It's hosted on Google App Engine.
This timeout error is coming from a single field in the form that simply does typeahead for an email address. All of the names on the domain are indexed in the datastore, about 300k entities - nothing is being pulled directly from the directory api. After 10 seconds of searching (via the Python Google Search API), it will time out. This is currently intermittent, but errors have been increasing in frequency.
Error: line 280, in get_result raise _ToSearchError(e) Timeout: Failed to complete request in 9975ms
Essentially, speeding up the searches will resolve. I looked at the code and I don't believe there is any room for improvement there. I am not sure if increasing the instance class will improve this, it is currently an F2. Or if perhaps there is another way to improve the index efficiency. I'm not entirely sure how one would do that however. Any thoughts would be appreciated.
Search Code:
class LookupUsersorGrpService(object):
'''
lookupUsersOrGrps accepts various params and performs search
'''
def lookupUsersOrGrps(self,params):
search_results_json = {}
search_results = []
directory_users_grps = GoogleDirectoryUsers()
error_msg = 'Technical error'
query = ''
try:
#Default few values if not present
if ('offset' not in params) or (params['offset'] is None):
params['offset'] = 0
else:
params['offset'] = int(params['offset'])
if ('limit' not in params) or (params['limit'] is None):
params['limit'] = 20
else:
params['limit'] = int(params['limit'])
#Search related to field name
query = self.appendQueryParam(q=query, p=params, qname='search_name', criteria=':', pname='query', isExactMatch=True,splitString=True)
#Search related to field email
query = self.appendQueryParam(q=query, p=params, qname='search_email', criteria=':', pname='query', isExactMatch=True, splitString=True)
#Perform search
log.info('Search initialized :\"{}\"'.format(query) )
# sort results by name ascending
expr_list = [search.SortExpression(expression='name', default_value='',direction=search.SortExpression.ASCENDING)]
# construct the sort options
sort_opts = search.SortOptions(expressions=expr_list)
#Prepare the search index
index = search.Index(name= "GoogleDirectoryUsers",namespace="1")
search_query = search.Query(
query_string=query.strip(),
options=search.QueryOptions(
limit=params['limit'],
offset=params['offset'],
sort_options=sort_opts,
returned_fields = directory_users_grps.get_search_doc_return_fields()
))
#Execute the search query
search_result = index.search(search_query)
#Start collecting the values
total_cnt = search_result.number_found
params['limit'] = len(search_result.results)
#Prepare the response object
for teamdriveDoc in search_result.results:
teamdriveRecord = GoogleDirectoryUsers.query(GoogleDirectoryUsers.email==teamdriveDoc.doc_id).get()
if teamdriveRecord:
if teamdriveRecord.suspended == False:
search_results.append(teamdriveRecord.to_dict())
search_results_json.update({"users" : search_results})
search_results_json.update({"limit" : params['limit'] if len(search_results)>0 else '0'})
search_results_json.update({"total_count" : total_cnt if len(search_results)>0 else '0'})
search_results_json.update({"status" : "success"})
except Exception as e:
log.exception("Error in performing search")
search_results_json.update({"status":"failed"})
search_results_json.update({"description":error_msg})
return search_results_json
''' Retrieves the given param from dict and adds to query if exists
'''
def appendQueryParam(self, q='', p=[], qname=None, criteria='=', pname=None,
isExactMatch = False, splitString = False, defaultValue=None):
if (pname in p) or (defaultValue is not None):
if len(q) > 0:
q += ' OR '
q += qname
if criteria:
q += criteria
if defaultValue is None:
val = p[pname]
else:
val = defaultValue
if splitString:
val = val.replace("", "~")[1: -1]
#Helps to retain passed argument as it is, example email
if isExactMatch:
q += "\"" +val + "\""
else:
q += val
return q
An Index instance's search method accepts a deadline parameter, so you could use that to increase the time that you are willing to wait for the search to respond:
search_result = index.search(search_query, deadline=30)
The documentation doesn't specify acceptable value for deadline, but other App Engine services tend to accept values up to 60 seconds.

Turning off Ads using Facebook Marketing API v2.9 in Python

I've been struggling on turning off Facebook Ads that are not performing well inside an Adset using Python and the Facebook marketing API. I'm a little bit concerned about the number of access that my code does to the API. Another concern is that I'm using 'get_insights' method to have access to the parameters that I want to use to build a logical decision, but I need to use 'get_ads' to be able to turn them on/off, so I feel that I'm doing things doubled.
Here's an example of what I've been doing so far using the API v2.9:
from facebookads.api import FacebookAdsApi
from facebookads import adobjects
from facebookads.adobjects.adaccount import *
from facebookads.adobjects.campaign import *
from facebookads.adobjects.adset import *
from facebookads.adobjects.ad import *
from fctn import * # this is just a file where I centralized some functions
import credentials
import copy
# Auth
my_app_id = credentials.my_app_id
my_app_secret = credentials.my_app_secret
my_access_token = credentials.my_access_token
api = FacebookAdsApi.init(my_app_id, my_app_secret, my_access_token)
ad_account = AdAccount(credentials.ad_account)
# Batch creation
my_batch = api.new_batch()
# Desired fields
fields = ['campaign_name', 'adset_name', 'ad_name', 'ctr', 'impressions']
# Getting all Adsets
ad_sets = ad_account.get_ad_sets(fields=[AdSet.Field.name, Ad.Field.created_time, Ad.Field.status],
params={
'effective_status': ['ACTIVE'],
'date_preset': 'last_30d',
'limit': 5000})
# We'll iterate over each adset because we want to campare just the ads inside the same adset
for ad_set in ad_sets:
ads = ad_set.get_ads(fields=[Ad.Field.name, Ad.Field.created_time, Ad.Field.status],
params={'effective_status': ['ACTIVE'],
'date_preset': 'last_30d',
'limit': 5000})
ads_insights = ad_set.get_insights(fields=fields,
params={'level': 'ad',
'date_preset': 'last_30d',
'effective_status': ['ACTIVE'],
'limit': 5000})
# this is an external funtion to get the median in relation with some metric
median_ctr = median_metric(ads_insights, 'ctr')
print(median_ctr)
print(ads_insights[0]['campaign_name'])
print(ad_set['name'])
print('BEFORE')
print(ads)
for i in range(0, len(ads)):
if dias_ate_hoje(ads[i]['created_time'][:10]) < 10:
# If Ad running less than 10 days, keep going
continue
else:
if float(ads_insights[i]['impressions']) < 300:
# If impressions less then 300, keep going (just an arbitrary decision here)
continue
else:
if float(ads_insights[i]['ctr']) < median_ctr:
# If Ad is in the worst half in relation with CTR: turn off
ads[i].api_update(params={'status': 'PAUSED'}, batch=my_batch)
else:
continue
my_batch.execute()
print('AFTER')
print(ads)
Hope anyone who had already done something like that could help me making this code better with less access to the API and doubled pieces of code.
Thanks.
This is what you can do:
Get all ads from account level ad_account.get_ads()
Get insights from account ad_account.get_insights(fields=fields,params={'level': 'ad', ...})
In your insights API calls, you need to add adset_id and ad_id to fields, so that you can calculate adset median ctr and pause ads.
In this way you don't need to loop over adsets and make API calls for each of them. If the insights data is too large, you can try the async insights api https://developers.facebook.com/docs/marketing-api/insights/best-practices#asynchronous

Categories

Resources