How to get all customers data from Shopify Python API? - python

For a private Shopify app, I want to retrieve all the customers data and write into a csv file. I have tried the option below for getting page-wise 250 records at a time. But I am getting an error:
HTTPError: Bad Request
shopify.ShopifyResource.set_site(shop_url)
import sys
import pandas as pd
% Get all customers
def get_all_resources(resource, **kwargs):
resource_count = resource.count(**kwargs)
resources = []
if resource_count > 0:
for page in range(1, ((resource_count-1) // 250) + 2):
kwargs.update({"limit" : 250, "page" : page})
resources.extend(resource.find(**kwargs))
return resources
all_customers = get_all_resources(shopify.Customer)
data=[]
for customer in all_customers:
tempdata=[]
tempdata.append(customer.id)
tempdata.append(customer.first_name)
tempdata.append(customer.last_name)
tempdata.append(customer.addresses)
tempdata.append(customer.phone)
tempdata.append(customer.email)
data.append(tempdata)
df=pd.DataFrame(data,columns=['CustomerCode','FirstName','LastName','Address','MobileNo','Email'])
df.to_csv('CustomerDataFromServer.csv',index=False)
shopify.ShopifyResource.clear_session()

You cannot use page-based pagination anymore.
Use cursor-based pagination instead.

Related

How to conect data scraping with google sheets

i have this code from a web scraping
from smart_sensor_client.smart_sensor_client import SmartSensorClient
import json
import requests
import pprint
DEFAULT_SETTINGS_FILE = 'settings.yaml'
def run_task(settings_file=DEFAULT_SETTINGS_FILE) -> bool:
# Create the client instance
client = SmartSensorClient(settings_file=settings_file)
# Authenticate
if not client.authenticate():
print('Authentication FAILED')
return False
# Get list of plants
plants = client.get_plant_list()
# Iterate the plant list and print all assets therein
for plant in plants:
# Get list of assets
response = client.get_asset_list(organization_id=client.organization_id)
if len(response) == 0:
print('No assets in this plant')
else:
for asset in response:
print(asset["assetName"],':',asset["lastSyncTimeStamp"])
...
And the following answer. json that i filter with the information i am interested in.
Ventilador FCH 11 : 2020-06-11T09:48:45Z
VTL SVA1 : 2020-06-11T10:20:43Z
Dardelet PV-1 : 2020-06-11T09:58:14Z
CANDLOT 1 (MOTOR N°2) : 2020-06-11T10:37:39Z
PC N°1 S1/2A : 2020-06-11T10:57:34Z
VTL SVA2 : 2020-06-11T11:31:08Z
Ventilador FCH 6 : 2020-06-11T11:43:28Z
Vibrotamiz Tolva Tampon : 2020-06-11T11:44:43Z
Ventilador FCH 10 : 2020-06-11T11:08:03Z
Task SUCCESS
I would like to paste this on a google sheet but i dont know how if its possible. Thank you!

get more than 10 results Google custom search API

I am trying to use Google custom search API,what I want to do is search the first 20 results, I tried changing the num=10 in URL to 20 but gives 400 Error, How can I fix or requests the second page of results ?(Note I am using search entire web)
Here is the code I am using
import requests,json
url="https://www.googleapis.com/customsearch/v1?q=SmartyKat+Catnip+Cat+Toys&cx=012572433248785697579%3A1mazi7ctlvm&num=10&fields=items(link%2Cpagemap%2Ctitle)&key={YOUR_API_KEY}"
res=requests.get(url)
di=json.loads(res.text)
Unfortunately, it is not possible to receive more than 10 results from Google custom search API. However, if you do want more results you can make multiple calls by increasing your start parameter by 10.
See this link: https://developers.google.com/custom-search/v1/using_rest#query-params
The information in the accepted answer https://stackoverflow.com/a/55866268/42346 is accurate.
Below is a Python function I wrote as an extension of the function in the 4th step of this answer https://stackoverflow.com/a/37084643/42346 to return up to 100 results from the Google Search API. It increases the start parameter by 10 for each API call, handling the number of results to return automatically. For example, if you request 25 results the function will induce 3 API calls of: 10 results, 10 results, and 5 results.
Background information:
For instructions on how to set-up a Google Custom Search engine: https://stackoverflow.com/a/37084643/42346
More detail about how to specify that it search the entire web here:
https://stackoverflow.com/a/11206266/42346
from googleapiclient.discovery import build
from pprint import pprint as pp
import math
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)
num_search_results = kwargs['num']
if num_search_results > 100:
raise NotImplementedError('Google Custom Search API supports max of 100 results')
elif num_search_results > 10:
kwargs['num'] = 10 # this cannot be > 10 in API call
calls_to_make = math.ceil(num_search_results / 10)
else:
calls_to_make = 1
kwargs['start'] = start_item = 1
items_to_return = []
while calls_to_make > 0:
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
items_to_return.extend(res['items'])
calls_to_make -= 1
start_item += 10
kwargs['start'] = start_item
leftover = num_search_results - start_item + 1
if 0 < leftover < 10:
kwargs['num'] = leftover
return items_to_return
And here's an example of how you'd call that:
NUM_RESULTS = 25
MY_SEARCH = 'why do cats chase their own tails'
MY_API_KEY = 'Google API key'
MY_CSE_ID = 'Custom Search Engine ID'
results = google_search(MY_SEARCH, MY_API_KEY, MY_CSE_ID, num=NUM_RESULTS)
for result in results:
pp(result)

Dealing with request rate limits, MusicBrainz API

Question: Is a time delay a good way of dealing with request rate limits?
I am very new to requests, APIs and web services. I am trying to create a web service that, given an ID, makes a request to MusicBrainz API and retrieves some information. However, apparently I am making too many requests, or making them too fast. In the last line of the code, if the delay parameter is set to 0, this error will appear:
{'error': 'Your requests are exceeding the allowable rate limit. Please see http://wiki.musicbrainz.org/XMLWebService for more information.'}
And looking into that link, I found out that:
The rate at which your IP address is making requests is measured. If that rate is too high, all your requests will be declined (http 503) until the rate drops again. Currently that rate is (on average) 1 request per second.
Therefore I thought, okey, I will insert a time delay of 1 second, and it will work. And it worked, but I guess there are nicer, neater and smarter ways of dealing with such a problem. Do you know one?
CODE:
####################################################
################### INSTRUCTIONS ###################
####################################################
'''
This script runs locally and returns a JSON formatted file, containing
information about the release-groups of an artist whose MBID must be provided.
'''
#########################################
############ CODE STARTS ################
#########################################
#IMPORT PACKAGES
#All of them come with Anaconda3 installation, otherwise they can be installed with pip
import requests
import json
import math
import time
#Base URL for looking-up release-groups on musicbrainz.org
root_URL = 'http://musicbrainz.org/ws/2/'
#Parameters to run an example
offset = 10
limit = 1
MBID = '65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab'
def collect_data(MBID, root_URL):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
MBID - MusicBrainz Identity of some artist.
root_URL - MusicBrainz root_URL for requests
Returns:
decoded_output - dictionary file containing all the information about the release-groups
of type album of the requested artist
'''
#Joins paths. Note: Release-groups can be filtered by type.
URL_complete = root_URL + 'release-group?artist=' + MBID + '&type=album' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
assert request.status_code == 200
output = request.content #bits file
decoded_output = json.loads(output) #dict file
return decoded_output
def collect_releases(release_group_id, root_URL, delay = 1):
'''
Description: Auxiliar function to collect data from the MusicBrainz API
Arguments:
release_group_id - ID of the release-group whose number of releases is to be extracted
root_URL - MusicBrainz root_URL for requests
Returns:
releases_count - integer containing the number of releases of the release-group
'''
URL_complete = root_URL + 'release-group/' + release_group_id + '?inc=releases' + '&fmt=json'
#Creates a requests object and sends a GET request
request = requests.get(URL_complete)
#Parses the content of the request to a dictionary
output = request.content
decoded_output = json.loads(output)
#Time delay to not exceed MusicBrainz request rate limit
time.sleep(delay)
releases_count = 0
if 'releases' in decoded_output:
releases_count = len(decoded_output['releases'])
else:
print(decoded_output)
#raise ValueError(decoded_output)
return releases_count
def paginate(store_albums, offset, limit = 50):
'''
Description: Auxiliar function to paginate results
Arguments:
store_albums - Dictionary containing information about each release-group
offset - Integer. Corresponds to starting album to show.
limit - Integer. Default to 50. Maximum number of albums to show per page
Returns:
albums_paginated - Paginated albums according to specified limit and offset
'''
#Restricts limit to 150
if limit > 150:
limit = 150
if offset > len(store_albums['albums']):
raise ValueError('Offset is greater than number of albums')
#Apply offset
albums_offset = store_albums['albums'][offset:]
#Count pages
pages = math.ceil(len(albums_offset) / limit)
albums_limited = []
if len(albums_offset) > limit:
for i in range(pages):
albums_limited.append(albums_offset[i * limit : (i+1) * limit])
else:
albums_limited = albums_offset
albums_paginated = {'albums' : None}
albums_paginated['albums'] = albums_limited
return albums_paginated
def post(MBID, offset, limit, delay = 1):
#Calls the auxiliar function 'collect_data' that retrieves the JSON file from MusicBrainz API
json_file = collect_data(MBID, root_URL)
#Creates list and dictionary for storing the information about each release-group
album_details_list = []
album_details = {"id": None, "title": None, "year": None, "release_count": None}
#Loops through all release-groups in the JSON file
for item in json_file['release-groups']:
album_details["id"] = item["id"]
album_details["title"] = item["title"]
album_details["year"] = item["first-release-date"].split("-")[0]
album_details["release_count"] = collect_releases(item["id"], root_URL, delay)
album_details_list.append(album_details.copy())
#Creates dictionary with all the albums of the artist
store_albums = {"albums": None}
store_albums["albums"] = album_details_list
#Paginates the dictionary
stored_paginated_albums = paginate(store_albums, offset , limit)
#Returns JSON typed file containing the different albums arranged according to offset&limit
return json.dumps(stored_paginated_albums)
#Runs the program and prints the JSON output as specified in the wording of the exercise
print(post(MBID, offset, limit, delay = 1))
There aren't any nicer ways of dealing with this problem, other than asking the API owner to increase your rate limit. The only way to avoid a rate limit problem is by not making too many requests at a time, and besides hacking the API in such a way that you bypass its requests counter, you're stuck with waiting one second between each request.

Why are the videos on the most_recent standard feed so out of date?

I'm trying to grab the most recently uploaded videos. There's a standard feed for that - it's called most_recent. I don't have any problems grabbing the feed, but when I look at the entries inside, they're all half a year old, which is hardly recent.
Here's the code I'm using:
import requests
import os.path as P
import sys
from lxml import etree
import datetime
namespaces = {"a": "http://www.w3.org/2005/Atom", "yt": "http://gdata.youtube.com/schemas/2007"}
fmt = "%Y-%m-%dT%H:%M:%S.000Z"
class VideoEntry:
"""Data holder for the video."""
def __init__(self, node):
self.entry_id = node.find("./a:id", namespaces=namespaces).text
published = node.find("./a:published", namespaces=namespaces).text
self.published = datetime.datetime.strptime(published, fmt)
def __str__(self):
return "VideoEntry[id='%s']" % self.entry_id
def paginate(xml):
root = etree.fromstring(xml)
next_page = root.find("./a:link[#rel='next']", namespaces=namespaces)
if next_page == None:
next_link = None
else:
next_link = next_page.get("href")
entries = [VideoEntry(e) for e in root.xpath("/a:feed/a:entry", namespaces=namespaces)]
return entries, next_link
prefix = "https://gdata.youtube.com/feeds/api/standardfeeds/"
standard_feeds = set("top_rated top_favorites most_shared most_popular most_recent most_discussed most_responded recently_featured on_the_web most_viewed".split(" "))
feed_name = sys.argv[1]
assert feed_name in standard_feeds
feed_url = prefix + feed_name
all_video_ids = []
while feed_url is not None:
r = requests.get(feed_url)
if r.status_code != 200:
break
text = r.text.encode("utf-8")
video_ids, feed_url = paginate(text)
all_video_ids += video_ids
all_upload_times = [e.published for e in all_video_ids]
print min(all_upload_times), max(all_upload_times)
As you can see, it prints the min and max timestamps for the entire feed.
misha#misha-antec$ python get_standard_feed.py most_recent
2013-02-02 14:40:02 2013-02-02 14:54:00
misha#misha-antec$ python get_standard_feed.py top_rated
2006-04-06 21:30:53 2013-07-28 22:22:38
I've glanced through the downloaded XML and it appears to match the output. Am I doing something wrong?
Also, on an unrelated note, the feeds I'm getting are all about 100 entries (I'm paginating through them 25 at a time). Is this normal? I expected the feeds to be a bit bigger.
Regarding the "Most-Recent-Feed"-Topic: There is a ticket for this one here. Unfortunately, the YouTube-API-Teams doesn't respond or solved the problem so far.
Regarding the number of entries: That depends on the type of standardfeed, but for the most-recent-Feed it´s usually around 100.
Note: You could try using the "orderby=published" parameter to get recents videos, although I don´t know how "recent" they are.
https://gdata.youtube.com/feeds/api/videos?orderby=published&prettyprint=True
You can combine this query with the "category"-parameter or other ones (region-specific queries - like for the standard feeds - are not possible, afaik).

Fetching language detection from Google api

I have a CSV with keywords in one column and the number of impressions in a second column.
I'd like to provide the keywords in a url (while looping) and for the Google language api to return what type of language was the keyword in.
I have it working manually. If I enter (with the correct api key):
http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=merde
I get:
{"responseData": {"language":"fr","isReliable":false,"confidence":6.213709E-4}, "responseDetails": null, "responseStatus": 200}
which is correct, 'merde' is French.
so far I have this code but I keep getting server unreachable errors:
import time
import csv
from operator import itemgetter
import sys
import fileinput
import urllib2
import json
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
#not working
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
# Deserialize JSON to Python objects
result_object = json.loads(result)
#Get the rows in the table, then get the second column's value
# for each row
return row in result_object
#not working
def retrieve_terms(seedterm):
print(seedterm)
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=%(seed)s'
url = url_template % {"seed": seedterm}
try:
with urllib2.urlopen(url) as data:
data = perform_request(seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
#terms = parse_result(result)
#print terms
print result
def main(argv):
filename = argv[1]
csvfile = open(filename, 'r')
csvreader = csv.DictReader(csvfile)
rows = []
for row in csvreader:
rows.append(row)
sortedrows = sorted(rows, key=itemgetter('impressions'), reverse = True)
keys = sortedrows[0].keys()
for item in sortedrows:
retrieve_terms(item['keywords'])
try:
outputfile = open('Output_%s.csv' % (filename),'w')
except IOError:
print("The file is active in another program - close it first!")
sys.exit()
dict_writer = csv.DictWriter(outputfile, keys, lineterminator='\n')
dict_writer.writer.writerow(keys)
dict_writer.writerows(sortedrows)
outputfile.close()
print("File is Done!! Check your folder")
if __name__ == '__main__':
start_time = time.clock()
main(sys.argv)
print("\n")
print time.clock() - start_time, "seconds for script time"
Any idea how to finish the code so that it will work? Thank you!
Try to add referrer, userip as described in the docs:
An area to pay special attention to
relates to correctly identifying
yourself in your requests.
Applications MUST always include a
valid and accurate http referer header
in their requests. In addition, we
ask, but do not require, that each
request contains a valid API Key. By
providing a key, your application
provides us with a secondary
identification mechanism that is
useful should we need to contact you
in order to correct any problems. Read
more about the usefulness of having an
API key
Developers are also encouraged to make
use of the userip parameter (see
below) to supply the IP address of the
end-user on whose behalf you are
making the API request. Doing so will
help distinguish this legitimate
server-side traffic from traffic which
doesn't come from an end-user.
Here's an example based on the answer to the question "access to google with python":
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import urllib, urllib2
from pprint import pprint
api_key, userip = None, None
query = {'q' : 'матрёшка'}
referrer = "https://stackoverflow.com/q/4309599/4279"
if userip:
query.update(userip=userip)
if api_key:
query.update(key=api_key)
url = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&%s' %(
urllib.urlencode(query))
request = urllib2.Request(url, headers=dict(Referer=referrer))
json_data = json.load(urllib2.urlopen(request))
pprint(json_data['responseData'])
Output
{u'confidence': 0.070496580000000003, u'isReliable': False, u'language': u'ru'}
Another issue might be that seedterm is not properly quoted:
if isinstance(seedterm, unicode):
value = seedterm
else: # bytes
value = seedterm.decode(put_encoding_here)
url = 'http://...q=%s' % urllib.quote_plus(value.encode('utf-8'))

Categories

Resources