Obtain all Woocommerce Orders via Python API - python

I'm looking to export all orders from the WooCommerce API via a python script.
I've followed the
authentication process
and I have been using method to obtain orders described
here. My code looks like the following:
wcapi = API(
url = "url",
consumer_key = consumerkey,
consumer_secret = consumersecret
)
r = wcapi.get('orders')
r = r.json()
r = r['orders']
print(len(r)) # output: 8
This outputs the most recent 8 orders, but I would like to access all of them. There are over 200 orders placed via woocommerce right now. How do I access all of the orders?
Please tell me there is something simple I am missing.
My ultimate goal is to pull these orders automatically, transform them, and then upload to a visualization tool. All input is appreciated.

First: Initialize your API (as you did).
wcapi = API(
url=eshop.url,
consumer_key=eshop.consumer_key,
consumer_secret=eshop.consumer_secret,
wp_api=True,
version="wc/v2",
query_string_auth=True,
verify_ssl = True,
timeout=10
)
Second: Fetch the orders from your request(as you did).
r=wcapi.get("orders")
Third: Fetch the total pages.
total_pages = int(r.headers['X-WP-TotalPages'])
Forth: For every page catch the json and access the data through the API.
for i in range(1,total_pages+1):
r=wcapi.get("orders?&page="+str(i)).json()
...

The relevant parameters found in the corresponding documentation are page and per_page. The per_page parameter defines how many orders should be retrieved at every request. The page parameter defines the current page of the order collection.
For example, the request sent by wcapi.get('orders/per_page=5&page=2') will return orders 5 to 10.
However, as the default of per_page is 10, it is not clear as to why you get only 8 orders.

I encountered the same problem with paginated response for products.
I built on the same approach described by #gtopal, whereby the X-WP-TotalPages header returned by WooCommerce is used to iterate through each page of results.
I knew that I would probably encounter the same issue for other WooCommerce API requests (such as orders), and I didn't want to have to confuse my code by repeatedly performing a loop when I requested a paginated set of results.
To avoid this I used a decorator to abstract the pagination logic, so that get_all_wc_orders can focus just on the request.
I hope the decorator below might be useful to someone else (gist)
from woocommerce import API
WC_MAX_API_RESULT_COUNT = 100
wcapi = API(
url=url,
consumer_key=key,
consumer_secret=secret,
version="wc/v3",
timeout=300,
)
def wcapi_aggregate_paginated_response(func):
"""
Decorator that repeat calls a decorated function to get
all pages of WooCommerce API response.
Combines the response data into a single list.
Decorated function must accept parameters:
- wcapi object
- page number
"""
def wrapper(wcapi, page=0, *args, **kwargs):
items = []
page = 0
num_pages = WC_MAX_API_RESULT_COUNT
while page < num_pages:
page += 1
log.debug(f"{page=}")
response = func(wcapi, page=page, *args, **kwargs)
items.extend(response.json())
num_pages = int(response.headers["X-WP-TotalPages"])
num_products = int(response.headers["X-WP-Total"])
log.debug(f"{num_products=}, {len(items)=}")
return items
return wrapper
#wcapi_aggregate_paginated_response
def get_all_wc_orders(wcapi, page=1):
"""
Query WooCommerce rest api for all products
"""
response = wcapi.get(
"orders",
params={
"per_page": WC_MAX_API_RESULT_COUNT,
"page": page,
},
)
response.raise_for_status()
return response
orders = get_all_wc_orders(wcapi)

Related

How to stop API feed from disconnecting without showing errors

I have constructed a news feed from a news agency which reads headlines as they're published (I'd rather not say which one for privacy reasons). One of the drawbacks of this API is that, unlike the filtered-stream from the Twitter API, it explicitly says in the documentation for this news agency's API that one needs to continuously fetch individual headlines in order to "simulate a wire feed". Thus, I have constructed a while loop that continuously fetches headlines and prints them to the terminal.
import config # The .py file that contains the username and password for my account with this service.
username = config.username
password = config.password
values = {
'username': username,
'password': password,
'format':'json'
}
# Functions
# The API requires fetching an authentication token using your username and password, which is regenerated every 24 hours.
def get_token():
token = ''
parameters = values
url = '<URL>'
response = requests.get(url, params=parameters)
token = response.json()['authToken']['authToken'].replace('=', '')
return token
# The API has multiple possible channels (though my subscription only has access to one),
#so I fetch the available channels using the authentication token obtained by the previous function.
def get_channel():
token = get_token()
parameters = {'token': token}
url = '<URL>'
response = requests.get(url, params=parameters)
dict_data = xmltodict.parse(response.content)
channel = dict_data['availableChannels']['channelInformation']['alias']
return channel
def get_headline():
url = '<URL>'
channel = get_channel()
token = get_token()
values = {'channel':channel, 'token':token, 'limit':1, 'maxAge':'10s'}
response = requests.get(url, params=values)
if response.status_code != 200:
raise Exception(
f"Cannot get stream (Error {response.status_code}): {response.text}"
dict_data = xmltodict.parse(response.content)
return dict_data
# According to the docs, the only way to simulate an actual live feed is to make constant requests for headlines,
#so I constructed a while loop that fetches headlines and then adds each headline to a set
#and compares each new headline that is fetched with the set to see if it has already been fetched.
#If it has, then the headline is printed.
def api_stream():
print('Connected to Stream!')
new_headline = ''
data = set()
while True:
dict_data = get_headline()
try:
# News headlines are classified on a scale of priority from 1-4 (1 being highest priority).
# Since I am only interested in headlines for breaking news, I only headlines of priorities 1, or 2.
if int(dict_data['results']['result']['priority']) < 3:
new_headline = dict_data['results']['result']['headline']
else:
pass
if new_headline not in data and new_headline != '':
print(new_headline)
data.add(new_headline)
except KeyError:
continue
# The get_headline() function fetches headlines that are only 10 seconds old, so if the most recent headline is older than it, it will return a KeyError saying that ['result'] is not a key in the dictionary, so I handle the exception this way.
This code usually works throughout the day but overnight starting at around midnight until around 7:30am, it will stop printing headlines without displaying an error message. I've tried a number of different things, such as containing this in another while loop, adding a second except block which calls the api_stream() function in the event of an error but nothing's worked, it just stops fetching headlines without warning.

What is the best Python Design pattern for consuming Paginated Messages from a TWILIO's HTTP GET REST API response?

Trying to implement source_to_raw to consume Twilio API responses via a python script. Below is a sample code I have tried. I hope there should be a better way than this.
I'm exploring options to accomplish via Python helper libraries without any schema options as its only to raw_zone. I ran into infinite loops of never ending 'next_page_uri''s. Twilio offers pageSize limits but couldn't figure out an end of 'page'(s) for designing an exit condition for loops and conditional statements in my code. Any help regarding Twilio Pagination on Python-AzureDatabricks stack would be of great help.
Following is the sample code & a couple of sample responses.
page_data = page_response(url,date,creds)
data.update(page_data)
while(page_data['next_page_uri']!=None):
page_data = page_response(url,date,creds)
data.update(page_data)
next_page_url=data['next_page_uri']
src_url='https://api.twilio.com'
url=src_url+next_page_url
print(url)```
Sample Responses:
#response_0:
{
"first_page_uri":"",
"end":11111,
"previous_page_uri":"/2010-04-01/..../",
"messages":[{raw...data}]
"next_page_uri":""/2010-04-01/Accounts/ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/Messages.json?start=2020-12-02PageSize=50&Page=1"
"page":0
}
#response_1:
{
"first_page_uri":"",
"end":49,
"previous_page_uri":"",
"messages":[{raw...data}]
"next_page_uri":""/2010-04-01/Accounts/ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/Messages.json?start=2020-12-02PageSize=50&Page=2"
"page":1
}
I did not test this, but you can always do some recursion like the following code:
def get_twillo_data(url, date, creds, data):
_data = page_response(url, date, creds)
data += _data['messages']
next_page_uri = _data.get('next_page_uri')
if next_page_uri:
get_twillo_data(
url = url + next_page_uri,
date = date,
creds = creds,
data = data
)
else:
return data
data = []
messages = get_twillo_data(
url='https://api.twilio.com',
date='ur_date',
creds='ur_creds',
data = data
)

The commits extracted from GitHub API are different than the commits I get from git.Repo for the same project

What I am trying to do is: I want to extract the name of classes that have been modified in a pull request. To do that, I do the following:
From GitHub API:
1) I extract all the pull requests for one project
2) I extract all the commits for each pull request
3) I keep only the first commit and last commit for each pull request.
Since at this point, I don't know how to extract the list of modified classes between these two commits per pull request, I use 'git' package, like this:
I cloned gson repository in D:\\projects\\gson
import git
repo = git.Repo("D:\\projects\\gson")
commits_list = list(repo.iter_commits())
temp = []
for x in commits_list[0].diff(commits_list[-1]):
if (x.a_path == x.b_path):
if x.a_path.endswith('.java'):
temp.append(x.a_path)
else:
if x.b_path.endswith('.java'):
temp.append(x.b_path)
Here is how I extract commits form GitHub API:
projects = [ {'owner':'google', 'repo':'gson', 'pull_requests': []}]
def get(url):
global nb
PARAMS = {
'client_id': '----my_client_id---',
'client_secret': '---my_client_secret---',
'per_page': 100,
'state': 'all' #open, closed, all
}
result = requests.get(url = url, params=PARAMS)
nb+=1
if(not (result.status_code in [200, 304])):
raise Exception('request error', url, result, result.headers)
data = result.json()
while 'next' in result.links.keys():
result = requests.get(url = result.links['next']['url'],
params=PARAMS)
data.extend(result.json())
nb+=1
return data
def get_pull_requests(repo):
url = 'https://api.github.com/repos/{}/pulls'.format(repo)
result = get(url)
return result
def get_commits(url):
result = get(url)
return result
for i,project in enumerate(projects):
project['pull_requests'] =
get_pull_requests('{}/{}'.format(project['owner'],project['repo']))
for p in project['pull_requests']:
p['commits'] = get_commits(p['commits_url'])
print('{}/{}'.format(project['owner'],project['repo']), ':',
len(project['pull_requests']))
Each of these two codes works. The problem is, I get 287 commits from GitHub API, but only 86 commits from git.Repo for the same project. When I try to match these commits, less then 40 commits matches.
Questions:
1) Why am I getting different commits for the same project?
2) Which one is correct and I should use?
3) Is there a way I can know what commits are for what pull request using Git.Repo ?
4) Is there a way I can extract the modified classes between two commits in GitHub API?
5) Dose anyone know of a better way of extracting modified classes per pull request?
I know this is a long post, but I tried to be specific here. The answer to any of these questions would be very much appreciated.

Vimeo API: get a list of links for downloading all video files

Good day.
I'm trying to get a list of all video files (links to direct downloading) from Vimeo account.
Is there a way to do it in a 1 GET request? OK, times to 100, if it is restriction of API.
I had hardcoded script, where I am making 12 GET request (1100+ videos, according to documentation, request is limited by 100 results), and then making over 1 000 requests to receive direct links.
Is there a way to to receive a list of links for downloading videous from Vimeo with one API request to server?
PS Account is PRO
import vimeo
import json
import config #token is here
client = vimeo.VimeoClient(
token = config.token
)
per_page = 100
answerDataAll = []
for i in range(12):
page=i+1
getString = 'https://api.vimeo.com/me/videos?per_page='+str(per_page) + '&page=' + str(page)
dataFromServer = client.get(getString).json()['data']
answerDataAll.extend(dataFromServer)
# creating list of videos
listOfItems = []
for item in answerDataAll:
listOfItems.append( item ['uri'])
# creating list of direct links, it is the goal
listOfUrls = []
for item in listOfItems:
# isolating digits
videoID = ""
for sign in item:
if sign.isdigit():
videoID = videoID + sign
requestForDownloading = client.get ('http://player.vimeo.com/video/' + videoID + '/config').json()['request']['files']['progressive']
for itm in requestForDownloading:
if itm['width']==640:
urlForDownloading = itm['url']
listOfUrls.append(urlForDownloading)
You can get up to 100 videos per request, but understand that a request like that to /me/videos will return the full metadata for each video, which is a lot of data to parse through. The API or your client may also timeout while Vimeo's servers try to render your request.
You should use the fields parameter so that only the download metadata you need is returned. You should also specify the sort and direction, so you know exactly order the videos should be returning in. The request uri should be formatted like this:
https://api.vimeo.com/me/videos?fields=uri,name,download&page=1&per_page=100&sort=date&direction=desc
Documentation of those parameters is found here:
https://developer.vimeo.com/api/common-formats#json-filter
https://developer.vimeo.com/api/common-formats#using-the-pagination-parameter
https://developer.vimeo.com/api/common-formats#using-the-sort-parameters

GAE - How can I combine the results of several asynchronous url fetches?

I have a Google AppEngine (in Python) application where I need to perform 4 to 5 url fetches, and then combine the data before I print it out to the response.
I can do this without any problems using a synchronous workflow, but since the urls that I am fetching are not related or dependent on each other, performing this asynchronously would be the most ideal (and quickest).
I have read and re-read the documentation here, but I just can't figure out how to get read the contents for each url. I've also searched the web for a small example (which is really what I am in need of). I have seen this SO question, but again, here they don't mention anything about reading the contents of these individual asynchronous url fetches.
Does anyone have any simple examples of how to perform 4 or 5 asynchronous url fetches with AppEngine? And then combine the results before I print it to the response?
Here is what I have so far:
rpcs = []
for album in result_object['data']:
total_facebook_photo_count = total_facebook_photo_count + album['count']
facebook_albumid_array.append(album['id'])
#Get the photos in the photo album
facebook_photos_url = 'https://graph.facebook.com/%s/photos?access_token=%s&limit=1000' % (album['id'], access_token)
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, facebook_photos_url)
rpcs.append(rpc)
for rpc in rpcs:
result = rpc.get_result()
self.response.out.write(result.content)
However, it still looks like the line: result = rpc.get_result() is forcing it to wait for the first request to finish, then the second, then the third, and so forth. Is there a way to simply put the results in a variables as they are received?
Thanks!
In the example, text = result.content is where you get the content (body).
To do url fetches in parallell, you could set them up, add to a list and check results afterwards. Expanding on the example already mentioned, it could look something like:
from google.appengine.api import urlfetch
futures = []
for url in urls:
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, url)
futures.append(rpc)
contents = []
for rpc in futures:
try:
result = rpc.get_result()
if result.status_code == 200:
contents.append(result.content)
# ...
except urlfetch.DownloadError:
# Request timed out or failed.
# ...
concatenated_result = '\n'.join(contents)
In this example, we assemble the body of all the requests that returned status code 200, and concatenate with linebreak between them.
Or with ndb, my personal preference for anything async on GAE, something like:
#ndb.tasklet
def get_urls(urls):
ctx = ndb.get_context()
result = yield map(ctx.urlfetch, urls)
contents = [r.content for r in result if r.status_code==200]
raise ndb.Return('\n'.join(contents))
I use this code (implmented before I learned about ndb tasklets):
while rpcs:
rpc = UserRPC.wait_any(rpcs)
result = rpc.get_result()
# process result here
rpcs.remove(rpc)

Categories

Resources