I am trying to grab all the members we have in a Github Organization. We have about ~4K.
Using the documentation here, I am trying to page through the results but it not iterating through the pages of results.
Here is the Code:
from dotenv import load_dotenv, find_dotenv
import json
import requests
import os
load_dotenv(find_dotenv())
headers = {
"authorization": f"{os.getenv('github_token')}",
"content-type": "application/json"
}
query_url = "https://api.github.com/orgs/<name of Org>/members?page="
members = [ ]
page_no = 1
loop_control = 0
while loop_control == 0:
url = query_url + str(page_no)
request = requests.get(url, headers=headers)
print(url)
print(request.status_code)
response = request.json()
print(len(response))
for i in response:
members.append(i)
if len(response) == 30:
page_no += 1
elif len(response) < 30:
loop_control = 1
with open('data/github/response.json', 'w') as file:
print(len(members))
json.dump(members, file)
With the code, it grabbing the first 30 results, then it grabs 7 for page 2 of the results.
Any Ideas?
Two things to check about your script:
ensure the token is associated with an account that is a member of the organization
ensure your token has the read:org scope set
If one of these conditions are not met the script will only see users who have public membership for the organization, which would explain the difference in numbers you're seeing.
To also improve the script performance, you can add a per_page=100 query string parameter to get 100 results per API call, instead of the default 30. This is documented in the Pagination section of the API docs.
Related
I have a table (as a Pandas DF) of (mostly) github repos, for which I need to automatically extract the LICENSE link. However, it is a requirement that the link does not just simply go to the /blob/master/ but actually points to a specific commit as the master link might be updated at some point. I assembled a Python script to do this through the github API, but using the API I am only able to retrieve the link with the master tag.
I.e. instead of
https://github.com/jsdom/abab/blob/master/LICENSE.md
I want
https://github.com/jsdom/abab/blob/8abc2aa5b1378e59d61dee1face7341a155d5805/LICENSE.md
Any idea if there is a way to automatically get the link to the latest commit for a file, in this case the LICENSE file?
This is the code I have written so far:
def githubcrawl(repo_url, session, headers):
parts = repo_url.split("/")[3:]
url_tmpl = "http://api.github.com/repos/{}/license"
url = url_tmpl.format("/".join(parts))
try:
response = session.get(url, headers=headers)
if response.status_code in [404]:
return(f"404: {repo_url}")
else:
data = json.loads(response.text)
return(data["html_url"]) # Returns the html URL to LICENSE file
except urllib.error.HTTPError as e:
print(repo_url, "-", e)
return f"http_error: {repo_url}"
token="mytoken" # Token for github authentication to get more requests per hour
headers={"Authorization": "token %s" % token}
session = requests.Session()
lizlinks = [] # List to store the links of the LICENSE files in
# iterate over DataFrame of applications/deps
for idx, row in df.iterrows():
# if idx < 5:
if type(row["Homepage"]) == type("str"):
repo_url = re.sub(r"\#readme", "", row["Homepage"])
response = session.get(repo_url, headers=headers)
repo_url = response.url # Some URLs are just redirects, so I get the actual repo url here
if "github" in repo_url and len(repo_url.split("/")) >= 3:
link = githubcrawl(repo_url, session, headers)
print(link)
lizlinks.append(link)
else:
print(row["Homepage"], "Not a github Repo")
lizlinks.append("Not a github repo")
else:
print(row["Homepage"], "Not a github Repo")
lizlinks.append("Not a github repo")
Bonus-Question: Would parallelizing this task work with the Github-API? I.e. could I send multiple requests at once without being locked out (DoS) or is the for-loop a good approach to avoid this? It takes quite a while to go through the 1000ish of repos I have in that list.
Ok, I found a way to get the unique SHA-hash of the current commit. I believe that should always link to the license file of that point in time.
Using the python git library, i simply run the ls_remote git command and return the HEAD sha
def lsremote_HEAD(url):
g = git.cmd.Git()
HEAD_sha = g.ls_remote(url).split()[0]
return HEAD_sha
I can then replace the "master", "main" or whatever tag in my github_crawl function:
token="token_string"
headers={"Authorization": "token %s" % token}
session = requests.Session()
def githubcrawl(repo_url, session, headers):
parts = repo_url.split("/")[3:]
api_url_tmpl = "http://api.github.com/repos/{}/license"
api_url = api_url_tmpl.format("/".join(parts))
try:
print(api_url)
response = session.get(api_url, headers=headers)
if response.status_code in [404]:
return(f"404: {repo_url}")
else:
data = json.loads(response.text)
commit_link = re.sub(r"/blob/.+?/",rf"/blob/{lsremote_HEAD(repo_url)}/", data["html_url"])
return(commit_link)
except urllib.error.HTTPError as e:
print(repo_url, "-", e)
return f"http_error: {repo_url}"
Maybe this helps someone, so I'm posting this answer here.
This answer uses the following libraries:
import re
import git
import urllib
import json
import requests
I am trying to start an eBay API in Python and I can't find a single answer as to how to get an API key with eBay's new requirements of "Account Deletion/Closure Notifications." Here's the link: https://developer.ebay.com/marketplace-account-deletion
Specifically, I am told that "Your Keyset is currently disabled" because I have not completed whatever process is needed for this marketplace account deletion/closure notification.
The problems?
I have no idea if I need this.
I have no idea how to actually do this.
Re: 1. It looks like this is for anyone who stores user data. I don’t think that’s me intentionally because I really just want to get sold data and current listings, but is it actually me?
Re: 2. I don’t understand how to validate it and send back the proper responses. I’ve gotten quite good at python but I’m lost here.
eBay forums are completely useless and I see no one with an answer to this. Any help is greatly appreciated.
Re: 1. Same. Here's my interpretation: In order to use their APIs, you need to provide (and configure) your own API, so they can communicate with you —programatically— and tell you what users have asked to have their accounts/data deleted.
Re: 2. To handle their GET and POST requests, I guess you'll need to configure a website's URL as an API endpoint. In Django, I might use something like this (untested) code:
import hashlib
import json
from django.http import (
HttpResponse,
JsonResponse,
HttpResponseBadRequest
)
def your_api_endpoint(request):
"""
API Endpoint to handle the verification's challenge code and
receive eBay's Marketplace Account Deletion/Closure Notifications.
"""
# STEP 1: Handle verification's challenge code
challengeCode = request.GET.get('challenge_code')
if challengeCode is not None:
# Token needs to be 32-80 characters long
verificationToken = "your-token-012345678901234567890123456789"
# URL needs to use HTTPS protocol
endpoint_url = "https://your-domain.com/your-endpoint"
# Hash elements need to be ordered as follows
m = hashlib.sha256((challengeCode+verificationToken+endpoint_url).encode('utf-8'))
# JSON field needs to be called challengeResponse
return JsonResponse({"challengeResponse": m.hexdigest()}, status=200)
# STEP 2: Handle account deletion/closure notification
elif request.method == 'POST':
notification_details = json.loads(request.body)
# Verify notification is actually from eBay
# ...
# Delete/close account
# ...
# Acknowledge notification reception
return HttpResponse(status=200)
else:
return HttpResponseBadRequest()
If you find the answer to question number one, please do let me know.
Re: 1. You need to comply with eBay's Marketplace Account Deletion/Closure Notification workflow if you are storing user data into your own database. For example, using eBay's Buy APIs, you may get access to what users are selling on eBay (for ex. an eBay feed of products). If those eBay sellers decide they want to remove all of their personal data from eBay's database, eBay is requesting you remove their data from your database as well. If you are NOT storing any eBay user data into your database, you do not need to comply. Here is where you can find more info: https://partnerhelp.ebay.com/helpcenter/s/article/Complying-with-the-eBay-Marketplace-Account-Deletion-Closure-Notification-workflow?language=en_US
Re: 2. To be honest I've spent days trying to figure this out in Python (Django), but I have a solution now and am happy to share it with whoever else comes across this issue. Here's my solution:
import os
import json
import base64
import hashlib
import requests
import logging
from OpenSSL import crypto
from rest_framework import status
from rest_framework.views import APIView
from django.http import JsonResponse
logger = logging.getLogger(__name__)
class EbayMarketplaceAccountDeletion(APIView):
"""
This is required as per eBay Marketplace Account Deletion Requirements.
See documentation here: https://developer.ebay.com/marketplace-account-deletion
"""
# Ebay Config Values
CHALLENGE_CODE = 'challenge_code'
VERIFICATION_TOKEN = os.environ.get('VERIFICATION_TOKEN')
# ^ NOTE: You can make this value up so long as it is between 32-80 characters.
ENDPOINT = 'https://example.com/ebay_marketplace_account_deletion'
# ^ NOTE: Replace this with your own endpoint
X_EBAY_SIGNATURE = 'X-Ebay-Signature'
EBAY_BASE64_AUTHORIZATION_TOKEN = os.environ.get('EBAY_BASE64_AUTHORIZATION_TOKEN')
# ^ NOTE: Here's how you can get your EBAY_BASE64_AUTHORIZATION_TOKEN:
# import base64
# base64.b64encode(b'{CLIENT_ID}:{CLIENT_SECRET}')
def __init__(self):
super(EbayMarketplaceAccountDeletion, self).__init__()
def get(self, request):
"""
Get challenge code and return challengeResponse: challengeCode + verificationToken + endpoint
:return: Response
"""
challenge_code = request.GET.get(self.CHALLENGE_CODE)
challenge_response = hashlib.sha256(challenge_code.encode('utf-8') +
self.VERIFICATION_TOKEN.encode('utf-8') +
self.ENDPOINT.encode('utf-8'))
response_parameters = {
"challengeResponse": challenge_response.hexdigest()
}
return JsonResponse(response_parameters, status=status.HTTP_200_OK)
def post(self, request):
"""
Return 200 status code and remove from db.
See how to validate the notification here:
https://developer.ebay.com/api-docs/commerce/notification/overview.html#use
"""
# Verify notification is actually from eBay #
# 1. Use a Base64 function to decode the X-EBAY-SIGNATURE header and retrieve the public key ID and signature
x_ebay_signature = request.headers[self.X_EBAY_SIGNATURE]
x_ebay_signature_decoded = json.loads(base64.b64decode(x_ebay_signature).decode('utf-8'))
kid = x_ebay_signature_decoded['kid']
signature = x_ebay_signature_decoded['signature']
# 2. Call the getPublicKey Notification API method, passing in the public key ID ("kid") retrieved from the
# decoded signature header. Documentation on getPublicKey:
# https://developer.ebay.com/api-docs/commerce/notification/resources/public_key/methods/getPublicKey
public_key = None
try:
ebay_verification_url = f'https://api.ebay.com/commerce/notification/v1/public_key/{kid}'
oauth_access_token = self.get_oauth_token()
headers = {
'Authorization': f'Bearer {oauth_access_token}'
}
public_key_request = requests.get(url=ebay_verification_url, headers=headers, data={})
if public_key_request.status_code == 200:
public_key_response = public_key_request.json()
public_key = public_key_response['key']
except Exception as e:
message_title = "Ebay Marketplace Account Deletion: Error calling getPublicKey Notfication API."
logger.error(f"{message_title} Error: {e}")
return JsonResponse({}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
# 3. Initialize the cryptographic library to perform the verification with the public key that is returned from
# the getPublicKey method. If the signature verification fails, an HTTP status of 412 Precondition Failed is returned.
pkey = crypto.load_publickey(crypto.FILETYPE_PEM, self.get_public_key_into_proper_format(public_key))
certification = crypto.X509()
certification.set_pubkey(pkey)
notification_payload = request.body
signature_decoded = base64.b64decode(signature)
try:
crypto.verify(certification, signature_decoded, notification_payload, 'sha1')
except crypto.Error as e:
message_title = f"Ebay Marketplace Account Deletion: Signature Invalid. " \
f"The signature is invalid or there is a problem verifying the signature. "
logger.warning(f"{message_title} Error: {e}")
return JsonResponse({}, status=status.HTTP_412_PRECONDITION_FAILED)
except Exception as e:
message_title = f"Ebay Marketplace Account Deletion: Error performing cryptographic validation."
logger.error(f"{message_title} Error: {e}")
return JsonResponse({}, status=status.HTTP_412_PRECONDITION_FAILED)
# Take appropriate action to delete the user data. Deletion should be done in a manner such that even the
# highest system privilege cannot reverse the deletion #
# TODO: Replace with your own data removal here
# Acknowledge notification reception
return JsonResponse({}, status=status.HTTP_200_OK)
def get_oauth_token(self):
"""
Returns the OAuth Token from eBay which can be used for making other API requests such as getPublicKey
"""
url = 'https://api.ebay.com/identity/v1/oauth2/token'
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'Authorization': f"Basic {self.EBAY_BASE64_AUTHORIZATION_TOKEN}"
}
payload = 'grant_type=client_credentials&scope=https%3A%2F%2Fapi.ebay.com%2Foauth%2Fapi_scope'
request = requests.post(url=url, headers=headers, data=payload)
data = request.json()
return data['access_token']
#staticmethod
def get_public_key_into_proper_format(public_key):
"""
Public key needs to have \n in places to be properly assessed by crypto library.
"""
return public_key[:26] + '\n' + public_key[26:-24] + '\n' + public_key[-24:]
This is how I am dealing with the ebay notification requirement using Python3 cgi. Because bytes are sent, cannot use cgi.FieldStorage()
import os
import sys
import hashlib
import json
from datetime import datetime
from html import escape
import cgi
import cgitb
import io
include_path = '/var/domain_name/www'
sys.path.insert(0, include_path)
cgitb.enable(display=0, logdir=f"""{include_path}/tmp_errors""") # include_path is OUTDIR
dt_now = datetime.now()
current_dt_now = dt_now.strftime("%Y-%m-%d_%H-%M-%S")
def enc_print(string='', encoding='utf8'):
sys.stdout.buffer.write(string.encode(encoding) + b'\n')
html = ''
challengeCode = ''
# GET
myQuery = os.environ.get('QUERY_STRING')
if myQuery.find('=') != -1:
pos = myQuery.find('=')
var_name = myQuery[:pos]
var_val = myQuery[pos+1:]
challengeCode = var_val
# POST
if os.environ.get('CONTENT_LENGTH') != None:
totalBytes=int(os.environ.get('CONTENT_LENGTH'))
reqbytes=io.open(sys.stdin.fileno(),"rb").read(totalBytes)
if challengeCode != '' :
"""
API Endpoint to handle the verification's challenge code and
receive eBay's Marketplace Account Deletion/Closure Notifications.
"""
# STEP 1: Handle verification's challenge code
# Token needs to be 32-80 characters long
verificationToken = "0123456789012345678901234567890123456789" #sample token
# URL needs to use HTTPS protocol
endpoint = "https://domain_name.com/ebay/notification.py" # sample endpoint
# Hash elements need to be ordered as follows
m = hashlib.sha256( (challengeCode+verificationToken+endpoint).encode('utf-8') )
# JSON field needs to be called challengeResponse
enc_print("Content-Type: application/json")
enc_print("Status: 200 OK")
enc_print()
enc_print('{"challengeResponse":"' + m.hexdigest() + '"}')
exit()
else :
#html += 'var length:' + str(totalBytes) + '\n'
html += reqbytes.decode('utf-8') + '\n'
# STEP 2: Handle account deletion/closure notification
# Verify notification is actually from eBay
# ...
# Delete/close account
# ...
# Acknowledge notification reception
with open( f"""./notifications/{current_dt_now}_user_notification.txt""", 'w') as f:
f.write(html)
enc_print("Content-Type: application/json")
enc_print("Status: 200 OK")
enc_print()
exit()
I've been trying #José Matías Arévalo code. It works except "STEP 2" branch - Django returns 403 error. This is because of by default Django uses CSRF middleware (Cross Site Request Forgery protection). To avoid 403 error we need to marks a view as being exempt from the protection as described here https://docs.djangoproject.com/en/dev/ref/csrf/#utilities so add couple strings in code:
from django.views.decorators.csrf import csrf_exempt
#csrf_exempt
def your_api_endpoint(request):
And in my case I use url "https://your-domain.com/your-endpoint/" with slash symbol "/" at the end of url. Without this slash eBay doesn't confirm subscription.
I am using Flask and this is the code I have used:
from flask import Flask, request
import hashlib
# Create a random verification token, it needs to be 32-80 characters long
verification_token = 'a94cbd68e463cb9780e2008b1f61986110a5fd0ff8b99c9cba15f1f802ad65f9'
endpoint_url = 'https://dev.example.com'
app = Flask(__name__)
# There will be errors if you just use '/' as the route as it will redirect eBays request
# eBay will send a request to https://dev.example.com?challenge_code=123
# The request will get redirected by Flask to https://dev.example.com/?challenge_code=123 which eBay will not accept
endpoint = endpoint_url + '/test'
# The Content-Type header will be added automatically by Flask as 'application/json'
#app.route('/test')
def test():
code = request.args.get('challenge_code')
print('Requests argument:', code)
code = code + token + endpoint
code = code.encode('utf-8')
code = hashlib.sha256(code)
code = code.hexdigest()
print('Hexdigest:', code)
final = {"challengeResponse": code}
return final
## To run locally first use this:
# app.run(port=29)
i try to import an json url (api), the file have "nhits":20843
So this is my url : https://opendata.reseaux-energies.fr/api/records/1.0/search/?dataset=injections-regionales-quotidiennes-consolidees-rpt&q=&rows=20843&facet=date&facet=region&facet=filiere&facet=plage_de_puissance
This is my code :
import requests
site = "https://opendata.reseaux-energies.fr/api/records/1.0/search/?dataset=injections-regionales-quotidiennes-consolidees-rpt&q=&rows=20843&facet=date&facet=region&facet=filiere&facet=plage_de_puissance"
r = requests.get(site)
data = r.json()
And i have a error message :
'raw_params': {'expected': '-1 <= rows <= 10000', 'field_value': 20843, 'field_name': 'rows'}, 'raw_message': 'Invalid field in API request: {field_name} with value {field_value}. Expected: {expected}', 'error_key': 'InvalidFieldInAPIRequestExpectedException'}
How can I avoid the error
ps: sorry for my bad english
You are requesting to many resuklts at once. The error says that you may only request 10K reusults at once. So you need to split your request into multiple parts if you want to get every single result.
For my code i reduced the results per request to 1000 so that the request does not time out. Then for each subsequent request I increment the start-parameter of you api by adding start=" + str(i) + "& to the url. See Documentation https://help.opendatasoft.com/apis/ods-search-v1/#available-apis
import requests
import math
datas = []
for i in range(math.ceil(20843/1000)):
site = "https://opendata.reseaux-energies.fr/api/records/1.0/search/?dataset=injections-regionales-quotidiennes-consolidees-rpt&q=&rows=1000&start="+str(i*1000)+"&facet=date&facet=region&facet=filiere&facet=plage_de_puissance"
r = requests.get(site)
datas.append(r.json())
I have a list of domain names in txt file ('links.txt') and I want to check it's availability and write it in different txt file ('available_domains.txt') if it is available. I wrote the code like:
import requests
import time
import json
api_key = "3mM44UaguNL6GH_Kc3bKzig25G1mZtnA87nwS"
secret_key = "37ZnMbQkQrYJ5pF57ZhrEi"
headers = {"Authorization" : "sso-key {}:{}".format(api_key, secret_key)}
url = "https://api.godaddy.com/v1/domains/available"
appraisal = "https://api.godaddy.com/v1/appraisal/{}"
do_appraise = True
with open("links.txt") as f:
for domains in f:
availability_res = requests.post(url, json=domains, headers=headers)
for domain in json.loads(availability_res.text)['domains']:
if domain['available']:
with open("available_domains.txt", 'w', newline="", encoding="UTF-8") as f:
f.write(domain)
else:
print("Not Available")
But I'm getting error like:
for domain in json.loads(availability_res.text)["domains"]:
KeyError: 'domains'
I'm new in it. And I don't think my code is that correct. If you have any idea can you help me with it??
I have found your issue.
You need to define the "domain" using the "params" flag of the get requests function.
Here is a little snippet that worked for me (you will need to incorporate it into your for loop)
api_key = "YOURKEYHERE"
secret_key = "YOURKEYHERE"
headers = {"Authorization": "sso-key {}:{}".format(api_key, secret_key)}
url = "https://api.godaddy.com/v1/domains/available"
test = get(url, params={'domain':'google.co.uk'}, headers=headers).text
print(test)
I have been tasked to do the same thing in python.
Thanks to this blogGodaddy domain name API in Python I was able to complete this task.
Try it out
This error occur because of the limit of requests you can send per minute you have to add time.sleep(48) after every 20 request in this way yo will not get this error
I'm looking to export all orders from the WooCommerce API via a python script.
I've followed the
authentication process
and I have been using method to obtain orders described
here. My code looks like the following:
wcapi = API(
url = "url",
consumer_key = consumerkey,
consumer_secret = consumersecret
)
r = wcapi.get('orders')
r = r.json()
r = r['orders']
print(len(r)) # output: 8
This outputs the most recent 8 orders, but I would like to access all of them. There are over 200 orders placed via woocommerce right now. How do I access all of the orders?
Please tell me there is something simple I am missing.
My ultimate goal is to pull these orders automatically, transform them, and then upload to a visualization tool. All input is appreciated.
First: Initialize your API (as you did).
wcapi = API(
url=eshop.url,
consumer_key=eshop.consumer_key,
consumer_secret=eshop.consumer_secret,
wp_api=True,
version="wc/v2",
query_string_auth=True,
verify_ssl = True,
timeout=10
)
Second: Fetch the orders from your request(as you did).
r=wcapi.get("orders")
Third: Fetch the total pages.
total_pages = int(r.headers['X-WP-TotalPages'])
Forth: For every page catch the json and access the data through the API.
for i in range(1,total_pages+1):
r=wcapi.get("orders?&page="+str(i)).json()
...
The relevant parameters found in the corresponding documentation are page and per_page. The per_page parameter defines how many orders should be retrieved at every request. The page parameter defines the current page of the order collection.
For example, the request sent by wcapi.get('orders/per_page=5&page=2') will return orders 5 to 10.
However, as the default of per_page is 10, it is not clear as to why you get only 8 orders.
I encountered the same problem with paginated response for products.
I built on the same approach described by #gtopal, whereby the X-WP-TotalPages header returned by WooCommerce is used to iterate through each page of results.
I knew that I would probably encounter the same issue for other WooCommerce API requests (such as orders), and I didn't want to have to confuse my code by repeatedly performing a loop when I requested a paginated set of results.
To avoid this I used a decorator to abstract the pagination logic, so that get_all_wc_orders can focus just on the request.
I hope the decorator below might be useful to someone else (gist)
from woocommerce import API
WC_MAX_API_RESULT_COUNT = 100
wcapi = API(
url=url,
consumer_key=key,
consumer_secret=secret,
version="wc/v3",
timeout=300,
)
def wcapi_aggregate_paginated_response(func):
"""
Decorator that repeat calls a decorated function to get
all pages of WooCommerce API response.
Combines the response data into a single list.
Decorated function must accept parameters:
- wcapi object
- page number
"""
def wrapper(wcapi, page=0, *args, **kwargs):
items = []
page = 0
num_pages = WC_MAX_API_RESULT_COUNT
while page < num_pages:
page += 1
log.debug(f"{page=}")
response = func(wcapi, page=page, *args, **kwargs)
items.extend(response.json())
num_pages = int(response.headers["X-WP-TotalPages"])
num_products = int(response.headers["X-WP-Total"])
log.debug(f"{num_products=}, {len(items)=}")
return items
return wrapper
#wcapi_aggregate_paginated_response
def get_all_wc_orders(wcapi, page=1):
"""
Query WooCommerce rest api for all products
"""
response = wcapi.get(
"orders",
params={
"per_page": WC_MAX_API_RESULT_COUNT,
"page": page,
},
)
response.raise_for_status()
return response
orders = get_all_wc_orders(wcapi)