I'm trying to get all users information from GitHub API using Python Requests library. Here is my code:
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
r = requests.get(url, headers=headers)
users = r.json()
with open('users.json', 'w') as outfile:
json.dump(users, outfile)
I can dump first page of users into a json file by now. I can also find the 'next' page's url:
next_url = r.links['next'].get('url')
r2 = requests.get(next_url, headers=headers)
users2 = r2.json()
Since I don't know how many pages yet, how can I append 2nd, 3rd... page to 'users.json' sequentially in a while loop as fast as possible?
Thanks!
First, you need to open file in 'a' mode, otherwise subsequence write will overwrite everything
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
outfile = open('users.json', 'a')
while True:
r = requests.get(url, headers=headers)
users = r.json()
json.dump(users, outfile)
url = r.links['next'].get('url')
# I don't know what Github return in case there is no more users, so you need to double check by yourself
if url == '':
break
outfile.close()
Append the data you get from the requests query to a list and move on to the next query.
Once you have all of the data you want, then proceed to try to concatenate the data into a file or into an object. You can also use threading to do multiple queries in parallel, but most likely there is going to be rate limiting on the api.
Related
I have been working on some code to download a days worth of Teams usage data from the Graph API. I can successfully send the token and receive the response. The response apparently contains the URL in the head to download the csv file. I can't see to find the code to grab it though.
My code as the moment is as follows.
import requests, urllib, json, csv, os
client_id = urllib.parse.quote_plus('XXXX')
client_secret = urllib.parse.quote_plus('XXXX')
tenant = urllib.parse.quote_plus('XXXX')
auth_uri = 'https://login.microsoftonline.com/' + tenant \
+ '/oauth2/v2.0/token'
auth_body = 'grant_type=client_credentials&client_id=' + client_id \
+ '&client_secret=' + client_secret \
+ '&scope=https%3A%2F%2Fgraph.microsoft.com%2F.default'
authorization = requests.post(auth_uri, data=auth_body, headers={'Content-Type': 'application/x-www-form-urlencoded'})
token = json.loads(authorization.content)['access_token']
graph_uri = 'https://graph.microsoft.com/v1.0/reports/getTeamsUserActivityUserDetail(date=2023-01-22)'
response = requests.get(graph_uri, data=auth_body, headers={'Content-Type': 'application/json', 'Authorization': 'Bearer ' + token})
print(response. Headers)
Is there any easy way to parse the URL from the header and to obtain the CSV file?
REF: https://learn.microsoft.com/en-us/graph/api/reportroot-getteamsuseractivityuserdetail?view=graph-rest-beta
response.headers is a case-insensitive dictionary of response headers, so you should be able to get location header this way
locationUrl = response.headers['location']
# retrieving data from the URL using get method
response = requests.get(locationUrl)
# write response content to a file
with open("data.csv", 'wb') as f:
f.write(response.content)
Currently I have
import requests
import json
import csv
# Set the request parameters
url= 'dev.service-now.com/change_request_list.do?CSV&'
user = 'myuser'
pwd = 'mypass'
# Set proper headers (Unsure if this is needed)
headers = {"Accept":"application/json"}
# Do the HTTP request
response = requests.get(url, auth=(user, pwd), headers=headers )
response.raise_for_status()
with open('out.csv', 'w') as f:
writer = csv.writer(f)
for line in response.iter_lines():
writer.writerow(line.decode('utf-8').split(','))
This gets the data I want from ServiceNow however it is missing certain fields. I need the 'Opened' & 'Closed' Column and am unsure how to query that with the code I have.
Any help would be perfect! I am really new to using requests.
Here is a solution using the Rest Table API that allows you to control what fields you want to pull. I also added a sysparm_query to restrict the rows.
import requests
import json
import csv
from urllib.parse import urlencode
url = 'https://dev.service-now.com/api/now/table/change_request'
user = 'myuser'
pwd = 'mypass'
fields = ['number', 'short_description', 'opened_at', 'closed_at']
params = {
'sysparm_display_value': 'true',
'sysparm_exclude_reference_link': 'true',
'sysparm_limit': '5000',
'sysparm_fields': ','.join(fields),
'sysparm_query': 'sys_created_on>2020-09-15'
}
headers = {"Accept":"application/json"}
response = requests.get(url + '?' + urlencode(params),
auth=(user, pwd), headers=headers)
response.raise_for_status()
rows = response.json()['result']
with open('out.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(fields)
for row in rows:
outputrow = []
for field in fields:
outputrow.append(row[field])
writer.writerow(outputrow)
At a glance, your code looks correct. I think you just need to update the web service URL you are using to provide the sysparm_default_export_fields=all parameter. I.E:
dev.service-now.com/change_request_list.do?CSV&sysparm_default_export_fields=all
After that, you should get a response containing every field, including system created fields like sys_id and created_on. Alternatively, you could create a new view in ServiceNow and provide the sysparm_view=viewName parameter in your URL.
This is my first question, please bear with me. I am working with an API that authenticates using an access token that expires in 15 minutes, there is no refresh token to use in-lieu of a re-login. So far I have been able to get the access token and insert it into the requests.get call but I cannot seem to get it to renew and am at a loss as to how.
All of the work done with this API, and in general, is with Python so I am hoping to keep it in Python throughout and in the same file.
I get a 401 message code once the 15 minutes are up, and code 200 if successful. So far my only ideas are to put it on a timer for renewal but I cannot make heads or tails of stackoverflow posts or the documentation on doing that, have the login running in a separate script and then this script calls the other one for the current header variable (but that still would require a timer), or have it call to redo the login function once it hits a response.status_code != 200.
Example script for getting the access token
import requests, os, json, time, csv
def login (url, payload):
#this will log into API and get an access token
auth = requests.post(url, data=payload).json()
sessionToken = auth["token"]
sessionTimer = auth["validFor"]
headers = {'Access-Token': sessionToken}
return headers
#calling the function to generate the token
if __name__ == '__main__':
url = "url inserted here"
u = input("Enter your username: ")
p = input("Enter your password: ")
t = input("Enter your tenancy name: ")
payload = {'username': u, 'password': p, 'tenant': t}
print("Logging in")
headers = login(url, payload)
#the actual work as pulled from a csv file
valuables = input("CSV file with filepath: ")
file = open(valuables, 'r', encoding='utf-8')
csvin = csv.reader(file)
for row in csvin:
try:
uuidUrl = row[0]
output_file = row[1]
response = requests.get(uuidUrl, headers=headers)
print(response.status_code)
with open(output_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
except requests.exceptions.RequestException:
print(output_file,"may have failed")
login(url, payload)
continue
I couldn't get it to successfully recognize a if response.status_code != 200: as a way to call back on the login(). I also couldn't seem to get it to exit a while True: loop.
I apologize I cannot give more details on accessing the API for other people to try out. It is non-public
Eventually I was able to figure out the answer to my own question. Posting this for later users. Updated snippet is below.
Short version of the story: requests.status_code was sending back a integer but I made the faulty assumption that it would be a string, thus my internal comparison was no good.
for row in csvin:
try:
uuidUrl = row[0]
xip_file = row[1]
response = requests.get(uuidUrl, headers=headers)
status = response.status_code
print(status)
if status == 401:
print(xip_file, "may have failed, loggin back in")
login(url, payload)
headers = login(url, payload)
response = requests.get(uuidUrl, headers=headers)
with open(xip_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
else:
with open(xip_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
except requests.exceptions.RequestException:
print(xip_file,"may have failed")
headers = login(url, payload)
continue
So as it stands I am able to get the content of the webpage of the PDF link EXAMPLE OF THE LINK HERE BUT, I don't want the content of the webpage I want the content of the PDF so I can put the content into a PDF on my computer in a folder.
I have been successful in doing this on sites that I don't need to log into and without a proxy server.
Relevant CODE:
import os
import urllib2
import time
import requests
import urllib3
from random import *
s = requests.Session()
data = {"Username":"username", "Password":"password"}
url = "https://login.url.com"
print "doing things"
r2 = s.post(url, data=data, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#I get a response 200 from printing r2
print r2
downlaod_url = "http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM"
file = open("F:\my_filepath\document" + str(maxCounter) + ".pdf", 'wb')
temp = s.get(download_url, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#This prints out the response from the proxy server (i.e. 200)
print temp
something = uniform(5,6)
print something
time.sleep(something)
#This gets me the content of the web page, not the content of the PDF
print temp.content
file.write(temp.content)
file.close()
I need help figuring out how to "download" the content of the PDF
try this:
import requests
url = 'http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM'
pdf = requests.get(url)
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
Edit
Try again with a requests session to manage cookies (assuming they send you those after login) and also maybe a different proxy
proxy_dict = {'https': 'ip:port'}
with requests.Session() as session:
# Authentication request, use GET/POST whatever is needed
# data variable should hold user/password information
auth = session.get(login_url, data=data, proxies=proxy_dict, verify=False)
if auth.status_code == 200:
print(auth.cookies) # Tell me if you got anything
pdf = auth.get('download_url') # Were continuing the same session
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
else:
print('No go, got {0} response'.format(auth.status_code))
I'm probably overlooking something spectacularly obvious, but I can't find why the following is happening.
I'm trying to POST a search query to http://www.arcade-museum.com using the requests lib and whenever the query contains spaces, the resulting page contains no results. Compare the result of these snippets:
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': '1942'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
and
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': 'Wonder Boy'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
If you try the same query on the website, the latter will result in a list of about 10 games. The same happens when posting the form data using the Postman REST client Chrome extension:
Again, it's probably something very obvious I'm overlooking, but I can't find what's causing this issue.