Python append json to json file in a while loop - python

I'm trying to get all users information from GitHub API using Python Requests library. Here is my code:
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
r = requests.get(url, headers=headers)
users = r.json()
with open('users.json', 'w') as outfile:
json.dump(users, outfile)
I can dump first page of users into a json file by now. I can also find the 'next' page's url:
next_url = r.links['next'].get('url')
r2 = requests.get(next_url, headers=headers)
users2 = r2.json()
Since I don't know how many pages yet, how can I append 2nd, 3rd... page to 'users.json' sequentially in a while loop as fast as possible?
Thanks!

First, you need to open file in 'a' mode, otherwise subsequence write will overwrite everything
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
outfile = open('users.json', 'a')
while True:
r = requests.get(url, headers=headers)
users = r.json()
json.dump(users, outfile)
url = r.links['next'].get('url')
# I don't know what Github return in case there is no more users, so you need to double check by yourself
if url == '':
break
outfile.close()

Append the data you get from the requests query to a list and move on to the next query.
Once you have all of the data you want, then proceed to try to concatenate the data into a file or into an object. You can also use threading to do multiple queries in parallel, but most likely there is going to be rate limiting on the api.

Related

Grabbing the octet stream data from a Graph API response

I have been working on some code to download a days worth of Teams usage data from the Graph API. I can successfully send the token and receive the response. The response apparently contains the URL in the head to download the csv file. I can't see to find the code to grab it though.
My code as the moment is as follows.
import requests, urllib, json, csv, os
client_id = urllib.parse.quote_plus('XXXX')
client_secret = urllib.parse.quote_plus('XXXX')
tenant = urllib.parse.quote_plus('XXXX')
auth_uri = 'https://login.microsoftonline.com/' + tenant \
+ '/oauth2/v2.0/token'
auth_body = 'grant_type=client_credentials&client_id=' + client_id \
+ '&client_secret=' + client_secret \
+ '&scope=https%3A%2F%2Fgraph.microsoft.com%2F.default'
authorization = requests.post(auth_uri, data=auth_body, headers={'Content-Type': 'application/x-www-form-urlencoded'})
token = json.loads(authorization.content)['access_token']
graph_uri = 'https://graph.microsoft.com/v1.0/reports/getTeamsUserActivityUserDetail(date=2023-01-22)'
response = requests.get(graph_uri, data=auth_body, headers={'Content-Type': 'application/json', 'Authorization': 'Bearer ' + token})
print(response. Headers)
Is there any easy way to parse the URL from the header and to obtain the CSV file?
REF: https://learn.microsoft.com/en-us/graph/api/reportroot-getteamsuseractivityuserdetail?view=graph-rest-beta
response.headers is a case-insensitive dictionary of response headers, so you should be able to get location header this way
locationUrl = response.headers['location']
# retrieving data from the URL using get method
response = requests.get(locationUrl)
# write response content to a file
with open("data.csv", 'wb') as f:
f.write(response.content)

Pulling Change Request CSV from ServiceNow with specific columns (Fields) in Python

Currently I have
import requests
import json
import csv
# Set the request parameters
url= 'dev.service-now.com/change_request_list.do?CSV&'
user = 'myuser'
pwd = 'mypass'
# Set proper headers (Unsure if this is needed)
headers = {"Accept":"application/json"}
# Do the HTTP request
response = requests.get(url, auth=(user, pwd), headers=headers )
response.raise_for_status()
with open('out.csv', 'w') as f:
writer = csv.writer(f)
for line in response.iter_lines():
writer.writerow(line.decode('utf-8').split(','))
This gets the data I want from ServiceNow however it is missing certain fields. I need the 'Opened' & 'Closed' Column and am unsure how to query that with the code I have.
Any help would be perfect! I am really new to using requests.
Here is a solution using the Rest Table API that allows you to control what fields you want to pull. I also added a sysparm_query to restrict the rows.
import requests
import json
import csv
from urllib.parse import urlencode
url = 'https://dev.service-now.com/api/now/table/change_request'
user = 'myuser'
pwd = 'mypass'
fields = ['number', 'short_description', 'opened_at', 'closed_at']
params = {
'sysparm_display_value': 'true',
'sysparm_exclude_reference_link': 'true',
'sysparm_limit': '5000',
'sysparm_fields': ','.join(fields),
'sysparm_query': 'sys_created_on>2020-09-15'
}
headers = {"Accept":"application/json"}
response = requests.get(url + '?' + urlencode(params),
auth=(user, pwd), headers=headers)
response.raise_for_status()
rows = response.json()['result']
with open('out.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(fields)
for row in rows:
outputrow = []
for field in fields:
outputrow.append(row[field])
writer.writerow(outputrow)
At a glance, your code looks correct. I think you just need to update the web service URL you are using to provide the sysparm_default_export_fields=all parameter. I.E:
dev.service-now.com/change_request_list.do?CSV&sysparm_default_export_fields=all
After that, you should get a response containing every field, including system created fields like sys_id and created_on. Alternatively, you could create a new view in ServiceNow and provide the sysparm_view=viewName parameter in your URL.

renewing an access token in python

This is my first question, please bear with me. I am working with an API that authenticates using an access token that expires in 15 minutes, there is no refresh token to use in-lieu of a re-login. So far I have been able to get the access token and insert it into the requests.get call but I cannot seem to get it to renew and am at a loss as to how.
All of the work done with this API, and in general, is with Python so I am hoping to keep it in Python throughout and in the same file.
I get a 401 message code once the 15 minutes are up, and code 200 if successful. So far my only ideas are to put it on a timer for renewal but I cannot make heads or tails of stackoverflow posts or the documentation on doing that, have the login running in a separate script and then this script calls the other one for the current header variable (but that still would require a timer), or have it call to redo the login function once it hits a response.status_code != 200.
Example script for getting the access token
import requests, os, json, time, csv
def login (url, payload):
#this will log into API and get an access token
auth = requests.post(url, data=payload).json()
sessionToken = auth["token"]
sessionTimer = auth["validFor"]
headers = {'Access-Token': sessionToken}
return headers
#calling the function to generate the token
if __name__ == '__main__':
url = "url inserted here"
u = input("Enter your username: ")
p = input("Enter your password: ")
t = input("Enter your tenancy name: ")
payload = {'username': u, 'password': p, 'tenant': t}
print("Logging in")
headers = login(url, payload)
#the actual work as pulled from a csv file
valuables = input("CSV file with filepath: ")
file = open(valuables, 'r', encoding='utf-8')
csvin = csv.reader(file)
for row in csvin:
try:
uuidUrl = row[0]
output_file = row[1]
response = requests.get(uuidUrl, headers=headers)
print(response.status_code)
with open(output_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
except requests.exceptions.RequestException:
print(output_file,"may have failed")
login(url, payload)
continue
I couldn't get it to successfully recognize a if response.status_code != 200: as a way to call back on the login(). I also couldn't seem to get it to exit a while True: loop.
I apologize I cannot give more details on accessing the API for other people to try out. It is non-public
Eventually I was able to figure out the answer to my own question. Posting this for later users. Updated snippet is below.
Short version of the story: requests.status_code was sending back a integer but I made the faulty assumption that it would be a string, thus my internal comparison was no good.
for row in csvin:
try:
uuidUrl = row[0]
xip_file = row[1]
response = requests.get(uuidUrl, headers=headers)
status = response.status_code
print(status)
if status == 401:
print(xip_file, "may have failed, loggin back in")
login(url, payload)
headers = login(url, payload)
response = requests.get(uuidUrl, headers=headers)
with open(xip_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
else:
with open(xip_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
except requests.exceptions.RequestException:
print(xip_file,"may have failed")
headers = login(url, payload)
continue

Need to download the PDF, NOT the content of the webpage

So as it stands I am able to get the content of the webpage of the PDF link EXAMPLE OF THE LINK HERE BUT, I don't want the content of the webpage I want the content of the PDF so I can put the content into a PDF on my computer in a folder.
I have been successful in doing this on sites that I don't need to log into and without a proxy server.
Relevant CODE:
import os
import urllib2
import time
import requests
import urllib3
from random import *
s = requests.Session()
data = {"Username":"username", "Password":"password"}
url = "https://login.url.com"
print "doing things"
r2 = s.post(url, data=data, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#I get a response 200 from printing r2
print r2
downlaod_url = "http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM"
file = open("F:\my_filepath\document" + str(maxCounter) + ".pdf", 'wb')
temp = s.get(download_url, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#This prints out the response from the proxy server (i.e. 200)
print temp
something = uniform(5,6)
print something
time.sleep(something)
#This gets me the content of the web page, not the content of the PDF
print temp.content
file.write(temp.content)
file.close()
I need help figuring out how to "download" the content of the PDF
try this:
import requests
url = 'http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM'
pdf = requests.get(url)
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
Edit
Try again with a requests session to manage cookies (assuming they send you those after login) and also maybe a different proxy
proxy_dict = {'https': 'ip:port'}
with requests.Session() as session:
# Authentication request, use GET/POST whatever is needed
# data variable should hold user/password information
auth = session.get(login_url, data=data, proxies=proxy_dict, verify=False)
if auth.status_code == 200:
print(auth.cookies) # Tell me if you got anything
pdf = auth.get('download_url') # Were continuing the same session
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
else:
print('No go, got {0} response'.format(auth.status_code))

POST form data containing spaces with Python requests

I'm probably overlooking something spectacularly obvious, but I can't find why the following is happening.
I'm trying to POST a search query to http://www.arcade-museum.com using the requests lib and whenever the query contains spaces, the resulting page contains no results. Compare the result of these snippets:
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': '1942'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
and
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': 'Wonder Boy'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
If you try the same query on the website, the latter will result in a list of about 10 games. The same happens when posting the form data using the Postman REST client Chrome extension:
Again, it's probably something very obvious I'm overlooking, but I can't find what's causing this issue.

Categories

Resources