Print Specific Value from an API Request in Python - python

I am trying to print the values from an API Request. The JSON file returned is large(4,000 lines) so I am just trying to get specific values from the key value pair and automate a message.
Here is what I have so far:
import requests
import json
import urllib
url = "https://api.github.com/repos/<companyName>/<repoName>/issues" #url
payload = {}
headers = {
'Authorization': 'Bearer <masterToken>' #authorization works fine
}
name = (user.login) #pretty sure nothing is being looked out
url = (url)
print(hello %name, you have a pull request to view. See here %url for more information) # i want to print those keys here
The JSON file (exported from the API get request is as followed:
[
{
**"url": "https://github.com/<ompanyName>/<repo>/issues/1000",**
"repository_url": "https://github.com/<ompanyName>/<repo>",
"labels_url": "https://github.com/<ompanyName>/<repo>/issues/1000labels{/name}",
"comments_url": "https://github.com/<ompanyName>/<repo>/issues/1000",
"events_url": "https://github.com/<ompanyName>/<repo>/issues/1000",
"html_url": "https://github.com/<ompanyName>/<repo>/issues/1000",
"id": <id>,
"node_id": "<nodeID>",
"number": 702,
"title": "<titleName>",
"user": {
**"login": "<userName>",**
"id": <idNumber>,
"node_id": "nodeID",
"avatar_url": "https://avatars3.githubusercontent.com/u/urlName?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/<userName>",
"html_url": "https://github.com/<userName>",
"followers_url": "https://api.github.com/users/<userName>/followers",
"following_url": "https://api.github.com/users/<userName>/following{/other_user}",
"gists_url": "https://api.github.com/users/<userName>/gists{/gist_id}",
"starred_url": "https://api.github.com/users/<userName>/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/<userName>/subscriptions",
"organizations_url": "https://api.github.com/users/<userName>/orgs",
"repos_url": "https://api.github.com/users/<userName>/repos",
"events_url": "https://api.github.com/users/<userName>/events{/privacy}",
"received_events_url": "https://api.github.com/users/<userName>/received_events",
"type": "User",
"site_admin": false
},
]
(note this JSON file repeats a few hundred times)
From the API request, I am trying to get the nested "login" and the url.
What am I missing?
Thanks
Edit:
Solved:
import requests
import json
import urllib
url = "https://api.github.com/repos/<companyName>/<repoName>/issues"
payload = {}
headers = {
'Authorization': 'Bearer <masterToken>'
}
response = requests.get(url).json()
for obj in response:
name = obj['user']['login']
url = obj['url']
print('Hello {0}, you have an outstanding ticket to review. For more information see here:{1}.'.format(name,url))

Since it's a JSON array you have to loop over it. And JSON objects are converted to dictionaries, so you use ['key'] to access the elements.
for obj in response:
name = obj['user']['login']
url = obj['url']
print(f'hello {name}, you have a pull request to view. See here {url} for more information')

you can parse it into a python lists/dictionaries and then access it like any other python object.
response = requests.get(...).json()
login = response[0]['user']

You can convert JSON formatted data to a Python dictionary like this:
https://www.w3schools.com/python/python_json.asp
json_data = ... # response from API
dict_data = json.loads(json_data)
login = response[0]['user']['login']
url = response[0]['url']

Related

How to scrape data from sciencedirect

I want to scrape all data from sciencedirect by keyword.
I know that sciencedirect is program by ajax,
so the data of their page could't be extract directly via the
url of search result page.
The page I want to scrape
I've find the json data from numerous requests in Network area, in my view, I could get json data by this url of the request.But there are some error msg and garbled. Here is my code.
The request that contain json
import requests as res
import json
from bs4 import BeautifulSoup
keyword="digital game"
url = 'https://www.sciencedirect.com/search/api?'
payload = {
'tak': keyword,
't': 'ZNS1ixW4GGlMjTKbRHccgZ2dHuMVHqLqNBwYzIZayNb8FZvZFnVnLBYUCU%2FfHTxZMgwoaQmcp%2Foemth5%2FnqtM%2BGQW3NGOv%2FI0ng6yDADzynQO66j9EPEGT0aClusSwPFvKdDbfVcomCzYflUlyb3MA%3D%3D',
'hostname': 'www.sciencedirect.com'
}
r = res.get(url, params = payload)
print(r.content) # get garbled
r = r.json()
print(r) # get error msg
Garbled (not json data I expect)
Error msg (about .json()
Try setting the HTTP headers in the request such as user-agent to mimic a standard web browser. This will return query search results in JSON format.
import requests
keyword = "digital game"
url = 'https://www.sciencedirect.com/search/api?'
headers = {
'User-Agent': 'Mozilla/5.0',
'Accept': 'application/json'
}
payload = {
'tak': keyword,
't': 'ZNS1ixW4GGlMjTKbRHccgZ2dHuMVHqLqNBwYzIZayNb8FZvZFnVnLBYUCU%2FfHTxZMgwoaQmcp%2Foemth5%2FnqtM%2BGQW3NGOv%2FI0ng6yDADzynQO66j9EPEGT0aClusSwPFvKdDbfVcomCzYflUlyb3MA%3D%3D',
'hostname': 'www.sciencedirect.com'
}
r = requests.get(url, headers=headers, params=payload)
# need to check if the response output is JSON
if "json" in r.headers.get("Content-Type"):
data = r.json()
else:
print(r.status_code)
data = r.text
print(data)
Output:
{'searchResults': [{'abstTypes': ['author', 'author-highlights'], 'authors': [{'order': 1, 'name': 'Juliana Tay'},
..., 'resultsCount': 961}}
I've got the same problem. The point is that sciencedirect.com is using cloudflare which blocks the access for scraping bots. I've tried to use different approaches like cloudsraper, cfscrape etc... Unsuccessful! Then I've made a small parser based on Selenium which allows me to take metadata from publications and put it into my own json file with following schema:
schema = {
"doi_number": {
"metadata": {
"pub_type": "Review article" | "Research article" | "Short communication" | "Conference abstract" | "Case report",
"open_access": True | False,
"title": "title_name",
"journal": "journal_name",
"date": "publishing_date",
"volume": str,
"issue": str,
"pages": str,
"authors": [
"author1",
"author2",
"author3"
]
}
}
}
If you have any questions or maybe ideas fill free to contact me.

REST APIs: How to POST several entries from a (JSON) file in a python loop?

I am new in python and REST world.
My Python script
import json
import requests
with open(r"create-multiple-Users.json", "r") as payload:
data = json.load(payload)
json_data = json.dumps(data, indent=2)
headers = {'content-type': 'application/json; charset=utf-8'}
for i in range(len(data)):
r = requests.post('http://localhost:3000/users',
data=json_data, headers=headers)
Mock API server: https://github.com/typicode/json-server .
Entry file: "info.json" with Endpoint: /users that has one user initially.
{
"users": [
{
"id": 1,
"name": "John",
"job": "Wong"
}
]
}
Issue:
POSTing from a file with only one user works perfectly. The new user is appended to info.json as expected as an object.
But when trying to POST let's say 3 users from file "create-multiple-Users.json" below, then the users are appended to the "info.json" as lists of objects 3 times (i.e. the number of objects/iterations)
[
{
"id": 10,
"name": "Janet",
"job": "Weaver"
},
{
"id": 12,
"name": "Kwonn",
"job": "Wingtsel"
},
{
"id": 13,
"name": "Eve",
"job": "Holt"
}
]
I would expect the users to be appended one by one as separate objects.
Maybe I am too oversimplifying the looping?
Any help is highly appreciated.
PS: Sorry I couldn't get the multiple-users file formatted ;(
A simple change in your for iteration would help:
import json
import requests
with open(r"create-multiple-Users.json", "r") as payload:
data = json.load(payload)
json_data = json.dumps(data, indent=2)
headers = {'content-type': 'application/json; charset=utf-8'}
for row in data: # Change this to iterate the json list
r = requests.post('http://localhost:3000/users',
data=row, headers=headers) # Send row that is a single object
I found the solution by using the hint thanks to "enriqueojedalara"
import json
import requests
with open(r"create-multiple-Users.json", "r") as payload:
data = json.load(payload) #<class 'list'>
headers = {'content-type': 'application/json; charset=utf-8'}
print("Total number of objects: ", len(data))
for i in range(len(data)):
data_new = json.dumps(data[i])
r = requests.post('http://localhost:3000/users', data=data_new, headers=headers)
print("Item#", i, "added", " -> ", data_new)

How can I use a JSON file as an input to MS Azure text analytics using Python?

I've looked through many responses to variants of this problem but still not able to get my code working.
I am trying to use the MS Azure text analytics service and when I paste the example code (including 2/3 sample sentences) it works as you might expect. However, my use case requires the same analysis to be performed on hundreds of free text survey responses so rather than pasting in each and every sentence, I would like to use a JSON file containing these responses as an input, pass that to Azure for analysis and receive back a JSON output.
The code I am using and the response it yields is shown below (note that the last bit of ID 2 response has been chopped off before the error message).
key = "xxxxxxxxxxx"
endpoint = "https://blablabla.cognitiveservices.azure.com/"
import json
with open(r'example.json', encoding='Latin-1') as f:
data = json.load(f)
print (data)
import os
from azure.cognitiveservices.language.textanalytics import TextAnalyticsClient
from msrest.authentication import CognitiveServicesCredentials
def authenticateClient():
credentials = CognitiveServicesCredentials(key)
text_analytics_client = TextAnalyticsClient(
endpoint=endpoint, credentials=credentials)
return text_analytics_client
import requests
# pprint is used to format the JSON response
from pprint import pprint
import os
subscription_key = "xxxxxxxxxxxxx"
endpoint = "https://blablabla.cognitiveservices.azure.com/"
entities_url = "https://blablabla.cognitiveservices.azure.com/text/analytics/v2.1/entities/"
documents = data
headers = {"Ocp-Apim-Subscription-Key": subscription_key}
response = requests.post(entities_url, headers=headers, json=documents)
entities = response.json()
pprint(entities)
[{'ID': 1, 'text': 'dog ate my homework', {'ID': 2, 'text': 'cat sat on the
{'code': 'BadRequest',
'innerError': {'code': 'InvalidRequestBodyFormat',
'message': 'Request body format is wrong. Make sure the json '
'request is serialized correctly and there are no '
'null members.'},
'message': 'Invalid request'}
According to my research, when we call Azure text analytics rest API to Identify Entities, the request body should be like
{
"documents": [
{
"id": "1",
"text": "."
},
{
"id": "2",
"text": ""
}
]
}
For example
My json file
[{
"id": "1",
"text": "dog ate my homework"
}, {
"id": "2",
"text": "cat sat on the sofa"
}
]
My code
key = ''
endpoint = "https://<>.cognitiveservices.azure.com/"
import requests
from pprint import pprint
import os
import json
with open(r'd:\data.json', encoding='Latin-1') as f:
data = json.load(f)
pprint(data)
entities_url = endpoint + "/text/analytics/v2.1/entities?showStats=true"
headers = {"Ocp-Apim-Subscription-Key": key}
documents=data
response = requests.post(entities_url, headers=headers, json=documents)
entities = response.json()
pprint(entities)
pprint("--------------------------------")
documents ={}
documents["documents"]=data
response = requests.post(entities_url, headers=headers, json=documents)
entities = response.json()
pprint(entities)

Handling url parts in a api returned json format

*** I have updated code at the bottom.
I have a json object I'm working with and it's coming from Azure analytics for an application we build. I'm trying to figure hot to parse the Url that comes back with just the limits and location keys data in separate columns. The code I'm using is listed here: (keys are take out as well as url because of api keys and tokens)
import request
import pandas as pd
from urllib.parse import urlparse
from furl import furl
import json
d1 = '<This is where I have the rule for the API>'
querystring = {"timespan":"P7D") #get's last 7 days
headers = { Stuff in here for headers }
response = requests.request("GET", d1, headers=headers, parms=querystring)
data = json.loads(response.text)
#then I clean up the stuff in the dataframe
for stuff in data(['value'])
del stuff['count'].....(just a list of all the non needed fields in the json script)
newstuff = json.dumps(data, indent=2, sort_keys=True
data2 = json.loads(newstuff)
ok now here is the part I am having problems with, I want to pull out 3 columns of data from each row. the ['request']['url'], ['timestamp'], ['user']['id']
I'm pretty sure I need to do a for loop so I'm doing the following to get the pieces out.
for x in data2['value']:
time = x['timestamp']
user = x['user']['id']
url = furl(x['request']['url'])
limit = url.args['limit']
location = url.args['location']
What's happening is when I try this i'm getting limit does not exist for every url. I think I have to do a if else statement but not sure how to formulate this. I need to get everything into a dataframe so I can parse it out into a cursor.execute statement which I know how to do.
What's needed.
1. Get the information in the for loop into a dataframe
2. take the url and if the url does not have a Limit or a Location then make it none else put limit in a column and same for location in a column by itself.
Dataframe would look like this
timestamp user limit location
2018-01-01 bob#home.com null
2018-01-01 bill#home.com null
2018-01-01 same#home.com null null
2018-01-02 bob#home.com
here is the information on furl
here is some sample json to test with:
{
"value": [{
"request": {
"url": "https://website/testing"
},
"timestamp": "2018-09-23T18:32:58.153z",
"user": {
"id": ""
}
},
{
"request": {
"url": "https://website/testing/limit?location=31737863-c431-e6611-9420-90b11c44c42f"
},
"timestamp": "2018-09-23T18:32:58.153z",
"user": {
"id": "steve#home.com"
}
},
{
"request": {
"url": "https://website/testing/dealanalyzer?limit=57bd5872-3f45-42cf-bc32-72ec21c3b989&location=31737863-c431-e611-9420-90b11c44c42f"
},
"timestamp": "2018-09-23T18:32:58.153z",
"user": {
"id": "tom#home.com"
}
}
]
}
import requests
import pandas as pd
from urllib.parse import urlparse
import json
from pandas.io.json import json_normalize
d1 = "https://nowebsite/v1/apps/11111111-2222-2222-2222-33333333333333/events/requests"
querystring = {"timespan":"P7D"}
headers = {
'x-api-key': "xxxxxxxxxxxxxxxxxxxxxxxx",
'Cache-Control': "no-cache",
'Postman-Token': "xxxxxxxxxxxxxxxxxxxx"
}
response = requests.request("GET", d1, headers=headers, params=querystring)
data = json.loads(response.text)
# delete crap out of API GET Request
for stuff in data['value']:
del stuff['count']
del stuff['customDimensions']
del stuff['operation']
del stuff['session']
del stuff['cloud']
del stuff['ai']
del stuff['application']
del stuff['client']
del stuff['id']
del stuff['type']
del stuff['customMeasurements']
del stuff['user']['authenticatedId']
del stuff['user']['accountId']
del stuff['request']['name']
del stuff['request']['success']
del stuff['request']['duration']
del stuff['request']['performanceBucket']
del stuff['request']['resultCode']
del stuff['request']['source']
del stuff['request']['id']
newstuff = json.dumps(data, indent=2, sort_keys=True)
#print(newstuff)
# Now it's in a cleaner format to work with
data2 = json.loads(newstuff)
json_normalize(data2['value'])
From here the data is in a pandas dataframe and looks like I want it to.
I just need to know how to use the furl to pull the limit and location out of the url on a per row basis and create a new column called load and limits as mentions above.

Glassdoor API Not Printing Custom Response

I have the following problem when I try to print something from this api. I'm trying to set it up so I can access different headers, then print specific items from it. But instead when I try to print soup it gives me the entire api response in json format.
import requests, json, urlparse, urllib2
from BeautifulSoup import BeautifulSoup
url = "apiofsomesort"
#Create Dict based on JSON response; request the URL and parse the JSON
#response = requests.get(url)
#response.raise_for_status() # raise exception if invalid response
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(url,headers=hdr)
response = urllib2.urlopen(req)
soup = BeautifulSoup(response)
print soup
When it prints it looks like the below:
{
"success": true,
"status": "OK",
"jsessionid": "0541E6136E5A2D5B2A1DF1F0BFF66D03",
"response": {
"attributionURL": "http://www.glassdoor.com/Reviews/airbnb-reviews-SRCH_KE0,6.htm",
"currentPageNumber": 1,
"totalNumberOfPages": 1,
"totalRecordCount": 1,
"employers": [{
"id": 391850,
"name": "Airbnb",
"website": "www.airbnb.com",
"isEEP": true,
"exactMatch": true,
"industry": "Hotels, Motels, & Resorts",
"numberOfRatings": 416,
"squareLogo": "https://media.glassdoor.com/sqll/391850/airbnb-squarelogo-1459271200583.png",
"overallRating": 4.3,
"ratingDescription": "Very Satisfied",
"cultureAndValuesRating": "4.4",
"seniorLeadershipRating": "4.0",
"compensationAndBenefitsRating": "4.3",
"careerOpportunitiesRating": "4.1",
"workLifeBalanceRating": "3.9",
"recommendToFriendRating": "0.9",
"sectorId": 10025,
"sectorName": "Travel & Tourism",
"industryId": 200140,
"industryName": "Hotels, Motels, & Resorts",
"featuredReview": {
"attributionURL": "http://www.glassdoor.com/Reviews/Employee-Review-Airbnb-RVW12111314.htm",
"id": 12111314,
"currentJob": false,
"reviewDateTime": "2016-09-28 16:44:00.083",
"jobTitle": "Employee",
"location": "",
"headline": "An amazing place to work!",
"pros": "Wonderful people and great culture. Airbnb really strives to make you feel at home as an employee, and everyone is genuinely excited about the company mission.",
"cons": "The limitations of Rails 3 and the company infrastructure make developing difficult sometimes.",
"overall": 5,
"overallNumeric": 5
},
"ceo": {
"name": "Brian Chesky",
"title": "CEO & Co-Founder",
"numberOfRatings": 306,
"pctApprove": 95,
"pctDisapprove": 5,
"image": {
"src": "https://media.glassdoor.com/people/sqll/391850/airbnb-brian-chesky.png",
"height": 200,
"width": 200
}
}
}]
}
}
I want to print out specific items like employers":name, industry etc...
You can load the JSON response into a dict then look for the values you want like you would in any other dict.
I took your data and saved it in an external JSON file to do a test since I don't have access to the API. This worked for me.
import json
# Load JSON from external file
with open (r'C:\Temp\json\data.json') as json_file:
data = json.load(json_file)
# Print the values
print 'Name:', data['response']['employers'][0]['name']
print 'Industry:', data['response']['employers'][0]['industry']
Since you're getting your data from an API something like this should work.
import json
import urlib2
url = "apiofsomesort"
# Load JSON from API
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(url, headers=hdr)
response = urllib2.urlopen(req)
data = json.load(response.read())
# Print the values
print 'Name:', data['response']['employers'][0]['name']
print 'Industry:', data['response']['employers'][0]['industry']
import json, urlib2
url = "http..."
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(url, headers=hdr)
response = urllib2.urlopen(req)
data = json.loads(response.read())
# Print the values
print 'numberOfRatings:', data['response']['employers'][0]['numberOfRatings']

Categories

Resources