Iterate through nested JSON object and get values throughout - python

Working on a API project, in which I'm trying to get all the redirect urls from an API output such as https://urlscan.io/api/v1/result/39a4fc22-39df-4fd5-ba13-21a91ca9a07d/
Example of where I'm trying to pull the urls from:
"redirectResponse": {
"url": "https://www.coke.com/"
I currently have the following code:
import requests
import json
import time
#URL to be scanned
url = 'https://www.coke.com'
#URL Scan Headers
headers = {'API-Key':apikey,'Content-Type':'application/json'}
data = {"url":url, "visibility": "public"}
response = requests.post('https://urlscan.io/api/v1/scan/',headers=headers, data=json.dumps(data))
uuid = response.json()['uuid']
responseUrl = response.json()['api']
time.sleep(10)
req = requests.Session()
r = req.get(responseUrl).json()
r.keys()
for value in r['data']['requests']['redirectResponse']['url']:
print(f"{value}")
I get the following error: TypeError: list indices must be integers or slices, not str. Not sure what the best way to parse the nested json in order to get all the redirect urls.

A redirectResponse isn't always present in the requests, so the code has to be written to handle that and keep going. In Python that's usually done with a try/except:
for obj in r['data']['requests']:
try:
redirectResponse = obj['request']['redirectResponse']
except KeyError:
continue # Ignore and skip to next one.
url = redirectResponse['url']
print(f'{url=!r}')

Related

Making an API request (LinkedIn) for every element in a list from a response, with Python

I have a list of LinkedIn posts IDs. I need to request share statistics for each of those posts with another request.
The request function looks like this:
def ugcp_stats(headers):
response = requests.get(f'https://api.linkedin.com/v2/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&ugcPosts=List(urn%3Ali%3AugcPost%3A{shid},urn%3Ali%3AugcPost%3A{shid2},...,urn%3Ali%3AugcPost%3A{shidx})', headers = headers)
ugcp_stats = response.json()
return ugcp_stats
urn%3Ali%3AugcPost%3A{shid},urn%3Ali%3AugcPost%3A{shid2},...,urn%3Ali%3AugcPost%3A{shidx} - these are the share urns. Their number depends on number of elements in my list.
What should I do next? Should I count the number of elements in my list and somehow amend the request URL to include all of them? Or maybe I should loop through the list and make a separate request for each of the elements and then append all the responses in one json file?
I'm struggling and I'm not quite sure how to write this. I don't even know how to parse the element into the request. Although I suspect it could look something like this:
for shid in shids:
def ugcp_stats(headers):
response = requests.get(f'https://api.linkedin.com/v2/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&ugcPosts=List(urn%3Ali%3AugcPost%3A & {shid})', headers = headers)
ugcp_stats = response.json()
return ugcp_stats
UPDATE - following your ansers
The code looks like this now:
link = "https://api.linkedin.com/v2/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&ugcPosts=List"
def share_stats(headers, shids):
# Local variable
sample = ""
# Sample the shids in the right pattern
for shid in shids: sample += "urn%3Ali%3AugcPost%3A & {},".format(shid)
# Get the execution of the string content
response = eval(f"requests.get('{link}({sample[:-1]})', headers = {headers})")
# Return the stats
return response.json()
if __name__ == '__main__':
credentials = 'credentials.json'
access_token = auth(credentials) # Authenticate the API
headers = headers(access_token) # Make the headers to attach to the API call.
share_stats = share_stats(headers) # Get shares
print(share_stats)
But nothing seems to be happening. It finishes the script, but I don't get anything. What's wrong?
This is just a proof of what I told you earlier as comment. Now you will adapt to your needs (even I try it to do it for you) :)
Updated - Base on your feedback.
#// IMPORT
#// I'm assuming your are using "requests" library
#// PyCharm IDE show me like this library is not used, but "eval()" is using it
import requests
#// GLOBAL VARIABLES
link: str = "https://api.linkedin.com/v2/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&ugcPosts=List"
#// Your function logic updated
def share_stats(sheds: list, header: dict) -> any:
# Local variable
sample = ""
# Sample the sheds in the right pattern
for shed in sheds: sample += "urn%3Ali%3AugcPost%3A & {},".format(shed)
# Get the execution of the string content
response = eval(f"requests.get('{link}({sample[:-1]})', headers = {header})")
# Return the stats as JSON file
return response.json()
#// Run if this is tha main file
if __name__ == '__main__':
#// An empty sheds list for code validation
debug_sheds: list = []
credentials: str = "credentials.json"
#// For me I get an unresolved reference for "auth", for you shod be fine
#// I'm assuming is your function for reading the file content and convert it to Python
access_token = auth(credentials) # Authenticate the API
#// Your error was from this script line
#// Error message: 'TypedDict' object is not callable
#// Your code: headers = headers(access_token)
#// When you want to get a dictionary value by a key use square brackets
headers = headers[access_token] # Make the headers to attach to the API call.
#// Here you shood ged an error/warning because you do not provided the sheds first time
#// Your code: share_stats = share_stats(headers)
share_stats = share_stats(debug_sheds, headers) # Get shares
print(share_stats)

Using Python Requests library to request Cookie as JSON in order to format as DataFrame

I'm trying to use Python Requests to retrieve a cookie from a website using a POST requests and JSON request parameters.
I'm using KNIME, which allows you to output a response as a DataFrame. When I request the cookie & attempt to output it as a DataFrame with the following code:
from pandas import DataFrame
import requests
session = requests.Session()
url = "https://api.danmurphys.com.au/apis/ui/Address/SetFavouriteStore"
payload = {"StoreNo":"1276"}
x = session.post(url, json=payload)
output_table = DataFrame(x.cookies)
, I get the following error:
Execute failed: No serializer extension having the id or processing python type "http.cookiejar.Cookie" could be found.
Unsupported column type in column: "0", column type: "<class 'http.cookiejar.Cookie'>".
I know the DataFrame output function would work if the response was JSON, so I've tried the following code:
x = session.post(url, json=payload)
res = session.cookies.get_dict()
output_table = DataFrame(res)
But this gives the following error:
ValueError: If using all scalar values, you must pass an index
If anyone knows how to format these cookies for DataFrame, or if I should be using a different library to Requests please inform me. Thanks.
EDIT: using the following DataFrame constructory:
output_table = DataFrame(data=session.cookies.get_dict(), index=session.cookies.get_dict(), columns=None, dtype=None, copy=False)
Is successful in outputting the cookies formatted as DataFrame however the values are replicated across rows and columns, as depicted in the screenshot.
Since the ValueError points at passing all scalar values with an index, you may just do this:
from pandas import DataFrame
import requests
session = requests.Session()
url = "https://api.danmurphys.com.au/apis/ui/Address/SetFavouriteStore"
payload = {"StoreNo":"1276"}
x = session.post(url, json=payload)
res = session.cookies.get_dict()
output_table = DataFrame({'data':res})#pass res as a dictionary value

Issues while inserting data in cloudant DB

I am working a project, where in I am suppose to get some user input through web application, and send that data to cloudant DB. I am using python for the use case. Below is the sample code:
import requests
import json
dict_key ={}
key = frozenset(dict_key.items())
doc={
{
"_ID":"1",
"COORD1":"1,1",
"COORD2":"1,2",
"COORD3":"2,1",
"COORD4":"2,2",
"AREA":"1",
"ONAME":"abc",
"STYPE":"black",
"CROPNAME":"paddy",
"CROPPHASE":"initial",
"CROPSTARTDATE":"01-01-2017",
"CROPTYPE":"temp",
"CROPTITLE":"rice",
"HREADYDATE":"06-03-2017",
"CROPPRICE":"1000",
"WATERRQ":"1000",
"WATERSRC":"borewell"
}
}
auth = ('uid', 'pwd')
headers = {'Content-type': 'application/json'}
post_url = "server_IP".format(auth[0])
req = requests.put(post_url, auth=auth,headers=headers, data=json.dumps(doc))
#req = requests.get(post_url, auth=auth)
print json.dumps(req.json(), indent=1)
When I am running the code, I am getting the below error:
"WATERSRC":"borewell"
TypeError: unhashable type: 'dict'
I searched a bit, and found below stackflow link as a prospective resolution
TypeError: unhashable type: 'dict'
It says that "To use a dict as a key you need to turn it into something that may be hashed first. If the dict you wish to use as key consists of only immutable values, you can create a hashable representation of it like this:
key = frozenset(dict_key.items())"
I have below queries:
1) I have tried using it in my code above,but I am not sure if I have used it correctly.
2) To put the data in the cloudant DB, I am using Python module "requests". In the code, I am using the below line to put the data in the DB:
req = requests.put(post_url, auth=auth,headers=headers, data=json.dumps(doc))
But I am getting below error:
"reason": "Only GET,HEAD,POST allowed"
I searched on that as well, and I found IBM BLuemix document about it as follows
https://console.ng.bluemix.net/docs/services/Cloudant/basics/index.html#cloudant-basics
As I referred the document, I can say that I am using the right option. But may be I am wrong.
If you are adding a document to the database and you know the the _id, then you need to do an HTTP POST. Here's some slightly modified code:
import requests
import json
doc={
"_id":"2",
"COORD1":"1,1",
"COORD2":"1,2",
"COORD3":"2,1",
"COORD4":"2,2",
"AREA":"1",
"ONAME":"abc",
"STYPE":"black",
"CROPNAME":"paddy",
"CROPPHASE":"initial",
"CROPSTARTDATE":"01-01-2017",
"CROPTYPE":"temp",
"CROPTITLE":"rice",
"HREADYDATE":"06-03-2017",
"CROPPRICE":"1000",
"WATERRQ":"1000",
"WATERSRC":"borewell"
}
auth = ('admin', 'admin')
headers = {'Content-type': 'application/json'}
post_url = 'http://localhost:5984/mydb'
req = requests.post(post_url, auth=auth,headers=headers, data=json.dumps(doc))
print json.dumps(req.json(), indent=1)
Notice that
the _id field is supplied in the doc and is lower case
the request call is a POST not a PUT
the post_url contains the name of the database being written to - in this case mydb
N.B in the above example I am writing to local CouchDB, but replacing the URL with your Cloudant URL and adding correct credentials should get this working for you.

requests.post with Python

I'm connecting to a login protected API with a Python script here below.
import requests
url = 'https://api.json'
header = {'Content-Type': 'application/x-www-form-urlencoded'}
login = ('kjji#snm.com', 'xxxxx')
mnem = 'inputRequests':'{'inputRequests':'[{'function':'GDSP','identifier':'ibm','mnemonic':'IQ_TOTAL_REV'}]}}
r = requests.post(url, auth=login, data=mnem, headers=header)
print(r.json())
The connection is established but I am getting an error from the API because of the format of the data request.The original format is here below. I cannot find a way to enter this in the mnem here above:
inputRequests={inputRequests:
[
{function:"xxx",identifier:"xxx",mnemonic:"xxx"},
]
}
The error given is
C:\Users\xxx\Desktop>pie.py
File "C:\Users\xxx\Desktop\pie.py", line 6
mnem={'inputRequests':'{'inputRequests':'[{'function':'xxx','identifier':'xx','mnemonic':'xxx'}]}}
^
SyntaxError: invalid syntax
I am unsure on how to proceed from here. I cannot find anything in the requests documentation that points to how to insert several variables in the data field.
The requests module in Python receive protogenic Python dict as the JSON data in post request but not a string. Therefore, you may try to define mnem like this:
mnem = {
'inputRequests':[
{'function':'GDSP',
'identifier':'ibm',
'mnemonic':'IQ_TOTAL_REV'
}
]}
the data parameter should be a dictionary.
therefore to pass the three parameters try using:
mnem = {'function':'GDSP','identifier':'ibm','mnemonic':'IQ_TOTAL_REV'}

How to consume JSON response in Python? [duplicate]

This question already has answers here:
HTTP requests and JSON parsing in Python [duplicate]
(8 answers)
Closed 7 years ago.
I am creating a Django web app.
There is a function which creates a JSON response like this:
def rest_get(request, token):
details = Links.get_url(Links, token)
result={}
if len(details)>0:
result['status'] = 200
result['status_message'] = "OK"
result['url'] = details[0].url
else:
result['status'] = 404
result['status_message'] = "Not Found"
result['url'] = None
return JsonResponse(result)
And I get the response in the web browser like this:
{"status": 200, "url": "http://www.bing.com", "status_message": "OK"}
Now from another function I want to consume that response and extract the data out of it. How do I do it?
You can use the json library in python to do your job. for example :
json_string = '{"first_name": "tom", "last_name":"harry"}'
import json
parsed_json = json.loads(json_string)
print(parsed_json['first_name'])
"tom"
Since you have created a web app. I am assuming you have exposed a URL from which you can get you JSON response, for example http://jsonplaceholder.typicode.com/posts/1.
import urllib2
import json
data = urllib2.urlopen("http://jsonplaceholder.typicode.com/posts/1").read()
parsed_json = json.loads(data)
The urlopen function sends a HTTP GET request at the given URL. parsed_json is a variable of the type map and you can extract the required data from it.
print parsed_json['userId']
1
The answer I want to suggest is a little different. In your scenario - where one function needs to be accessed from both server and client end, I would suggest provide some extra parameter and change the output based on that. This reduces overheads and unnecessary conversions.
For example, if you pass in an extra parameter and change the result like this, you don't need JSON parsing on python. Of course there are solutions to do that, but why need converting to json and then parsing back when you can avoid that totally?
def rest_get(request, token, return_json=True):
details = Links.get_url(Links, token)
result={}
if len(details)>0:
result['status'] = 200
result['status_message'] = "OK"
result['url'] = details[0].url
else:
result['status'] = 404
result['status_message'] = "Not Found"
result['url'] = None
if return_json: # this is web response, so by default return_json = True
return JsonResponse(result)
return result
Then in your python code call like this -
rest_get(request, token, return_json=False): # we are passing False, so the return type is dictionary and we can use it right away.

Categories

Resources