How to extract a couple of fields nested in response using python

How to extract a couple of fields nested in response using python - python

I'm a python beginner. I would like to ask for help regarding the retrieve the response data. Here's my script:
import pandas as pd
import re
import time
import requests as re
import json
response = re.get(url, headers=headers, auth=auth)
data = response.json()
Here's a part of json response:
{'result': [{'display': '',
'closure_code': '',
'service_offer': 'Integration Platforms',
'updated_on': '2022-04-23 09:05:53',
'urgency': '2',
'business_service': 'Operations',
'updated_by': 'serviceaccount45',
'description': 'ALERT returned 400 but expected 200',
'sys_created_on': '2022-04-23 09:05:53',
'sys_created_by': 'serviceaccount45',
'subcategory': 'Integration',
'contact_type': 'Email',
'problem_type': 'Design: Availability',
'caller_id': '',
'action': 'create',
'company': 'aaaa',
'priority': '3',
'status': '1',
'opened': 'smith.j',
'assigned_to': 'doe.j',
'number': '123456',
'group': 'blabla',
'impact': '2',
'category': 'Business Application & Databases',
'caused_by_change': '',
'location': 'All Locations',
'configuration_item': 'Monitor',
},
I would like to extract the data only for one group = 'blablabla'. Then I would like to extract fields such as:
number = data['number']
group = data['group']
service_offer = data['service_offer']
updated = data['updated_on']
urgency = data['urgency']
username = data['created_by']
short_desc = data['description']
How it should be done?
I know that to check the first value I should use:
service_offer = data['result'][0]['service_offer']
I've tried to create a dictionary, but, I'm getting an error:
data_result = response.json()['result']
payload ={
number = data_result['number']
group = data_result['group']
service_offer = data_result['service_offer']
updated = data_result['updated_on']
urgency = data_result['urgency']
username = data_result['created_by']
short_desc = data_result['description']
}
TypeError: list indices must be integers or slices, not str:
So, I've started to create something like below., but I'm stuck:
get_data = []
if len(data) > 0:
for item in range(len(data)):
get_data.append(data[item])
May I ask for help?

If data is your decoded json response from the question then you can do:
# find group `blabla` in result:
g = next(d for d in data["result"] if d["group"] == "blabla")
# get data from the `blabla` group:
number = g["number"]
group = g["group"]
service_offer = g["service_offer"]
updated = g["updated_on"]
urgency = g["urgency"]
username = g["sys_created_by"]
short_desc = g["description"]
print(number, group, service_offer, updated, urgency, username, short_desc)
Prints:
123456 blabla Integration Platforms 2022-04-23 09:05:53 2 serviceaccount45 ALERT returned 400 but expected 200

Related

Python Reddit API JSON issues (no PRAW)

I am trying to obtain replies to the comments on Threads. Here is what I have been able to accomplish by parsing JSON:
subreddit = 'wallstreetbets'
link = 'https://oauth.reddit.com/r/'+subreddit+'/hot'
hot = requests.get(link,headers = headers)
hot.json()
Here is output
{'kind': 'Listing',
'data': {'after': 't3_x8kidp',
'dist': 27,
'modhash': None,
'geo_filter': None,
'children': [{'kind': 't3',
'data': {'approved_at_utc': None,
'subreddit': 'wallstreetbets',
'selftext': '**Read [rules](https://www.reddit.com/r/wallstreetbets/wiki/contentguide), follow [Twitter](https://twitter.com/Official_WSB) and [IG](https://www.instagram.com/official_wallstreetbets/), join [Discord](https://discord.gg/wallstreetbets), see [ban bets](https://www.reddit.com/r/wallstreetbets/wiki/banbets)!**\n\n[dm mods because why not](https://www.reddit.com/message/compose/?to=/r/wallstreetbets)\n\n[Earnings Thread](https://wallstreetbets.reddit.com/x4ryjg)',
'author_fullname': 't2_bd6q5',
'saved': False,
'mod_reason_title': None,
'gilded': 0,
'clicked': False,
'title': 'What Are Your Moves Tomorrow, September 08, 2022',
'link_flair_richtext': [{'e': 'text', 't': 'Daily Discussion'}],
'subreddit_name_prefixed': 'r/wallstreetbets',
'hidden': False,
'pwls': 7,
'link_flair_css_class': 'daily',
'downs': 0,
'thumbnail_height': None,
'top_awarded_type': None,
'hide_score': False,
'name': 't3_x8ev67',
...
'created_utc': 1662594703.0,
'num_crossposts': 0,
'media': None,
'is_video': False}}],
'before': None}}
I then turned it into a data frame
df = pd.DataFrame()
for post in hot.json()['data']['children']:
df = df.append({
'subreddit' : post['data']['subreddit'],
'title': post['data']['title'],
'selftext': post['data']['selftext'],
'created_utc': post['data']['created_utc'],
'id': post['data']['id']
}, ignore_index = True)
With this, I was able to obtain a data frame like thisDataFrame
Then, to obtain the comments, I created a list with all the JSON script from the 26 posts, and then created a while loop to iterate through the json script.
supereme = len(list_of_comments)
indexy = pd.DataFrame()
while supereme > 0:
supereme -= 1
for g in range(0,len(list_of_comments[supereme]['data']['children'])-1):
indexy = pd.concat([indexy, pd.DataFrame.from_records([{
'body': list_of_comments[supereme]['data']['children'][g]['data']['body'],
'post_id': list_of_comments[supereme]['data']['children'][g]['data']['parent_id'] }])], ignore_index = True)
indexy
This gave me this: DataFrame
However, I am not able to obtain the replies to the comments. Any help? I tried to do this
posts = 26
for i in np.arange(0,27):
print('i',i)
if len(list_of_comments[i]['data']['children']) == 0:
continue
for j in np.arange(0,len(list_of_comments[i]['data']['children'])):
if len(list_of_comments[i]['data']['children'][j]['data']['replies']) == 0:
break
else:
print('j',len(list_of_comments[i]['data']['children'][j]['data']['replies']))
for z in np.arange(len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'])):
if len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children']) == 0:
break
print('z',z)
print(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])
The first loop kinda works but it doesn't count up properly to get all the replies to all the posts itll only pull like one or two. We don't want to use PRAW

x=len(list_of_comments)
replies = pd.DataFrame()
for i in range(0,len(list_of_comments)):
try:
for j in range(0, len(list_of_comments[x]['data']['children'])):
try:
for z in range(0, len(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'])):
#print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])
#print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id'])
replies = pd.concat([replies, pd.DataFrame.from_records([{
'body': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'],
'post_id': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id']
}])], ignore_index = True)
except:
pass
except:
continue

Iterate through a nested dict inside of a list, with some missing keys

I need your guys' help on how to extract information from a nested dictionary inside a list. Here's the code to get the data:
import requests
import json
import time
all_urls = []
for x in range(5000,5010):
url = f'https://api.jikan.moe/v4/anime/{x}/full'
all_urls.append(url)
all_responses = []
for page_url in all_urls:
response = requests.get(page_url)
all_responses.append(response)
time.sleep(1)
print(all_responses)
data = []
for i in all_responses:
json_data = json.loads(i.text)
data.append(json_data)
print(data)
The sample of the extracted data is as follows:
[{'status': 404,
'type': 'BadResponseException',
'message': 'Resource does not exist',
'error': '404 on https://myanimelist.net/anime/5000/'},
{'status': 404,
'type': 'BadResponseException',
'message': 'Resource does not exist',
'error': '404 on https://myanimelist.net/anime/5001/'},
{'data': {'mal_id': 5002,
'url': 'https://myanimelist.net/anime/5002/Bari_Bari_Densetsu',
'images': {'jpg': {'image_url': 'https://cdn.myanimelist.net/images/anime/4/58873.jpg',
'small_image_url': 'https://cdn.myanimelist.net/images/anime/4/58873t.jpg',
'large_image_url': 'https://cdn.myanimelist.net/images/anime/4/58873l.jpg'},
'webp': {'image_url': 'https://cdn.myanimelist.net/images/anime/4/58873.webp',
'small_image_url': 'https://cdn.myanimelist.net/images/anime/4/58873t.webp',
'large_image_url': 'https://cdn.myanimelist.net/images/anime/4/58873l.webp'}},
'trailer': {'youtube_id': None,
'url': None,
'embed_url': None,
'images': {'image_url': None,
'small_image_url': None,
'medium_image_url': None,
'large_image_url': None,
'maximum_image_url': None}},
'title': 'Bari Bari Densetsu',
'title_english': None,
'title_japanese': 'バリバリ伝説',
'title_synonyms': ['Baribari Densetsu',
......
I need to extract the title from the list of data. Any help is appreciated! Also, any recommendation on a better/simpler/cleaner code to extract the json data from an API is also greatly appreciated!

Firstly, no need to create multiple lists. You can do everything in one loop:
import requests
import json
data = []
for x in range(5000,5010):
page_url = f'https://api.jikan.moe/v4/anime/{x}/full'
response = requests.get(page_url)
json_data = json.loads(response.text)
data.append(json_data)
print(data)
Second, to address your problem, you have two options. You can use dict.get:
for dic in data:
title = dic.get('title', 'no title')
Or use the try/except pattern:
for dic in data:
try:
title = dic['title']
except KeyError:
# deal with case where dict has no title
pass

huobi working python 3.6 example (including signature)

Working example to generate a valid url (including signature) for the Huobi API.
In the Huobi API documenation there is no explicit example that allows you to verify your signature creation method step by step.
My intention is to create that here, but I need help, because I haven't managed yet.
Try to order using huobi api
First, I succeed to get account information
from datetime import datetime
import requests
import json
import hmac
import hashlib
import base64
from urllib.parse import urlencode
#Get all Accounts of the Current User
#apiAccessKey = hb_access, apiSecretKey = hb_secret
timestamp = str(datetime.utcnow().isoformat())[0:19]
params = urlencode({'AccessKeyId': hb_access,
'SignatureMethod': 'HmacSHA256',
'SignatureVersion': '2',
'Timestamp': timestamp
})
method = 'GET'
endpoint = '/v1/account/accounts'
base_uri = 'api.huobi.pro'
pre_signed_text = method + '\n' + base_uri + '\n' + endpoint + '\n' + params
hash_code = hmac.new(hb_secret.encode(), pre_signed_text.encode(), hashlib.sha256).digest()
signature = urlencode({'Signature': base64.b64encode(hash_code).decode()})
url = 'https://' + base_uri + endpoint + '?' + params + '&' + signature
# url = 'https://' + base_uri + endpoint2 + '?' + params + '&' + signature
response = requests.request(method, url)
accts = json.loads(response.text)
print(accts)
it gives :
{'status': 'ok', 'data': [{'id': 1153465, 'type': 'spot', 'subtype': '', 'state': 'working'}, {'id': 1170797, 'type': 'otc', 'subtype': '', 'state': 'working'}]}
but, I have problem with market order.
want to look matchresults :
here is the discription from the website
https://alphaex-api.github.io/openapi/spot/v1/en/#search-match-results
GET /v1/order/matchresults
Parameter DataType Required Default Description Value Range
symbol string true btcusdt, bccbtc.Refer to
I know I should add symbol parameter, but I don't know how I do that,,,
and my code :
timestamp = str(datetime.utcnow().isoformat())[0:19]
params = urlencode({'AccessKeyId': hb_access,
'SignatureMethod': 'HmacSHA256',
'SignatureVersion': '2',
'Timestamp': timestamp,
})
method = 'GET'
base_uri = 'api.huobi.pro'
enpoint ='/v1/order/matchresults'
pre_signed_text = base_uri + '\n' + endpoint + '\n' + params
hash_code = hmac.new(hb_secret.encode(), pre_signed_text.encode(), hashlib.sha256).digest()
signature = urlencode({'Signature': base64.b64encode(hash_code).decode()})
url = 'https://'+base_uri + endpoint +'?'+params+'&'+signature
resp = requests.request(method, url)
resp.json()
it gives :
{'status': 'error',
'err-code': 'api-signature-not-valid',
'err-msg': 'Signature not valid: Verification failure [校验失败]',
'data': None}
I need help...
and another question is :
is way to make Sinature different with using Get method from using POST method?
like placing market order(POST /v1/order/orders/place), when I request using POST do I need to give different Signature?

To get exact symbol you have to provide required parameters in param variable
params = urlencode({'AccessKeyId': hb_access,
'SignatureMethod': 'HmacSHA256',
'SignatureVersion': '2',
'Timestamp': timestamp,
"symbol": "btcusdt" # allowed symbols: GET /v1/common/symbols
})
url variable:
https://api.huobi.pro/v1/order/matchresults?AccessKeyId=39****-b******-4*****-3****&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=2022-09-26T19%3A09%3A16&symbol=btcusdt&Signature=1K**************************
Output:
{'status': 'ok', 'data': []}
Here is my way adding parameters and making request:
def huobi(endpoint: str, extra_params: dict={}, AccessKeyId: str=None, SecretKey: str=None) -> dict:
AccessKeyId = 'xxxxxx-xxxxxx-xxxxxx-xxxx' # tmp
SecretKey = 'xxxxxx-xxxxxx-xxxxxx-xxxx' # tmp
timestamp = str(datetime.utcnow().isoformat())[0:19]
standart_params = {
'AccessKeyId': AccessKeyId,
'SignatureMethod': 'HmacSHA256',
'SignatureVersion': '2',
'Timestamp': timestamp
}
params = dict(standart_params, **extra_params)
params_url_enc = urlencode(
sorted(params.items(), key=lambda tup: tup[0])
)
pre_signed = 'GET\n'
pre_signed += 'api.huobi.pro\n'
pre_signed += f'{endpoint}\n'
pre_signed += params_url_enc
sig_bin = hmac.new(
SecretKey.encode(),
pre_signed.encode(),
hashlib.sha256
).digest()
sig_b64_bytes = base64.b64encode(sig_bin)
sig_b64_str = sig_b64_bytes.decode()
signature = urlencode({'Signature': sig_b64_str})
url = f'https://api.huobi.pro{endpoint}?'
url += params_url_enc + '&'
url += signature
return requests.get(url).json()
endpoint = "/v2/account/valuation"
extra_params = {
"accountType": "1",
"valuationCurrency": "BTC"
}
print(huobi(endpoint=endpoint, extra_params=extra_params))

Looping through a function

I am struggling with figuring out the best way to loop through a function. The output of this API is a Graph Connection and I am a-little out of my element. I really need to obtain ID's from an api output and have them in a dict or some sort of form that I can pass to another API call.
**** It is important to note that the original output is a graph connection.... print(type(api_response) does show it as a list however, if I do a print(type(api_response[0])) it returns a
This is the original output from the api call:
[{'_from': None, 'to': {'id': '5c9941fcdd2eeb6a6787916e', 'type': 'user'}}, {'_from': None, 'to': {'id': '5cc9055fcc5781152ca6eeb8', 'type': 'user'}}, {'_from': None, 'to': {'id': '5d1cf102c94c052cf1bfb3cc', 'type': 'user'}}]
This is the code that I have up to this point.....
api_response = api_instance.graph_user_group_members_list(group_id, content_type, accept,limit=limit, skip=skip, x_org_id=x_org_id)
def extract_id(result):
result = str(result).split(' ')
for i, r in enumerate(result):
if 'id' in r:
id = (result[i+1].translate(str.maketrans('', '', string.punctuation)))
print( id )
return id
extract_id(api_response)
def extract_id(result):
result = str(result).split(' ')
for i, r in enumerate(result):
if 'id' in r:
id = (result[i+8].translate(str.maketrans('', '', string.punctuation)))
print( id )
return id
extract_id(api_response)
def extract_id(result):
result = str(result).split(' ')
for i, r in enumerate(result):
if 'id' in r:
id = (result[i+15].translate(str.maketrans('', '', string.punctuation)))
print( id )
return id
extract_id(api_response)
I have been able to use a function to extract the ID's but I am doing so through a string. I am in need of a scalable solution that I can use to pass these ID's along to another API call.
I have tried to use a for loop but because it is 1 string and i+1 defines the id's position, it is redundant and just outputs 1 of the id's multiple times.
I am receiving the correct output using each of these functions however, it is not scalable..... and just is not a solution. Please help guide me......

So to solve the response as a string issue I would suggest using python's builtin json module. Specifically, the method .loads() can convert a string to a dict or list of dicts. From there you can iterate over the list or dict and check if the key is equal to 'id'. Here's an example based on what you said the response would look like.
import json
s = "[{'_from': None, 'to': {'id': '5c9941fcdd2eeb6a6787916e', 'type': 'user'}}, {'_from': None, 'to': {'id': '5cc9055fcc5781152ca6eeb8', 'type': 'user'}}, {'_from': None, 'to': {'id': '5d1cf102c94c052cf1bfb3cc', 'type': 'user'}}]"
# json uses double quotes and null; there is probably a better way to do this though
s = s.replace("\'", '\"').replace('None', 'null')
response = json.loads(s) # list of dicts
for d in response:
for key, value in d['to'].items():
if key == 'id':
print(value) # or whatever else you want to do
# 5c9941fcdd2eeb6a6787916e
# 5cc9055fcc5781152ca6eeb8
# 5d1cf102c94c052cf1bfb3cc

Using regex to search for text that follows a specific word

I am searching a string of text which contains dictionaries that look like so:
soup_string = """{"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:00+01:00/13:00+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"full","slotDiscountedByDP":"false","slotId":"1hr-12-13-20190613","time":"12:00pm - 1:00pm","rawSlotPrice":"","slotDiscounted":"false"},
{"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:30+01:00/13:30+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"available","slotDiscountedByDP":"false","slotId":"1hr-12:30-13:30-20190613","time":"12:30pm - 1:30pm","rawSlotPrice":"","slotDiscounted":"false"}"""
I am looking to return the string which follows each key in the 'dictionaries'.
I have decided an appropriate method is to use Regex expressions. I can return each times and costs using
Costs = re.findall(r"\£[0-9]\.[0-9][0-9]", soup_string)
times = re.findall(r'\"(time)\"\:\"(.{14,16})\"\,', soup_string)
Essentially I would like to be able to look for each key in the dictionary, and search for a specific string then return the value.
The end goal is to create a dictionary with the 'Cost', 'Availability' and 'time'.
Full code:
import requests
from bs4 import BeautifulSoup
import json
postcode = "L4 0TH"
ASDA_url = "https://groceries.asda.com/api/user/checkpostcode?postcode="+ postcode + "&requestorigin=gi"
ASDA_url2 = "https://groceries.asda.com/api/slot/view?startdate=12%2F06%2F2019&deliveryoption=homedelivery&requestorigin=gi&_="
client = requests.Session()
r = client.get(ASDA_url)
r2 = client.get(ASDA_url2)
soup = BeautifulSoup(r2.text, 'html.parser')
soup_string = str(soup)
soup_dicts = json.loads('[' + soup_string + ']')
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:soup_dict[k] for k in keep_keys} for soup_dict in soup_dicts]```

Given that you have multiple dictionaries, I'm not exactly sure what you're trying to obtain, but from my understanding this should help:
import json
soup_string = ''' ... ''' # As it is in the question
soup_dicts = json.loads('[' + soup_string + ']')
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:soup_dict[k] for k in keep_keys} for soup_dict in soup_dicts]
It treats your string of dictionaries as a list of JSON dictionaries, and uses the json module to parse it. Then it filters out everything except the key/value pairs you need. The result is a list of the filtered dictionaries.
Output (i.e. value of filtered):
[
{'cost': '£2.00', 'availability': 'full', 'time': '12:00pm - 1:00pm'},
{'cost': '£2.00', 'availability': 'available', 'time': '12:30pm - 1:30pm'}
]
EDIT:
In response to you providing your code, I can see that you're calling str on the results from BeautifulSoup. Rather than doing that, you can just process the client.get() results directly:
import json
import requests
postcode = "L4 0TH"
ASDA_url = "https://groceries.asda.com/api/user/checkpostcode?postcode="+ postcode + "&requestorigin=gi"
ASDA_url2 = "https://groceries.asda.com/api/slot/view?startdate=12%2F06%2F2019&deliveryoption=homedelivery&requestorigin=gi&_="
client = requests.Session()
r = client.get(ASDA_url)
r2 = client.get(ASDA_url2)
dicts = r2.json()['slotHeader'][0]['slots']
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:d[k] for k in keep_keys} for d in dicts]

First you need to put your data into a list and create a dictionary with key: data. (see my example below). Then use json to convert it as a dictionary of dictionaries. Then extract cost, availability and time per dictionary on a loop.
import json
soup_string = """{"data": [{"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:00+01:00/13:00+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"full","slotDiscountedByDP":"false","slotId":"1hr-12-13-20190613","time":"12:00pm - 1:00pm","rawSlotPrice":"","slotDiscounted":"false"}, {"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:30+01:00/13:30+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"available","slotDiscountedByDP":"false","slotId":"1hr-12:30-13:30-20190613","time":"12:30pm - 1:30pm","rawSlotPrice":"","slotDiscounted":"false"}]}"""
d = json.loads(soup_string)
result = []
cost, avail, time = [], [], []
for data in d['data']:
tmp = {}
tmp['Cost'] = data['cost']
tmp['Availability'] = data['availability']
tmp['Time'] = data['time']
result.append(tmp)
result
Output:
[{'Cost': '£2.00', 'Availability': 'full', 'Time': '12:00pm - 1:00pm'},
{'Cost': '£2.00', 'Availability': 'available', 'Time': '12:30pm - 1:30pm'}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract a couple of fields nested in response using python - python

Related

Python Reddit API JSON issues (no PRAW)

Iterate through a nested dict inside of a list, with some missing keys

huobi working python 3.6 example (including signature)

Looping through a function

Using regex to search for text that follows a specific word

Categories

Resources