json change dictionary item to a list with one dictionary - python

I'm working with a Rest Api for finding address details. I pass it an address and it passes back details for that address: lat/long, suburb etc. I'm using the requests library with the json() method on the response and adding the json response to a list to analyse later.
What I'm finding is that when there is a single match for an address the 'FoundAddress' key in the json response contains a dictionary but when more than one match is found the 'FoundAddress' key contains a list of dictionaries.
The returned json looks something like:
For a single match:
{
'FoundAddress': {AddressDetails...}
}
For multiple matches:
{
'FoundAddress': [{Address1Details...}, {Address2Details...}]
}
I don't want to write code to handle a single match and then multiple matches.
How can I modify the 'FoundAddress' so that when there is a single match it changes it to a list with a single dictionary entry? Such that I get something like this:
{
'FoundAddress': [{AddressDetails...}]
}

If it's the external API sending responses in that format then you can't really change FoundAddress itself, since it will always arrive in that format.
You can change the response if you want to, since you have full control over what you've received:
r = json.parse(response)
fixed = r['FoundAddress'] if (type(r['FoundAddress']) is list) else [r['FoundAddress']]
r['FoundAddress'] = fixed
Alternatively you can do the distinction at address usage time:
def func(foundAddress):
# work with a single dictionary instance here
then:
result = map(func, r['FoundAddress']) if (type(r['FoundAddress']) is list) else [func(r['FoundAddress'])]
But honestly I'd take a clear:
if type(r['FoundAddress']) is list:
result = map(func, r['FoundAddress'])
else:
result = func(r['FoundAddress'])
or the response fix-up over the a if b else c one-liner any day.

If you can, I would just change the API. If you can't there's nothing magical you can do. You just have to handle the special case. You could probably do this in one place in your code with a function like:
def handle_found_addresses(found_addresses):
if not isinstance(found_addresses, list):
found_addresses = [found_addreses]
...
and then proceed from there to do whatever you do with found addresses as if the value is always a list with one or more items.

Related

python variable body API

Good Morning,
I need to understand how to insert a variable into this variable (CHANGEME).
payload = "{\n\t"client": {\n\t\t"clientId": "name"\n\t},\n\t"contentFieldOption": {\n\t\t"returnLinkedContents": false,\n\t\t"returnLinkedCategories": false,\n\t\t"returnEmbedCodes": false,\n\t\t"returnThumbnailUrl": false,\n\t\t"returnItags": false,\n\t\t"returnAclInfo": false,\n\t\t"returnImetadata": false,\n\t\t"ignoreITagCombining": false,\n\t\t"returnTotalResults": true\n\t},\n\t"criteria": {\n\t\t"linkedCategoryOp": {\n\t\t\t"linkedCategoryIds": [\n\t\t\t\t" CHANGEME ",\n\t\t\t\t"!_TRASH"\n\t\t\t],\n\t\t\t"cascade": true\n\t\t}\n\t},\n\t"numberOfresults": 50,\n\t"offset": 0,\n\t"orderBy": "creationDate_A"\n}"
It is part of the body to be inserted inside API POST request.
I have tried various alternatives, but to no avail it led me to solve my problem
Don't try to hack this string with regexes; you'll end up with invalid data in no time. Use json.loads() to convert it into a dictionary, find the key CHANGEME, and do whatever you need to do (which you do not really explain).
>>> paydict = json.loads(payload)
>>> print(json.dumps(paydict, indent=4)
{
"criteria": {
"linkedCategoryOp": {
"linkedCategoryIds": [
" CHANGEME ",
"!_TRASH"
...
API objects usually have a consistent structure, so your variable is probably always in the list paydict["criteria"]["linkedCategoryOp"]["linkedCategoryIds"]. Find the index of " CHANGEME " in this list, and take it from there.
You can use re - Python's regular expressions module :
import re
payload = '{\n\t"client": {\n\t\t"clientId": "name"\n\t},\n\t"contentFieldOption": {\n\t\t"returnLinkedContents": false,\n\t\t"returnLinkedCategories": false,\n\t\t"returnEmbedCodes": false,\n\t\t"returnThumbnailUrl": false,\n\t\t"returnItags": false,\n\t\t"returnAclInfo": false,\n\t\t"returnImetadata": false,\n\t\t"ignoreITagCombining": false,\n\t\t"returnTotalResults": true\n\t},\n\t"criteria": {\n\t\t"linkedCategoryOp": {\n\t\t\t"linkedCategoryIds": [\n\t\t\t\t" CHANGEME ",\n\t\t\t\t"!_TRASH"\n\t\t\t],\n\t\t\t"cascade": true\n\t\t}\n\t},\n\t"numberOfresults": 50,\n\t"offset": 0,\n\t"orderBy": "creationDate_A"\n}'
payload = re.sub("\n|\t","",payload).strip() # do some cleanup
payload = re.sub("\s+CHANGEME\s+","NEW VALUE",payload) # Replace the value
print(payload) # CHANGEME is replaced with NEW VALUE
You could use a simple string replace to swap "CHANGEME" with something else.
new_str = 'IMCHANGED'
payload.replace('CHANGEME', new_str)
This solves your stated problem, unless there are extra constraints about what the payload looks like (right now you're assuming it's a string, or how many times the word CHANGEME occurs). Please clarify if that is the case.

python requests get request debugging

/api/stats
?fields=["clkCnt","impCnt"]
&ids=nkw0001,nkw0002,nkw0003,nkw0004
&timeRange={"since":"2019-05-25","until":"2019-06-17"}
I'm currently working on a API called naver_searchad_api
link to github of the api If you want to check it out. but i don't think you need to
the final url should be a baseurl + /api/stats
and on fields and ids and timeRange, the url should be like that
the requests I wrote is like below
r = requests.get(BASE_URL + uri, params={'ids': ['nkw0001','nkw0002','nkw0003','nkw0004'], 'timeRange': {"since": "2019-05-25", "until": "2019-06-17"}}, headers=get_header(method,uri,API_KEY,SECRET_KEY,CUSTOMER_ID))
final_result = r.json()
print(final_result)
as I did below instead
print(r.url)
it returns as below
https://api.naver.com/stats?ids=nkw0001&ids=nkw0002&ids=nkw0002&ids=nkw0002&fields=clkCnt&fields=impCnt&timeRange=since&timeRange=until
the 'ids' is repeated and doesn't have dates that I put.
how would I make my code to fit with the right url?
Query strings are key-value pairs. All keys and all values are strings. Anything that is not trivially convertible to string depends on convention. In other words, there is no standard for these things, so it depends on the expectations of the API.
For example, the API could define that lists of values are to be given as comma-separated strings, or it could say that anything complex should be JSON-encoded.
In fact, that's exactly what the API documentation says:
fields string
Fields to be retrieved (JSON format string).
For example, ["impCnt","clkCnt","salesAmt","crto"]
The same goes for timeRange. The other values can be left alone. Therefore we JSON-encode those two values only.
We can do that inline with a dict comprehension.
import json
import requests
params = {
'fields': ["clkCnt", "impCnt"],
'ids': 'nkw0001,nkw0002,nkw0003,nkw0004',
'timeRange': {"since":"2019-05-25","until":"2019-06-17"},
}
resp = requests.get('https://api.naver.com/api/stats', {
key: json.dumps(value) if key in ['fields', 'timeRange'] else value for key, value in params.items()
})
On top of complying with the API's expectations, all keys and values that go into the query string need to be URL-encoded. Luckily the requests module takes care of that part, so all we need to do is pass a dict to requests.get.

How do I properly format this API call?

I am making a telegram chatbot and can't figure out how to take out the [{' from the output.
def tether(bot, update):
tetherCall = "https://api.omniexplorer.info/v1/property/31"
tetherCallJson = requests.get(tetherCall).json()
tetherOut = tetherCallJson ['issuances'][:1]
update.message.reply_text("Last printed tether: " + str (tetherOut)+" Please take TXID and past it in this block explorer to see more info: https://www.omniexplorer.info/search")
My user will see this as a response: [{'grant': '25000000.00000000', 'txid': 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'}]
This looks like a list with a single dict in it:
[{'grant': '25000000.00000000',
'txid': 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'}]
You should be able to access the dict by indexing the list with [0]…
tetherOut[0]
# {'grant': '25000000.00000000',
# 'txid': 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'}
…and if you want to get a particular value from the dict you can index by its name, e.g.
tetherOut[0]['txid']
# 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'
Be careful chaining these things, though. If tetherOut is an empty list, tetherOut[0] will generate an IndexError. You'll probably want to catch that (and the KeyError that an invalid dict key will generate).

How Do I Start Pulling Apart This Block of JSON Data?

I'd like to make a program that makes offline copies of math questions from Khan Academy. I have a huge 21.6MB text file that contains data on all of their exercises, but I have no idea how to start analyzing it, much less start pulling the questions from it.
Here is a pastebin containing a sample of the JSON data. If you want to see all of it, you can find it here. Warning for long load time.
I've never used JSON before, but I wrote up a quick Python script to try to load up individual "sub-blocks" (or equivalent, correct term) of data.
import sys
import json
exercises = open("exercises.txt", "r+b")
byte = 0
frontbracket = 0
backbracket = 0
while byte < 1000: #while byte < character we want to read up to
#keep at 1000 for testing purposes
char = exercises.read(1)
sys.stdout.write(char)
#Here we decide what to do based on what char we have
if str(char) == "{":
frontbracket = byte
while True:
char = exercises.read(1)
if str(char)=="}":
backbracket=byte
break
exercises.seek(frontbracket)
block = exercises.read(backbracket-frontbracket)
print "Block is " + str(backbracket-frontbracket) + " bytes long"
jsonblock = json.loads(block)
sys.stdout.write(block)
print jsonblock["translated_display_name"]
print "\nENDBLOCK\n"
byte = byte + 1
Ok, the repeated pattern appears to be this: http://pastebin.com/4nSnLEFZ
To get an idea of the structure of the response, you can use JSONlint to copy/paste portions of your string and 'validate'. Even if the portion you copied is not valid, it will still format it into something you can actually read.
First I have used requests library to pull the JSON for you. It's a super-simple library when you're dealing with things like this. The API is slow to respond because it seems you're pulling everything, but it should work fine.
Once you get a response from the API, you can convert that directly to python objects using .json(). What you have is essentially a mixture of nested lists and dictionaries that you can iterate through and pull specific details. In my example below, my_list2 has to use a try/except structure because it would seem that some of the entries do not have two items in the list under translated_problem_types. In that case, it will just put 'None' instead. You might have to use trial and error for such things.
Finally, since you haven't used JSON before, it's also worth noting that it can behave like a dictionary itself; you are not guaranteed the order in which you receive details. However, in this case, it seems the outermost structure is a list, so in theory it's possible that there is a consistent order but don't rely on it - we don't know how the list is constructed.
import requests
api_call = requests.get('https://www.khanacademy.org/api/v1/exercises')
json_response = api_call.json()
# Assume we first want to list "author name" with "author key"
# This should loop through the repeated pattern in the pastebin
# access items as a dictionary
my_list1 = []
for item in json_response:
my_list1.append([item['author_name'], item['author_key']])
print my_list1[0:5]
# Now let's assume we want the 'sha' of the SECOND entry in translated_problem_types
# to also be listed with author name
my_list2 = []
for item in json_response:
try:
the_second_entry = item['translated_problem_types'][0]['items'][1]['sha']
except IndexError:
the_second_entry = 'None'
my_list2.append([item['author_name'], item['author_key'], the_second_entry])
print my_list2[0:5]

Clean API results to get the headlines of news articles?

I have been having trouble finding a way to pull out specific text info from the Guardian API for my dissertation. I have managed to get all my text onto Python but how do you then clean it to get say, just the headlines of the news articles?
This is a snippet of the API result that I want to pull out info from:
{
"response": {
"status":"ok",
"userTier":"developer",
"total":1869990,
"startIndex":1,
"pageSize":10,
"currentPage":1,
"pages":186999,
"orderBy":"newest",
"results":[
{
"id":"sport/live/2016/jul/09/tour-de-france-2016-stage-eight-live",
"type":"liveblog",
"sectionId":"sport",
"sectionName":"Sport",
"webPublicationDate":"2016-07-09T13:21:36Z",
"webTitle":"Tour de France 2016: stage eight – live!",
"webUrl":"https://www.theguardian.com/sport/live/2016/jul/09/tour-de-france-2016-stage-eight-live",
"apiUrl":"https://content.guardianapis.com/sport/live/2016/jul/09/tour-de-france-2016-stage-eight-live",
"isHosted":false
},
{
"id":"sport/live/2016/jul/09/serena-williams-v-angelique-kerber-wimbledon-womens-final-live",
"type":"liveblog",
"sectionId":"sport",
"sectionName":"Sport",
"webPublicationDate":"2016-07-09T13:21:02Z",
"webTitle":"Serena Williams v Angelique Kerber: Wimbledon women's final –
...
Hoping the OP adds the used code to the question.
One solution in python is, that whatever you get (from the methods offered by the requests module?) will be either already deeply nested structures you can well index into or you can easily map it to these structures (via json.loads(the_string_you_displayed).
Sample:
d = json.loads(the_string_you_displayed)
head_line = d['response']['results'][0]['webTitle']
Would give the value into headline that is stored in the first dict found in the results "array" (index 0) of the response entries value. (The question was updated so now, the full path is visible)
in case I read the sample snippet given correctly, and it has been cut during copy and paste here, as the given sample is (as is) invalid JSON.
If the text does not represent a valid JSON text, it will depend on sifting through text via substring or pattern matching and may well be very brittle ...
Update: So assuming the full response structure is stored inside a variable named data:
result_seq = data['response']['results'] # yields a list here
headlines = [result['webTitle'] for result in result_seq]
The last line works like so: This is a list comprehension compactly creating a list from all entries result in the result_seq by picking the value of the key webTitle in each dict.
An explicit for loop like solution picking them all would be:
result_seq = data['response']['results']
headlines = []
for result in result_seq:
headlines.append(result['webTitle'])
This does not check for errors like result dicts without a key webTitle etc. but Python will raise a matching exception, and one can decide, if one likes to wrap the processing inside a try: except block or hope for the best ...

Categories

Resources