parse JSON values by multilevel keys - python

Yesterday, I have started with learning python. I want to parse some JSON values now. I have read many of tutorials and spent a lot of time on getting values by multilevel key (if I can call it like that) in my script but nothing works to me. Can you help me please?
This is my JSON output:
{
"future.arte.tv": [
{
"mediaUrl": "http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR",
"micropost": {
"html": "Berlin ",
"plainText": "Berlin"
},
"micropostUrl": "http://future.arte.tv/de/der-erste-weltkrieg-die-rolle-von-wissenschaft-und-technik",
"publicationDate": "Tue Jun 17 20:31:33 CEST 2014",
"relevance": 5.9615083,
"timestamp": 1403029893606,
"type": "image"
}
],
"www.zdf.de": [
{
"mediaUrl": "http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025",
"micropost": {
"plainText": "Berlin direkt"
},
"micropostUrl": "http://www.zdf.de/ZDFmediathek/hauptnavigation/sendung-a-bis-z",
"publicationDate": "Tue Jun 10 16:25:42 CEST 2014",
"relevance": 3.7259426,
"timestamp": 1402410342400,
"type": "image"
}
]
}
I need to get values stored in "mediaUrl" key so I tried to do
j = json.loads(jsonOutput)
keys = j.keys();
for key in keys:
print key # keys are future.arte.tv and www.zdf.de
print j[key]["mediaUrl"]
but print j[key]["mediaUrl"] causes this error:
TypeError: list indices must be integers, not str
so I tried to do print j[key][0] but the result is not as I wanted to have (I want to have just mediaUrl value... btw j[key][1] causes list index out of range error):
{u'micropostUrl': u'http://www.berlin.de/special/gesundheit-und-beauty/ernaehrung/1692726-215-spargelhoefe-in-brandenburg.html', u'mediaUrl': u'http://berlin.de/binaries/asset/image_assets/42859/ratio_4_3/1371638570/170x130/', u'timestamp': 1403862143675, u'micropost': {u'plainText': u'Spargel', u'html': u'Spargel '}, u'publicationDate': u'Fri Jun 27 11:42:23 CEST 2014', u'relevance': 1.6377668, u'type': u'image'}
Can you give me some advice please?

Here is a list comprehension that should do it
>>> [d[i][0].get('mediaUrl') for i in d.keys()]
['http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025',
'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR']
How it works
First you can get a list of the top-level keys
>>> d.keys()
['www.zdf.de', 'future.arte.tv']
Get the corresponding values
>>> [d[i] for i in d.keys()]
[[{'micropostUrl': 'http://www.zdf.de/ZDFmediathek/hauptnavigation/sendung-a-bis-z', 'mediaUrl': 'http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025', 'timestamp': 1402410342400L, 'micropost': {'plainText': 'Berlin direkt'}, 'publicationDate': 'Tue Jun 10 16:25:42 CEST 2014', 'relevance': 3.7259426, 'type': 'image'}], [{'micropostUrl': 'http://future.arte.tv/de/der-erste-weltkrieg-die-rolle-von-wissenschaft-und-technik', 'mediaUrl': 'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR', 'timestamp': 1403029893606L, 'micropost': {'plainText': 'Berlin', 'html': 'Berlin '}, 'publicationDate': 'Tue Jun 17 20:31:33 CEST 2014', 'relevance': 5.9615083, 'type': 'image'}]]
For each dictionary, grab the value for the 'mediaUrl' key
>>> [d[i][0].get('mediaUrl') for i in d.keys()]
['http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025',
'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR']

Related

python function to transform data to JSON

Can I check how do we convert the below to a dictionary?
code.py
message = event['Records'][0]['Sns']['Message']
print(message)
# this gives the below and the type is <class 'str'>
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
}
}
I would need to add in additional field called "status" : 1 such that it looks like this:
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Wanted to know what is the best way of doing this?
Update: I managed to do it for some reason.
I used ast.literal_eval(data) like below.
D2= ast.literal_eval(message)
D2["status"] =1
print(D2)
#This gives the below
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Is there any better way to do this? Im not sure so wanted to check...
Can I check how do we convert the below to a dictionary?
As far as I can tell, the data = { } asigns a dictionary with content to the variable data.
I would need to add an additional field called "status" : 1 such that it looks like this
A simple update should do the trick.
data.update({"status": 1})
I found two issues when trying to deserialise the string as JSON
invalid escape I\\'m
unescaped newlines
These can worked around with
data = data.replace("\\'", "'")
data = re.sub('\n\n"', '\\\\n\\\\n"', data, re.MULTILINE)
d = json.loads(data)
There are also surrogate pairs in the data which may cause problems down the line. These can be fixed by doing
data = data.encode('utf-16', 'surrogatepass').decode('utf-16')
before calling json.loads.
Once the data has been deserialised to a dict you can insert the new key/value pair.
d['status'] = 1

Understanding what happens when you use references of arrays and append them to dictionaries

If someone has a better question title, i'm all ears eyes...
I just spent a while on this problem and found the issue was my understanding. I had the following code below (note the comment marked by "## <<--").
It basically takes a dictionary (summaryTotal, e.g. data below) which contains a summary of alarms: both total counts and a list of the summarised alarms and some info about them. The actual alarms are within summaryTotal['Alarms'] which is an array of dictionaries. My function filters this full list of alarms by the alarm source and produces the same format as summaryTotal, but filtered.
The line of code I was having issue with was this one:
summaryFiltered['Alarms'].append(alarm)
In this line, alarm is really a reference to an element from the alarmsTotal list. The alarmsTotal list itself is a reference to summaryTotal['Alarms'].
When I used this line of code and appended alarm, the function at the bottom of the code system.util.jsonEncode (external function to change a python object into a json encoded python object - not too sure on the details), was always coming up with a 'too many recursive calls' error. When I changed what I was appending by creating essentially a new alarm dictionary within a new object and set it to the values within the actual alarm object, then the jsonEncode function started working and not raising recursion exceptions.
I'd like to be able to explain why that is?
When I'm appending alarm to my new array, I think I'm appending the reference of the summaryTotal['Alarms'] object to it, for example summaryTotal['Alarms'][20]... ?
summaryTotal = ['ActiveUnacked':0, 'ActiveAcked': 2, 'ClearUnacked': 23,
'Alarms':
[
{
"name": "Comms Fault",
"eventTime": "Fri Mar 05 12:25:27 ACDT 2021",
"label": "Comms Fault",
"displayPath": "Refrigeration MSB4 MCC PWM Comms Fault",
"source": "FolderA/FolderB/FolderC/MSB4 MCC PWM Comms OK",
"state": "Cleared, Unacknowledged"
},
{
"name": "Comms Fault",
"eventTime": "Fri Mar 05 12:28:46 ACDT 2021",
"label": "Comms Fault",
"displayPath": "Refrigeration MSB4 MCC PWM Comms Fault",
"source": "Folder1/Folder2/Folder3/MSB4 MCC PWM Comms OK",
"state": "Cleared, Unacknowledged"
}
]
]
alarmsTotal = summaryTotal['Alarms']
summaryFiltered = {'ActiveAcked':0, 'ActiveUnacked':0, 'ClearUnacked':0, 'Alarms':[]}
for alarm in alarmsTotal:
if pathFilter in alarm['source']:
alarmInfo = {'name':alarm['name'],
'label':alarm['label'],
'displayPath':alarm['displayPath'],
'source': alarm['source'],
'state': alarm['state'],
'eventTime': alarm['eventTime']
}
summaryFiltered['Alarms'].append(alarm) ## <<-- to fix the code, I added `alarmInfo` above and appended `alarmInfo` instead of `alarm`
if alarm['state'] == 'Cleared, Unacknowledged':
summaryFiltered['ClearUnacked'] += 1
if alarm['state'] == 'Active, Unacknowledged':
summaryFiltered['ActiveUnacked'] += 1
if alarm['state'] == 'Active, Acknowledged':
summaryFiltered['ActiveAcked'] += 1
ret = system.util.jsonEncode(summaryFiltered)

aws python lambda: reading csv file (iterator should return strings)

I'm getting this message when I'm trying to test my python 3.8 lambda function:
Logs are:
soc-connect
contacts.csv
{'ResponseMetadata': {'RequestId': '9D7D7F0C5CB79984', 'HostId': 'wOd6HvIm+BpLOMKF2beRvqLiW0NQt5mK/kzjCjYxQ2kHQZY0MRCtGs3l/rqo4o0r4xAPuV1QpGM=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'wOd6HvIm+BpLOMKF2beRvqLiW0NQt5mK/kzjCjYxQ2kHQZY0MRCtGs3l/rqo4o0r4xAPuV1QpGM=', 'x-amz-request-id': '9D7D7F0C5CB79984', 'date': 'Thu, 26 Mar 2020 11:21:35 GMT', 'last-modified': 'Tue, 24 Mar 2020 16:07:30 GMT', 'etag': '"8a3785e750475af3ca25fa7eab159dab"', 'accept-ranges': 'bytes', 'content-type': 'text/csv', 'content-length': '52522', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2020, 3, 24, 16, 7, 30, tzinfo=tzutc()), 'ContentLength': 52522, 'ETag': '"8a3785e750475af3ca25fa7eab159dab"', 'ContentType': 'text/csv', 'Metadata': {}, 'Body': <botocore.response.StreamingBody object at 0x7f858dc1e6d0>}
1153
<_csv.reader object at 0x7f858ea76970>
[ERROR] Error: iterator should return strings, not bytes (did you open the file in text mode?)
The code snippet is:
import boto3
import csv
def digest_csv(bucket_name, key_name):
# Let's use Amazon S3
s3 = boto3.client('s3');
print(bucket_name)
print(key_name)
s3_object = s3.get_object(Bucket=bucket_name, Key=key_name)
print(s3_object)
# read the contents of the file and split it into a list of lines
lines = s3_object['Body'].read().splitlines(True)
print(len(lines))
contacts = csv.reader(lines, delimiter=';')
print(contacts)
# now iterate over those contacts
for contact in contacts:
# here you get a sequence of dicts
# do whatever you want with each line here
print('-*-'.join(contact))
I think the problem is on csv.reader.
I'm setting first parameter an array of lines... Should it be modified?
Any ideas?
Instead of using the csv.reader the following worked for me (adjusted for your delimiter and variables):
for line in lines:
contact = ''.join(line.decode().split(';'))
print(contact)

How to extract first item in this json list

I have this json that im trying to extract the first element/list in this list of lists in python... How do you extract the first item in ads_list?
Are there any built in functions i can use to extract first item in a json file? I need something other than simply iterating through this like an array....
P.S. i shortened the json data
Here is the list.
{
u'data':{
u'ad_list':[
{
u'data':{
u'require_feedback_score':0,
u'hidden_by_opening_hours':False,
u'trusted_required':False,
u'currency':u'EGP',
u'require_identification':False,
u'is_local_office':False,
u'first_time_limit_btc':None,
u'city':u'',
u'location_string':u'Egypt',
u'countrycode':u'EG',
u'max_amount':u'20000',
u'lon':0.0,
u'sms_verification_required':False,
u'require_trade_volume':0.0,
u'online_provider':u'SPECIFIC_BANK',
u'max_amount_available':u'20000',
u'msg': u" \u2605\u2605\u2605\u2605\u2605 \u0645\u0631\u062d\u0628\u0627 \u2605\u2605\u2605\u2605\u2605\r\n\r\n\u0625\u0630\u0627 \u0643\u0646\u062a \u062a\u0631\u063a\u0628 \u0641\u064a \u0628\u064a\u0639 \u0627\u0648 \u0634\u0631\u0627\u0621 \u0627\u0644\u0628\u062a\u0643\u0648\u064a\u0646 \u062a\u0648\u0627\u0635\u0644 \u0645\u0639\u064a \u0648\u0633\u0623\u0642\u0648\u0645 \u0628\u062e\u062f\u0645\u062a\u0643\r\n\u0644\u0644\u062a\u0648\u0627\u0635\u0644: https: //tawk.to/hanyibrahim\r\n \u0627\u0644\u062e\u064a\u0627\u0631 \u0644\u0644\u062a\u062d\u0648\u064a\u0644: \u0627\u0644\u0628\u0646\u0643 \u0627\u0644\u062a\u062c\u0627\u0631\u064a \u0627\u0644\u062f\u0648\u0644\u064a \u0627\u0648\u0641\u0648\u062f\u0627\u0641\u0648\u0646 \u0643\u0627\u0634 \u0627\u0648 \u0627\u062a\u0635\u0627\u0644\u0627\u062a \u0641\u0644\u0648\u0633 \u0627\u0648 \u0627\u0648\u0631\u0627\u0646\u062c \u0645\u0648\u0646\u064a\r\n\r\n'' \u0634\u0643\u0631\u0627 ''\r\n\r\n\r\n \u2605\u2605\u2605\u2605\u2605 Hello \u2605\u2605\u2605\u2605\u2605\r\n\r\nIf you would like to trade Bitcoins please let me know and I will help you\r\nconnect: https: //tawk.to/hanyibrahim\r\nOption transfer:Bank CIB Or Vodafone Cash Or Etisalat Flous Or Orange Money\r\n'' Thank ''",
u'volume_coefficient_btc':u'1.50',
u'profile':{
u'username':u'hanyibrahim11',
u'feedback_score':100,
u'trade_count':u'3000+',
u'name':u'hanyibrahim11 (3000+; 100%)',
u'last_online': u'2019-01-14T17:54:52+00:00 '}, u' bank_name':u'CIB_Vodafone Cash_Etisalat Flous_Orange Money',
u'trade_type':u'ONLINE_BUY',
u'ad_id':803036,
u'temp_price':u'67079.44',
u'payment_window_minutes':90,
u'min_amount':u'50',
u'limit_to_fiat_amounts':u'',
u'require_trusted_by_advertiser':False,
u'temp_price_usd':u'3738.54',
u'lat':0.0,
u'visible':True,
u'created_at': u'2018-07-25T08:12:21+00:00 ', u' atm_model':None,
u'is_low_risk':True
},
u'actions':{
u'public_view': u'https://localbitcoins.com/ad/803036'
}
},
{
u'data':{
u'require_feedback_score':0,
u'hidden_by_opening_hours':False,
u'trusted_required':False,
u'currency':u'EGP',
u'require_identification':False,
u'is_local_office':False,
u'first_time_limit_btc':None,
u'city':u'',
u'location_string':u'Egypt',
u'countrycode':u'EG',
u'max_amount':u'20000',
u'lon':0.0,
u'sms_verification_required':False,
u'require_trade_volume':0.0,
u'online_provider':u'CASH_DEPOSIT',
u'max_amount_available':u'20000',
u'msg':u'QNB,
CIB deposite- Vodafone Cash - Etisalat Felous - Orange Money - Western Union - Money Gram \r\n- Please do not entiate a new trade request if you are not serious to finalize it.',
u'volume_coefficient_btc':u'1.50',
u'profile':{
u'username':u'Haboush',
u'feedback_score':99,
u'trade_count':u'500+',
u'name':u'Haboush (500+; 99%)',
u'last_online': u'2019-01-14T16:48:52+00:00 '}, u' bank_name':u'QNB\u2714CIB\u2714Vodafone\u2714Orange\u2714Etisalat\u2714WU',
u'trade_type':u'ONLINE_BUY',
u'ad_id':719807,
u'temp_price':u'66860.18',
u'payment_window_minutes':270,
u'min_amount':u'100',
u'limit_to_fiat_amounts':u'',
u'require_trusted_by_advertiser':False,
u'temp_price_usd':u'3726.32',
u'lat':0.0,
u'visible':True,
u'created_at': u'2018-03-24T19:29:08+00:00 ', u' atm_model':None,
u'is_low_risk':True
},
u'actions':{
u'public_view': u'https://localbitcoins.com/ad/719807'
}
},
}
],
u'ad_count':17
}
}
Assuming your data structure is stored in the variable j, you can use j['data']['ad_list'][0] to extract the first item from the ad_list key. Use a try-except block to catch a possible IndexError exception if ad_list can ever be empty.

Return particular string from response

I am trying to return a particular string value after getting response from request URL.
Ex.
response =
{
'assets': [
{
'VEG': True,
'CONTACT': '12345',
'CLASS': 'SIX',
'ROLLNO': 'A101',
'CITY': 'CHANDI',
}
],
"body": "**Trip**: 2017\r\n** Date**: 15th Jan 2015\r\n**Count**: 501\r\n\r\n"
}
This is the response which i am getting, from this I need only Date: 15th Jan 2015. I am not sure how to do it.
Any help would be appreciated.
assuming it is a dictionary
a={'body': '**Trip**: 2017\r\n** Date**: 15th Jan 2015\r\n**Count**: 501\r\n\r\n', 'assets': [{'VEG': True, 'CONTACT': '12345', 'CLASS': 'SIX', 'ROLLNO': 'A101', 'CITY': 'CHANDI'}]}
then
required=a['body'].split('\r\n')[1].replace('**','')
print required
result:
' Date: 15th Jan 2015'
Access the key body of the dictionary a
split through \r\n to get a list ['**Trip**: 2017', '** Date**:
15th Jan 2015', '**Count**: 501', '', '']
access it's first index and replace ** with empty('')
a = str(yourdictionary)
print([e for e in a.split("\r\n") if "Date" in e][0].remove("**").strip())
Try this
>>> body = response['body']
>>> bodylist = body.split('\r\n')
>>> for value in bodylist:
... value = value.split(':')
...
>>> for i,value in enumerate(bodylist):
... bodylist[i] = value.split(':')
...
>>> for i, value in enumerate(bodylist):
... if bodylist[i][0] == '** Date**':
... print(bodylist[i][1])
...
15th Jan 2015
I have trown it in the interpreter and this works. I don't know if it is the best code around, but it works ;-)

Categories

Resources