Use python to create a nested json - python

{"0":{"posted_date":"25 Jun 2015"},"1":{"posted_date":"26 Jun 2015"}}
Note:
that '0' and '1' are variable - 'count', the variable is generate through repeat/loop
"posted_date" is a string
"25 jun 2015" and "26 jun 2015" are also variable - 'date'
How to create a JSON output like above with python?
[edit-not working code]
import json
final = []
count = 0
postID = 224
while postID < 1200:
final.append({count: {"posted_ID":postID}})
count = count + 1
postID = postID * 2
print str(json.dumps(final))

import json
dates = ["25 Jun 2015", "26 Jun 2015", "27 Jun 2015"]
result = {}
for each, date in enumerate(dates):
result.update({each: {"posted_data": date}})
jsoned = json.dumps(result)
You don't need to use the "count" variable

First create the map the way you want it:
outMap = {}
outMap["0"]={}
outMap["0"]["posted_date"]="25 Jun 2015"
outMap["1"]={}
outMap["1"]["posted_date"]="26 Jun 2015"
Then use json.dumps() to get the json
import json
outjson = json.dumps(outMap)
print(outjson)

Related

python function to transform data to JSON

Can I check how do we convert the below to a dictionary?
code.py
message = event['Records'][0]['Sns']['Message']
print(message)
# this gives the below and the type is <class 'str'>
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
}
}
I would need to add in additional field called "status" : 1 such that it looks like this:
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Wanted to know what is the best way of doing this?
Update: I managed to do it for some reason.
I used ast.literal_eval(data) like below.
D2= ast.literal_eval(message)
D2["status"] =1
print(D2)
#This gives the below
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Is there any better way to do this? Im not sure so wanted to check...
Can I check how do we convert the below to a dictionary?
As far as I can tell, the data = { } asigns a dictionary with content to the variable data.
I would need to add an additional field called "status" : 1 such that it looks like this
A simple update should do the trick.
data.update({"status": 1})
I found two issues when trying to deserialise the string as JSON
invalid escape I\\'m
unescaped newlines
These can worked around with
data = data.replace("\\'", "'")
data = re.sub('\n\n"', '\\\\n\\\\n"', data, re.MULTILINE)
d = json.loads(data)
There are also surrogate pairs in the data which may cause problems down the line. These can be fixed by doing
data = data.encode('utf-16', 'surrogatepass').decode('utf-16')
before calling json.loads.
Once the data has been deserialised to a dict you can insert the new key/value pair.
d['status'] = 1

Nested Counter for json data

I have a JSON data as:
{
"persons": [
{
"city": "Seattle",
"name": "Brian"
"dob" : "19-03-1980"
},
{
"city": "Amsterdam",
"name": "David"
"dob" : "19-09-1979"
}
{
"city": "London",
"name": "Joe"
"dob" : "19-01-1980"
}
{
"city": "Kathmandu",
"name": "Brian"
"dob" : "19-03-1980"
}
]
}
How can I count the individual elements, like, number of person born in Jan-Dec (0 if none were born) and born in given year using python in one single iteration. Also the number of unique names registered in each month
Like:
1980 :3
--Jan:1
--Mar:2
1979 :1
--Sep:1
Names:
Mar 1980: 1 #Brian is same for both cities
Jan 1980: 1
Sep 1979: 1
counters_mon is the counter that has values for specific months of year
for k_mon,v_mon in counters_mon.items():
print('{}={}'.format(k_mon,v_mon))
But I want details too to be printed. How can I achieve this?
import json
f = open('/path/to/your/json', 'r')
persons = json.load(f)
years_months = {}
years_months_names = {}
for person in persons['persons']:
year = person['dob'][-4:]
month = person['dob'][3:5]
month_year = month + ' ' + year
name = person['name']
if year not in years_months.keys():
years_months[year] = { 'count': 1, 'months' : {} }
if month not in years_months[year]['months'].keys():
years_months[year]['months'][month] = 1
else:
years_months[year]['months'][month] += 1
else:
years_months[year]['count'] += 1
if month not in years_months[year]['months'].keys():
years_months[year]['months'][month] = 1
else:
years_months[year]['months'][month] += 1
if month_year not in years_months_names.keys():
years_months_names[month_year] = set([name])
else:
years_months_names[month_year].add(name)
for k, v in years_months.items():
print(k + ': ' + str(v['count']))
for month, count in v['months'].items():
print("-- " + str(month) + ": " + str(count))
for k, v in years_months_names.items():
print(k + ": " + str(len(v)))
I'm assuming that you have the path to your json. I also tested my answer on the JSON that you've posted, and be careful to make sure that your JSON is structured correctly.
This is a good case for using defaultdicts (https://docs.python.org/3/library/collections.html#collections.defaultdict).
data # assume you have your data in a var called data
from collections import defaultdict
from calendar import month_abbr
# slightly strange construction here but we want a 2 levels of defaultdict followed by lists
aggregate = defaultdict(lambda:defaultdict(list))
# then the population is super simple - you'll end up with something like
# aggregate[year][month] = [name1, name2]
for person in data['persons']:
day, month, year = map(int, person['dob'].split('-'))
aggregate[year][month].append(person['name'])
# I'm sorting in chronological order for printing
for year, months in sorted(aggregate.items()):
print('{}: {}'.format(year, sum(len(names) for names in months.values())))
for month, names in sorted(months.items()):
print('--{}: {}'.format(month_abbr[month], len(names)))
for year, months in sorted(aggregate.items()):
for month, names in sorted(months.items()):
print('{} {}: {}'.format(month_abbr[month], year, len(set(names))))
Depending on how the data was going to be used I'd actually consider not having the complex nesting in the aggregation and instead opt for something like aggregate[(year, month)] = [name1, name2,...]. I find that the more nested my data, the more confusing it is to work with.
EDIT Alternatively you can create several structures on the first pass so the printing step is simplified. Again, I'm using defaultdict to clean up all the provisioning.
agg_years = defaultdict(lambda:defaultdict(int)) # [year][month] = counter
agg_years_total = defaultdict(int) # [year] = counter
agg_months_names = defaultdict(set) # [(year, month)] = set(name1, name2...)
for person in data['persons']:
day, month, year = map(int, person['dob'].split('-'))
agg_years[year][month] += 1
agg_years_total[year] += 1
agg_months_names[(year, month)].add(person['name'])
for year, months in sorted(agg_years.items()):
print('{}: {}'.format(year, agg_years_total[year]))
for month, quant in sorted(months.items()):
print('--{}: {}'.format(month_abbr[month], quant))
for (year, month), names in sorted(agg_months_names.items()):
print('{} {}: {}'.format(month_abbr[month], year, len(names)))

How could i parse the html into dictionary in elegant way

I'm trying to parse the html into dictionary
My current code has lots of logic in it.
It smells bad, I use the lxml to help me to parse it.
Any recommend method to parse the kind of html without too much well-formed DOM ?
Thanks so much
original html
<p><strong>Departs:</strong> 5:15:00AM, Sat, Nov 28, 2015 - Taipei</p>
<p><strong>Arrives:</strong> 8:00:00AM, Sat, Nov 28, 2015 - Bangkok - Don Mueang</p>
<p><strong>Flight duration:</strong> 3h 45m</p>
<p><strong>Operated by:</strong> NokScoot</p>
expected result
{
Departs: "5:15:00AM, Sat, Nov 28, 2015",
Arrives: "5:15:00AM, Sat, Nov 28, 2015",
Flight duration: "3h 45m"
...
}
current code (implementing)
doc_root = html.document_fromstring(resp.text)
for ele in doc_root.xpath('//ul[#class="tb_body"]'):
if has_stops(ele.xpath('.//li[#class="tb_body_flight"]//span[#class="has_cuspopup"]')):
continue
set_trace()
from_city = ele.xpath('.//li[#class="tb_body_city"]')[0]
set_trace()
sub_ele = ele.xpath('.//li[#class="tb_body_flight"]//span[#class="has_cuspopup"]')
set_trace()
I created example for html you provided. It uses popular Beautiful Soup.
from bs4 import BeautifulSoup
data = '<p><strong>Departs:</strong> 5:15:00AM, Sat, Nov 28, 2015 - Taipei</p>\
<p><strong>Arrives:</strong> 8:00:00AM, Sat, Nov 28, 2015 - Bangkok - Don Mueang</p>\
<p><strong>Flight duration:</strong> 3h 45m</p>\
<p><strong>Operated by:</strong> NokScoot</p>'
soup = BeautifulSoup(data, 'html.parser')
res = {p.contents[0].text: p.contents[1].split(' - ')[0].strip() for p in soup.find_all('p')}
print(res)
Output:
{
'Departs:': '5:15:00AM, Sat, Nov 28, 2015',
'Flight duration:': '3h 45m',
'Operated by:': 'NokScoot',
'Arrives:': '8:00:00AM, Sat, Nov 28, 2015'
}
I think you should avoid of using attributes if you want to make your code compact.

JSON Parsing help in Python

I have below data in JSON format, I have started with code below which throws a KEY ERROR.
Not sure how to get all data listed in headers section.
I know I am not doing it right in json_obj['offers'][0]['pkg']['Info']: but not sure how to do it correctly.
how can I get to different nodes like info,PricingInfo,Flt_Info etc?
{
"offerInfo":{
"siteID":"1",
"language":"en_US",
"currency":"USD"
},
"offers":{
"pkg":[
{
"offerDateRange":{
"StartDate":[
2015,
11,
8
],
"EndDate":[
2015,
11,
14
]
},
"Info":{
"Id":"111"
},
"PricingInfo":{
"BaseRate":1932.6
},
"flt_Info":{
"Carrier":"AA"
}
}
]
}
}
import os
import json
import csv
f = open('api.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = ['Id' , 'StartDate', 'EndDate', 'Id', 'BaseRate', 'Carrier']
default = ''
writer.writerow(headers)
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)
It looks like you're accessing the json incorrectly. After you have accessed json_obj['offers'], you accessed [0], but there is no array there. json_obj['offers'] gives you another dictionary.
For example, to get PricingInfo like you asked, access like this:
json_obj['offers']['pkg'][0]['PricingInfo']
or 11 from the StartDate like this:
json_obj['offers']['pkg'][0]['offerDateRange']['StartDate'][1]
And I believe you get the KEY ERROR because you access [0] in the dictionary, which since that isn't a key, you get the error.
try to substitute this piece of code:
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)
With this:
for pkg in json_obj['offers']['pkg']:
row.append(pkg['Info']['Id'])
year = pkg['offerDateRange']['StartDate'][0]
month = pkg['offerDateRange']['StartDate'][1]
day = pkg['offerDateRange']['StartDate'][2]
StartDate = "%d-%d-%d" % (year,month,day)
print StartDate
writer.writerow(row)
Try this
import os
import json
import csv
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["StartDate"][2])
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["EndDate"][2])
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print json_obj["offers"]["pkg"][0]["PricingInfo"]["BaseRate"]
print json_obj["offers"]["pkg"][0]["flt_Info"]["Carrier"]

parse JSON values by multilevel keys

Yesterday, I have started with learning python. I want to parse some JSON values now. I have read many of tutorials and spent a lot of time on getting values by multilevel key (if I can call it like that) in my script but nothing works to me. Can you help me please?
This is my JSON output:
{
"future.arte.tv": [
{
"mediaUrl": "http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR",
"micropost": {
"html": "Berlin ",
"plainText": "Berlin"
},
"micropostUrl": "http://future.arte.tv/de/der-erste-weltkrieg-die-rolle-von-wissenschaft-und-technik",
"publicationDate": "Tue Jun 17 20:31:33 CEST 2014",
"relevance": 5.9615083,
"timestamp": 1403029893606,
"type": "image"
}
],
"www.zdf.de": [
{
"mediaUrl": "http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025",
"micropost": {
"plainText": "Berlin direkt"
},
"micropostUrl": "http://www.zdf.de/ZDFmediathek/hauptnavigation/sendung-a-bis-z",
"publicationDate": "Tue Jun 10 16:25:42 CEST 2014",
"relevance": 3.7259426,
"timestamp": 1402410342400,
"type": "image"
}
]
}
I need to get values stored in "mediaUrl" key so I tried to do
j = json.loads(jsonOutput)
keys = j.keys();
for key in keys:
print key # keys are future.arte.tv and www.zdf.de
print j[key]["mediaUrl"]
but print j[key]["mediaUrl"] causes this error:
TypeError: list indices must be integers, not str
so I tried to do print j[key][0] but the result is not as I wanted to have (I want to have just mediaUrl value... btw j[key][1] causes list index out of range error):
{u'micropostUrl': u'http://www.berlin.de/special/gesundheit-und-beauty/ernaehrung/1692726-215-spargelhoefe-in-brandenburg.html', u'mediaUrl': u'http://berlin.de/binaries/asset/image_assets/42859/ratio_4_3/1371638570/170x130/', u'timestamp': 1403862143675, u'micropost': {u'plainText': u'Spargel', u'html': u'Spargel '}, u'publicationDate': u'Fri Jun 27 11:42:23 CEST 2014', u'relevance': 1.6377668, u'type': u'image'}
Can you give me some advice please?
Here is a list comprehension that should do it
>>> [d[i][0].get('mediaUrl') for i in d.keys()]
['http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025',
'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR']
How it works
First you can get a list of the top-level keys
>>> d.keys()
['www.zdf.de', 'future.arte.tv']
Get the corresponding values
>>> [d[i] for i in d.keys()]
[[{'micropostUrl': 'http://www.zdf.de/ZDFmediathek/hauptnavigation/sendung-a-bis-z', 'mediaUrl': 'http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025', 'timestamp': 1402410342400L, 'micropost': {'plainText': 'Berlin direkt'}, 'publicationDate': 'Tue Jun 10 16:25:42 CEST 2014', 'relevance': 3.7259426, 'type': 'image'}], [{'micropostUrl': 'http://future.arte.tv/de/der-erste-weltkrieg-die-rolle-von-wissenschaft-und-technik', 'mediaUrl': 'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR', 'timestamp': 1403029893606L, 'micropost': {'plainText': 'Berlin', 'html': 'Berlin '}, 'publicationDate': 'Tue Jun 17 20:31:33 CEST 2014', 'relevance': 5.9615083, 'type': 'image'}]]
For each dictionary, grab the value for the 'mediaUrl' key
>>> [d[i][0].get('mediaUrl') for i in d.keys()]
['http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025',
'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR']

Categories

Resources