Python: Flatten and Parse certain sections of JSON - python

I have an input JSON that looks like this:
> {"payment": {"payment_id": "AA340", "payment_amt": "20", "chk_nr": "321749", "clm_list": {"dtl": [{"clm_id": "1A2345", "name": "John", adj:{"adj_id":"W123","adj_cd":"45"}}, {"clm_id": "9999", "name": "Dilton", adj:{"adj_id":"X123","adj_cd":"5"}}]}}}
I need the output to look like this:
{"clm_id": "1A2345",adj:{"adj_id":"W123"},"payment_amt": "20", "chk_nr": "321749"}
{"clm_id": "9999"adj:{"adj_id":"X123"},"payment_amt": "20", "chk_nr": "321749"}
So the code takes in the one JSON doc, parses the claim array section and normalizes it by adding payment info to each section. Even the nested JSON is parsed.
I'm able to parse the data, but unsure on how to normalize only certain section of the data.
The code below will parse the data, but NOT normalize
keep = ["payment","payment_id","payment_amt", "clm_list", "dtl", "clm_id","adj","adj_id"]
old_dict={"payment": {"payment_id": "AA340", "payment_amt": "20", "chk_nr": "321749", "clm_list": {"dtl": [{"clm_id": "1A2345", "name": "John", "adj": {"adj_id": "W123", "adj_cd": "45"}}, {"clm_id": "9999", "name": "Dilton", "adj": {"adj_id": "X123", "adj_cd": "5"}}]}}}
def recursively_prune_dict_keys(obj, keep):
if isinstance(obj, dict):
return dict([(k, recursively_prune_dict_keys(v, keep)) for k, v in obj.items() if k in keep])
elif isinstance(obj, list):
return [recursively_prune_dict_keys(item, keep) for item in obj]
else:
return obj
new_dict = recursively_prune_dict_keys(old_dict, keep)
conv_json=new_dict["payment"]
print json.dumps(conv_json)

It may be the neat way is to simply pick through the data, like;
new_dict = recursively_prune_dict_keys(old_dict, keep)
payment = old_dict['payment']
claims = payment['clm_list']['dtl']
for claim in claims:
claim['payment_amt'] = payment['payment_amt']
claim['chk_nr'] = payment['chk_nr']
print(json.dumps(claims))
This will yield;
[{"chk_nr": "321749", "clm_id": "1A2345", "payment_amt": "20", "adj": {"adj_id": "W123"}}, {"chk_nr": "321749", "clm_id": "9999", "payment_amt": "20", "adj": {"adj_id": "X123"}}]
This contains the output you asked for, but not exactly as you may want to see it.
First, your desired output isn't correct JSON without the square brackets [] that would make it a list. But, we can get rid of that by dumping each claim individually;
new_dict = recursively_prune_dict_keys(old_dict, keep)
payment = old_dict['payment']
claims = payment['clm_list']['dtl']
for claim in claims:
claim['payment_amt'] = payment['payment_amt']
claim['chk_nr'] = payment['chk_nr']
print(json.dumps(claim))
This gives;
{"name": "John", "clm_id": "1A2345", "payment_amt": "20", "adj": {"adj_cd": "45", "adj_id": "W123"}, "chk_nr": "321749"}
{"name": "Dilton", "clm_id": "9999", "payment_amt": "20", "adj": {"adj_cd": "5", "adj_id": "X123"}, "chk_nr": "321749"}
This is close to your desired output, except maybe for the ordering. Python dicts are not inherently ordered. You can sort them, however. So, if the ordering is important, you will want to read through How to Sort Python Dictionaries by Key or Value

Related

How to convert time from list of dictionaries to date time format

I need to convert time from list of dictionaries, that I made from json file, to date time format
the json file looks like this
{"name": "Thomas", "time_created": 1665070563, "gender": null}
{"name": "Lisa", "time_created": 1665226717, "gender": "female", "age": 59}
{"name": "James", "time_created": 1664913997, "gender": "male", "last_name": "Rogers"}
{"name": "Helen", "time_created": 1664651357, "gender": "female", "last_name": "Scott"}
{"name": "Nora", "time_created": 1664689732, "gender": "female", "age": null}
I try to write this code
import jsonlines
import datetime
with jsonlines.open('data.jsonl', 'r') as jsonl_f:
lst = [obj for obj in jsonl_f]
for value_man in lst:
for value in value_man.keys():
value['time_created'] = datetime.datetime.fromtimestamp(value['time_created'])
print(lst)
but I have a error here
value['time_created'] = str(datetime.datetime.fromtimestamp(value['time_created']))
TypeError: string indices must be integers
You don't need to iterate over all the keys in each dict; the keys are strings, which is why the extra level of iteration is giving you a string instead of the actual dict. Just do:
for value in lst:
value['time_created'] = datetime.datetime.fromtimestamp(value['time_created'])
You can also do this:
lst = [obj for obj in jsonl_f]
more simply as:
lst = list(jsonl_f)
You are accessing the key of the dictionary, not the value. You have to actually access the value, something along those lines:
import jsonlines
import datetime
with jsonlines.open('data.jsonl', 'r') as jsonl_f:
lst = [obj for obj in jsonl_f]
for value_man in lst:
for value in value_man.keys():
if value == 'time_created':
value_man[value] = datetime.datetime.fromtimestamp(value_man[value])
print(lst)

python getting json values from list

I have some json data similar to this...
{
"people": [
{
"name": "billy",
"age": "12"
...
...
},
{
"name": "karl",
"age": "31"
...
...
},
...
...
]
}
At the moment I can do this to get a entry from the people list...
wantedPerson = "karl"
for person in people:
if person['name'] == wantedPerson:
* I have the persons entry *
break
Is there a better way of doing this? Something similar to how we can .get('key') ?
Thanks,
Chris
Assuming you load that json data using the standard library for it, you're fairly close to optimal, perhaps you were looking for something like this:
from json import loads
text = '{"people": [{"name": "billy", "age": "12"}, {"name": "karl", "age": "31"}]}'
data = loads(text)
people = [p for p in data['people'] if p['name'] == 'karl']
If you frequently need to access this data, you might just do something like this:
all_people = {p['name']: p for p in data['people']}
print(all_people['karl'])
That is, all_people becomes a dictionary that uses the name as a key, so you can access any person in it quickly by accessing them by name. This assumes however that there are no duplicate names in your data.
First, there's no problem with your current 'naive' approach - it's clear and efficient since you can't find the value you're looking for without scanning the list.
It seems that you refer to better as shorter, so if you want a one-liner solution, consider the following:
next((person for person in people if person.name == wantedPerson), None)
It gets the first person in the list that has the required name or None if no such person was found.
similarly
ps = {
"people": [
{
"name": "billy",
"age": "12"
},
{
"name": "karl",
"age": "31"
},
]
}
print([x for x in ps['people'] if 'karl' in x.values()])
For possible alternatives or details see e.g. # Get key by value in dictionary

Passing Jupyter Widget Dropdown list to second dropdown

I've got an issue that pertains to how to use jupyter widgets, dropdowns namely, to produce a workflow. Currently my intentions aren't working, and I am aiming to do the following:
Run a function that produces a list
This list is fed into a dropdown, from which I select one (x)
x refers to another function, that has a dictionary, it picks up all values associated with this key, and produces another list
The list is fed into another dropdown, from where I'd pick one value for processing.
Issue that I am coming up with, is that I can get the first list produced and fed into a dropdown. However the subsequent list is not captured, and rather the function is, which of course fails down the road. Let me illustrate with some code:
This bit of code simply goes through a list of dictionaries, and places all the unique league instances into a list:
def league_names():
league_list = []
data_filenames = [data_file for data_file in os.listdir()
if data_file.endswith('.json')]
with open(data_filenames[0]) as json_file:
data = json.load(json_file)
for x in data:
if x['Competition'] is not None and x['Competition'] not in league_list:
league_list.append(x['Competition'])
return league_list[1:]
What the following will then do, is take that list, and search the same set of dictionaries, search for all the teams that are a part of that league, and add them to a list.
def team_names(league_select):
team_list = []
data_filenames = [data_file for data_file in os.listdir()
if data_file.endswith('.json')]
with open(data_filenames[0]) as json_file:
data = json.load(json_file)
for x in data:
if x['Competition'] == league_select and x['Team'] not in team_list:
team_list.append(x['Team'])
return team_list
How I want to interact with this, is that the first league list is passed to a dropdown, from which you pick a league. This passes the league to the second function, to pull all the teams. How this is done is with the following:
def league_interact():
choice = interact(team_names, league_select=league_names())
return type(choice)
league_interact()
Now this works, the list is successfully passed through, however what I simply cannot get to work, is for the interact from here to be transformed into a variable, that I can then pass to a subsequent function for further processing.
Below is an example of the json content:
[{"Team": "Yeovil Town FC", "Gender": "M", "Competition": "National League", "Earliest Season": "2003-2004", "Latest Season": "2020-2021", "Total Seasons": "18", "Championships": "1", "Other Names": "", "Code": "bd5179b9", "Prefix": "Yeovil-Town-Stats"},
{"Team": "Yeovil Town LFC", "Gender": "F", "Competition": "", "Earliest Season": "2017", "Latest Season": "2018-2019", "Total Seasons": "3", "Championships": "", "Other Names": "", "Code": "a506e4a2", "Prefix": "Yeovil-Town-Women-Stats"},
{"Team": "York City FC", "Gender": "M", "Competition": "", "Earliest Season": "2002-2003", "Latest Season": "2019-2020", "Total Seasons": "13", "Championships": "0", "Other Names": "", "Code": "e272e7a8", "Prefix": "York-City-Stats"},
{"Team": "Yorkshire Amateur AFC", "Gender": "M", "Competition": "", "Earliest Season": "2019-2020", "Latest Season": "2020-2021", "Total Seasons": "0", "Championships": "", "Other Names": "", "Code": "66379800", "Prefix": "Yorkshire-Amateur-AFC-Stats"}]
Question: How would I in the above case, use interact to produce the list created by the first choice, rather than a function? I have the type pulled here, where it is a 'function' rather than a list as expected. I tried using .value, and some derivatives, but none of them pushed out a value. Any idea how to approach this, so I can produce a secondary dropdown?
I've tried the following, but getting an error:
def league_interact():
choice = interact(team_names, league_select=league_names())
return choice
def team_interact():
choice2 = interact(team_code, team_select=league_interact())
team_interact()
Error: ValueError: <function team_names at 0x0000021359D20B80> cannot be transformed to a widget
Thanks! I did trawl through the documentation, but how to approach this didn't quite click with me.
Ok so I actually managed to figure this out:
#interact(league = league_box)
def choose_both(league):
team_box.options = team_names(league_box.value)
return
#interact_manual(team = team_box, use_season = season_box)
def choose_team(team, use_season):
return team_choice_cap(team_data(team),use_season)
def team_choice_cap(data_set, use_season):
code = data_set['Code']
prefix = data_set['Prefix']
return parse_seasons(code,prefix,use_season)
The above interact and interact_manual can be used to feed the latter list, that then works to pull up with a manual call the rest of the details.

Dynamic var origination from JSON object

this is going to be a kinky one... well it is for me as I've been trying to nail it for a week with no success so far :(
Lets say I get a nested JSON response from an API hit as:
{"Parameters": {
"Name": {
"Unparsed": null,
"First": "John",
"Middle": "A",
"Last": "Smith",
"Suffix": "Jr"
},
"Address": {
"Unparsed": null,
"Line1": "123 Main St",
"Line2": "apt.2",
"City": "New York",
"State": "NY",
"Zip": "12345"
}
and I wanted to create a variables dynamically from the key and assign value from the key's value.
I know how to do it like with name_first = data.get("Name").get(First), but in this case I am highly dependable on JSON response structure and above wont work if the structure is changed (renamed keys, added or deleted key) etc.
So I am working on writing a python script to do it, but so far had no luck getting this nailed.
thanks!
You might use locals().update to update current variables. So, this snippet creates new variables, like Address_Line2, Name_Suffix, etc
from collection import deque
import json
st = deque()
st.append(([], json.loads(your_json)['Parameters']))
while len(st):
prefix, item = st.pop()
if isinstance(item, dict):
for k, v in item.items():
st.append((prefix + [k], v))
else:
print({'_'.join(prefix): item})
locals().update({'_'.join(prefix): item})

SimpleJson handling of same named entities

I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name
{
"status": "OK",
"usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
"url": "",
"language": "english",
"entities": [
{
"type": "Person",
"relevance": "0.33",
"count": "1",
"text": "Michael Jordan",
"disambiguated": {
"name": "Michael Jordan",
"subType": "Athlete",
"subType": "AwardWinner",
"subType": "BasketballPlayer",
"subType": "HallOfFameInductee",
"subType": "OlympicAthlete",
"subType": "SportsLeagueAwardWinner",
"subType": "FilmActor",
"subType": "TVActor",
"dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
"freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
"umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
"opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
"yago": "http://mpii.de/yago/resource/Michael_Jordan"
}
}
]
}
So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?
The rfc 4627 that defines application/json says:
An object is an unordered collection of zero or more name/value pairs
And:
The names within an object SHOULD be unique.
It means that AlchemyAPI should not return multiple "subType" names inside the same object and claim that it is a JSON.
You could try to request the same in XML format (outputMode=xml) to avoid ambiguity in the results or to convert duplicate keys values into lists:
import simplejson as json
from collections import defaultdict
def multidict(ordered_pairs):
"""Convert duplicate keys values to lists."""
# read all values into lists
d = defaultdict(list)
for k, v in ordered_pairs:
d[k].append(v)
# unpack lists that have only 1 item
for k, v in d.items():
if len(v) == 1:
d[k] = v[0]
return dict(d)
print json.JSONDecoder(object_pairs_hook=multidict).decode(text)
Example
text = """{
"type": "Person",
"subType": "Athlete",
"subType": "AwardWinner"
}"""
Output
{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}
The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:
The names within an object SHOULD be unique.
From rfc 2119:
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
This is a known problam.
You can solve this by modify the duplicate key, or save him into array.
You can use this code if you want.
import json
def parse_object_pairs(pairs):
"""
This function get list of tuple's
and check if have duplicate keys.
if have then return the pairs list itself.
but if haven't return dict that contain pairs.
>>> parse_object_pairs([("color": "red"), ("size": 3)])
{"color": "red", "size": 3}
>>> parse_object_pairs([("color": "red"), ("size": 3), ("color": "blue")])
[("color": "red"), ("size": 3), ("color": "blue")]
:param pairs: list of tuples.
:return dict or list that contain pairs.
"""
dict_without_duplicate = dict()
for k, v in pairs:
if k in dict_without_duplicate:
return pairs
else:
dict_without_duplicate[k] = v
return dict_without_duplicate
decoder = json.JSONDecoder(object_pairs_hook=parse_object_pairs)
str_json_can_be_with_duplicate_keys = '{"color": "red", "size": 3, "color": "red"}'
data_after_decode = decoder.decode(str_json_can_be_with_duplicate_keys)

Categories

Resources