Related
I started by converting XML to JSON.
I have a nested value I would like to find the index of in order to retrieve it's sibling dictionary. These elements are under the same parent Location. The JSON below is abridged, there are many more Location dictionaries.
For example, from providing a value of New York to:
Location > LocationMetaData > LocationName
I'd like to retrieve:
Location > LocationData > TimeData > Period > DateTimeTo and
Location > LocationData > TimeData > Element > ElementIndex
I'm stuck here, trying something along the lines of:
json_d['Root']['Location'][0]['LocationMetaData']['LocationName']
JSON
{
"Root":{
"Origin":{
"Firstname":"Guy",
"Lastname":"Man"
},
"Identification":{
"Title":"The Title",
"DateTime":"2022-05-26 09:00"
},
"Location":[
{
"LocationMetaData":{
"LocationId":"192",
"LocationName":"New York"
},
"LocationData":{
"TimeData":[
{
"Period":{
"DateTimeFrom":"2022-05-26 12:00",
"DateTimeTo":"2022-05-26 13:00"
},
"Element":{
"ElementValue":"V",
"ElementIndex":"9"
}
},
{
"Period":{
"DateTimeFrom":"2022-05-26 13:00",
"DateTimeTo":"2022-05-26 14:00"
},
"Element":{
"ElementValue":"V",
"ElementIndex":"8"
}
},
{
"Period":{
"DateTimeFrom":"2022-05-26 14:00",
"DateTimeTo":"2022-05-26 15:00"
},
"Element":{
"ElementValue":"H",
"ElementIndex":"6"
}
}
]
}
}
],
"Location":[
{
"LocationMetaData":{
"LocationId":"168",
"LocationName":"Chicago"
},
"LocationData":{
"TimeData":[
{
"Period":{
"DateTimeFrom":"2022-05-26 12:00",
"DateTimeTo":"2022-05-26 13:00"
},
"Element":{
"ElementValue":"V",
"ElementIndex":"9"
}
},
{
"Period":{
"DateTimeFrom":"2022-05-26 13:00",
"DateTimeTo":"2022-05-26 14:00"
},
"Element":{
"ElementValue":"V",
"ElementIndex":"8"
}
},
{
"Period":{
"DateTimeFrom":"2022-05-26 14:00",
"DateTimeTo":"2022-05-26 15:00"
},
"Element":{
"ElementValue":"H",
"ElementIndex":"6"
}
}
]
}
}
]
}
}
Python
#!/usr/bin/env python3
import xmltodict
import json
import requests
url = "https://example.com/file.xml"
xml = requests.get(url)
json_s = json.dumps(xmltodict.parse(xml.content), indent=2)
#print(json_s)
#dict
json_d = json.loads(json_s)
#need to find index by providing value of LocationName
json_d = json_d['Root']['Location'][0]
city = json_d['LocationMetaData']['LocationName'] #new york
count = 0
for key in json_d['LocationData']['TimeData']:
timeto = key['Root']['DateTimeTo']
indx = key['Element']['ElementIndex']
level = key['Element']['ElementValue']
level = {
'L': 'Low',
'M': 'Medium',
'H': 'High',
'V': 'Very High',
'E': 'Extreme'} [level]
print(f'{timeto}\t{indx} ({level})')
count += 1
if count == 15: break
If json_d is your parsed json file, you can try:
for location in json_d["Root"]["Location"]:
city = location["LocationMetaData"]["LocationName"]
date_time_to = []
element_idx = []
element_values = []
for td in location["LocationData"]["TimeData"]:
date_time_to.append(td["Period"]["DateTimeTo"])
element_idx.append(td["Element"]["ElementIndex"])
element_values.append(td["Element"]["ElementValue"])
print(city)
print("-" * 80)
for t, e, v in zip(date_time_to, element_idx, element_values):
print(f"{t} - {e} - {v}")
print()
Prints:
Chicago
--------------------------------------------------------------------------------
2022-05-26 13:00 - 9 - V
2022-05-26 14:00 - 8 - V
2022-05-26 15:00 - 6 - H
My dataframe is
fname lname city state code
Alice Lee Athens Alabama PXY
Nor Xi Mesa Arizona ABC
The output of json should be
{
"Employees":{
"Alice Lee":{
"code":"PXY",
"Address":"Athens, Alabama"
},
"Nor Xi":{
"code":"ABC",
"Address":"Mesa, Arizona"
}
}
}
df.to_json() gives no hierarchy to the json. Can you please suggest what am I missing? Is there a way to combine columns and give them a 'keyname' while writing json in pandas?
Thank you.
Try:
names = df[["fname", "lname"]].apply(" ".join, axis=1)
addresses = df[["city", "state"]].apply(", ".join, axis=1)
codes = df["code"]
out = {"Employees": {}}
for n, a, c in zip(names, addresses, codes):
out["Employees"][n] = {"code": c, "Address": a}
print(out)
Prints:
{
"Employees": {
"Alice Lee": {"code": "PXY", "Address": "Athens, Alabama"},
"Nor Xi": {"code": "ABC", "Address": "Mesa, Arizona"},
}
}
We can populate a new dataframe with columns being "code" and "Address", and index being "full_name" where the latter two are generated from the dataframe's columns with string addition:
new_df = pd.DataFrame({"code": df["code"],
"Address": df["city"] + ", " + df["state"]})
new_df.index = df["fname"] + " " + df["lname"]
which gives
>>> new_df
code Address
Alice Lee PXY Athens, Alabama
Nor Xi ABC Mesa, Arizona
We can now call to_dict with orient="index":
>>> d = new_df.to_dict(orient="index")
>>> d
{"Alice Lee": {"code": "PXY", "Address": "Athens, Alabama"},
"Nor Xi": {"code": "ABC", "Address": "Mesa, Arizona"}}
To match your output, we wrap d with a dictionary:
>>> {"Employee": d}
{
"Employee":{
"Alice Lee":{
"code":"PXY",
"Address":"Athens, Alabama"
},
"Nor Xi":{
"code":"ABC",
"Address":"Mesa, Arizona"
}
}
}
json = json.loads(df.to_json(orient='records'))
employees = {}
employees['Employees'] = [{obj['fname']+' '+obj['lname']:{'code':obj['code'], 'Address':obj['city']+', '+obj['state']}} for obj in json]
This outputs -
{
'Employees': [
{
'Alice Lee': {
'code': 'PXY',
'Address': 'Athens, Alabama'
}
},
{
'Nor Xi': {
'code': 'ABC',
'Address': 'Mesa, Arizona'
}
}
]
}
you can solve this using df.iterrows()
employee_dict = {}
for row in df.iterrows():
# row[0] is the index number, row[1] is the data respective to that index
row_data = row[1]
employee_name = row_data.fname + ' ' + row_data.lname
employee_dict[employee_name] = {'code': row_data.code, 'Address':
row_data.city + ', ' + row_data.state}
json_data = {'Employees': employee_dict}
Result:
{'Employees': {'Alice Lee': {'code': 'PXY', 'Address': 'Athens, Alabama'},
'Nor Xi': {'code': 'ABC', 'Address': 'Mesa, Arizona'}}}
I have a csv file with the following structure:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 1,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 1,summer tournament,Catarina Corbell
...
Team 10, spring tournament,Jessi Ravelo
I want to create a nested dictionary (team, tournament) with a list of player dictionary. The desired outcome would be something like:
{'data':
{Team 1:
{'spring tournament':
{'players': [
{name: Rebecca Cardone},
{name: Salina Youngblood},
{name: Catarina Corbell}]
},
{'summer tournament':
{'players': [
{name: Cara Mejias},
{name: Catarina Corbell}]
}
}
},
...
{Team 10:
{'spring tournament':
{'players': [
{name: Jessi Ravelo}]
}
}
}
}
I've been struggling to format it like this. I have been able to successfully nest the first level (team # --> tournament) but I cannot get the second level to nest. Currently, my code looks like this:
d = {}
header = True
with open("input.csv") as f:
for line in f.readlines():
if header:
header = False
continue
team, tournament, player = line.strip().split(",")
d_team = d.get(team,{})
d_tournament = d_team.get(tournament, {})
d_player = d_tournament.get('player',['name'])
d_player.append(player)
d_tournament['player'] = d_tournament
d_team[tournament] = d_tournament
d[team] = d_team
print(d)
What would be the next step in fixing my code so I can create the nested dictionary?
Some problems with your implementation:
You do d_player = d_tournament.get('player',['name']). But you actually want to get the key named players, and this should be a list of dictionaries. Each of these dictionaries must have the form {"name": "Player's Name"}. So you want
l_player = d_tournament.get('players',[]) (default to an empty list), and then do l_player.append({"name": player}) (I renamed it to l_player because it's a list, not a dict).
You do d_tournament['player'] = d_tournament. I suspect you meant d_tournament['player'] = d_player
Strip the whitespace off the elements in the rows. Do team, tournament, player = (word.strip() for word in line.split(","))
Your code works fine after you make these changes
I strongly suggest you use the csv.reader class to read your CSV file instead of manually splitting the line by commas.
Also, since python's containers (lists and dictionaries) hold references to their contents, you can just add the container once and then modify it using mydict["key"] = value or mylist.append(), and these changes will be reflected in parent containers too. Because of this behavior, you don't need to repeatedly assign these things in the loop like you do with d_team[tournament] = d_tournament
allteams = dict()
hasHeader = True
with open("input.csv") as f:
csvreader = csv.reader(f)
if hasHeader: next(csvreader) # Consume one line if a header exists
# Iterate over the rows, and unpack each row into three variables
for team_name, tournament_name, player_name in csvreader:
# If the team hasn't been processed yet, create a new dict for it
if team_name not in allteams:
allteams[team_name] = dict()
# Get the dict object that holds this team's information
team = allteams[team_name]
# If the tournament hasn't been processed already for this team, create a new dict for it in the team's dict
if tournament_name not in team:
team[tournament_name] = {"players": []}
# Get the tournament dict object
tournament = team[tournament_name]
# Add this player's information to the tournament dict's "player" list
tournament["players"].append({"name": player_name})
# Add all teams' data to the "data" key in our result dict
result = {"data": allteams}
print(result)
Which gives us what we want (prettified output):
{
'data': {
'Team 1': {
'spring tournament': {
'players': [
{ 'name': 'Rebbecca Cardone' },
{ 'name': 'Salina Youngblood' },
{ 'name': 'Catarina Corbell' }
]
},
'summer tournament': {
'players': [
{ 'name': 'Cara Mejias' },
{ 'name': 'Catarina Corbell' }
]
}
},
'Team 10': {
' spring tournament': {
'players': [
{ 'name': 'Jessi Ravelo' }
]
}
}
}
}
The example dictionary you describe is not possible (if you want multiple dictionaries under the key "Team 1", put them in a list), but this snippet:
if __name__ == '__main__':
your_dict = {}
with open("yourfile.csv") as file:
all_lines = file.readlines()
data_lines = all_lines[1:] # Skipping "team,tournament,player" line
for line in data_lines:
line = line.strip() # Remove \n
team, tournament_type, player_name = line.split(",")
team_dict = your_dict.get(team, {}) # e.g. "Team 1"
tournaments_of_team_dict = team_dict.get(tournament_type, {'players': []}) # e.g. "spring_tournament"
tournaments_of_team_dict["players"].append({'name': player_name})
team_dict[tournament_type] = tournaments_of_team_dict
your_dict[team] = team_dict
your_dict = {'data': your_dict}
For this example yourfile.csv:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 2,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 2,summer tournament,Catarina Corbell
Gives the following:
{
"data": {
"Team 1": {
"spring tournament": {
"players": [
{
"name": "Rebbecca Cardone"
},
{
"name": "Salina Youngblood"
}
]
},
"summer tournament": {
"players": [
{
"name": "Cara Mejias"
}
]
}
},
"Team 2": {
"spring tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
},
"summer tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
}
}
}
}
Process finished with exit code 0
Maybe I overlook somethign but couldn't you use:
df.groupby(['team','tournament'])['player'].apply(list).reset_index().to_json(orient='records')
You might approach it this way:
from collections import defaultdict
import csv
from pprint import pprint
d = defaultdict(dict)
with open('f00.txt', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
d[ row['team'] ].setdefault(row['tournament'], []
).append(row['player'])
pprint(dict(d))
Prints:
{'Team 1': {'spring tournament': ['Rebbecca Cardone',
'Salina Youngblood',
'Catarina Corbell'],
'summer tournament': ['Cara Mejias', 'Catarina Corbell']},
'Team 10': {' spring tournament': ['Jessi Ravelo']}}
I have a list of dictionaries like so
names = [{'id':1, 'name': 'Alice', 'dep_name':'Pammy', 'is_dep_minor':True, 'is_insured':False},
{'id':2, 'name': 'Alice', 'dep_name':'Trudyl', 'is_dep_minor':False, 'is_insured':True},
{'id':3, 'name': 'Bob', 'dep_name':'Charlie', 'is_dep_minor':True, 'is_insured':True},]
I want to create a new unified dictionary with new properties that will be populated later. I am planning to eliminate the need to have two dicts for Alice when it can be nested inside. This is what I have so far.
results = []
for name in names:
newdict = defaultdict(dict)
newdict[name[name]]["NEWKEY"] = None # to be filled in later
newdict[name[name]]["ANOTHERKEY"] = None # to be filled in later
innerdict = defaultdict(dict)
innerdict["dep_name"]["is_minor"] = name["is_dep_minor"]
innerdict["dep_name"]["is_insured"] = name["is_insured"]
newdict[name[name]]["DEPENDENTS"] = innerdict
results.append(newdict)
This gives me
[
{
"Alice" : {
"NEWKEY" : None,
"ANOTHERKEY" : None,
"DEPENDENTS" : {
"Pammy" : {
"is_minor" : True,
"is_insured" : False
}
}
}
},
{
"Alice" : {
"NEWKEY" : None,
"ANOTHERKEY" : None,
"DEPENDENTS" : {
"Trudy" : {
"is_minor" : False,
"is_insured" : True
}
}
}
},
# and the list goes on
]
What I'm aiming for is
{
"Alice" : {
"NEWKEY" : None,
"ANOTHERKEY" : None,
"DEPENDENTS" : {
"Pammy" : {
"is_minor" : True,
"is_insured" : False
},
"Trudy" : {
"is_minor" : False,
"is_insured" : True
}
}
}
},
Can someone help me out with this? Thanks in Advance
I found a solution to my problem. I did it like so
results = {}
def create_dict():
newdict = {}
newdict["NEWKEY"] = None # to be filled in later
newdict["ANOTHERKEY"] = None # to be filled in later
innerdict = defaultdict(dict)
innerdict["dep_name"]["is_minor"] = name["is_dep_minor"]
innerdict["dep_name"]["is_insured"] = name["is_insured"]
newdict["DEPENDENTS"] = innerdict
return newdict
for name in names:
if name["name"] in results.keys():
dname = name["dep_name"]
is_minor = name["is_dep_minor"]
is_insured = name["is_dep_insured"]
name = results.get(name["name"])
name["DEPENDENT"][dname]["is_dep_minor"] = is_minor
name["DEPENDENT"][dname]["is_dep_insured"] = is_insured
else:
newdict = create_dict()
results[name["name"]] = newdict
This thing gives the desired output
I am trying to update a key while retaining its values within a nested dictionaries.
While I have found a method to do so, I had to create new dictionaries in order to cater for it. As such, wondering if there anyone could provide me with a better insight on the approach I have taken?
init_dict = {
'pageA' : {
0 : {
'menuA' : [
'a01',
'a02'
]
}
},
'pageB' : {
1 : {
'menuB' : [
'b10'
]
}
}
}
changed = {'pageB' : 0, 'pageA' : 1}
condense_dict = {}
for k, v in init_dict.items():
for i in v.keys():
condense_dict[k] = init_dict[k][i]
new_dict = {}
for i in condense_dict:
new_dict[i] = {}
new_dict[i][changed.get(i)] = condense_dict.get(i)
My expected output is as follows:
{
'pageA' : {
1 : {
'menuA' : [
'a01',
'a02'
]
}
},
'pageB' : {
0 : {
'menuB' : [
'b10'
]
}
}
}
You can pop the presumably only key from the sub-dict and assign it to the new key for each entry in changed:
for k, v in changed.items():
init_dict[k][v] = init_dict[k].pop(next(iter(init_dict[k])))
init_dict becomes:
{'pageA': {1: {'menuA': ['a01', 'a02']}}, 'pageB': {0: {'menuB': ['b10']}}}
Using the .pop() method this can be done similar to this (although I'm sure you could rewrite it better)
init_dict = {
'pageA': {
0: {
'menuA' : [
'a01',
'a02'
]
}
},
'pageB': {
1: {
'menuB': [
'b10'
]
}
}
}
print(init_dict)
thing = init_dict.pop('pageA')
sub_thing = thing.pop(0)
redone = {1: sub_thing}
init_dict.update({'pageA': redone})
print(init_dict)
{'pageA': {0: {'menuA': ['a01', 'a02']}}, 'pageB': {1: {'menuB': ['b10']}}}
{'pageA': {1: {'menuA': ['a01', 'a02']}}, 'pageB': {1: {'menuB': ['b10']}}}
You can see it's the same data as we start with, but we changed 0 to 1
Here I use .pop() and change it inplace. With the same init_dict as you:
change_to = {1: 0, 0: 1}
for k, v in init_dict.items():
for old_key in v.keys():
if old_key in change_to:
v[change_to[old_key]] = v.pop(old_key)