My dataframe is
fname lname city state code
Alice Lee Athens Alabama PXY
Nor Xi Mesa Arizona ABC
The output of json should be
{
"Employees":{
"Alice Lee":{
"code":"PXY",
"Address":"Athens, Alabama"
},
"Nor Xi":{
"code":"ABC",
"Address":"Mesa, Arizona"
}
}
}
df.to_json() gives no hierarchy to the json. Can you please suggest what am I missing? Is there a way to combine columns and give them a 'keyname' while writing json in pandas?
Thank you.
Try:
names = df[["fname", "lname"]].apply(" ".join, axis=1)
addresses = df[["city", "state"]].apply(", ".join, axis=1)
codes = df["code"]
out = {"Employees": {}}
for n, a, c in zip(names, addresses, codes):
out["Employees"][n] = {"code": c, "Address": a}
print(out)
Prints:
{
"Employees": {
"Alice Lee": {"code": "PXY", "Address": "Athens, Alabama"},
"Nor Xi": {"code": "ABC", "Address": "Mesa, Arizona"},
}
}
We can populate a new dataframe with columns being "code" and "Address", and index being "full_name" where the latter two are generated from the dataframe's columns with string addition:
new_df = pd.DataFrame({"code": df["code"],
"Address": df["city"] + ", " + df["state"]})
new_df.index = df["fname"] + " " + df["lname"]
which gives
>>> new_df
code Address
Alice Lee PXY Athens, Alabama
Nor Xi ABC Mesa, Arizona
We can now call to_dict with orient="index":
>>> d = new_df.to_dict(orient="index")
>>> d
{"Alice Lee": {"code": "PXY", "Address": "Athens, Alabama"},
"Nor Xi": {"code": "ABC", "Address": "Mesa, Arizona"}}
To match your output, we wrap d with a dictionary:
>>> {"Employee": d}
{
"Employee":{
"Alice Lee":{
"code":"PXY",
"Address":"Athens, Alabama"
},
"Nor Xi":{
"code":"ABC",
"Address":"Mesa, Arizona"
}
}
}
json = json.loads(df.to_json(orient='records'))
employees = {}
employees['Employees'] = [{obj['fname']+' '+obj['lname']:{'code':obj['code'], 'Address':obj['city']+', '+obj['state']}} for obj in json]
This outputs -
{
'Employees': [
{
'Alice Lee': {
'code': 'PXY',
'Address': 'Athens, Alabama'
}
},
{
'Nor Xi': {
'code': 'ABC',
'Address': 'Mesa, Arizona'
}
}
]
}
you can solve this using df.iterrows()
employee_dict = {}
for row in df.iterrows():
# row[0] is the index number, row[1] is the data respective to that index
row_data = row[1]
employee_name = row_data.fname + ' ' + row_data.lname
employee_dict[employee_name] = {'code': row_data.code, 'Address':
row_data.city + ', ' + row_data.state}
json_data = {'Employees': employee_dict}
Result:
{'Employees': {'Alice Lee': {'code': 'PXY', 'Address': 'Athens, Alabama'},
'Nor Xi': {'code': 'ABC', 'Address': 'Mesa, Arizona'}}}
Related
I want to transform a Dictionary in Python, from Dictionary 1 Into Dictionary 2 as follows.
transaction = {
"trans_time": "14/07/2015 10:03:20",
"trans_type": "DEBIT",
"description": "239.95 USD DHL.COM TEXAS USA",
}
I want to transform the above dictionary to the following
transaction = {
"trans_time": "14/07/2015 10:03:20",
"trans_type": "DEBIT",
"description": "DHL.COM TEXAS USA",
"amount": 239.95,
"currency": "USD",
"location": "TEXAS USA",
"merchant_name": "DHL"
}
I tried the following but it did not work
dic1 = {
"trans_time": "14/07/2015 10:03:20",
"trans_type": "DEBIT",
"description": "239.95 USD DHL.COM TEXAS USA"
}
print(type(dic1))
copiedDic = dic1.copy()
print("copiedDic = ",copiedDic)
updatekeys = ['amount', 'currency', 'merchant_name', 'location', 'trans_category']
for key in dic1:
if key == 'description':
list_words = dic1[key].split(" ")
newdict = {updatekeys[i]: x for i, x in enumerate(list_words)}
copiedDic.update(newdict)
print(copiedDic)
I got The following result
{
'trans_time': '14/07/2015 10:03:20',
'trans_type': 'DEBIT',
'description': '239.95 USD DHL.COM TEXAS USA',
'amount': '239.95',
'currency': 'USD',
'merchant_name': 'DHL.COM',
'location': 'TEXAS',
'trans_category': 'USA'
}
My Intended output should look like this:
transaction = {
"trans_time": "14/07/2015 10:03:20",
"trans_type": "DEBIT",
"description": "DHL.COM TEXAS USA",
"amount": 239.95,
"currency": "USD",
"location": "TEXAS USA",
"merchant_name": "DHL"
}
I think it would be easier to turn the value into an array of words and parse it. Here, an array of words 'aaa ' is created from the dictionary string 'transaction['description']'. Where there are more than one word(array element) 'join' is used to turn the array back into a string. The currency value itself is converted to fractional format from the string. In 'merchant_name', the segment up to the point is taken.
transaction = {
"trans_time": "14/07/2015 10:03:20",
"trans_type": "DEBIT",
"description": "239.95 USD DHL.COM TEXAS USA",
}
aaa = transaction['description'].split()
transaction['description'] = ' '.join(aaa[2:])
transaction['amount'] = float(aaa[0])
transaction['currency'] = aaa[1]
transaction['location'] = ' '.join(aaa[3:])
transaction['merchant_name'] = aaa[2].partition('.')[0]
print(transaction)
Output
{
'trans_time': '14/07/2015 10:03:20',
'trans_type': 'DEBIT',
'description': 'DHL.COM TEXAS USA',
'amount': 239.95,
'currency': 'USD',
'location': 'TEXAS USA',
'merchant_name': 'DHL'}
If you want to transform, you do not need the copy to the original dictionary.
Just do something like this:
new_keys = ['amount', 'currency', 'merchant_name', 'location', 'trans_category']
values = transaction["description"].split(' ')
for idx, key in enumerate(new_keys):
if key == "amount":
transaction[key] = float(values[idx])
else:
transaction[key] = values[idx]
I am new to Python and don't know how to achieve this. I am trying to convert CSV file to JSON format. Address have types 1. Primary 2. Work and Address is multi value attribute as well. Person can have 2 Primary address.
Input Data in CSV format
"f_name"|"l_name"|"address_type"|"address_line_1"|"city"|"state"|"postal_code"|"country"
Brad|Pitt|Primary|"18 Atherton"|Irvine|CA|"92620-2501"|USA
Brad|Pitt|work|"1325 S Grand Ave"|Santa Ana|CA|"92705-4406"|USA
Output Expecting in JSON Format
{
"f_name": "Brad",
"l_name": "Pitt",
"parsed_address": [
{
"address_type": "Primary",
"address": [
{
"address_line_1": "18 Atherton",
"city": "Irvine",
"state": "CA",
"postal_code": "92620-2501",
"country": "USA"
}
]
},
{
"address_type": "work",
"address": [
{
"address_line_1": "1325 S Grand Ave",
"city": "Santa Ana",
"state": "CA",
"postal_code": "92620-2501",
"country": "USA"
}
]
}
]
}
Code Tried
df = pd.read_csv("file")
g_cols = ['f_name','l_name']
address_field = ['address']
cols = ['address_line_1', 'address_line_2', 'address_line_3', 'city', 'state', 'postal_code', 'country']
for i in g_cols:
if i in dict_val.keys():
g_cols[g_cols.index(i)] = dict_val[i]
for i in cols:
if i in dict_val.keys():
cols[cols.index(i)] = dict_val[i]
df2 = df.drop_duplicates().groupby(g_cols)[cols].apply(lambda x: x.to_dict('records')).reset_index(
name=address_field).to_dict('record')
You were close. This should do too exactly what aim to do.
df = pd.read_csv("data.csv", sep="|")
df
dic = {}
for name, group in df.groupby(by=["name"]):
dic["name"] = name
dic["parsed_address"] = []
for address_type, group in df.groupby(by=["address_type"]):
address_dic = {}
address_dic["address_type"] = address_type
address_dic["address"] = group.drop(columns=["name", "address_type"]).to_dict(orient="records")
dic["parsed_address"].append(address_dic)
dic
I think you can try having a dictionary or list (json_data in the code below) to keep track of a person's data and iterating throw each row of the dataframe using for _, row in df.iterrows():
import pandas as pd
df = pd.read_csv("file", delimiter='|')
print(df)
json_data = {}
for _, row in df.iterrows():
name = row["name"]
address_type = row["address_type"]
address_line_1 = row["address_line_1"]
city = row["city"]
state = row["state"]
postal_code = row["postal_code"]
country = row["country"]
if name not in json_data:
json_data[name] = {
"name": name,
"parsed_address": []
}
address_list = None
for address in json_data[name]["parsed_address"]:
if address["address_type"] == address_type:
address_list = address
if address_list is None:
address_list = {
"address_type": address_type,
"address": []
}
json_data[name]["parsed_address"].append(address_list)
address_list["address"].append({
"address_line_1": address_line_1,
"city": city,
"state": state,
"postal_code": postal_code,
"country": country
})
lst = list(json_data.values())
# Verify data parsing
import json
print(json.dumps(lst, indent=2))
dic = {}
g_cols = ['id','first_name','last_name','address_type]
for name, group in df.groupby(g_cols)["address"]:
id = name[0]
dic["id"] = id
dic["parsed_address"] = []
for address_type, group in df.groupby(by=["address_type"]):
address_dic = {}
address_dic["address_type"] = address_type
address_dic["address"] = group.drop(
columns=["id", "first_name","last_name","address_type"]).to_dict("record")
dic["parsed_address"].append(address_dic)
I have a csv file with the following structure:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 1,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 1,summer tournament,Catarina Corbell
...
Team 10, spring tournament,Jessi Ravelo
I want to create a nested dictionary (team, tournament) with a list of player dictionary. The desired outcome would be something like:
{'data':
{Team 1:
{'spring tournament':
{'players': [
{name: Rebecca Cardone},
{name: Salina Youngblood},
{name: Catarina Corbell}]
},
{'summer tournament':
{'players': [
{name: Cara Mejias},
{name: Catarina Corbell}]
}
}
},
...
{Team 10:
{'spring tournament':
{'players': [
{name: Jessi Ravelo}]
}
}
}
}
I've been struggling to format it like this. I have been able to successfully nest the first level (team # --> tournament) but I cannot get the second level to nest. Currently, my code looks like this:
d = {}
header = True
with open("input.csv") as f:
for line in f.readlines():
if header:
header = False
continue
team, tournament, player = line.strip().split(",")
d_team = d.get(team,{})
d_tournament = d_team.get(tournament, {})
d_player = d_tournament.get('player',['name'])
d_player.append(player)
d_tournament['player'] = d_tournament
d_team[tournament] = d_tournament
d[team] = d_team
print(d)
What would be the next step in fixing my code so I can create the nested dictionary?
Some problems with your implementation:
You do d_player = d_tournament.get('player',['name']). But you actually want to get the key named players, and this should be a list of dictionaries. Each of these dictionaries must have the form {"name": "Player's Name"}. So you want
l_player = d_tournament.get('players',[]) (default to an empty list), and then do l_player.append({"name": player}) (I renamed it to l_player because it's a list, not a dict).
You do d_tournament['player'] = d_tournament. I suspect you meant d_tournament['player'] = d_player
Strip the whitespace off the elements in the rows. Do team, tournament, player = (word.strip() for word in line.split(","))
Your code works fine after you make these changes
I strongly suggest you use the csv.reader class to read your CSV file instead of manually splitting the line by commas.
Also, since python's containers (lists and dictionaries) hold references to their contents, you can just add the container once and then modify it using mydict["key"] = value or mylist.append(), and these changes will be reflected in parent containers too. Because of this behavior, you don't need to repeatedly assign these things in the loop like you do with d_team[tournament] = d_tournament
allteams = dict()
hasHeader = True
with open("input.csv") as f:
csvreader = csv.reader(f)
if hasHeader: next(csvreader) # Consume one line if a header exists
# Iterate over the rows, and unpack each row into three variables
for team_name, tournament_name, player_name in csvreader:
# If the team hasn't been processed yet, create a new dict for it
if team_name not in allteams:
allteams[team_name] = dict()
# Get the dict object that holds this team's information
team = allteams[team_name]
# If the tournament hasn't been processed already for this team, create a new dict for it in the team's dict
if tournament_name not in team:
team[tournament_name] = {"players": []}
# Get the tournament dict object
tournament = team[tournament_name]
# Add this player's information to the tournament dict's "player" list
tournament["players"].append({"name": player_name})
# Add all teams' data to the "data" key in our result dict
result = {"data": allteams}
print(result)
Which gives us what we want (prettified output):
{
'data': {
'Team 1': {
'spring tournament': {
'players': [
{ 'name': 'Rebbecca Cardone' },
{ 'name': 'Salina Youngblood' },
{ 'name': 'Catarina Corbell' }
]
},
'summer tournament': {
'players': [
{ 'name': 'Cara Mejias' },
{ 'name': 'Catarina Corbell' }
]
}
},
'Team 10': {
' spring tournament': {
'players': [
{ 'name': 'Jessi Ravelo' }
]
}
}
}
}
The example dictionary you describe is not possible (if you want multiple dictionaries under the key "Team 1", put them in a list), but this snippet:
if __name__ == '__main__':
your_dict = {}
with open("yourfile.csv") as file:
all_lines = file.readlines()
data_lines = all_lines[1:] # Skipping "team,tournament,player" line
for line in data_lines:
line = line.strip() # Remove \n
team, tournament_type, player_name = line.split(",")
team_dict = your_dict.get(team, {}) # e.g. "Team 1"
tournaments_of_team_dict = team_dict.get(tournament_type, {'players': []}) # e.g. "spring_tournament"
tournaments_of_team_dict["players"].append({'name': player_name})
team_dict[tournament_type] = tournaments_of_team_dict
your_dict[team] = team_dict
your_dict = {'data': your_dict}
For this example yourfile.csv:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 2,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 2,summer tournament,Catarina Corbell
Gives the following:
{
"data": {
"Team 1": {
"spring tournament": {
"players": [
{
"name": "Rebbecca Cardone"
},
{
"name": "Salina Youngblood"
}
]
},
"summer tournament": {
"players": [
{
"name": "Cara Mejias"
}
]
}
},
"Team 2": {
"spring tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
},
"summer tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
}
}
}
}
Process finished with exit code 0
Maybe I overlook somethign but couldn't you use:
df.groupby(['team','tournament'])['player'].apply(list).reset_index().to_json(orient='records')
You might approach it this way:
from collections import defaultdict
import csv
from pprint import pprint
d = defaultdict(dict)
with open('f00.txt', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
d[ row['team'] ].setdefault(row['tournament'], []
).append(row['player'])
pprint(dict(d))
Prints:
{'Team 1': {'spring tournament': ['Rebbecca Cardone',
'Salina Youngblood',
'Catarina Corbell'],
'summer tournament': ['Cara Mejias', 'Catarina Corbell']},
'Team 10': {' spring tournament': ['Jessi Ravelo']}}
I'm trying to wrangle some data to make a recommender system for an app. Of course, to do this I need a record of which users like which posts. I currently have that data in a JSON file that is formatted like this (numbers being post id, and letters being user ids):
{
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
I'm trying to figure out how to get this into a pandas dataframe that would look like this:
example format
I've tried using a few online JSON to CSV converters out of laziness which unsurprisingly didn't bring it into a useable format for me. I've tried using "print(json_normalize(data))", as well which also did not work, and put each instance of a like into separate columns.
Any advice?
This is a solution optimized for the peculiarities in your dataset.
import pandas as pd
data = {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}}
formatted = [{'PostID': d, 'User Like': list(data[d].keys())} for d in data]
df = pd.DataFrame.from_dict(formatted)
Output:
From my experience for such simple formats, writing a quick and dirty loop is usually the fastest method rather than finding some ready solution and customizing it. An example for the data you gave here:
import json
my_json=""" {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
}"""
parsed_json = json.loads(my_json)
print(parsed_json)
# result:
# {'-1234': {'abc': 'abc', 'def': 'def', 'ghi': 'ghi'},
# '-5678': {'jkl': 'jkl', 'mno': 'mno'}}
for key in parsed_json.keys():
line = ''
line += key
line += ' | '
for value in parsed_json[key].values():
line += value + ', '
line = line[:-2] # stripping the ', ' from the end of the line
print(line)
# result:
# -1234 | abc, def, ghi
# -5678 | jkl, mno
Setup
Thanks Zaroth
import json
my_json=""" {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
}"""
parsed_json = json.loads(my_json)
Comprehension
pd.DataFrame(
[(k, [*v]) for k, v in parsed_json.items()],
columns=['PostID', 'User Like']
)
PostID User Like
0 -1234 [abc, def, ghi]
1 -5678 [jkl, mno]
OR
pd.DataFrame({
'PostID': [*parsed_json],
'User Like': [[*v] for v in parsed_json.values()]
})
data = {"-1234": {"abc": "abc","def": "def","ghi": "ghi"},"-5678": {"jkl": "jkl","mno": "mno"}}
key = []
val = []
for k,v in data.items():
key.append(k)
val.append(list(v.values()))
pd.DataFrame(zip(key,val),columns=['PostID','User Like'])
When I use:
for reports in raw_data:
for names in reports["names"]:
report_name = json.dumps(names).strip('"')
report_names.append(report_name)
I get the key/object name: 'report1', ...
When I use:
for reports in raw_data:
for names in reports["names"].values():
report_name = json.dumps(names).strip('"')
report_names.append(report_name)
I get the value of the object: 'name1', ...
How do get the object and value together, for example: 'report1': 'name1', ...
The json:
[
{
"names": {
"report1": "name1",
"report2": "name2"
}
},
{
"names": {
"report3": "name3",
"report4": "name4"
}
}
]
You need to loop over each dictionary in the object, then extract each key: value pair from items():
data = [
{
"names": {
"report1": "name1",
"report2": "name2"
}
},
{
"names": {
"report3": "name3",
"report4": "name4"
}
}
]
for d in data:
for k, v in d["names"].items():
print(k, v)
Result:
report1 name1
report2 name2
report3 name3
report4 name4
Or if you can just print out the tuple pairs:
for d in data:
for pair in d["names"].items():
print(pair)
# ('report1', 'name1')
# ('report2', 'name2')
# ('report3', 'name3')
# ('report4', 'name4')
If you want all of the pairs in a list, use a list comprehension:
[pair for d in data for pair in d["names"].items()]
# [('report1', 'name1'), ('report2', 'name2'), ('report3', 'name3'), ('report4', 'name4')]
Try something like this:
import json
with open(r'jsonfile.json', 'r') as f:
qe = json.load(f)
for item in qe:
if item == 'name1':
print(qe)