I'm new to python programming.
I have tried a lot to avoid these nested for loops, but no success.
My data input like:
[
{
"province_id": "1",
"name": "HCM",
"districts": [
{
"district_id": "1",
"name": "Thu Duc",
"wards": [
{
"ward_id": "1",
"name": "Linh Trung"
},
{
"ward_id": "2",
"name": "Linh Chieu"
}
]
},
{
"district_id": "2",
"name": "Quan 9",
"wards": [
{
"ward_id": "3",
"name": "Hiep Phu"
},
{
"ward_id": "4",
"name": "Long Binh"
}
]
}
]
},
{
"province_id": "2",
"name": "Binh Duong",
"districts": [
{
"district_id": "3",
"name": "Di An",
"wards": [
{
"ward_id": "5",
"name": "Dong Hoa"
},
{
"ward_id": "6",
"name": "Di An"
}
]
},
{
"district_id": "4",
"name": "Thu Dau Mot",
"wards": [
{
"ward_id": "7",
"name": "Hiep Thanh"
},
{
"ward_id": "8",
"name": "Hiep An"
}
]
}
]
}
]
And my code is:
for province in data:
for district in province['districts']:
for ward in district['wards']:
# Excute my function
print('{}, {}, {}'.format(ward['name'], district['name'], province['name']))
Output
Linh Trung, Thu Duc, HCM
Linh Chieu, Thu Duc, HCM
Hiep Phu, Quan 9, HCM
...
Even though my code is working it looks pretty ugly.
How can I avoid these nested for loops?
Your data structure is naturally nested, but one option you have for neatening your code is to write a generator function for iterating over it:
def all_wards(data):
for province in data:
for district in province['districts']:
for ward in district['wards']:
yield province, district, ward
This function has the same triply-nested loop in it as you currently have, but everywhere else in your code, you can now iterate over the data structure with a single non-nested loop:
for province, district, ward in all_wards(data):
print('{}, {}, {}'.format(ward['name'], district['name'], province['name']))
If you prefer to avoid having too much indentation, here's an equivalent way to write the function, similar to #adarian's answer but without creating a temporary list:
def all_wards(data):
return (
province, district, ward
for province in data
for district in province['districts']
for ward in district['wards']
)
Here is a one-liner version
[
print("{}, {}, {}".format(ward["name"], district["name"], province["name"]))
for province in data
for district in province["districts"]
for ward in district["wards"]
]
You could do something like this:
def print_district(district, province):
for ward in district['wards']:
print('{}, {}, {}'.format(ward['name'], district['name'], province['name']))
def print_province(province):
for district in province['districts']:
print_district(district, province)
for province in data:
print_province(province)
Related
Input
id
name
country
lost_item
year
status
resolved_date
closed_date
refunded_date
123
John
US
Bike
2020
Resolved
2021-12-25
125
Mike
CAN
Car
2021
Refunded
2021-11-22
123
John
US
Car
2019
Resolved
2021-12-25
563
Steve
CAN
Battery
2022
Closed
2019-02-03
Desired output
{
"items": {
"item": [
{
"id": "123",
"name": "John",
"categories": {
"category": [
{
"lost_item": "Bike",
"year": "2020"
},
{
"lost_item": "Car",
"year": "2019"
}
]
},
"country": "US",
"status": "Resolved",
"resolved_date":"2021-12-25",
},
{
"id": "125",
"name": "Mike",
"categories": {
"category": [
{
"lost_item": "Car",
"year": "2021"
},
]
},
"country": "CAN",
"status": "Reopened",
"refunded_date":"2021-11-22",
},
{
"id": "563",
"name": "Steve",
"categories": {
"category": [
{
"lost_item": "Bike",
"year": "2020"
},
]
},
"country": "CAN",
"status": "Closed",
"closed_date":"2019-02-03",
}
]
}
}
My code:
df = pd.read_excel('C:/Users/hero/Desktop/sample.xlsx', sheet_name='catalog')
df["closed_date"] = df["closed_date"].astype(str)
df["resolved_date"] = df["resolved_date"].astype(str)
df["refunded_date"] = df["refunded_date"].astype(str)
partial = df.groupby(['id', 'name', 'country', 'status', 'closed_date', 'resolved_date', 'refunded_date'], dropna=False).apply(lambda x: {"category":x[['lost_item','year']].to_dict('records')}).reset_index(name="categories").to_dict(orient="records")
res = []
for dict in partial:
clean = {key: value for (key, value) in dict.items() if value!="NaT"}
res.append(clean)
print(json.dumps(res, indent=2)) ## I will be writing the final payload to a JSON file.
In my input the fields id, name, country, status are mandatory fields. The fields resolved_Date, closed_date, refunded_date are not mandatory and will be empty values.
My questions:
Does including columns that have NaN values in GroupBy will have side effects for large datasets? I didn't find any problem with the above sample input.
Can i remove the fields resolved_Date, closed_date, refunded_date in group by and append these columns after group by ?
Whats the best way to handle the NaNs in the dataset ? For my usecase if a NaN is present then i have to drop that particular key not the entire row.
Please let me know if there is any room for improvement in my existing code. Any help is appreciated.
Thanks
So, I'm trying to parse this json object into multiple events, as it's the expected input for a ETL tool. I know this is quite straight forward if we do this via loops, if statements and explicitly defining the search fields for given events. This method is not feasible because I have multiple heavily nested JSON objects and I would prefer to let the python recursions handle the heavy lifting. The following is a sample object, which consist of string, list and dict (basically covers most use-cases, from the data I have).
{
"event_name": "restaurants",
"properties": {
"_id": "5a9909384309cf90b5739342",
"name": "Mangal Kebab Turkish Restaurant",
"restaurant_id": "41009112",
"borough": "Queens",
"cuisine": "Turkish",
"address": {
"building": "4620",
"coord": {
"0": -73.9180155,
"1": 40.7427742
},
"street": "Queens Boulevard",
"zipcode": "11104"
},
"grades": [
{
"date": 1414540800000,
"grade": "A",
"score": 12
},
{
"date": 1397692800000,
"grade": "A",
"score": 10
},
{
"date": 1381276800000,
"grade": "A",
"score": 12
}
]
}
}
And I want to convert it to this following list of dictionaries
[
{
"event_name": "restaurants",
"properties": {
"restaurant_id": "41009112",
"name": "Mangal Kebab Turkish Restaurant",
"cuisine": "Turkish",
"_id": "5a9909384309cf90b5739342",
"borough": "Queens"
}
},
{
"event_name": "restaurant_address",
"properties": {
"zipcode": "11104",
"ref_id": "41009112",
"street": "Queens Boulevard",
"building": "4620"
}
},
{
"event_name": "restaurant_address_coord"
"ref_id": "41009112"
"0": -73.9180155,
"1": 40.7427742
},
{
"event_name": "restaurant_grades",
"properties": {
"date": 1414540800000,
"ref_id": "41009112",
"score": 12,
"grade": "A",
"index": "0"
}
},
{
"event_name": "restaurant_grades",
"properties": {
"date": 1397692800000,
"ref_id": "41009112",
"score": 10,
"grade": "A",
"index": "1"
}
},
{
"event_name": "restaurant_grades",
"properties": {
"date": 1381276800000,
"ref_id": "41009112",
"score": 12,
"grade": "A",
"index": "2"
}
}
]
And most importantly these events will be broken up into independent structured tables to conduct joins, we need to create primary keys/ unique identifiers. So the deeply nested dictionaries should have its corresponding parents_id field as ref_id. In this case ref_id = restaurant_id from its parent dictionary.
Most of the example on the internet flatten's the whole object to be normalized and into a dataframe, but to utilise this ETL tool to its full potential it would be ideal to solve this problem via recursions and outputting as list of dictionaries.
This is what one might call a brute force method. Create a translator function to move each item into the correct part of the new structure (like a schema).
# input dict
d = {
"event_name": "demo",
"properties": {
"_id": "5a9909384309cf90b5739342",
"name": "Mangal Kebab Turkish Restaurant",
"restaurant_id": "41009112",
"borough": "Queens",
"cuisine": "Turkish",
"address": {
"building": "4620",
"coord": {
"0": -73.9180155,
"1": 40.7427742
},
"street": "Queens Boulevard",
"zipcode": "11104"
},
"grades": [
{
"date": 1414540800000,
"grade": "A",
"score": 12
},
{
"date": 1397692800000,
"grade": "A",
"score": 10
},
{
"date": 1381276800000,
"grade": "A",
"score": 12
}
]
}
}
def convert_structure(d: dict):
''' function to convert to new structure'''
# the new dict
e = {}
e['event_name'] = d['event_name']
e['properties'] = {}
e['properties']['restaurant_id'] = d['properties']['restaurant_id']
# and so forth...
# keep building the new structure / template
# return a list
return [e]
# run & print
x = convert_structure(d)
print(x)
the reuslt (for the part done) looks like this:
[{'event_name': 'demo', 'properties': {'restaurant_id': '41009112'}}]
If a pattern is identified, then the above could be improved...
I have a Following Json File format in which the data will be having Dynamic update.
I have to parse the Json file using python with the following scenario
If the status: "PASS", then the results of value should be maths,q1 ,q2
Kindly me with this scenario using python
{
"quiz": {
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
},
"status": "BLOCK"
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
},
"status": "PASS"
}
}
}
Hope below code gives you the expected output
import json
file = open('data.json')
data = json.load(file)
keys = []
for i in data['quiz']:
if data['quiz'][i]['status'] != 'PASS':
keys.append(i)
for key in keys:
del data['quiz'][key]
print(data)
for key in data['quiz'].keys():
print(key)
for key1 in data['quiz'][key].keys():
if key1 != 'status':
print(key1)
I'm trying to export a dataFrame into a nested JSON (hierarchical) for D3.js using solution which is only for one level ( parent , children)
Any help would be appreciated. I'm new to python
My DataFrame contains 7 levels
Here is the expected solution
JSON Example:
{
"name": "World",
"children": [
{
"name": "Europe",
"children": [
{
"name": "France",
"children": [
{
"name": "Paris",
"population": 1000000
}]
}]
}]
}
and here is the python method:
def to_flare_json(df, filename):
"""Convert dataframe into nested JSON as in flare files used for D3.js"""
flare = dict()
d = {"name":"World", "children": []}
for index, row in df.iterrows():
parent = row[0]
child = row[1]
child1 = row[2]
child2 = row[3]
child3 = row[4]
child4 = row[5]
child5 = row[6]
child_value = row[7]
# Make a list of keys
key_list = []
for item in d['children']:
key_list.append(item['name'])
#if 'parent' is NOT a key in flare.JSON, append it
if not parent in key_list:
d['children'].append({"name": parent, "children":[{"value": child_value, "name1": child}]})
# if parent IS a key in flare.json, add a new child to it
else:
d['children'][key_list.index(parent)]['children'].append({"value": child_value, "name11": child})
flare = d
# export the final result to a json file
with open(filename +'.json', 'w') as outfile:
json.dump(flare, outfile, indent=4,ensure_ascii=False)
return ("Done")
[EDIT]
Here is a sample of my df
World Continent Region Country State City Boroughs Population
1 Europe Western Europe France Ile de France Paris 17 821964
1 Europe Western Europe France Ile de France Paris 19 821964
1 Europe Western Europe France Ile de France Paris 20 821964
The structure you want is clearly recursive so I made a recursive function to fill it:
def create_entries(df):
entries = []
# Stopping case
if df.shape[1] == 2: # only 2 columns left
for i in range(df.shape[0]): # iterating on rows
entries.append(
{"Name": df.iloc[i, 0],
df.columns[-1]: df.iloc[i, 1]}
)
# Iterating case
else:
values = set(df.iloc[:, 0]) # Getting the set of unique values
for v in values:
entries.append(
{"Name": v,
# reiterating the process but without the first column
# and only the rows with the current value
"Children": create_entries(
df.loc[df.iloc[:, 0] == v].iloc[:, 1:]
)}
)
return entries
All that's left is to create the dictionary and call the function:
mydict = {"Name": "World",
"Children": create_entries(data.iloc[:, 1:])}
Then you just write your dict to a JSON file.
I hope my comments are explicit enough, the idea is to recursively use the first column of the dataset as the "Name" and the rest as the "Children".
Thank you Syncrossus for the answer, but this result in different branches for each boroughs or city
The result is this:
"Name": "World",
"Children": [
{
"Name": "Western Europe",
"Children": [
{
"Name": "France",
"Children": [
{
"Name": "Ile de France",
"Children": [
{
"Name": "Paris",
"Children": [
{
"Name": "17ème",
"Population": 821964
}
]
}
]
}
]
}
]
},{
"Name": "Western Europe",
"Children": [
{
"Name": "France",
"Children": [
{
"Name": "Ile de France",
"Children": [
{
"Name": "Paris",
"Children": [
{
"Name": "10ème",
"Population": 154623
}
]
}
]
}
]
}
]
}
But the desired result is this
"Name": "World",
"Children": [
{
"Continent": "Europe",
"Children": [
{
"Region": "Western Europe",
"Children": [
{
"Country": "France",
"Children": [
{
"State": "Ile De France",
"Children": [
{
"City": "Paris",
"Children": [
{
"Boroughs": "17ème",
"Population": 82194
},
{
"Boroughs": "16ème",
"Population": 99194
}
]
},
{
"City": "Saint-Denis",
"Children": [
{
"Boroughs": "10ème",
"Population": 1294
},
{
"Boroughs": "11ème",
"Population": 45367
}
]
}
]
}
]
},
{
"Country": "Belgium",
"Children": [
{
"State": "Oost-Vlaanderen",
"Children": [
{
"City": "Gent",
"Children": [
{
"Boroughs": "2ème",
"Population": 1234
},
{
"Boroughs": "4ème",
"Population": 7456
}
]
}
]
}
]
}
]
}
]
}
]
I have some data from a project where the variables can change from motorcycle and car. I need to get the name out of them and that value is inside the variable.
This is not the data i will be using but it has the same structure, the "official" data is some persional information so i changed it to some random values. I can not change the structure of the JSON data since this is the way the serveradmins decided to structure it for some reason.
This is my python code:
import json
with open('exampleData.json') as j:
data = json.load(j)
name = 0
Vehicle = 0
for x in data:
print(data['persons'][x]['name'])
for i in data['persons'][x]['things']["Vehicles"]:
print(data['persons'][x]['things']['Vehicles'][i]['type']['name'])
print("\n")
This is my Json data i extracted from the file "ExampleData.json"(sorry for long but it is kinda complex and necessary to understand the problem):
{
"total": 2,
"persons": [
{
"name": "Sven Svensson",
"things": {
"House": "apartment",
"Vehicles": [
{
"id": "46",
"type": {
"name": "Kawasaki ER6N",
"type": "motorcyle"
},
"Motorcycle": {
"plate": "aaa111",
"fields": {
"brand": "Kawasaki",
"status": "in shop"
}
}
},
{
"id": "44",
"type": {
"name": "BMW m3",
"type": "Car"
},
"Car": {
"plate": "bbb222",
"fields": {
"brand": "BMW",
"status": "in garage"
}
}
}
]
}
},
{
"name": "Eric Vivian Matthews",
"things": {
"House": "House",
"Vehicles": [
{
"id": "44",
"type": {
"name": "Volvo XC90",
"type": "Car"
},
"Car": {
"plate": "bbb222",
"fields": {
"brand": "Volvo",
"status": "in garage"
}
}
}
]
}
}
]
}
I want it to print out something like this :
Sven Svensson
Bmw M3
Kawasaki ER6n
Eric Vivian Matthews
Volvo XC90
but i get this error:
print(data['persons'][x]['name'])
TypeError: list indices must be integers or slices, not str
Process finished with exit code 1
What you need is
for person in data["persons"]:
for vehicle in person["things"]["vehicles"]:
print(vehicle["type"]["name"])
type = vehicle["type"]["type"]
print(vehicle[type]["plate"])
Python for loop does not return the key but rather an object here:
for x in data:
Referencing an object as key
print(data['persons'][x]['name'])
Is causing the error
What you need is to use the returning json object and iterate over them like so:
for x in data['persons']:
print(x['name'])
for vehicle in x['things']['Vehicles']:
print(vehicle['type']['name'])
print('\n')