I have the following data frame:
data = [
{"id": 1, "parent_id": -1, "level": 1, "name": "Company"},
{"id": 2, "parent_id": 1, "level": 2, "name": "Bakery"},
{"id": 3, "parent_id": 1, "level": 2, "name": "Frozen"},
{"id": 4, "parent_id": 2, "level": 3, "name": "Bread"},
{"id": 5, "parent_id": 2, "level": 3, "name": "Pastry"},
{"id": 6, "parent_id": 3, "level": 3, "name": "Ice Cream"},
{"id": 7, "parent_id": 3, "level": 3, "name": "Sorbet"},
]
df = pd.DataFrame(data)
that looks like this:
id parent_id level name
0 1 -1 1 Company
1 2 1 2 Bakery
2 3 1 2 Frozen
3 4 2 3 Bread
4 5 2 3 Pastry
5 6 3 3 Ice Cream
6 7 3 3 Sorbet
I'm trying to represent the data as a dictionay like this:
data = {
"Company": {
"Bakery": [
"Bread",
"Pastry",
],
"Frozen": [
"Ice Cream",
"Sorbet",
],
},
}
Heavily struggling with achieving this result, so any help is appreciated! I've tried various for-loops but getting muddled up!
This is what I came up with (this code assumes consistency between parent_ids and levels and that all parent_ids exist):
# to store the final result
result = {}
# to store references of dictionaries by their ids
by_id = {}
for d in sorted(data, key=lambda d: d['level']):
new_dict = {}
if d['parent_id'] == -1:
result[d['name']] = new_dict
else:
by_id[d['parent_id']][d['name']] = new_dict
by_id[d['id']] = new_dict
At this point:
>>> result
{'Company': {'Bakery': {'Bread': {}, 'Pastry': {}}, 'Frozen': {'Ice Cream': {}, 'Sorbet': {}}}}
Now to convert empty dictionaries to a list of items, we use a recursive function:
def transform_dicts_to_lists(r):
if any(r.values()):
for k, v in r.items():
r[k] = transform_dicts_to_lists(v)
return r
else:
return list(r.keys())
result = transform_dicts_to_lists(result)
>>> result
{'Company': {'Bakery': ['Bread', 'Pastry'], 'Frozen': ['Ice Cream', 'Sorbet']}}
You can avoid final processing if you know that the maximum level is always 3.
Related
I want to merge two lists and get data matched without duplicates and to alias them to new structure
I have two lists
here is a given two list try to merge
cats = [
'orange', 'apple', 'banana'
]
and second list
types = [
{
"id": 1,
"type": "orange"
},
{
"id": 2,
"type": "apple"
},
{
"id": 3,
"type": "apple"
},
{
"id": 4,
"type": "orange"
},
{
"id": 5,
"type": "banana"
}
]
and I want to combine them to get this result:
[
{'orange': {
'UNIT': [1, 4]
}
},
{'apple': {
'UNIT': [2, 3]
}
},
{'banana': {
'UNIT': [5]
}
}
]
and my code, this after my tries i get this result :
for item in types:
for cat in cats:
if item['type'] == cat:
matched.append(
{
cat: {
"UNIT": [i['id'] for i in types if
'id' in i]
}
}
)
and my result is like this
[{'orange': {'UNIT': [1, 2, 3, 4, 5]}},
{'apple': {'UNIT': [1, 2, 3, 4, 5]}},
{'apple': {'UNIT': [1, 2, 3, 4, 5]}},
{'orange': {'UNIT': [1, 2, 3, 4, 5]}},
{'banana': {'UNIT': [1, 2, 3, 4, 5]}}]
Your problem is the in inside your list comprehension - beside that your code is complex. You get multiples due to your for loops and never checking if that fruit was already added to matched.
To reduce the 2 lists to the needed values you can use
cats = [ 'orange', 'apple', 'banana']
types = [ { "id": 1, "type": "orange" },
{ "id": 2, "type": "apple" },
{ "id": 3, "type": "apple" },
{ "id": 4, "type": "orange" },
{ "id": 5, "type": "banana" }]
rv = {}
for inner in types:
t = inner["type"]
if t not in cats: # for current data not needed, only needed
continue # if some listelements don't occure in dict
rv.setdefault(t, [])
rv[t].append(inner["id"])
print(rv)
which leads to an easier dictionary with all the data you need:
{'orange': [1, 4], 'apple': [2, 3], 'banana': [5]}
From there you can build up your overly complex list of dicts with 1 key each:
lv = [{k:{"UNIT":v}} for k,v in rv.items()]
print (lv)
to get
[{'orange': {'UNIT': [1, 4]}},
{'apple': {'UNIT': [2, 3]}},
{'banana': {'UNIT': [5]}}]
Answer to extended problem from comment:
If you need to add more things you need to capture you can leverage the fact that lists and dicts store by ref:
cats = [ 'orange', 'apple', 'banana']
types = [ { "id": 1, "type": "orange" , "bouncyness": 42 },
{ "id": 2, "type": "apple" , "bouncyness": 21 },
{"id": 3, "type": "apple" , "bouncyness": 63},
{ "id": 4, "type": "orange" , "bouncyness": 84},
{ "id": 5, "type": "banana" , "bouncyness": 99}]
rv = [] # list of dicts - single key is "fruitname"
pil = {} # dict that keeps track which fruit is on what position in rv
# to avoid iterating over list to find correct fruit-dict
di = None # the current fruits dictionary
for inner in types:
t = inner["type"]
if t not in cats: # for current data not needed, only needed
continue # if some listelements don't occure in dict
# step1: find correct fruit dict in rv or create new one and add it
di = None
if t in pil:
# get cached dict by position of fruit in rv
di = rv[pil[t]]
else:
# create fruit dict, cache position in rv in pil
di = {}
rv.append(di)
pil[t] = len(rv)-1
# step2: create all the needed inner lists
# you can speed this up using defaultdict(list) if speed gets
# problematic - until then dict.setdefault should be fine
di.setdefault(t, [])
di.setdefault("bouncyness", [])
# step3: fill with values
di[t].append(inner["id"])
di["bouncyness"].append(inner["bouncyness"])
print(rv)
to get
[{'orange': [1, 4], 'bouncyness': [42, 84]},
{'apple': [2, 3], 'bouncyness': [21, 63]},
{'banana': [5], 'bouncyness': [99]}]
Here is an alternative approach using filter() method and lambda.
dict_lst = []
for cat in cats:
cat_items = filter(lambda x: x['type'] == cat, types)
cat_dict = {cat: {'UNIT': [x['id'] for x in cat_items]}}
dict_lst.append(cat_dict)
print(dict_lst)
[{'orange': {'UNIT': [1, 4]}},
{'apple': {'UNIT': [2, 3]}},
{'banana': {'UNIT': [5]}}]
With list comprehension:
[{cat:{'UNIT':[type_['id'] for type_ in types if type_['type'] == cat]}} for cat in cats]
Take a look at this example dictionary:
This is in a JSON file:
{
"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
}
}
}
So in this example, we are showing the contents of my fridge, except we show the AMOUNT (2) of every ITEM (soda) in every CATEGORY (drinks). The way we did this, we firstly created a dictionary for the fridge and for every category we have another dictionary where inside it we have the item-type and amount of it.
Now let's say we went shopping... We bought some FRUITS at the supermarket and we got:
"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}
We want to put this data (or fruits) into my fridge, except we don't have a CATEGORY for "fruits"!! So not only do we have to add a new dictionary into my fridge, but we also have data that we want to already add too!
Now this fridge thing was an example to help you understand what I want. So how do you insert a new dictionary into an already existing one with key-value pairs in it? In other words, I want my fridge to look like this:
{
"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
},
"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}
}
}
I tried APPEND but as expected, it does not work for dictionaries (it is for lists only) and so I do not know what to do... keep in mind that I do not want to re-define my data, I want to ADD data to existing data so that I can edit it later. Would appreciate some help, Thanks!
Suppose you have this string in proper json format:
j_str1='''\
{"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
}
}}'''
And:
j_str2='''\
{"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}}
'''
First convert to a Python dict:
import json
j=json.loads(j_str1)
>>> j
{'fridge': {'vegetables': {'Cucumber': 0, 'Carrot': 2, 'Lettuce': 5}, 'drinks': {'Water': 12, 'Juice': 4, 'Soda': 2}}}
Then update:
j["fridge"].update(json.loads(j_str2))
>>> j
{'fridge': {'vegetables': {'Cucumber': 0, 'Carrot': 2, 'Lettuce': 5}, 'drinks': {'Water': 12, 'Juice': 4, 'Soda': 2}, 'fruits': {'Apple': 3, 'Banana': 2, 'Melon': 1}}}
Then convert back into json:
>>> print(json.dumps(j,indent=3))
{
"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
},
"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}
}
}
I have a dict stored under the variable parsed:
{
"8119300029": {
"store": 4,
"total": 4,
"web": 4
},
"8119300030": {
"store": 2,
"total": 2,
"web": 2
},
"8119300031": {
"store": 0,
"total": 0,
"web": 0
},
"8119300032": {
"store": 1,
"total": 1,
"web": 1
},
"8119300033": {
"store": 0,
"total": 0,
"web": 0
},
"8119300034": {
"store": 2,
"total": 2,
"web": 2
},
"8119300036": {
"store": 0,
"total": 0,
"web": 0
},
"8119300037": {
"store": 0,
"total": 0,
"web": 0
},
"8119300038": {
"store": 2,
"total": 2,
"web": 2
},
"8119300039": {
"store": 3,
"total": 3,
"web": 3
},
"8119300040": {
"store": 3,
"total": 3,
"web": 3
},
"8119300041": {
"store": 0,
"total": 0,
"web": 0
}
}
I am trying to get the "web" value from each JSON entry but can only get the key values.
for x in parsed:
print(x["web"])
I tried doing this ^ but kept getting this error: "string indices must be integers". Can somebody explain why this is wrong?
because your x variable is dict key name
for x in parsed:
print(parsed[x]['web'])
A little information on your parsed data there: this is basically a dictionary of dictionaries. I won't go into too much of the nitty gritty but it would do well to read up a bit on json: https://www.w3schools.com/python/python_json.asp
In your example, for x in parsed is iterating through the keys of the parsed dictionary, e.g. 8119300029, 8119300030, etc. So x is a key (in this case, a string), not a dictionary. The reason you're getting an error about not indexing with an integer is because you're trying to index a string -- for example x[0] would give you the first character 8 of the key 8119300029.
If you need to get each web value, then you need to access that key in the parsed[x] dictionary:
for x in parsed:
print(parsed[x]["web"])
Output:
4
2
0
...
I have a CSV file in a format similar to this
order_id, customer_name, item_1_id, item_1_quantity, Item_2_id, Item_2_quantity, Item_3_id, Item_3_quantity
1, John, 4, 1, 24, 4, 16, 1
2, Paul, 8, 3, 41, 1, 33, 1
3, Andrew, 1, 1, 34, 4, 8, 2
I want to export to json, currently I am doing this.
df = pd.read_csv('simple.csv')
print ( df.to_json(orient = 'records') )
And the output is
[
{
"Item_2_id": 24,
"Item_2_quantity": 4,
"Item_3_id": 16,
"Item_3_quantity": 1,
"customer_name": "John",
"item_1_id": 4,
"item_1_quantity": 1,
"order_id": 1
},
......
However, I would like the output to be
[
{
"customer_name": "John",
"order_id": 1,
"items": [
{ "id": 4, "quantity": 1 },
{ "id": 24, "quantity": 4 },
{ "id": 16, "quantity": 1 },
]
},
......
Any suggestions on a good way to do this?
In this particular project, there will not be more than 5 times per order
Try the following:
import pandas as pd
import json
output_lst = []
##specify the first row as header
df = pd.read_csv('simple.csv', header=0)
##iterate through all the rows
for index, row in df.iterrows():
dict = {}
items_lst = []
## column_list is a list of column headers
column_list = df.columns.values
for i, col_name in enumerate(column_list):
## for the first 2 columns simply copy the value into the dictionary
if i<2:
element = row[col_name]
if isinstance(element, str):
## strip if it is a string type value
element = element.strip()
dict[col_name] = element
elif "_id" in col_name:
## i+1 is used assuming that the item_quantity comes right after the corresponding item_id for each item
item_dict = {"id":row[col_name], "quantity":row[column_list[i+1]]}
items_lst.append(item_dict)
dict["items"] = items_lst
output_lst.append(dict)
print json.dumps(output_lst)
If you run the above file with the sample.csv described in the question then you get the following output:
[
{
"order_id": 1,
"items": [
{
"id": 4,
"quantity": 1
},
{
"id": 24,
"quantity": 4
},
{
"id": 16,
"quantity": 1
}
],
" customer_name": "John"
},
{
"order_id": 2,
"items": [
{
"id": 8,
"quantity": 3
},
{
"id": 41,
"quantity": 1
},
{
"id": 33,
"quantity": 1
}
],
" customer_name": "Paul"
},
{
"order_id": 3,
"items": [
{
"id": 1,
"quantity": 1
},
{
"id": 34,
"quantity": 4
},
{
"id": 8,
"quantity": 2
}
],
" customer_name": "Andrew"
}
]
Source DF:
In [168]: df
Out[168]:
order_id customer_name item_1_id item_1_quantity Item_2_id Item_2_quantity Item_3_id Item_3_quantity
0 1 John 4 1 24 4 16 1
1 2 Paul 8 3 41 1 33 1
2 3 Andrew 1 1 34 4 8 2
Solution:
In [169]: %paste
import re
x = df[['order_id','customer_name']].copy()
x['id'] = \
pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_id',
flags=re.I)].values.tolist(),
index=df.index)
x['quantity'] = \
pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_quantity',
flags=re.I)].values.tolist(),
index=df.index)
x.to_json(orient='records')
## -- End pasted text --
Out[169]: '[{"order_id":1,"customer_name":"John","id":[4,24,16],"quantity":[1,4,1]},{"order_id":2,"customer_name":"Paul","id":[8,41,33],"qua
ntity":[3,1,1]},{"order_id":3,"customer_name":"Andrew","id":[1,34,8],"quantity":[1,4,2]}]'
Intermediate helper DF:
In [82]: x
Out[82]:
order_id customer_name id quantity
0 1 John [4, 24, 16] [1, 4, 1]
1 2 Paul [8, 41, 33] [3, 1, 1]
2 3 Andrew [1, 34, 8] [1, 4, 2]
j = df.set_index(['order_id','customer_name']) \
.groupby(lambda x: x.split('_')[-1], axis=1) \
.agg(lambda x: x.values.tolist()) \
.reset_index() \
.to_json(orient='records')
import json
Beatufied result:
In [122]: print(json.dumps(json.loads(j), indent=2))
[
{
"order_id": 1,
"customer_name": "John",
"id": [
4,
24,
16
],
"quantity": [
1,
4,
1
]
},
{
"order_id": 2,
"customer_name": "Paul",
"id": [
8,
41,
33
],
"quantity": [
3,
1,
1
]
},
{
"order_id": 3,
"customer_name": "Andrew",
"id": [
1,
34,
8
],
"quantity": [
1,
4,
2
]
}
]
I created a nested dictionary in Python like this:
{
"Laptop": {
"sony": 1
"apple": 2
"asus": 5
},
"Camera": {
"sony": 2
"sumsung": 1
"nikon" : 4
},
}
But I couldn't figure out how to write this nested dict into a json file. Any comments will be appreciated..!
d = {
"Laptop": {
"sony": 1,
"apple": 2,
"asus": 5,
},
"Camera": {
"sony": 2,
"sumsung": 1,
"nikon" : 4,
},
}
with open("my.json","w") as f:
json.dump(d,f)