How to Manipulate dictionary data in python [duplicate] - python

I need your help to solve the task: I have a list of dicts with the next data about products:
- id;
- title;
- country;
- seller;
In the output result I'm expecting to group all the dictionaries with the same id, creating a new key called "info" and this key
must consist of list of dicts with info about product "country" and product "seller", related to each one product.
Input data
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
Output data
result = [
{"id": 1, "title": "Samsung", "info": [{"country": "France", "seller": "amazon_fr"}]},
{"id": 2, "title": "Apple", "info": [{"country": "Spain", "seller": "amazon_es"}, {"country": "Italy", "seller": "amazon_it"}]},
]
Thanks a lot in advance for your efforts.
P.S. Pandas solutions are also appreciated.

Here's a straight python solution, creating a result dictionary based on the id values from each dictionary in data, and updating the values in that dictionary when a matching id value is found. The values of the dictionary are then used to create the output list:
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
result = {}
for d in data:
id = d['id']
if id in result:
result[id]['info'] += [{ "country": d['country'], "seller": d['seller'] }]
else:
result[id] = { "id": id, "title": d['title'], "info" : [{ "country": d['country'], "seller": d['seller'] }] };
result = [r for r in result.values()]
print(result)
Output:
[
{'title': 'Samsung', 'id': 1, 'info': [{'seller': 'amazon_fr', 'country': 'France'}]},
{'title': 'Apple', 'id': 2, 'info': [{'seller': 'amazon_es', 'country': 'Spain'},
{'seller': 'amazon_it', 'country': 'Italy'}
]
}
]

you can use itertools.groupby:
from operator import itemgetter
from itertools import groupby
data.sort(key=itemgetter('id'))
group = groupby(data, key=lambda x: (x['id'], x['title']))
result = [
{'id': i, 'title': t, 'info': [{'country': d['country'], 'seller': d['seller']} for d in v]}
for (i, t), v in group]
output:
[{'id': 1,
'title': 'Samsung',
'info': [{'country': 'France', 'seller': 'amazon_fr'}]},
{'id': 2,
'title': 'Apple',
'info': [{'country': 'Spain', 'seller': 'amazon_es'},
{'country': 'Italy', 'seller': 'amazon_it'}]}]

Related

Adding a count to object array in python

How would you count and group object array using Counter
list = [ {
"text": "London",
},
{
"text": "New york",
},
{
"text": "London",
}]
The output that im expecting is:
[ {
"text": "London",
"count": 2
},
{
"text": "New york",
"count": 1
}
]
An option would be the following:
counter = {}
for d in current_list:
for v in d.values():
present = counter.get(v)
if present:
counter[v] += 1
else:
counter[v] = 1
print(counter)
new_dict = [{"text": k, "counter": v} for k, v in counter.items()]
print(new_dict)
If you are looking for a more compact option you can do this with a one line list comprehension
lst = [ {
"text": "London",
},
{
"text": "New york",
},
{
"text": "London",
}]
[{'text': location, 'count': [x['text'] for x in lst].count(location)} for location in set([x['text'] for x in lst])]
Output:
[{'text': 'London', 'count': 2}, {'text': 'New york', 'count': 1}]
Note: I renamed the original list just to avoid using the keyword list as a variable name
To extend this for an arbitrary number of attributes that are unique to the location use this
[{**{n['text']: n for n in lst}[loc], **{'count': [x['text'] for x in lst].count(loc)}} for loc in set([x['text'] for x in lst])]
with a new input of
lst = [ {
"text": "London",
"country": "UK"
},
{
"text": "New york",
"country": "USA"
},
{
"text": "London",
"country": "UK"
}]
Output:
[{'text': 'New york', 'country': 'USA', 'count': 1},
{'text': 'London', 'country': 'UK', 'count': 2}]
I would do this in two steps
from collections import Counter
>>> l = [...]
>>> counter = Counter([x['text'] for x in l])
>>> [{'text': k, 'count': c} for k, c in counter.items()]
[{'text': 'London', 'count': 2}, {'text': 'New york', 'count': 1}]
Unserious itertools.groupby solution
from itertools import groupby
L = [ {
"text": "London",
},
{
"text": "New york",
},
{
"text": "London",
}]
L = [{**key, "count" : len((*group,))} for key, group in groupby(sorted(L, key=str))]
print(L)
It works in this case because the dictionaries have only a key, but it might not work if you have 2 dictionaries with keys ordered in opposite ways, happy to improve it if you have any suggestion for the sorting function

Fetch the Json for a particular key value

for an input json
[{
"Name": "John",
"Age": "23",
"Des": "SE"
},
{
"Name": "Rai",
"Age": "33",
"Des": "SSE"
},
{
"Name": "James",
"Age": "42",
"Des": "SE"
}
]
I want to filter out the json data where only "Des":"SE" is true
required output
[{
"Name": "John",
"Age": "23"
},
{
"Name": "James",
"Age": "42"
}
]
A list comprehension should do it:
out = [{'Name':d['Name'], 'Age':d['Age']} for d in lst if d['Des']=='SE']
Another way:
out = [d for d in lst if d.pop('Des')=='SE']
Output:
[{'Name': 'John', 'Age': '23'}, {'Name': 'James', 'Age': '42'}]
To make it more dynamic if each json has more elements:
import json
input_str = '[{"Name": "John", "Age": "23", "Des": "SE"}, {"Name": "Rai", "Age": "33", "Des": "SSE"}, {"Name": "James", "Age": "42", "Des": "SE"}]'
input_list = json.loads(input_str)
# If you already converted to a list of dicts, then you don't need the above
# Using pop here removes the key you are using to filter
output = [each for each in input_list if each.pop("Des") == "SE"]
using the json module, you can load a file using loads or a string using load. From there, it acts as a normal python list of dictionaries which you can iterate over and check the keys of. From there, you simply create a new list of dictionaries that match your desired pattern and remove the key you are no longer using. Example:
import json
jsonString = """[{
"Name": "John",
"Age": "23",
"Des": "SE"
},
{
"Name": "Rai",
"Age": "33",
"Des": "SSE"
},
{
"Name": "James",
"Age": "42",
"Des": "SE"
}
]"""
jsonList = json.loads(jsonString)
filteredList = []
def CheckDes(dataDict: dict):
if dataDict['Des'] == 'SE':
dataDict.pop('Des')
filteredList.append(dataDict)
print(jsonList)
"""
[
{
'Name': 'John',
'Age': '23',
'Des': 'SE'
},
{
'Name': 'Rai',
'Age': '33',
'Des': 'SSE'
},
{
'Name': 'James',
'Age': '42',
'Des': 'SE'
}
]"""
[CheckDes(dataDict) for dataDict in jsonList]
print(filteredList)
"""[
{
'Name': 'John',
'Age': '23'
},
{
'Name': 'James',
'Age': '42'
}
]
"""

Pandas dataframe into nested child dictionary

I have a dataframe like below, where each 'level' drills down into more detail, with the last level having an id value.
data = [
{'id': 1, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Siamese Cat'},
{'id': 2, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Javanese Cat'},
{'id': 3, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Ursidae', 'level_4', 'Polar Bear'},
{'id': 4, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Labradore Retriever'},
{'id': 5, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Golden Retriever'}
]
I want to turn this into a nested dictionary of parent / child relationships like below.
var data = {
"name": "Animals",
"children": [
{
"name": "Carnivores",
"children": [
{
"name": "Felidae",
"children": [
{
"id": 1,
"name": "Siamese Cat",
"children": []
},
{
"id": 2,
"name": "Javanese Cat",
"children": []
}
]
},
{
"name": "Ursidae",
"children": [
{
"id": 3,
"name": "Polar Bear",
"children": []
}
]
},
{
"name": "Canidae",
"children": [
{
"id": 4,
"name": "Labradore Retriever",
"children": []
},
{
"id": 5,
"name": "Golden Retriever",
"children": []
}
]
}
]
}
]
}
I've tried several approaches of grouping the dataframe and also looping over individual rows, but haven't been able to find a working solution yet. Any help would be greatly appreciated!
The answer of #Timus mimics your intention, however you might encounter some difficulties searching this dictionary as each level has a key name and a key children. If this is what you intended ignore my answer. However, if you would like to create a dictionary in which you can more easily search through unique keys you can try:
df = df.set_index(['level_1', 'level_2', 'level_3', 'level_4'])
def make_dictionary(df):
if df.index.nlevels == 1:
return df.to_dict()
dictionary = {}
for key in df.index.get_level_values(0).unique():
sub_df = df.xs(key)
dictionary[key] = df_to_dict(sub_df)
return dictionary
make_dictionary(df)
It requires setting the different levels as index, and you will end up with a slightly different dictionary:
{'Animals':
{'Carnivores':
{'Felidae':
{'id': {'Siamese Cat': 1,
'Javanese Cat': 2}},
'Ursidae':
{'id': {'Polar Bear': 3}},
'Canidae':
{'id': {'Labradore Retriever': 4,
'Golden Retriever': 5}}}
}
}
EDIT: Had to make an adjustment, because the result wasn't exactly as expected.
Here's an attempt that produces the expected output (if I haven't made a mistake, which wouldn't be a surprise, because I've made several on the way):
def pack_level(df):
if df.columns[0] == 'id':
return [{'id': i, 'name': name, 'children': []}
for i, name in zip(df[df.columns[0]], df[df.columns[1]])]
return [{'name': df.iloc[0, 0],
'children': [entry for lst in df[df.columns[1]]
for entry in lst]}]
df = pd.DataFrame(data)
columns = list(df.columns[1:])
df = df.groupby(columns[:-1]).apply(pack_level)
for i in range(1, len(columns) - 1):
df = (df.reset_index(level=-1, drop=False).groupby(columns[:-i])
.apply(pack_level)
.reset_index(level=-1, drop=True))
var_data = {'name': df.index[0], 'children': df.iloc[0]}
The result looks a bit different at first glance, but that should be only due to the sorting (from printing):
{
"children": [
{
"children": [
{
"children": [
{
"children": [],
"id": 4,
"name": "Labradore Retriever"
},
{
"children": [],
"id": 5,
"name": "Golden Retriever"
}
],
"name": "Canidae"
},
{
"children": [
{
"children": [],
"id": 1,
"name": "Siamese Cat"
},
{
"children": [],
"id": 2,
"name": "Javanese Cat"
}
],
"name": "Felidae"
},
{
"children": [
{
"children": [],
"id": 3,
"name": "Polar Bear"
}
],
"name": "Ursidae"
}
],
"name": "Carnivores"
}
],
"name": "Animals"
}
I've tried to be as generic as possible, but the first column has to be named id (as in your sample).

Group list of dictionaries by dictionary column

I need your help to solve the task: I have a list of dicts with the next data about products:
- id;
- title;
- country;
- seller;
In the output result I'm expecting to group all the dictionaries with the same id, creating a new key called "info" and this key
must consist of list of dicts with info about product "country" and product "seller", related to each one product.
Input data
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
Output data
result = [
{"id": 1, "title": "Samsung", "info": [{"country": "France", "seller": "amazon_fr"}]},
{"id": 2, "title": "Apple", "info": [{"country": "Spain", "seller": "amazon_es"}, {"country": "Italy", "seller": "amazon_it"}]},
]
Thanks a lot in advance for your efforts.
P.S. Pandas solutions are also appreciated.
Here's a straight python solution, creating a result dictionary based on the id values from each dictionary in data, and updating the values in that dictionary when a matching id value is found. The values of the dictionary are then used to create the output list:
data = [
{"id": 1, "title": "Samsung", "country": "France", "seller": "amazon_fr"},
{"id": 2, "title": "Apple", "country": "Spain", "seller": "amazon_es"},
{"id": 2, "title": "Apple", "country": "Italy", "seller": "amazon_it"},
]
result = {}
for d in data:
id = d['id']
if id in result:
result[id]['info'] += [{ "country": d['country'], "seller": d['seller'] }]
else:
result[id] = { "id": id, "title": d['title'], "info" : [{ "country": d['country'], "seller": d['seller'] }] };
result = [r for r in result.values()]
print(result)
Output:
[
{'title': 'Samsung', 'id': 1, 'info': [{'seller': 'amazon_fr', 'country': 'France'}]},
{'title': 'Apple', 'id': 2, 'info': [{'seller': 'amazon_es', 'country': 'Spain'},
{'seller': 'amazon_it', 'country': 'Italy'}
]
}
]
you can use itertools.groupby:
from operator import itemgetter
from itertools import groupby
data.sort(key=itemgetter('id'))
group = groupby(data, key=lambda x: (x['id'], x['title']))
result = [
{'id': i, 'title': t, 'info': [{'country': d['country'], 'seller': d['seller']} for d in v]}
for (i, t), v in group]
output:
[{'id': 1,
'title': 'Samsung',
'info': [{'country': 'France', 'seller': 'amazon_fr'}]},
{'id': 2,
'title': 'Apple',
'info': [{'country': 'Spain', 'seller': 'amazon_es'},
{'country': 'Italy', 'seller': 'amazon_it'}]}]

Python List to Nested Json

I am having the following problem.
class Inventory:
def __init__(self,project_no,country,category,product,count):
self.project_no = project_no
self.country = country
self.category = category
self.product = product
self.count = count
inventory_list = []
inventory_list.append(Inventory(1,'USA','Beverages','Milk',2))
inventory_list.append(Inventory(1,'USA','Beverages','Juice',5))
inventory_list.append(Inventory(1,'USA','Snacks','Potato Chips',2))
inventory_list.append(Inventory(1,'USA','Oils','Canola',5))
inventory_list.append(Inventory(1,'USA','Oils','Olive',8))
inventory_list.append(Inventory(1,'CAN','Beverages','Milk',7))
inventory_list.append(Inventory(1,'CAN','Beverages','Juice',8))
inventory_list.append(Inventory(1,'CAN','Snacks','Potato Chips',8))
inventory_list.append(Inventory(1,'CAN','Oils','Canola',3))
inventory_list.append(Inventory(1,'CAN','Oils','Olive',4))
{'Inventory': [{'Country': inv.country , 'Category' : [{inv.category : [{'Product' : inv.product}]}] } for inv in inventory_list]}
This code is giving me the following output.
{'Inventory': [{'Country': 'USA',
'Category': [{'Beverages': [{'Product': 'Milk'}]}]},
{'Country': 'USA', 'Category': [{'Beverages': [{'Product': 'Juice'}]}]},
{'Country': 'USA', 'Category': [{'Snacks': [{'Product': 'Potato Chips'}]}]},
{'Country': 'USA', 'Category': [{'Oils': [{'Product': 'Canola'}]}]},
{'Country': 'USA', 'Category': [{'Oils': [{'Product': 'Olive'}]}]},
{'Country': 'CAN', 'Category': [{'Beverages': [{'Product': 'Milk'}]}]},
{'Country': 'CAN', 'Category': [{'Beverages': [{'Product': 'Juice'}]}]},
{'Country': 'CAN', 'Category': [{'Snacks': [{'Product': 'Potato Chips'}]}]},
{'Country': 'CAN', 'Category': [{'Oils': [{'Product': 'Canola'}]}]},
{'Country': 'CAN', 'Category': [{'Oils': [{'Product': 'Olive'}]}]}]}
What I actually need is more like below.
{
"Inventory": [{
"country": "USA",
"category": [{
"Beverages": [{
"product": "Milk",
"count": 2
}, {
"product": "Juice",
"count": 5
}]
}, {
"Snacks": [{
"product": "Potato Chips",
"count": 2
}]
}, {
"Oils": [{
"product": "Canola",
"count": 5
}, {
"product": "Olive",
"count": 8
}]
}]
}, {
"country": "CAN",
"category": [{
"Beverages": [{
"product": "Milk",
"count": 7
}, {
"product": "Juice",
"count": 8
}]
}, {
"Snacks": [{
"product": "Potato Chips",
"count": 8
}]
}, {
"Oils": [{
"product": "Canola",
"count": 3
}, {
"product": "Olive",
"count": 4
}]
}]
}
]
}
How to do this?
I thought list comprehension is the way to go.
But I am having trouble beyond this point.
I thought this should be really easy for a python coder.
With my limited python I could only reach this far.
If anyone can help.
I would suggest you try serializing your Inventory class using the json module. However, it looks like you'll want to reorganize your data a bit. From what I can tell, you want to have an inventory that has a collection of countries which contain a set of products separated into categories.
First, let's define the Product class:
class Product(object):
def __init__(self, name, count):
self.product = name
self.count = count
Next, we can define the Country class as a container for a set Products, arranged in a dictionary using the category name as the key.
class Country(object):
def __init__(self, name):
self.name = name
self.categories = dict()
def add_product_to_category(self, category, product):
if category not in self.categories:
self.categories[category] = []
self.categories[category].append(product)
Then, we can re-define the Inventory class as a container for a set of Country objects.
class Inventory(object):
def __init__(self, project_no):
self.project_no = project_no
self.countries = []
Next, we can use simple methods to fill out our classes with the required data.
inv = Inventory(1)
us_set = Country('USA')
us_set.add_product_to_category('Beverages', Product('Milk', 2))
us_set.add_product_to_category('Beverages', Product('Juice', 5))
us_set.add_product_to_category('Snacks', Product('Potato Chips', 2))
us_set.add_product_to_category('Oils', Product('Canola', 5))
us_set.add_product_to_category('Oils', Product('Olive', 8))
canada_set = Country('CAN')
canada_set.add_product_to_category('Beverages', Product('Milk', 7))
canada_set.add_product_to_category('Beverages', Product('Juice', 8))
canada_set.add_product_to_category('Snacks', Product('Potato Chips', 8))
canada_set.add_product_to_category('Oils', Product('Canola', 3))
canada_set.add_product_to_category('Oils', Product('Olive', 4))
inv.countries.append(us_set)
inv.countries.append(canada_set)
Finally, (to actually answer your question, lul) to serialize the Inventory class, we have to define an encoder to use:
class MyEncoder(json.JSONEncoder):
def default(self, o):
return o.__dict__
Now, we can just call json.dumps() to get a string output of our serialized Inventory.
json.dumps(inv, indent=2, cls=MyEncoder)
The output isn't exactly what you laid out, but I think this method will work well for you.
{
"project_no": 1,
"countries": [
{
"name": "USA",
"categories": {
"Beverages": [
{
"count": 2,
"product": "Milk"
},
{
"count": 5,
"product": "Juice"
}
],
"Oils": [
{
"count": 5,
"product": "Canola"
},
{
"count": 8,
"product": "Olive"
}
],
"Snacks": [
{
"count": 2,
"product": "Potato Chips"
}
]
}
},
{
"name": "CAN",
"categories": {
"Beverages": [
{
"count": 7,
"product": "Milk"
},
{
"count": 8,
"product": "Juice"
}
],
"Oils": [
{
"count": 3,
"product": "Canola"
},
{
"count": 4,
"product": "Olive"
}
],
"Snacks": [
{
"count": 8,
"product": "Potato Chips"
}
]
}
}
]
}
try using the json module, e.g.
import json
...
inv_json = {'Inventory': [{'Country': inv.country , 'Category' : [{inv.category : [{'Product' : inv.product}]}] } for inv in inventory_list]}
json_formatted_str = json.dumps(x, indent=2)
print(json_formatted_str)
https://www.journaldev.com/33302/python-pretty-print-json

Categories

Resources