Merge two arrays by keys in Python - python

I have two arrays. "Categories" and "Products".
I need to get a third array where in each category there will be a list of products to which it belongs.
categories = [
{
'id': 1,
'name': 'first category'
},
{
'id': 2,
'name': 'second category'
},
{
'id': 3,
'name': 'third category'
},
]
products = [
{
'id': 1,
'id_category': 1,
'name': 'first product'
},
{
'id': 2,
'id_category': 2,
'name': 'second product'
},
{
'id': 3,
'id_category': 2,
'name': 'third product'
},
]
I need to combine these two arrays into one, by "id_category", to get the following:
the conditions must be met:
if there are no products in the category, then remove category from the list;
products_by_category = [
{
"id":1,
"name":"first category",
"products":[
{
"id":1,
"name":"first product"
}
]
},
{
"id":2,
"name":"second category",
"products":[
{
"id":2,
"name":"second product"
},
{
"id":3,
"name":"third product"
}
]
}
]
i tried the following code:
for category in list(categories):
category['products'] = []
for product in products:
if category['id'] == product['id_category']:
category['products'].append(product)
if not category['products']:
categories.remove(category)
print(categories)

Here is something which does what you want. Note it does not create a new data structure just updates the categories in place.
catdict = {cat['id']:cat for cat in categories}
for product in products:
pcat = product['id_category']
del product['id_category']
cat = catdict[pcat]
if not "products" in cat:
cat["products"] = []
cat["products"].append(product)
print(categories)

Here is another approach using defaultdict, create a lookup between id_category along with there details. Followed by list comprehension to update back to categories.
from collections import defaultdict
# create a mapping for each product
prod_lookup = defaultdict(list)
for prod in products:
prod_lookup[prod['id_category']].append(
{"id": prod['id'], "name": prod['name']}
)
# update back to categories, based on "id"
(
[{**cat, **{"products": prod_lookup.get(cat['id'], [])}}
for cat in categories if prod_lookup.get(cat['id'])]
)

Related

How to extract two values from dict in python?

I'm using python3 and and i have data set. That contains the following data. I'm trying to get the desire value from this data list. I have tried many ways but unable to figure out how to do that.
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":525,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":524,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":553,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":550,
"user_id":2,
"time":"199322002",
"expire":"199322002"
}
]
# Desired output
# [
# {"user_id":1,"slots_ids":[551,552,553]}
# {"user_id":2,"slots_ids":[550]}
# {"user_id":3,"slots_ids":[524,525]}
# ]
I have tried in the following way and obviously this is not correct. I couldn't figure out the solution of this problem :
final_list = []
for item in slots_data:
obj = obj.dict()
obj = {
"user_id":item["user_id"],
"slot_ids":item["id"]
}
final_list.append(obj)
print(set(final_list))
The other answer added here has a nice solution, but here's one without using pandas:
users = {}
for item in slots_data:
# Check if we've seen this user before,
if item['user_id'] not in users:
# if not, create a new entry for them
users[item['user_id']] = {'user_id': item['user_id'], 'slot_ids': []}
# Add their slot ID to their dictionary
users[item['user_id']]['slot_ids'].append(item['id'])
# We only need the values (dicts)
output_list = list(users.values())
Lots of good answers here.
If I was doing this, I would base my answer on setdefault and/or collections.defaultdict that can be used in a similar way. I think the defaultdict version is very readable but if you are not already importing collections you can do without it.
Given your data:
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
#....
]
You can reshape it into your desired output via:
## -------------------
## get the value for the key user_id if it exists
## if it does not, set the value for that key to a default
## use the value to append the current id to the sub-list
## -------------------
reshaped = {}
for slot in slots_data:
user_id = slot["user_id"]
id = slot["id"]
reshaped.setdefault(user_id, []).append(id)
## -------------------
## -------------------
## take a second pass to finish the shaping in a sorted manner
## -------------------
reshaped = [
{
"user_id": user_id,
"slots_ids": sorted(reshaped[user_id])
}
for user_id
in sorted(reshaped)
]
## -------------------
print(reshaped)
That will give you:
[
{'user_id': 1, 'slots_ids': [551, 552, 553]},
{'user_id': 2, 'slots_ids': [550]},
{'user_id': 3, 'slots_ids': [524, 525]}
]
I would say try using pandas to group the user id's together and convert it back to a dictionary
pd.DataFrame(slots_data).groupby('user_id')['id'].agg(list).reset_index().to_dict('records')
[{'user_id': 1, 'id': [551, 552, 553]},
{'user_id': 2, 'id': [550]},
{'user_id': 3, 'id': [525, 524]}]
thriough just simple loop way
>>> result = {}
>>> for i in slots_data:
... if i['user_id'] not in result:
... result[i['user_id']] = []
... result[i['user_id']].append(i['id'])
...
>>> output = []
>>> for i in result:
... dict_obj = dict(user_id=i, slots_id=result[i])
... output.append(dict_obj)
...
>>> output
[{'user_id': 1, 'slots_id': [551, 552, 553]}, {'user_id': 3, 'slots_id': [525, 524]}, {'user_id': 2, 'slots_id': [550]}]
You can use the following to get it done. Purely Python. Without any dependencies.
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":525,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":524,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":553,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":550,
"user_id":2,
"time":"199322002",
"expire":"199322002"
}
]
user_wise_slots = {}
for slot_detail in slots_data:
if not slot_detail["user_id"] in user_wise_slots:
user_wise_slots[slot_detail["user_id"]] = {
"user_id": slot_detail["user_id"],
"slot_ids": []
}
user_wise_slots[slot_detail["user_id"]]["slot_ids"].append(slot_detail["id"])
print(user_wise_slots.values())
This can be made in a using listcomprehension:
final_list = [{"user_id": user_id, "id":sorted([slot["id"] for slot in slots_data if slot["user_id"] == user_id])} for user_id in sorted(set([slot["user_id"] for slot in slots_data]))]
A more verbose and better formatted version of the same code:
all_user_ids = [slot["user_id"] for slot in slots_data]
unique_user_ids = sorted(set(all_user_ids))
final_list = [
{
"user_id": user_id,
"id": sorted([slot["id"] for slot in slots_data if slot["user_id"] == user_id])
}
for user_id in unique_user_ids]
Explanation:
get all the user ids with list comprehension
get the unique user ids by creating a set
create the final list of dictionaries using list comprehension.
each field id is of itself a list with list comprehension. We get the id of the slot, and only add it to the list, if the user ids match
Using pandas you can easily achieve the result.
First install pandas if you don't have as follow
pip install pandas
import pandas as pd
df = pd.DataFrame(slots_data) #create dataframe
df1 = df.groupby("user_id")['id'].apply(list).reset_index(name="slots_ids") #groupby on user_id and combine elements of id in list and give the column name is slots_ids
final_slots_data = df1.to_dict('records') # convert dataframe into a list of dictionary
final_slots_data
Output:
[{'user_id': 1, 'slots_ids': [551, 552, 553]},
{'user_id': 2, 'slots_ids': [550]},
{'user_id': 3, 'slots_ids': [525, 524]}]

How to convert list structure from one pattern to other pattern [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
How to solve it by List Comprehension in Pyhon
[
{
"id": 1,
"cleaning_type": "Lite service",
"service_name": "Floors",
},
{
"id": 2,
"cleaning_type": "Lite service",
"service_name": "Bathrooms",
},
{
"id": 3,
"cleaning_type": "Lite service",
"service_name": "Kitchen",
},
{
"id": 4,
"cleaning_type": "Moving cleaning",
"service_name": "Kitchen Including All Appliances And Cabinets",
},
{
"id": 5,
"cleaning_type": "Moving cleaning",
"service_name": "Shift Products",
}
]
I want this to be in the following format:
[
{
id: 1,
cleaning_type: 'Lite service',
service_name: ['Floors', 'bathroom', 'kitchen'],
},
{
id: 2,
cleaning_type: 'Moving cleaning',
service_name: ['Kitchen Including All Appliances And Cabinets','Shift Products'],
},
]
I want the list in the second format like group wise. service_name will be shown under cleaning type.
This is as close as I could get to your desired format. I don't understand how you got id = 2 for the second item, since the cleaning_type of the item with an id of 2 is Lite service.
It's worth noting that keys need to be immutable types. I've just used strings directly, instead of creating variables named id, cleaning_type, and service_name bound to the same strings (note the lack of quotes in your example)
items = [
{
'id': 1,
'cleaning_type': 'Lite service',
'service_name': 'Floors',
},
{
'id': 2,
'cleaning_type': 'Lite service',
'service_name': 'Bathrooms',
},
{
'id': 3,
'cleaning_type': 'Lite service',
'service_name': 'Kitchen',
},
{
'id': 4,
'cleaning_type': 'Moving cleaning',
'service_name': 'Kitchen Including All Appliances And Cabinets',
},
{
'id': 5,
'cleaning_type': 'Moving cleaning',
'service_name': 'Shift Products',
}
]
items_by_type = {}
for item in items:
cleaning_type = item['cleaning_type']
# We have not yet come across this cleaning type before, so create a new dict
if cleaning_type not in items_by_type:
new_item = {}
new_item['id'] = item['id']
new_item['cleaning_type'] = cleaning_type
new_item['service_name'] = [item['service_name']] # Note: This is a list
items_by_type[cleaning_type] = new_item
# The dict already exists, so we only need to add the cleaning type to the list that was previously created
else:
items_by_type[cleaning_type]['service_name'].append(item['service_name'])
# Transform to a list, since you don't want the keys
items_by_type_as_list = [d for d in items_by_type.values()]
expected_result = [
{
'id': 1,
'cleaning_type': 'Lite service',
'service_name': ['Floors', 'Bathrooms', 'Kitchen'],
},
{
'id': 4,
'cleaning_type': 'Moving cleaning',
'service_name': ['Kitchen Including All Appliances And Cabinets', 'Shift Products'],
}
]
print(items_by_type_as_list == expected_result)
Output:
True

Create a list of nested dictionaries from a single csv file in python

I have a csv file with the following structure:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 1,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 1,summer tournament,Catarina Corbell
...
Team 10, spring tournament,Jessi Ravelo
I want to create a nested dictionary (team, tournament) with a list of player dictionary. The desired outcome would be something like:
{'data':
{Team 1:
{'spring tournament':
{'players': [
{name: Rebecca Cardone},
{name: Salina Youngblood},
{name: Catarina Corbell}]
},
{'summer tournament':
{'players': [
{name: Cara Mejias},
{name: Catarina Corbell}]
}
}
},
...
{Team 10:
{'spring tournament':
{'players': [
{name: Jessi Ravelo}]
}
}
}
}
I've been struggling to format it like this. I have been able to successfully nest the first level (team # --> tournament) but I cannot get the second level to nest. Currently, my code looks like this:
d = {}
header = True
with open("input.csv") as f:
for line in f.readlines():
if header:
header = False
continue
team, tournament, player = line.strip().split(",")
d_team = d.get(team,{})
d_tournament = d_team.get(tournament, {})
d_player = d_tournament.get('player',['name'])
d_player.append(player)
d_tournament['player'] = d_tournament
d_team[tournament] = d_tournament
d[team] = d_team
print(d)
What would be the next step in fixing my code so I can create the nested dictionary?
Some problems with your implementation:
You do d_player = d_tournament.get('player',['name']). But you actually want to get the key named players, and this should be a list of dictionaries. Each of these dictionaries must have the form {"name": "Player's Name"}. So you want
l_player = d_tournament.get('players',[]) (default to an empty list), and then do l_player.append({"name": player}) (I renamed it to l_player because it's a list, not a dict).
You do d_tournament['player'] = d_tournament. I suspect you meant d_tournament['player'] = d_player
Strip the whitespace off the elements in the rows. Do team, tournament, player = (word.strip() for word in line.split(","))
Your code works fine after you make these changes
I strongly suggest you use the csv.reader class to read your CSV file instead of manually splitting the line by commas.
Also, since python's containers (lists and dictionaries) hold references to their contents, you can just add the container once and then modify it using mydict["key"] = value or mylist.append(), and these changes will be reflected in parent containers too. Because of this behavior, you don't need to repeatedly assign these things in the loop like you do with d_team[tournament] = d_tournament
allteams = dict()
hasHeader = True
with open("input.csv") as f:
csvreader = csv.reader(f)
if hasHeader: next(csvreader) # Consume one line if a header exists
# Iterate over the rows, and unpack each row into three variables
for team_name, tournament_name, player_name in csvreader:
# If the team hasn't been processed yet, create a new dict for it
if team_name not in allteams:
allteams[team_name] = dict()
# Get the dict object that holds this team's information
team = allteams[team_name]
# If the tournament hasn't been processed already for this team, create a new dict for it in the team's dict
if tournament_name not in team:
team[tournament_name] = {"players": []}
# Get the tournament dict object
tournament = team[tournament_name]
# Add this player's information to the tournament dict's "player" list
tournament["players"].append({"name": player_name})
# Add all teams' data to the "data" key in our result dict
result = {"data": allteams}
print(result)
Which gives us what we want (prettified output):
{
'data': {
'Team 1': {
'spring tournament': {
'players': [
{ 'name': 'Rebbecca Cardone' },
{ 'name': 'Salina Youngblood' },
{ 'name': 'Catarina Corbell' }
]
},
'summer tournament': {
'players': [
{ 'name': 'Cara Mejias' },
{ 'name': 'Catarina Corbell' }
]
}
},
'Team 10': {
' spring tournament': {
'players': [
{ 'name': 'Jessi Ravelo' }
]
}
}
}
}
The example dictionary you describe is not possible (if you want multiple dictionaries under the key "Team 1", put them in a list), but this snippet:
if __name__ == '__main__':
your_dict = {}
with open("yourfile.csv") as file:
all_lines = file.readlines()
data_lines = all_lines[1:] # Skipping "team,tournament,player" line
for line in data_lines:
line = line.strip() # Remove \n
team, tournament_type, player_name = line.split(",")
team_dict = your_dict.get(team, {}) # e.g. "Team 1"
tournaments_of_team_dict = team_dict.get(tournament_type, {'players': []}) # e.g. "spring_tournament"
tournaments_of_team_dict["players"].append({'name': player_name})
team_dict[tournament_type] = tournaments_of_team_dict
your_dict[team] = team_dict
your_dict = {'data': your_dict}
For this example yourfile.csv:
team,tournament,player
Team 1,spring tournament,Rebbecca Cardone
Team 1,spring tournament,Salina Youngblood
Team 2,spring tournament,Catarina Corbell
Team 1,summer tournament,Cara Mejias
Team 2,summer tournament,Catarina Corbell
Gives the following:
{
"data": {
"Team 1": {
"spring tournament": {
"players": [
{
"name": "Rebbecca Cardone"
},
{
"name": "Salina Youngblood"
}
]
},
"summer tournament": {
"players": [
{
"name": "Cara Mejias"
}
]
}
},
"Team 2": {
"spring tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
},
"summer tournament": {
"players": [
{
"name": "Catarina Corbell"
}
]
}
}
}
}
Process finished with exit code 0
Maybe I overlook somethign but couldn't you use:
df.groupby(['team','tournament'])['player'].apply(list).reset_index().to_json(orient='records')
You might approach it this way:
from collections import defaultdict
import csv
from pprint import pprint
d = defaultdict(dict)
with open('f00.txt', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
d[ row['team'] ].setdefault(row['tournament'], []
).append(row['player'])
pprint(dict(d))
Prints:
{'Team 1': {'spring tournament': ['Rebbecca Cardone',
'Salina Youngblood',
'Catarina Corbell'],
'summer tournament': ['Cara Mejias', 'Catarina Corbell']},
'Team 10': {' spring tournament': ['Jessi Ravelo']}}

Best way to query Django ORM to sum items by category per year (for time series)

Assume we have the models:
class Category(models.Model):
description = models.CharField(...) # Ex: 'horror', 'classic', 'self-help', etc.
class Book(models.Model):
category = models.ForeignKey(Category, ...)
written_date = models.DateField(...)
I want to make a query that will eventually get me the total number of books per category per year! Like so:
{
'2019-01-01': { 'horror': 2, 'classic': 1},
'2020-01-01': { 'horror': 2, 'classic': 1, 'self-help': 4},
...
}
I was only able to come up with the following query:
Book.objects \
.annotate(year=TruncYear('written_date')) \
.values('year', 'category__description') \
.order_by('year') \
.annotate(total=Count('id'))
However this only gets me
{
{
"category__description": "Horror",
"year": "2019-01-01",
"total": 2
},
{
"category__description": "Classic",
"year": "2019-01-01",
"total": 1
},
{
"category__description": "Horror",
"year": "2020-01-01",
"total": 2
},
...
}
Is there any way to do this via ORM? Or I have to do this by manipulating the result directly? Thanks!
You can post-process the result with groupby [python-doc]:
from itertools import groupby
from operator import itemgetter
data = Book.objects.values(
'category__description'
year=TruncYear('written_date'),
).annotate(
total=Count('id')
).order_by('year', 'category__description')
result = {
yrs: {r['category__description']: r['total'] for r in rs}
for yrs, rs in groupby(data, itemgetter('year'))
}

How to get distinct results when ordering by a related annotated field?

This is a Django (2.2) project using Python (3.7). Given the following models, how would I get distinct results in the query below?
class Profile(models.Model):
user = models.ForeignKey(User, ...)
class Location(models.Model):
profile = models.ForeignKey(Profile, ...)
point = PointField()
class ProfileService(models.Model):
profile = models.ForeignKey(Profile, ...)
service = models.ForeignKey(Service, ...)
Here's the query I have so far which works but I end up with duplicate 'ProfileService' objects:
service = Service.objects.get(id=1)
qs = (
ProfileService.objects
.filter(service=service)
.annotate(distance=Distance('profile__location__point', self.point))
.order_by('distance')
)
If I add .distinct('profile') it obviously fails with SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
I have a feeling that the solution lies in using __in but I need to keep the annotated distance field.
Further explanation
To help illustrate further, the lists below represent dummy data that will reproduce the issue:
services = [
{ 'id': 1, 'service': 'A', ... },
{ 'id': 2, 'service': 'B', ... },
]
users = [
{ 'id': 1, 'username': 'Jane Doe', 'email': 'jane#test.com', ... },
{ 'id': 2, 'username': 'John Doe', 'email': 'john#test.com', ... },
]
profiles = [
{ 'id': 1, 'user': 1, ... },
{ 'id': 2, 'user': 2, ... },
]
locations = [
{ 'id': 1, 'profile': 1, 'point': 'X', ... },
{ 'id': 2, 'profile': 1, 'point': 'Y', ... },
{ 'id': 3, 'profile': 2, 'point': 'Z', ... },
]
# 'point' would normally contain actual Point data.
# Letters (XYZ) just intended to represent unique Point data.
profile_services = [
{ 'id': 1, 'profile': 1, 'service': 1 },
{ 'id': 2, 'profile': 1, 'service': 2 },
{ 'id': 3, 'profile': 2, 'service': 1 },
]
It is the 'Location' objects that cause the duplications in the 'qs' queryset above (if a 'Profile' has only 1 'Location' associated with it, there is no duplicate result in 'qs'), however the user does need to keep the ability to provide multiple locations, we just need the closest.
Progress
Following the advice from 'Ivan Starostin', I have put together the following using subqueries:
locations = (
Location.objects
.filter(profile=OuterRef('profile'))
.annotate(distance=Distance('point', self.point))
.order_by('distance')
)
qs = (
ProfileService.objects
.filter(service=service)
.filter(profile__id__in=Subquery(locations.values('profile_id')[:1]))
.annotate(distance=Subquery(locations.values('distance')[:1]))
)
Now this solves the issue of duplicate results but it loses the annotated 'distance' value which should be annotated against the applicable ProfileService query object. Not sure if this is going in the right direction or not (any pointers would be greatly appreciated), I just want to avoid pulling the data into Python memory to get rid of the duplicates.
I have been refering to the following post too but the accepted answer refuses to work in my queryset: Similar question

Categories

Resources