I have a list of dictionaries something like this:
users=[{"name": "David", "team": "reds", "score1": 100, "score2": 20,},
{"name": "David", "team": "reds", "score1": 20, "score2": 60,},
{"name": "David", "team": "blues", "score1": 10, "score2": 70,}]
and would really like to get a new processed list of dictionaries something like
summary=[{"team": "reds", "total1": 120, "total2": 80,},
{"team": "blues", "total1": 120, "total2": 80,}]
preferably looping through the original data just once. I can create a dictionary holding a total value for each user key with this
summary = dict()
for user in users:
if not user['team'] in summary:
summary[user['team']]=float(user['score1'])
else:
summary[user['team']]+=float(user['score1'])
to give
summary = {'reds': 120,'blues': 10}
but am struggling with producing the list of dictionaries, the nearest I can get is to create a dictionary at the first instance of a team, and then try to append to its values on subsequent occurrences...
summary = []
for user in users:
if any(d['team'] == user['team'] for d in summary):
# append to values in the relevant dictionary
# ??
else:
# Add dictionary to list with some initial values
d ={'team':user['team'],'total1':user['score1'],'total2':user['score2']}
summary.append(dict(d))
...and it has gotten messy... Am I going about this in completely the wrong way? Can you change values in a dictionary within a list?
Thanks
I think this is good case to use pandas library for python:
>>> import pandas as pd
>>> dfUsers = pd.DataFrame(users)
>>> dfUsers
name score1 score2 team
0 David 100 20 reds
1 David 20 60 reds
2 David 10 70 blues
>>> dfUsers.groupby('team').sum()
score1 score2
team
blues 10 70
reds 120 80
And if you really want to put it into dict:
>>> dfRes = dfUsers.groupby('team').sum()
>>> dfRes.columns = ['total1', 'total2'] # if you want to rename columns
>>> dfRes.reset_index().to_dict(orient='records')
[{'team': 'blues', 'total1': 10, 'total2': 70},
{'team': 'reds', 'total1': 120, 'total2': 80}]
another way to do this is with itertools.groupby:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> users.sort(key=itemgetter('team'))
>>>
>>> res = [{'team': t[0], 'res': list(t[1])} for t in groupby(users, key=itemgetter('team'))]
>>> res = [{'team':t[0], 'total1': sum(x['score1'] for x in t[1]), 'total2': sum(x['score2'] for x in t[1])} for t in res]
>>> res
[{'team': 'blues', 'total1': 10, 'total2': 70},
{'team': 'reds', 'total1': 120, 'total2': 80}]
Or, if you really want simple python:
>>> res = dict()
>>> for x in users:
if x['team'] not in res:
res[x['team']] = [x['score1'], x['score2']]
else:
res[x['team']][0] += x['score1']
res[x['team']][1] += x['score2']
>>> res = [{'team': k, 'total1': v[0], 'total2': v[1]} for k, v in res.iteritems()}]
>>> res
[{'team': 'reds', 'total1': 120, 'total2': 80},
{'team': 'blues', 'total1': 10, 'total2': 70}]
You are really close, you just need a way to look up which dictionary to update. This is the simplest way I can see.
summary = dict()
for user in users:
team = user['team']
if team not in summary:
summary[team] = dict(team=team,
score1=float(user['score1']),
score2=float(user['score2']))
else:
summary[team]['score1'] += float(user['score1'])
summary[team]['score2'] += float(user['score2'])
then
>>> print summary.values()
[{'score1': 120.0, 'score2': 80.0, 'team': 'reds'},
{'score1': 10.0, 'score2': 70.0, 'team': 'blues'}]
Here's my solution which assumes that all scores that need to be added start with score:
users=[{"name": "David", "team": "reds", "score1": 100, "score2": 20,},
{"name": "David", "team": "reds", "score1": 20, "score2": 60,},
{"name": "David", "team": "blues", "score1": 10, "score2": 70,}]
totals = {}
for item in users:
team = item['team']
if team not in totals:
totals[team] = {}
for k,v in item.items():
if k.startswith('score'):
if k in totals[team]:
totals[team][k] += v
else:
totals[team][k] = v
print totals
Output:
{'reds': {'score1': 120, 'score2': 80}, 'blues': {'score1': 10, 'score2': 70}}
See comments inline for an explanation
import pprint
users=[{"name": "David", "team": "reds", "score1": 100, "score2": 20,},
{"name": "David", "team": "reds", "score1": 20, "score2": 60,},
{"name": "David", "team": "blues", "score1": 10, "score2": 70,}]
scores_by_team = dict()
for user in users:
if user['team'] not in scores_by_team:
# Make sure you're gonna have your scores zeroed so you can add the
# user's scores later
scores_by_team[user['team']] = {
'total1': 0,
'total2': 0
}
# Here the user's team exists for sure in scores_by_team
scores_by_team[user['team']]['total1'] += user['score1']
scores_by_team[user['team']]['total2'] += user['score2']
# So now, the scores you want have been calculated in a dictionary where the
# keys are the team names and the values are another dictionary with the scores
# that you actually wanted to calculate
print "Before making it a summary: %s" % pprint.pformat(scores_by_team)
summary = list()
for team_name, scores_by_team in scores_by_team.items():
summary.append(
{
'team': team_name,
'total1': scores_by_team['total1'],
'total2': scores_by_team['total2'],
}
)
print "Summary: %s" % summary
This outputs:
Before making it a summary: {'blues': {'total1': 10, 'total2': 70}, 'reds': {'total1': 120, 'total2': 80}}
Summary: [{'total1': 120, 'total2': 80, 'team': 'reds'}, {'total1': 10, 'total2': 70, 'team': 'blues'}]
Related
I have a list of dicts that looks like this:
{
"Player_Name":"Byeong-Hun An",
"Tournament":[
{
"Name":"Arnold Palmer Invitational presented by Mastercard",
"Points":"32.80",
"Salary":"10300.00"
}
]
},
{
"Player_Name":"Byeong-Hun An",
"Tournament":[
{
"Name":"Different",
"Points":"18.80",
"Salary":"10400.00"
}
]
}
and I want this:
[
{
"Player_Name":"Byeong-Hun An",
"Tournament":[
{
"Name":"Arnold Palmer Invitational presented by Mastercard",
"Points":"32.80",
"Salary":"10300.00"
},
{
"Name":"Different",
"Points":"18.80",
"Salary":"10400.00"
}
]
}
]
I've tried collections, but it doesn't do exactly what I'm wanting. I essentially want to take every single player and combine all the tournament objects into one so each player has one object instead of each event having its own object.
Here's my code
import json
import numpy as np
import pandas as pd
from collections import Counter
# using json open the player objects file and set it equal to data
with open('PGA_Player_Objects.json') as json_file:
data = json.load(json_file)
points = []
players = []
for a in data:
for b in a['Tournament']:
points.append(int(float(b['Points'])))
for x in data:
players.append(x['Player_Name'])
def Average(lst):
unrounded = sum(lst) / len(lst)
return round(unrounded,2)
result = Counter()
for d in data:
for b in d['Tournament']:
result[d['Player_Name']] += int(float(b['Points']))
How can I do that?
if your list is in l:
l = [{'Player_Name': 'Byeong-Hun An', 'Tournament': [{'Name': 'Arnold Palmer Invitational presented by Mastercard', 'Points': '32.80', 'Salary': '10300.00'}]},
{'Player_Name': 'Byeong-Hun An', 'Tournament': [{'Name': 'Different', 'Points': '18.80', 'Salary': '10400.00'}]},]
Try this:
from itertools import groupby
result = []
for k,g in groupby(sorted(l, key=lambda x:x['Player_Name']), lambda x:x['Player_Name']):
result.append({'Player_Name':k, 'Tournament':[i['Tournament'][0] for i in g]})
Then the result will be:
[{'Player_Name': 'Byeong-Hun An',
'Tournament': [
{'Name': 'Arnold Palmer Invitational presented by Mastercard',
'Points': '32.80',
'Salary': '10300.00'},
{'Name': 'Different',
'Points': '18.80',
'Salary': '10400.00'}]}]
This works as well, and it's a more general solution that works for arbitrary key names:
from collections import defaultdict
d = defaultdict(list)
for dic in lst:
for k, v in dic.items():
if isinstance(v, list):
d[k].extend(v)
else:
d[k] = v
answer = [dict(d)]
Here's my take on a solution.
Create a new list of dictionaries
Iterate through the original list of dictionaries.
Store one copy of the beginning data for each player that is the same into the new list of dictionaries
Append additional Tournament data for each player into that one dictionary into a unified Tournament list.
Untested code below as an example, but should work with some tweaks.
listofDicts = [{'Player_Name': 'Byeong-Hun An', 'Tournament': [{'Name': 'Arnold Palmer Invitational presented by Mastercard', 'Points': '32.80', 'Salary': '10300.00'}]},{'Player_Name': 'Byeong-Hun An', 'Tournament': [{'Name': 'Different', 'Points': '18.80', 'Salary': '10400.00'}]}]
newListOfDicts = []
playerName = " "
playerNo = -1
for dicts in listofDicts:
if playerName == dicts['Player_Name']:
newListOfDicts[playerNo]['Tournament'].append(dicts['Tournament'][0])
else:
newListOfDicts.append(dicts)
playerName = dicts['Player_Name']
playerNo += 1
I couldn't find any examples that match my use case. Still working through my way in python lists and dictionaries.
Problem:
all_cars = {'total_count': 3,'cars': [{'name': 'audi','model': 'S7'}, {'name': 'honda', 'model': 'accord'},{'name': 'jeep', 'model': 'wrangler'} ]}
owners = {'users':[{'owner': 'Nick', 'car': 'audi'},{'owner': 'Jim', 'car': 'ford'},{'owner': 'Mike', 'car': 'mercedes'} ]}
def duplicate():
for c in all_cars['cars']:
if c['name'] == [c['users']for c in owners['users']]:
pass
else:
res = print(c['name'])
return res
output = ['honda', 'jeep', audi']
and
def duplicate():
for c in all_cars['cars']:
if c['name'] == 'audi':
pass
else:
res = print(c['name'])
return res
output - ['honda', 'jeep']
I am trying to find matching values in both dictionaries, using list comprehension, then return non-matching values only.
Solution: Using 'in' rather than '==' operator, I was able to compare values between both lists and skip duplicates.
def duplicate():
for c in all_cars['cars']:
if c['name'] in [c['users']for c in owners['users']]:
pass
else:
res = print(c['name'])
return res
To answer the question in your title, you can conditionally add elements during a list comprehension using the syntax [x for y in z if y == a], where y == a is any condition you need - if the condition evaluates to True, then the element y will be added to the list, otherwise it will not.
I would just keep a dictionary of all of the owner data together:
ownerData = { "Shaft" : {
"carMake" : "Audi",
"carModel" : "A8",
"year" : "2015" },
"JamesBond" : {
"carMake" : "Aston",
"carModel" : "DB8",
"year" : "2012" },
"JeffBezos" : {
"carMake" : "Honda",
"carModel" : "Accord"
"year" : "1989"}
}
Now you can loop through and query it something like this:
for o in ownerData:
if "Audi" in o["carMake"]:
print("Owner %s drives a %s %s %s" % (o, o["year"], o["carMake"], o["carModel"]))
Should output:
"Owner Shaft drives a 2015 Audi A8"
This way you can expand your data set for owners without creating multiple lists.
OK, based on your feedback on the solution above, here is how I would tackle your problem. Drop your common items into lists and then use "set" to print out the diff.
all_cars = {'total_count': 3,'cars': [{'name': 'audi','model': 'S7'},
{'name': 'honda', 'model': 'accord'},{'name': 'jeep', 'model': 'wrangler'} ]}
owners = {'users':[{'owner': 'Nick', 'car': 'audi'},{'owner': 'Jim',
'car': 'ford'},{'owner': 'Mike', 'car': 'mercedes'} ]}
allCarList = []
ownerCarList = []
for auto in all_cars['cars']:
thisCar = auto['name']
if thisCar not in allCarList:
allCarList.append(thisCar)
for o in owners['users']:
thisCar = o['car']
if thisCar not in ownerCarList:
ownerCarList.append(thisCar)
diff = list(set(allCarList) - set(ownerCarList))
print(diff)
I put this in and ran it and came up with this output:
['jeep', 'honda']
Hope that helps!
So I have a small data like this:
data = [
{"Name":"Arab","Code":"Zl"},
{"Name":"Korea","Code":"Bl"},
{"Name":"China","Code":"Bz"}
]
I want to find a graph so that the x-axis is: "Bl", "Bz", "Zl" (alphabetic order)
and the y-axis is: "Korea", "China", "Arab" (corresponding to the codenames).
I thought of:
new_data = {}
for dic in data:
country_data = dic["Name"]
code_data = dic["Code"]
new_data[code_data] = country_data
code_data = []
for codes in new_data.keys():
code_data.append(codes)
code_data.sort()
name_data = []
for code in code_data:
name_data.append(new_data[code])
Is there a better way to do this?
Perhaps by not creating a new dictionary?
So here's the data:
data = [
{"Name":"Arab","Code":"Zl"},
{"Name":"Korea","Code":"Bl"},
{"Name":"China","Code":"Bz"}
]
To create a new sorted list:
new_list = sorted(data, key=lambda k: k['Code'])
If you don't want to get a new list:
data[:] = sorted(data, key=lambda k: k['Code'])
The result is:
[{'Code': 'Bl', 'Name': 'Korea'}, {'Code': 'Bz', 'Name': 'China'}, {'Code': 'Zl', 'Name': 'Arab'}]
I hope I could help you!
Better way to produce same results:
from operator import itemgetter
data = [
{"Name": "Arab", "Code": "Zl"},
{"Name": "Korea", "Code": "Bl"},
{"Name": "China", "Code": "Bz"}
]
sorted_data = ((d["Code"], d["Name"]) for d in sorted(data, key=itemgetter("Code")))
code_data, name_data = (list(item) for item in zip(*sorted_data))
print(code_data) # -> ['Bl', 'Bz', 'Zl']
print(name_data) # -> ['Korea', 'China', 'Arab']
Here's one way using operator.itemgetter and unpacking via zip:
from operator import itemgetter
_, data_sorted = zip(*sorted(enumerate(data), key=lambda x: x[1]['Code']))
codes, names = zip(*map(itemgetter('Code', 'Name'), data_sorted))
print(codes)
# ('Bl', 'Bz', 'Zl')
print(names)
# ('Korea', 'China', 'Arab')
I'm trying to change the result so if there are 2 grades in values it will replace the 2 grades with the average. I tried so many techniques to do that but failed.
I need to write a solution for the average and to delete the 2 values of the grades.
I wrote this code:
def myDict(grades, teachers):
Dict={}
for i1 in grades:
for i2 in teachers:
key=i2[1]
value=[]
Dict[key]=value #{'Statistics': [], 'Philosophy': [], 'Computer': [], 'Physics': [], 'English': []}
for i1 in grades:
if key==i1[-1]:
value.append(i1[0]) #{'Statistics': [23560, 23452], 'Philosophy': [], 'Computer': [23415, 12345], 'Physics': [23452, 23459], 'English': [12345]}
for i1 in grades:
if key==i1[-1]:
value.append(i1[1])
value_size=len(value)
if value_size>2:
end=int(value_size)/2
for i in value[-1:end]:
print float(count(i)/value_size)
print Dict
grades = [[12345,75,'English'],
[23452,83,'Physics'],
[23560,81,'Statistics'],
[23415,61,'Computer'],
[23459,90,'Physics'],
[12345,75,'Computer'],
[23452,100,'Statistics']]
teachers = [['Aharoni','English'],
['Melamed','Physics'],
['Kaner','Computer'],
['Zloti','Statistics'],
['Korman','Philosophy']]
print myDict(grades, teachers)
The result is:
>>>
{'Statistics': [23560, 23452, 81, 100], 'Philosophy': [], 'Computer': [23415, 12345, 61, 75], 'Physics': [23452, 23459, 83, 90], 'English': [12345, 75]}
None
>>>
What i want to get (it is in process, i am stuck in this level):
{ 'Aharoni': [12345, 75.0], 'Kaner': [23415, 12345, 68.0], 'Melamed': [23452, 23459, 86.5], 'Korman': [], 'Zloti': [23560, 23452, 90.5] }
What about this simple loop:
myDict = {}
for teacher, subject in teachers:
values = []
scores = []
for i1, i2, s in grades:
if subject == s:
values.append(i1)
scores.append(i2)
if scores:
average = sum(scores) / len(scores)
values.append(average)
myDict[teacher] = values
First, iterate trough the teachers, and for each matching subject in the grade list, append i1 and i2 to some list.
At the end of the iteration, you can easily compute the average of i2 values (if the list is not empty) and then update your dictionnary.
The output with your data would be:
{
'Korman': [],
'Melamed': [23452, 23459, 86.5],
'Zloti': [23560, 23452, 90.5],
'Aharoni': [12345, 75.0],
'Kaner': [23415, 12345, 68.0]
}
List comprehensions are a great way to deal with a data structure like that:
def myDict(grades, teachers):
subjects = [x[1] for x in teachers]
d = {}
for s in subjects:
subject_grades_records = [x for x in grades if x[2] == s]
value = [x[0] for x in subject_grades_records]
if len(value) > 0:
value.append(sum(x[1] for x in subject_grades_records) / float(len(subject_grades_records)))
teacher = [x[0] for x in teachers if x[1] == s][0]
d[teacher] = value
return d
grades = [[12345,75,'English'],
[23452,83,'Physics'],
[23560,81,'Statistics'],
[23415,61,'Computer'],
[23459,90,'Physics'],
[12345,75,'Computer'],
[23452,100,'Statistics']]
teachers = [['Aharoni','English'],
['Melamed','Physics'],
['Kaner','Computer'],
['Zloti','Statistics'],
['Korman','Philosophy']]
print(repr(myDict(grades, teachers)))
# {'Kaner': [23415, 12345, 68.0], 'Aharoni': [12345, 75.0], 'Zloti': [23560, 23452, 90.5], 'Melamed': [23452, 23459, 86.5], 'Korman': []}
I have a list of dictionaries like this
data = [
{"_id": {"cohort_name": "09-01-2010", "segment_name": "LTV90-Prime", "driver_name": "ADB"}, "cohort_data": [
{"calculated": [],
"original": [{"1": 225.2699758337715}, {"2": 106.05173118059133}, {"3": 547.2908664469512},
{"4": 573.1083659247656}]}]},
{"_id": {"cohort_name": "11-01-2010", "segment_name": "LTV90-Prime", "driver_name": "Unit Loss Rate"},
"cohort_data": [{"calculated": [], "original": [{"1": 0.002687180620372531}, {"2": 0.001468127113897437}]}]},
{"_id": {"cohort_name": "11-01-2010", "segment_name": "LTV90-Prime", "driver_name": "Unit Loss Rate"},
"cohort_data": [{"calculated": [], "original": [{"10": 0.002687180620372531}, {"1": 0.002687180620372531},
{"2": 0.001468127113897437}]}]}
]
I am trying to group data based upon the driver_name and segment_name and push all cohort_name and cohort_data inside the internal dictionary.
The expected output is as follows
[{'driver_name': 'Unit Loss Rate',
'segment_name': 'LTV90-Prime',
'cohort_data': {
'5-01-2010': [{'1': 0.002687180620372531}, {'2': 0.001468127113897437}, {'10': 0.002687180620372531}],
'11-01-2010': [{'1': 0.002687180620372531}, {'2': 0.001468127113897437}]
}},
{'driver_name': 'ADB',
'segment_name': 'LTV90-Prime',
'cohort_data': {
"09-01-2010": [{'1': 225.2699758337715}, {'2': 106.05173118059133}, {'3': 547.2908664469512},
{'4': 573.1083659247656}]
}}
]
This is what I have done so far. I am stuck in pushing the cohort_name and cohort_data in the internal dictionary.
def get_data_list(d):
final_data = None
for i in d:
calculated = i['calculated']
original = i['original']
if original:
final_data = original
elif calculated:
final_data = calculated
return final_data
dd = defaultdict(dict)
for i in data:
df = {}
id_ = i['_id']
cohort_name_final, segment_name_final, driver_name_final = id_['cohort_name'], \
id_['segment_name'], \
id_['driver_name']
cohort_data_final = i['cohort_data']
if segment_name_final not in df and segment_name_final not in df:
df['segment_name'] = segment_name_final
df['driver_name'] = driver_name_final
df['cohort_data'] = get_data_list(cohort_data_final)
elif segment_name_final in df and segment_name_final in df:
df['cohort_data'].append(get_data_list(cohort_data_final))
# df['cohort_data'].append({cohort_name_final: get_data_list(cohort_data_final)})
I am using Python 3.4.3. The data shown here is an subset of an original dataset which is queried from the MongoDB database.
Please help.