sum values of specific keys in a dict - python

I have a a list of dicts that looks like this:
source_dict = [{'ppl': 10, 'items': 15, 'airport': 'lax', 'city': 'Los Angeles', 'timestamp': 1, 'region': 'North America', 'country': 'United States'},
{'ppl': 20, 'items': 32, 'airport': 'JFK', 'city': 'New York', 'timestamp': 2, 'region': 'North America', 'country': 'United States'},
{'ppl': 50, 'items': 20, 'airport': 'ABC', 'city': 'London', 'timestamp': 1, 'region': 'Europe', 'country': 'United Kingdom'}... ]
#Gets the list of countries in the dict
countries = list(set(stats['country'] for stats in source_dict))
I know I can use an a collection:
counter = collections.Counter()
for d in source_dict:
counter.update(d)
But, want to group by country and get totals for only certain keys not all of them.
So the result should be
{'Country': 'United States', 'p95': 30, 'items':37},
{'Country': 'England', 'ppl': 50, 'items':20},...
Im not sure how to incorporate multiple keys into a counter. To produce that result

This is one approach using collections.defaultdict & collections.Counter.
Ex:
from collections import defaultdict, Counter
source_dict = [{'ppl': 10, 'items': 15, 'airport': 'lax', 'city': 'Los Angeles', 'timestamp': 1, 'region': 'North America', 'country': 'United States'},
{'ppl': 20, 'items': 32, 'airport': 'JFK', 'city': 'New York', 'timestamp': 2, 'region': 'North America', 'country': 'United States'},
{'ppl': 50, 'items': 20, 'airport': 'ABC', 'city': 'London', 'timestamp': 1, 'region': 'Europe', 'country': 'United Kingdom'} ]
result = defaultdict(Counter)
for stats in source_dict:
result[stats['country']].update(Counter({'ppl': stats['ppl'], "items": stats['items']}))
#result = [{'Country': k, **v} for k, v in result.items()] #Required output
print(result)
Output:
defaultdict(<class 'collections.Counter'>,
{'United Kingdom': Counter({'ppl': 50, 'items': 20}),
'United States': Counter({'items': 47, 'ppl': 30})})

In pandas you can do this:
import io
import pandas as pd
dff=io.StringIO("""ppl,items,airport,city,timestamp,region,country
10,15,lax,Los Angeles,1,North America,United States
20,32,JFK,New York,2,North America,United States
50,20,ABC,London,1,Europe,United Kingdom""")
df3=pd.read_csv(dff)
df3
ppl items airport city timestamp region country
0 10 15 lax Los Angeles 1 North America United States
1 20 32 JFK New York 2 North America United States
2 50 20 ABC London 1 Europe United Kingdom
df3.groupby('region').agg({'ppl':'sum', 'items':'sum'})
# ppl items
#region
#Europe 50 20
#North America 30 47

Related

How to Arrange a List of Dictionaries in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
data=[{'address': 'High Tech Campus 60', 'beta': 1.406659, 'ceo': 'Mr. Richard Clemmer', 'changes': -3.9400024, 'cik': '0001413447', 'city': 'Eindhoven', 'companyName': 'NXP Semiconductors N.V.', 'country': 'NL', 'currency': 'USD', ...}]
I have a dictionary.
Need to receive a list of dictionaries comma separated: [{},{},..]
How do I add them in a loop?
I tried to use append:
data_list.append(data.copy())
But it returns smth different: [[{...}]]
How do I get a list of such format:
[{'address': 'High Tech Campus 60', 'beta': 1.406659, 'ceo': 'Mr. Richard Clemmer', 'changes': -3.9400024, 'cik': '0001413447', 'city': 'Eindhoven', 'companyName': 'NXP Semiconductors N.V.', 'country': 'NL', 'currency': 'USD', ...}, {'address': '41st, 1155 Rene-Leve...W Flr 4000', 'beta': 2.219123, 'ceo': 'Mr. Klaus Paulini', 'changes': -0.00999999, 'cik': '0001113423', 'city': 'MONTREAL', 'companyName': 'Aeterna Zentaris Inc.', 'country': 'CA', 'currency': 'USD', ...}, {'address': '125 Summer Street', 'beta': 0.0, 'ceo': 'Dr. Jean-Pierre Som...ossi Ph.D.', 'changes': 2.5800018, 'cik': '0001593899', 'city': 'Boston', 'companyName': 'Atea Pharmaceuticals, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '401 Charmany Dr', 'beta': 1.073689, 'ceo': 'Mr. Corey Chambas', 'changes': 0.0, 'cik': '0001521951', 'city': 'Madison', 'companyName': 'First Business Finan...ices, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '490 Arsenal Way', 'beta': 0.0, 'ceo': 'Mr. Marc A. Cohen', 'changes': -0.9699974, 'cik': '0001662579', 'city': 'Watertown', 'companyName': 'C4 Therapeutics, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': 'General-Guisan-Strasse 6', 'beta': 1.629418, 'ceo': 'Mr. Carlos Creus Moreira', 'changes': -0.09000015, 'cik': '0001738699', 'city': 'Zug', 'companyName': 'WISeKey Internationa...Holding AG', 'country': 'CH', 'currency': 'USD', ...}, {'address': '508 W Wall St Ste 800', 'beta': 1.7762, 'ceo': 'Mr. Stephen Jumper', 'changes': -0.04999995, 'cik': '0000799165', 'city': 'Midland', 'companyName': 'Dawson Geophysical Company', 'country': 'US', 'currency': 'USD', ...}, {'address': '955 Perimeter Road', 'beta': 0.0, 'ceo': 'Mr. Ravi Vig', 'changes': -1.2900009, 'cik': '0000866291', 'city': 'Manchester', 'companyName': 'Allegro MicroSystems, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '490 Lapp Rd', 'beta': 1.138646, 'ceo': 'Ms. Geraldine Henwood', 'changes': -0.04999995, 'cik': '0001588972', 'city': 'Malvern', 'companyName': 'Recro Pharma, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '5 Haplada Street, PO Box 5011', 'beta': 1.396288, 'ceo': 'Mr. Guy Bernstein', 'changes': -0.9300003, 'cik': '0000876779', 'city': 'OR YEHUDA', 'companyName': 'Magic Software Enter...rises Ltd.', 'country': 'IL', 'currency': 'USD', ...}, {'address': '111 West 33rd Street', 'beta': 0.0, 'ceo': 'Mr. Richard Gumer', 'changes': -0.20249999, 'cik': '0001823323', 'city': 'New York', 'companyName': 'KL Acquisition Corp', 'country': 'US', 'currency': 'USD', ...}, {'address': '2 Canal Park Ste 4', 'beta': 1.907176, 'ceo': 'Mr. Langley Steinert', 'changes': -1.4399986, 'cik': '0001494259', 'city': 'Cambridge', 'companyName': 'CarGurus, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '119 Standard St', 'beta': 1.592636, 'ceo': 'Mr. Ethan Brown', 'changes': -3.859993, 'cik': '0001655210', 'city': 'El Segundo', 'companyName': 'Beyond Meat, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '3854 American Way Ste A', 'beta': 0.502729, 'ceo': 'Mr. Paul Kusserow', 'changes': -1.5899963, 'cik': '0000896262', 'city': 'Baton Rouge', 'companyName': 'Amedisys, Inc.', 'country': 'US', 'currency': 'USD', ...}, ...]
Ok, it looks like initially I have not a dictionary but a list of dictionaries from one element. So how do I add another dictionary to the list after comma?
Upd: I managed to receive a list of dictionaries. It appeared it's not fully correct as some rows include additional fields. The list looks like this:'currency': 'USD', ...}, 'code', 'status', {'address': '5 ...
How can I validate a list of dictionaries and make sure every dictionary matches predefined list of columns.
enter code here
data_list.append(data[0].copy())
You could also do
data_list = data_list + data

Adding value to dataframe based on dict

I have problem with a list of dicts like this:
list_validation = [{'name': 'Alice', 'street': 'Baker Street', 'stamp': 'T05', 'city': 'London'}, {'name': 'Margaret', 'street': 'Castle Street', 'stamp': 'T01', 'city': 'Cambridge'}, {'name': 'Fred', 'street': 'Baker Street', 'stamp': 'T012', 'city': 'London'}]
Now in my dataframe there are columns
df = pd.DataFrame({'name': ['Fred', 'Jane', 'Alice', 'Margaret'], 'street': ['Baker Street', 'Downing Street', 'Baker Street', 'Castle Street'],
'stamp': ['', 'T03', '', ''],
'city': ['', 'London', '', ''],
'other irrelevant columns for this task' : [1, 2, 3, 4]
})
What I want is to fill the gaps of the stamp columns and the city columns, so it looks like this:
df2 = pd.DataFrame({'name': ['Fred', 'Jane', 'Alice', 'Margaret'], 'street': ['Baker Street', 'Downing Street', 'Baker Street', 'Downing Street'],
'stamp': ['T012', 'T03', 'T05', 'T01'],
'city': ['London', 'London', 'London', 'Cambridge'],
'other irrelevant columns for this task' : [1, 2, 3, 4]
})
I have been trying this, but it is not working and going great:
new_dict = df[['name', 'street', 'stamp', 'city']].to_dict()
list(new_dict)
for l in list_validation:
for row in new_dict:
if l['name'] == row['name'] and l['street'] == row['street']:
row['stamp'] = l['stamp']
row['city'] = l['city']
This is one approach iterate over each row in the dataframe and fill the missing values from the list.
List Definition:
list_validation = [{'name': 'Alice', 'street': 'Baker Street', 'stamp': 'T05', 'city': 'London'}, {'name': 'Margaret', 'street': 'Castle Street', 'stamp': 'T01', 'city': 'Cambridge'}, {'name': 'Fred', 'street': 'Baker Street', 'stamp': 'T012', 'city': 'London'}]
DataFrame Definition:
df = pd.DataFrame({'name': ['Fred', 'Jane', 'Alice', 'Margaret'], 'street': ['Baker Street', 'Downing Street', 'Baker Street', 'Castle Street'],
'stamp': ['', 'T03', '', ''],'city': ['', 'London', '', ''],'other irrelevant columns for this task' : [1, 2, 3, 4]})
Logic
for r,i in df.iterrows():
name_in_df = i['name']
# if pd.isna(i['stamp']):
if not i['stamp']:
for j in list_validation:
if j['name'] == name_in_df:
value_in_list = j['stamp']
df.loc[r,'stamp'] = value_in_list
break
# if pd.isna(i['city']):
if not i['city']:
name_in_df = i['name']
for j in list_validation:
if j['name'] == name_in_df:
value_in_list = j['city']
df.loc[r,'city'] = value_in_list
break
df
Here is the approach that I would use
Set the index of given dataframe to name and street
Create a new dataframe from list_validation and set its index to name and street as well.
Mask the empty values in df1 and fill the masked values using the values from df2
c = ['name', 'street']
df1 = df.set_index(c)
df2 = pd.DataFrame(list_validation).set_index(c)
df1.mask(df1.eq('')).fillna(df2).reset_index()
name street stamp city other irrelevant columns for this task
0 Fred Baker Street T012 London 1
1 Jane Downing Street T03 London 2
2 Alice Baker Street T05 London 3
3 Margaret Castle Street T01 Cambridge 4

Python Decision Tree: Creating Relationship using Dictionary from Row data

I have a hierarchical data(more than 10 generation) which tells who a person's parent/children are. i would want to represent this as dict of dict. is there any way to achieve this.
sample input - List of Dict/Dataframe
[{'Name': 'Oli Bob', 'Location': 'United Kingdom', 'Parent': nan}, {'Name': 'Mary May', 'Location': 'Germany', 'Parent': 'Oli Bob'}, {'Name': 'Christine Lobowski', 'Location': 'France', 'Parent': 'Oli Bob'}, {'Name': 'Brendon Philips', 'Location': 'USA', 'Parent': 'Oli Bob'}, {'Name': 'Margret Marmajuke', 'Location': 'Canada', 'Parent': 'Brendon Philips'}, {'Name': 'Frank Harbours', 'Location': 'Russia', 'Parent': 'Brendon Philips'}, {'Name': 'Todd Philips', 'Location': 'United Kingdom', 'Parent': 'Frank Harbours'}, {'Name': 'Jamie Newhart', 'Location': 'India', 'Parent': nan}, {'Name': 'Gemma Jane', 'Location': 'China', 'Parent': nan}, {'Name': 'Emily Sykes', 'Location': 'South Korea', 'Parent': 'Emily Sykes'}, {'Name': 'James Newman', 'Location': 'Japan', 'Parent': nan}]
same data in table form
Desired Output
[
{name:"Oli Bob", location:"United Kingdom", _children:[
{name:"Mary May", location:"Germany"},
{name:"Christine Lobowski", location:"France"},
{name:"Brendon Philips", location:"USA",_children:[
{name:"Margret Marmajuke", location:"Canada"},
{name:"Frank Harbours", location:"Russia",_children:[{name:"Todd Philips", location:"United Kingdom"}]},
]},
]},
{name:"Jamie Newhart", location:"India"},
{name:"Gemma Jane", location:"China", _children:[
{name:"Emily Sykes", location:"South Korea"},
]},
{name:"James Newman", location:"Japan"},
];

how to find the no. of person from a particular country from the below code?

[
{'Year': 1901,
'Category': 'Chemistry',
'Prize': 'The Nobel Prize in Chemistry 1901',
'Motivation': '"in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions"',
'Prize Share': '1/1',
'Laureate ID': 160,
'Laureate Type': 'Individual',
'Full Name': "Jacobus Henricus van 't Hoff",
'Birth Date': '1852-08-30',
'Birth City': 'Rotterdam',
'Birth Country': 'Netherlands',
'Sex': 'Male',
'Organization Name': 'Berlin University',
'Organization City': 'Berlin',
'Organization Country': 'Germany',
'Death Date': '1911-03-01',
'Death City': 'Berlin',
'Death Country': 'Germany'},
{'Year': 1901,
'Category': 'Literature',
'Prize': 'The Nobel Prize in Literature 1901',
'Motivation': '"in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect"',
'Prize Share': '1/1',
'Laureate ID': 569,
'Laureate Type': 'Individual',
'Full Name': 'Sully Prudhomme',
'Birth Date': '1839-03-16',
'Birth City': 'Paris',
'Birth Country': 'France',
'Sex': 'Male',
'Organization Name': '',
'Organization City': '',
'Organization Country': '',
'Death Date': '1907-09-07',
'Death City': 'Châtenay',
'Death Country': 'France'}
]
If you want to find, how many person belong to same birth country only from given list of dict, you can use the following code :
from collections import Counter
li = [each['Birth City'] for each in val if each['Birth City']]
print(dict(Counter(li)))
OUTPUT
{'Rotterdam': 1, 'Paris': 1}

How to use a for loop to iterate through a list of dictionaries, select a key, and append the values to a new list

I'm a beginner and this is a basic question. I need to use a for loop to iterate through a list of dictionaries, and for a certain key in each of the dictionaries, append the value to a new list. The original list is a list of cities, with each dictionary in the list representing a city and containing information about the city. I need the loop to pick out the 'Population' key in each city dictionary, and append the value to a new list called city_populations.
I've only managed to append the population from one of the dictionaries to the list. Having trouble getting it to iterate through the list of dictionaries and append each population from each city. Here are three things I've tried:
Attempt 1:
city_populations = []
for city in cities:
city_populations.append(cities[0]['Population'])
city_populations
Attempt 2:
city_populations = []
for city in cities:
city_populations.append(cities[index]['Population'])
city_populations
Attempt 3:
city_populations = []
for city in cities:
index = 0
city_populations.append(cities[index]['Population'])
index =+ 1
city_populations
Here is the list of cities:
[{'City': 'Buenos Aires',
'Country': 'Argentina',
'Population': 2891000,
'Area': 4758},
{'City': 'Toronto', 'Country': 'Canada', 'Population': 2800000, 'Area': 2731},
{'City': 'Pyeongchang',
'Country': 'South Korea',
'Population': 2581000,
'Area': 3194},
{'City': 'Marakesh', 'Country': 'Morocco', 'Population': 928850, 'Area': 200},
{'City': 'Albuquerque',
'Country': 'New Mexico',
'Population': 559277,
'Area': 491},
{'City': 'Los Cabos',
'Country': 'Mexico',
'Population': 287651,
'Area': 3750},
{'City': 'Greenville', 'Country': 'USA', 'Population': 84554, 'Area': 68},
{'City': 'Archipelago Sea',
'Country': 'Finland',
'Population': 60000,
'Area': 8300},
{'City': 'Walla Walla Valley',
'Country': 'USA',
'Population': 32237,
'Area': 33},
{'City': 'Salina Island', 'Country': 'Italy', 'Population': 4000, 'Area': 27},
{'City': 'Solta', 'Country': 'Croatia', 'Population': 1700, 'Area': 59},
{'City': 'Iguazu Falls',
'Country': 'Argentina',
'Population': 0,
'Area': 672}]
How could I achieve what I am trying to do here? Thanks for your help.
Your problem is that you are not actually using the different values in the dictionary as you iterate through it. Try this:
city_populations = []
for city in cities:
city_populations.append(city['Population'])
city_populations
Your 3rd attempt is actually almost there. It would work if you just put index = 0 outside of the loop.
Stop worrying about indexes. You want every Population field of each dictionary of the list of dictionaries.
Use a list comprehension:
city_populations = [city["Population"] for city in cities]
Here is my solution plus some bonus code:
cities = [{'City': 'Buenos Aires',
'Country': 'Argentina',
'Population': 2891000,
'Area': 4758},
{'City': 'Toronto', 'Country': 'Canada', 'Population': 2800000, 'Area': 2731},
{'City': 'Pyeongchang',
'Country': 'South Korea',
'Population': 2581000,
'Area': 3194},
{'City': 'Marakesh', 'Country': 'Morocco', 'Population': 928850, 'Area': 200},
{'City': 'Albuquerque',
'Country': 'New Mexico',
'Population': 559277,
'Area': 491},
{'City': 'Los Cabos',
'Country': 'Mexico',
'Population': 287651,
'Area': 3750},
{'City': 'Greenville', 'Country': 'USA', 'Population': 84554, 'Area': 68},
{'City': 'Archipelago Sea',
'Country': 'Finland',
'Population': 60000,
'Area': 8300},
{'City': 'Walla Walla Valley',
'Country': 'USA',
'Population': 32237,
'Area': 33},
{'City': 'Salina Island', 'Country': 'Italy', 'Population': 4000, 'Area': 27},
{'City': 'Solta', 'Country': 'Croatia', 'Population': 1700, 'Area': 59},
{'City': 'Iguazu Falls',
'Country': 'Argentina',
'Population': 0,
'Area': 672}]
# This is the specific solution to your problem
city_populations = []
for city in cities:
city_populations.append(city['Population'])
print(city_populations)
# In order to better understand what is happening please try also this code
for city in cities:
print (city)
print (city['Population'])
# A more pythonic and elegant way to solve the problem is using list comprehension
city_populations = [city["Population"] for city in cities]
print(city_populations)
# If you want to be able to access specific keys / values you can use items()
for key, value in cities[0].items():
print ("Key: " + key)
print("Value: " + str(value))

Categories

Resources