Pandas appending dictionary values with iterrows row values - python

I have a dict of city names, each having an empty list as a value. I am trying to use
df.iterrows()
to append corresponding names to each dict key(city):
for index, row in df.iterrows():
dict[row['city']].append(row['fullname'])
Can somebody explain why the code above appends all possible 'fullname' values to each dict's key instead of appending them to their respective city keys?
I.e. instead of getting the result
{"City1":["Name1","Name2"],"City2":["Name3","Name4"]}
I'm getting
{"City1":["Name1","Name2","Name3","Name4"],"City2":["Name1","Name2","Name3","Name4"]}
Edit: providing a sample of the dataframe:
d = {'fullname': ['Jason', 'Katty', 'Molly', 'Nicky'],
'city': ['Arizona', 'Arizona', 'California', 'California']}
df = pd.DataFrame(data=d)
Edit 2:
I'm pretty sure that my problem lies in my dict, since I created it in the following way:
cities = []
for i in df['city']:
cities.append(i)
dict = dict.fromkeys(set(cities), [])
when I call dict, i get the correct output:
{"Arizona":[],"California":[]}
However if I specify a key dict['Arizona'], i get this:
{"index":[],"columns":[],"data":[]}

I'm surprised it works at all, because row is a Series.
How about this alternative approach:
for city in your_dict.keys():
your_dict[city] += list(df["fullname"][df["city"] == city])
You should always avoid iterating through dataframes unless it's absolutely necessary.

The problem is indeed .fromkeys - the default value is evaluated once - so all of the keys are "pointing to" the same list.
>>> dict.fromkeys(['one', 'two'], [])
{'one': [], 'two': []}
>>> d = dict.fromkeys(['one', 'two'], [])
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': ['three']}
You'd need a comprehension to create a distinct list for each key.
>>> d = { k: [] for k in ['one', 'two'] }
>>> d
{'one': [], 'two': []}
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': []}
You are also manually implementing a groupby with your code:
>>> df.groupby('city')['fullname'].agg(list)
city
Arizona [Jason, Katty]
California [Molly, Nicky]
Name: fullname, dtype: object
If you want a dict:
>>> df.groupby('city')['fullname'].agg(list).to_dict()
{'Arizona': ['Jason', 'Katty'], 'California': ['Molly', 'Nicky']}

Related

extract dataframes from list of dictionaries and combine into one

I have a list of dictionaries. Each item in the list is a dictionary. Each dictionary is a pair of key and value with the value being a data frame.
I would like to extract all the data frames and combine them into one.
I have tried:
df = pd.DataFrame.from_dict(data)
for both the full data file and for each dictionary in the list.
This gives the following error:
ValueError: If using all scalar values, you must pass an index
I have also tried turning the dictionary into a list, then converting to a pd.DataFrame, i get:
KeyError: 0
Any ideas?
It should be doable with pd.concat(). Let's say you have a list of dictionaries l:
l = (
{'a': pd.DataFrame(np.arange(9).reshape((3,3)))},
{'b': pd.DataFrame(np.arange(9).reshape((3,3)))},
{'c': pd.DataFrame(np.arange(9).reshape((3,3)))}
)
You can feed dataframes from each dict in the list to pd.concat():
df = pd.concat([[pd.DataFrame(df_) for df_ in dict_.values()][0] for dict_ in l])
In my example all data frames have the same number of columns, so the result has 9 x 3 shape. If your dataframes have different columns the output will be malformed and required extra steps to process.
This should work.
import pandas as pd
dict1 = {'d1': pd.DataFrame({'a': [1,2,3], 'b': ['one', 'two', 'three']})}
dict2 = {'d2': pd.DataFrame({'a': [4,5,6], 'b': ['four', 'five', 'six']})}
dict3 = {'d3': pd.DataFrame({'a': [7,8,9], 'b': ['seven', 'eigth', 'nine']})}
# dicts list. you would start from here
dicts_list = [dict1, dict2, dict3]
dict_counter = 0
for _dict in dicts_list:
aux_df = list(_dict.values())[0]
if dict_counter == 0:
df = aux_df
else:
df = df.append(aux_df)
dict_counter += 1
# Reseting and dropping old index
df = df.reset_index(drop=True)
print(df)
Just out of curiosity: Why are your sub-dataframes already included in a dictionary? An easy way of creating a dataframe from dictionaries is just building a list of dictionaries and then calling pd.DataFrame(list_with_dicts). If the keys are the same across all dictionaries, it should work. Just a suggestion from my side. Something like this:
list_with_dicts = [{'a': 1, 'b': 2}, {'a': 5, 'b': 4}, ...]
# my_df -> DataFrame with columns [a, b] and two rows with the values in the dict.
my_df = pd.DataFrame(list_with_dicts)

Converting pandas multiple columns to dictionaries

I have a pandas data frame with 100 columns and I want to create a dictionary for each column with the first column as key and I've been doing it manually.
Let me explain with a sample dataframe
ID a b c
123 jon foo bar
789 pan bam fan
278 car bike boat
Let's consider the above table with column names ID, a, b, c. Now I am trying to create dictionaries for every column with ID being the key of the dictionary.
something like below
dicta = {ID: a}
dictb = {ID: b}
dictc = {ID: c}
What I am doing currently is:
dicta_ = dict(zip(df['ID'], df['a']))
dicta = {k:v for k,v in dicta_.items()}
dictb_ = dict(zip(df['ID'], df['b']))
dictb = {k:v for k,v in dictb_.items()}
dictc_ = dict(zip(df['ID'], df['c']))
dictc = {k:v for k,v in dictc_.items()}
and the above code is fetching me the desired result but I have to do it manually for all the 100 columns which is not the most efficient way to do it.
I would really appreciate if I could get some help or suggestion to automate the process by writing a loop or function. Thanks in advance!
set_index then use df.to_dict():
d = df.set_index('ID').to_dict()
then call the column with the value:
d['a']
# {123: 'jon', 278: 'car', 789: 'pan'}
d['b']
# {123: 'foo', 278: 'bike', 789: 'bam'}

Python - Get dictionary element in a list of dictionaries after an if statement

How can I get a dictionary value in a list of dictionaries, based on the dictionary satisfying some condition? For instance, if one of the dictionaries in the list has the id=5, I want to print the value corresponding to the name key of that dictionary:
list = [{'name': 'Mike', 'id': 1}, {'name': 'Ellen', 'id': 5}]
id = 5
if any(m['id'] == id for m in list):
print m['name']
This won't work because m is not defined outside the if statement.
You have a list of dictionaries, so you can use a list comprehension:
[d for d in lst if d['id'] == 5]
# [{'id': 5, 'name': 'Ellen'}]
new_list = [m['name'] for m in list if m['id']==5]
print '\n'.join(new_list)
This will be easy to accomplish with a single for-loop:
for d in list:
if 'id' in d and d['in'] == 5:
print(d['name'])
There are two key concepts to learn here. The first is that we used a for loop to "go through each element of the list". The second, is that we used the in word to check if a dictionary had a certain key.
How about the following?
for entry in list:
if entry['id']==5:
print entry['name']
It doesn't exist in Python2, but a simple solution in Python3 would be to use a ChainMap instead of a list.
import collections
d = collections.ChainMap(*[{'name':'Mike', 'id': 1}, {'name':'Ellen', 'id': 5}])
if 'id' in d:
print(d['id'])
You can do it by using the filter function:
lis = [ {'name': 'Mike', 'id': 1}, {'name':'Ellen', 'id': 5}]
result = filter(lambda dic:dic['id']==5,lis)[0]['name']
print(result)

Check if items in a list exist in dictionary

My question might be a little complicated to understand but here's actually the thing. I have a nested dictionary that looks like this:
dict_a = {'one': {'bird':2, 'tree':6, 'sky':1, 'TOTAL':9},
'two': {'apple':3, 'sky':1, 'TOTAL':4},
'three': {'tree':6, 'TOTAL':6},
'four': {'nada':1, 'TOTAL':1},
'five': {'orange':2, 'bird':3, 'TOTAL':5}
}
and a list:
list1 = ['bird','tree']
newlist = []
how can I check the items in list1 whether it is in the nested dictionary of dict_a and append it to the newlist? The output should look like this:
newlist = ['one','three','five']
since bird and tree happened to be in the nested dictionary of one, three and five.
What I can think of is:
for s,v in dict_a.items():
for s1,v1 in v.items():
for item in list1:
if item == s1:
newlist.append(s)
Make list1 a set and use dictionary views, and a list comprehension:
set1 = set(list1)
newlist = [key for key, value in dict_a.iteritems() if value.viewkeys() & set1]
In Python 3, use value.keys() and dict_a.items instead.
This tests if there is a set intersection between the dictionary keys and the set of keys you are looking for (an efficient operation).
Demo:
>>> dict_a = {'one': {'bird':2, 'tree':6, 'sky':1, 'TOTAL':9},
... 'two': {'apple':3, 'sky':1, 'TOTAL':4},
... 'three': {'tree':6, 'TOTAL':6},
... 'four': {'nada':1, 'TOTAL':1},
... 'five': {'orange':2, 'bird':3, 'TOTAL':5}
... }
>>> set1 = {'bird','tree'}
>>> [key for key, value in dict_a.iteritems() if value.viewkeys() & set1]
['three', 'five', 'one']
Note that dictionary ordering is arbitrary (depending on the keys used and dictionary insertion and deletion history), so the output list order may differ.
Technically speaking, you can use your list directly too (value.viewkeys() & list1 works) but making it a set states your intention more clearly.

Dict containing keys with lists of lists

I have a dict structured as below:
{ 'records':[['15','2013-04-02','Mexico','blah','bleh',1,2],['25','2013-04-02','Italy','meh','heh',3,4]], 'attributes':['id','date','location','descr1','descr2','total1','total2'] }
It was created from json using json.load.
How can I iterate through the records key as to make ['records'][0] a key in a new dict and the remainder of each list in ['records'] be the value for that key.
Something like this is what I am thinking, may not even be possible, I am new to Python:
{ '15':['2013-04-02','Mexico','blah','bleh',1,2], '25':['2013-04-02','Italy','meh','heh',3,4] }
Could someone point me in the right direction to going about iterating through the original dict to create the new one?
If d is your dictionary:
In [5]: {rec[0]:rec[1:] for rec in d['records']}
Out[5]:
{'15': ['2013-04-02', 'Mexico', 'blah', 'bleh', 1, 2],
'25': ['2013-04-02', 'Italy', 'meh', 'heh', 3, 4]}
rec_lsts = orgi_dict['records']
new_dict = {}
for l_list in rec_lsts:
new_dict[l_lst[0]] = l_lst[1:]
d = { 'records':[['15','2013-04-02','Mexico','blah','bleh',1,2], ['25','2013-04-02','Italy','meh','heh',3,4]], 'attributes':['id','date','location','descr1','descr2','total1','total2']}
new_d = {}
for a in d['records']:
new_d[a[0]] = a[1:]
print new_d

Categories

Resources