Converting pandas multiple columns to dictionaries - python

I have a pandas data frame with 100 columns and I want to create a dictionary for each column with the first column as key and I've been doing it manually.
Let me explain with a sample dataframe
ID a b c
123 jon foo bar
789 pan bam fan
278 car bike boat
Let's consider the above table with column names ID, a, b, c. Now I am trying to create dictionaries for every column with ID being the key of the dictionary.
something like below
dicta = {ID: a}
dictb = {ID: b}
dictc = {ID: c}
What I am doing currently is:
dicta_ = dict(zip(df['ID'], df['a']))
dicta = {k:v for k,v in dicta_.items()}
dictb_ = dict(zip(df['ID'], df['b']))
dictb = {k:v for k,v in dictb_.items()}
dictc_ = dict(zip(df['ID'], df['c']))
dictc = {k:v for k,v in dictc_.items()}
and the above code is fetching me the desired result but I have to do it manually for all the 100 columns which is not the most efficient way to do it.
I would really appreciate if I could get some help or suggestion to automate the process by writing a loop or function. Thanks in advance!

set_index then use df.to_dict():
d = df.set_index('ID').to_dict()
then call the column with the value:
d['a']
# {123: 'jon', 278: 'car', 789: 'pan'}
d['b']
# {123: 'foo', 278: 'bike', 789: 'bam'}

Related

Pandas appending dictionary values with iterrows row values

I have a dict of city names, each having an empty list as a value. I am trying to use
df.iterrows()
to append corresponding names to each dict key(city):
for index, row in df.iterrows():
dict[row['city']].append(row['fullname'])
Can somebody explain why the code above appends all possible 'fullname' values to each dict's key instead of appending them to their respective city keys?
I.e. instead of getting the result
{"City1":["Name1","Name2"],"City2":["Name3","Name4"]}
I'm getting
{"City1":["Name1","Name2","Name3","Name4"],"City2":["Name1","Name2","Name3","Name4"]}
Edit: providing a sample of the dataframe:
d = {'fullname': ['Jason', 'Katty', 'Molly', 'Nicky'],
'city': ['Arizona', 'Arizona', 'California', 'California']}
df = pd.DataFrame(data=d)
Edit 2:
I'm pretty sure that my problem lies in my dict, since I created it in the following way:
cities = []
for i in df['city']:
cities.append(i)
dict = dict.fromkeys(set(cities), [])
when I call dict, i get the correct output:
{"Arizona":[],"California":[]}
However if I specify a key dict['Arizona'], i get this:
{"index":[],"columns":[],"data":[]}
I'm surprised it works at all, because row is a Series.
How about this alternative approach:
for city in your_dict.keys():
your_dict[city] += list(df["fullname"][df["city"] == city])
You should always avoid iterating through dataframes unless it's absolutely necessary.
The problem is indeed .fromkeys - the default value is evaluated once - so all of the keys are "pointing to" the same list.
>>> dict.fromkeys(['one', 'two'], [])
{'one': [], 'two': []}
>>> d = dict.fromkeys(['one', 'two'], [])
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': ['three']}
You'd need a comprehension to create a distinct list for each key.
>>> d = { k: [] for k in ['one', 'two'] }
>>> d
{'one': [], 'two': []}
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': []}
You are also manually implementing a groupby with your code:
>>> df.groupby('city')['fullname'].agg(list)
city
Arizona [Jason, Katty]
California [Molly, Nicky]
Name: fullname, dtype: object
If you want a dict:
>>> df.groupby('city')['fullname'].agg(list).to_dict()
{'Arizona': ['Jason', 'Katty'], 'California': ['Molly', 'Nicky']}

match values between two dictionaries, extract the keys with equals value in new dictionary

For example, I have:
dict1 = {"name":"Cristian","surname":"Rossi","nationality":"Italy","color":"red"}
dict2 = {"country":"Italy","loc":"Milan","other":"red","car":"ford"}
dictionaries is large, some thousands elements.
In this example, the values in both dictionaries are Italy and red. So, I would this result
dict3 = {"nationality":"country","color":"other"}
It may be easier to convert the dictionaries into sets?
Thanks!
Get a set of the common values in both dictionaries. Then get the keys for those values and build a dictionary.
dict1 = {"name":"Cristian","surname":"Rossi","nationality":"Italy","color":"red"}
dict2 = {"country":"Italy","loc":"Milan","other":"red","car":"ford"}
common = set(dict1.values()) & set(dict2.values())
keys1 = [k for k,v in dict1.items() if v in common]
keys2 = [k for k,v in dict2.items() if v in common]
d = {k1:k2 for k1,k2 in zip(keys1, keys2)}
print(d)
Output:
{'nationality': 'country', 'color': 'other'}
Here is one approach which first inverts the dicts and then looks at an intersection of the values. Given that intersection of values it then builds a final result with all of the keys that each value mapped to in the original dicts. Assumes Python 3.
d1 = {"name":"Cristian","surname":"Rossi","nationality":"Italy","color":"red"}
d2 = {"country":"Italy","loc":"Milan","other":"red","car":"ford"}
def inv_dict(d):
inv = {}
for k, v in d.items():
inv.setdefault(v, []).append(k)
return inv
id1 = inv_dict(d1)
id2 = inv_dict(d2)
result = {v:id1[v] + id2[v] for v in id1.keys() & id2.keys()}
print(result)
# {'Italy': ['nationality', 'country'], 'red': ['color', 'other']}
The output is slightly different than what you specified, but it's unclear how your example output would work if the same value appeared in multiple keys in one or both dicts.

Creating a column, where the value of each row is a key of a specified dict, based on whether existing column contains that dict value as a substring?

Say I have the following dictionary
dict = {'a': ['tool', 'device'], 'b': ['food', 'beverage']},
and I have a dataframe with a column with the first 2 row values as
'tools',
'foods'
and I want to create a new column where the 1st value is a, and the second is b.
What would be the best way to do this?
First dont use varable name dict, because builtins (python code word). Then are swapped values of dict - values with keys for new dict, get values from column by Series.str.findall by keys of dict and Series.map by dictionary for new column:
d = {'a': ['tool', 'device'], 'b': ['food', 'beverage']}
df = pd.DataFrame({'col':['tools','foods']})
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'tool': 'a', 'device': 'a', 'food': 'b', 'beverage': 'b'}
df['new'] = df['col'].str.findall('|'.join(d1.keys())).str[0].map(d1)
print (df)
col new
0 tools a
1 foods b
Or:
df['new'] = df['col'].str.extract('({})'.format('|'.join(d1.keys())), expand=False).map(d1)

Retrieving items from a nested dictionary with a nested for loop fresults in KeyError

I need to systematically access dictionaries that are nested within a list within a dictionary at the 3rd level, like this:
responses = {'1': {'responses': [{1st dict to be retrieved}, {2nd dict to be retrieved}, ...]},
'2': {'responses': [{1st dict to be retrieved}, {2nd dict to be retrieved}, ...]}, ...}
I need to unnest and transform these nested dicts into dataframes, so the end result should look like this:
responses = {'1': df1,
'2': df2, ...}
In order to achieve this, I built a for-loop in order to loop through all keys on the first level. Within that loop, I am using another loop to extract each item from the nested dicts into a new empty list called responses_df:
responses_dict = {}
for key in responses.keys():
for item in responses[key]['responses']:
responses_dict[key].update(item)
However, I get:
KeyError: '1'
The inner loop works if I use it individually on a key within the dict, but that doesn't really help me since the data comes from an API and has to be updated dynamically every few minutes in production.
The nex loop to transform the result into dataframes would look like this:
for key in responses_dict:
responses_df[key] = pd.DataFrame.from_dict(responses_dict[key], orient='index')
But I haven't gotten to try that out since the first operation fails.
Try this:
from collections import defaultdict
responses_dict = defaultdict(dict) # instead of {}
Then your code will work.
In fact responses_dict[key] where key=1 doesn't exist.
So when you simply do print(responses_dict[key]) you get the same error, 1 is not a key of that dict and update is not used as it should be.
Try the following syntax :
responses_dict = {}
for key in responses.keys():
print(key)
for item in responses[key]['responses']:
responses_dict.update(key = item)
I prefer using dictionaries while updating a dictionary.
If you update with an existing key, the value of that key will be updated.
If you update with an new key-value pair, the pair will be added to that dictionary.
>>>d1 = {1: 10, 2:20}
>>>d1.update({1:20})
>>>d1
>>>{1: 20, 2:20}
>>>d1.update({3:30})
>>>d1
>>>{1: 20, 2:20, 3:30}
Try fixing your line with:
responses_dict = {}
for key in responses.keys():
for item in responses[key]['responses']:
responses_dict.update({key: item})
So basically, use dictionary to update a dictionary, more readable and easy.
Try this:
responses = {'1': {'responses': [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}]},
'2': {'responses': [{'e': 5}, {'f': 6}]}}
result = {k: pd.DataFrame(chain.from_iterable(v['responses'])) for k, v in responses.items()}
for df in result.values():
print(df, end='\n\n')
Output:
0
0 a
1 b
2 c
3 d
0
0 e
1 f

creating a table out of list of data

so i have 3 list of data,
person=['bobby','tim','sarah','tim','sarah','bobby,'tim','sarah','bobby,]
places=["loc1","loc1","loc2","loc1","loc2","loc2","loc1","loc2",'loc1"]
i have to use this data to show how time a person visited a certain place.
how would i be able to use the above list to get something like that
person loc1 loc2
bobby 2 1
tim 3 0
sarah 0 3
if the list the list was like bobby=['loc1','loc1','loc2']
i could use bobby.count(loc1). to find the data need but it is different here, also I have no idea how to make the table
i am not entire codes , i just need to know how i should start .
Use a temporary dict to store the values:
d = {}
for name, loc in zip(person, places):
d.setdefault(name, []).append(loc)
And to print it:
>>> for k, v in d.items():
print(k, end='\t')
print(v.count('loc1'), end='\t')
print(v.count('loc2'))
tim 3 0
bobby 2 1
sarah 0 3
Hope this helps!
If you have to have two lists to store persons and theirs places, you can have the following code to construct a dict that indicates which person has visited which places and the times.
persons = ['bobby', 'tim', 'sarah', 'tim', 'sarah', 'bobby', 'tim', 'sarah', 'bobby',]
places = ['loc1', 'loc1', 'loc2', 'loc1', 'loc2', 'loc2', 'loc1', 'loc2', 'loc1']
person_to_places = {}
total = len(persons)
for i in xrange(total):
if not person_to_places.has_key(persons[i]):
person_to_places[persons[i]] = []
person_to_places[person[i]].append(places[i])
for name, place in person_to_places.items():
person_to_places[name] = {}
for p in places:
if not person_to_places[name].has_key(p):
person_to_places[name][p] = 0
person_to_places[name][p] += 1
then, you have a dict *person_to_places* like this:
{'bobby': {'loc1': 2, 'loc2': 1}, 'sarah': {'loc2': 3}, 'tim': {'loc1': 3}}
After that, output the table as you like.
I would use dictionaries. Start by creating a dictionary of dictionaries for each name like so (see how to create it below):
data = { 'bobby' : {'loc1':0, 'loc2':0},
'sarah' : {'loc1':0, 'loc2':0},
'tim' : {'loc1':0, 'loc2':0} }
If you don't know your places and person list, you can use sets to make the data dictionary in the above form (there are ways, but here's my way):
from sets import Set
data = dict( (name, dict( (place, 0) for place in Set(places) ) ) for name in Set(person) )
And then simply loop through your lists, incrementing relevant places:
for i in range(len(person)):
data[ person[i] ][ places[i] ] += 1
When your data dictionary is ready, you can print it however you want.
Here's how the data dictionary looks like after running the above code on the lists you provided.
>>> for key in sorted(data): print key, data[key]
...
bobby {'loc2': 1, 'loc1': 2}
sarah {'loc2': 3, 'loc1': 0}
tim {'loc2': 0, 'loc1': 3}

Categories

Resources