Pandas: Convert dataframe to dict of lists - python

I have a dataframe like this:
col1, col2
A 0
A 1
B 2
C 3
I would like to get this:
{ A: [0,1], B: [2], C: [3] }
I tried:
df.set_index('col1')['col2'].to_dict()
but that is not quite correct. The first issue I have is 'A' is repeated, I end up getting A:1 only (0 gets overwritten). How to fix?

You can use a dictionary comprehension on a groupby.
>>> {idx: group['col2'].tolist()
for idx, group in df.groupby('col1')}
{'A': [0, 1], 'B': [2], 'C': [3]}

Solution
df.groupby('col1')['col2'].apply(lambda x: x.tolist()).to_dict()
{'A': [0, 1], 'B': [2], 'C': [3]}

Related

How to convert Pandas data frame to dict with values in a list

I have a huge Pandas data frame with the structure follows as an example below:
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'C', 'C', 'C'], 'col2': [1, 2, 5, 2, 4, 6]})
df
col1 col2
0 A 1
1 A 2
2 B 5
3 C 2
4 C 4
5 C 6
The task is to build a dictionary with elements in col1 as keys and corresponding elements in col2 as values. For the example above the output should be:
A -> [1, 2]
B -> [5]
C -> [2, 4, 6]
Although I write a solution as
from collections import defaultdict
dd = defaultdict(set)
for row in df.itertuples():
dd[row.col1].append(row.col2)
I wonder if somebody is aware of a more "Python-native" solution, using in-build pandas functions.
Without apply we do it by for loop
{x : y.tolist() for x , y in df.col2.groupby(df.col1)}
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}
Use GroupBy.apply with list for Series of lists and then Series.to_dict:
d = df.groupby('col1')['col2'].apply(list).to_dict()
print (d)
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}

How Do You Check if an Entry Is in Pandas DataFrame?

How do I check if an entry is in a pandas DataFrame? For example, say
import pandas as pd
df = pd.DataFrame({'A' : [1,2,3], 'B' : [4,5,6], 'C' : [7,8,9]})
How do I check if the entry 1,4 exists in an entry under columns A,B, regardless of what is in C?
You can pass a dictionary (to isin) with the values to search by column:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
result = df.isin({'A': [1], 'B': [4]})
print(result)
Output
A B
0 True True
1 False False
2 False False
Afterwards you can find if the entry exists using all:
result = df.isin({'A': [1], 'B': [4]}).all(1)
print(result)
Output
0 True
1 False
2 False
dtype: bool
To use it in an if statement, and just on the columns ['A', 'B'], use any, for example:
if df[['A', 'B']].isin({'A': [1], 'B': [4]}).all(1).any():
print('found')

How to shorten this nested list comprehension?

This question is a continuation of this one: Comprehension list and output <generator object.<locals>.<genexpr> at 0x000002C392688C78>
I was oriented to create a new question.
I have a few dicts inside another dict. And they get pretty big sometimes, since I'm keeping them in log I would like to limit the size of them to 30 'items' (key:value).
So I tried something like this: (In the example I limit the size to two)
main_dict = {
'A':{
'a1': [1,2,3],
'a2': [4,5,6]
},
'B': {
'b1': [0,2,4],
'b2': [1,3,5]
}
}
print([main_dict[x][i][:2] for x in main_dict.keys() for i in main_dict[x].keys()])
The output I get is this:
[[1, 2], [4, 5], [0, 2], [1, 3]]
What I expected was this:
['A':['a1':[1, 2],'a2':[4, 5]], 'B':['b1':[0, 2], 'b2':[1, 3]]]
Or something like that. It doesn't have to be exactly that, but I need to know what value belongs to what dict, which isn't clear in the output I end up getting.
To put it simple all I want is to cut short the sub-dicts inside the dictionary. Elegantly, if possible.
This is a nice clean way to do it in one line, without altering the original dictionary:
print({key: {sub_k: ls[:2] for sub_k, ls in sub_dict.items()} for key, sub_dict in main_dict.items()})
Output:
{'A': {'a1': [1, 2], 'a2': [4, 5]}, 'B': {'b1': [0, 2], 'b2': [1, 3]}}
Your original trial used list comprehension [], but this case actually needs dict comprehension {}.
Try this:
print({key: {sub_key: lst[:2] for sub_key, lst in sub_dict.items()}
for key, sub_dict in main_dict.items()})
Note the use of {} (dict comprehension) instead of [] (list comprehension)
A more efficient approach is to use nested for loops to delete the tail end of the sub-lists in-place:
for d in main_dict.values():
for k in d:
del d[k][2:]
main_dict becomes:
{'A': {'a1': [1, 2], 'a2': [4, 5]}, 'B': {'b1': [0, 2], 'b2': [1, 3]}}
d = {'A':{
'a1': [1,2,3],
'a2': [4,5,6],
'a3': [7,8,9]
},
'B':{
'b1': [0,2,4],
'b2': [1,3,5]
}
}
If the dictionaries are only nested one-deep
q = []
for k,v in d.items():
keys, values = v.keys(), v.values()
values = (value[:2] for value in values)
q.append((k,tuple(zip(keys,values))))
I have rewrote my code based on the comments provided. See below.
my_dict = {}
for key, value in main_dict.iteritems():
sub_dict = {}
for sub_key, sub_value in value.iteritems():
sub_dict[sub_key] = sub_value[:2]
my_dict[key] = sub_dict
print my_dict
This will give you something that looks like this, and save it to a separate variable.
{'A': {'a1': [1, 2], 'a2': [4, 5]}, 'B': {'b1': [0, 2], 'b2': [1, 3]}}

Convert a Pandas DataFrame with repetitive keys to a dictionary

I have a DataFrame with two columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in same row be values. However, entries in the first column are repeated -
Keys Values
1 1
1 6
1 9
2 3
3 1
3 4
The dict I want is - {1: [1,6,9], 2: [3], 3: [1,4]}
I am using the code - mydict=df.set_index('Keys').T.to_dict('list') however, the output has only unique values of keys. {1: [9], 2: [3], 3: [4]}
IIUC you can groupby on the 'Keys' column and then apply list and call to_dict:
In[32]:
df.groupby('Keys')['Values'].apply(list).to_dict()
Out[32]: {1: [1, 6, 9], 2: [3], 3: [1, 4]}
Breaking down the above into steps:
In[35]:
# groupby on the 'Keys' and apply list to group values into a list
df.groupby('Keys')['Values'].apply(list)
Out[35]:
Keys
1 [1, 6, 9]
2 [3]
3 [1, 4]
Name: Values, dtype: object
convert to a dict
In[37]:
# make a dict
df.groupby('Keys')['Values'].apply(list).to_dict()
Out[37]: {1: [1, 6, 9], 2: [3], 3: [1, 4]}
Thanks to #P.Tillman for the suggestion that to_frame was unnecessary, kudos to him
try this,
df.groupby('Keys')['Values'].unique().to_dict()
Output:
{1: array([1, 6, 9]), 2: array([3]), 3: array([1, 4])}

How do I build a dict using list comprehension?

How do I build a dict using list comprehension?
I have two lists.
series = [1,2,3,4,5]
categories = ['A', 'B', 'A', 'C','B']
I want to build a dict where the categories are the keys.
Thanks for your answers I'm looking to produce:
{'A' : [1, 3], 'B' : [2, 5], 'C' : [4]}
Because the keys can't exist twice
You have to have a list of tuples. The tuples are key/value pairs. You don't need a comprehension in this case, just zip:
dict(zip(categories, series))
Produces {'A': 3, 'B': 5, 'C': 4} (as pointed out by comments)
Edit: After looking at the keys, note that you can't have duplicate keys in a dictionary. So without further clarifying what you want, I'm not sure what solution you're looking for.
Edit: To get what you want, it's probably easiest to just do a for loop with either setdefault or a defaultdict.
categoriesMap = {}
for k, v in zip(categories, series):
categoriesMap.setdefault(k, []).append(v)
That should produce {'A': [1, 3], 'B': [2, 5], 'C': [3]}
from collectons import defaultdict
series = [1,2,3,4,5]
categories = ['A', 'B', 'A', 'C','B']
result = defaultdict(list)
for key, val in zip(categories, series)
result[key].append(value)
Rather than being clever (I have an itertools solution I'm fond of) there's nothing wrong with a good, old-fashioned for loop:
>>> from collections import defaultdict
>>>
>>> series = [1,2,3,4,5]
>>> categories = ['A', 'B', 'A', 'C','B']
>>>
>>> d = defaultdict(list)
>>> for c,s in zip(categories, series):
... d[c].append(s)
...
>>> d
defaultdict(<type 'list'>, {'A': [1, 3], 'C': [4], 'B': [2, 5]})
This doesn't use a list comprehension because a list comprehension is the wrong way to do it. But since you seem to really want one for some reason: how about:
>> dict([(c0, [s for (c,s) in zip(categories, series) if c == c0]) for c0 in categories])
{'A': [1, 3], 'C': [4], 'B': [2, 5]}
That has not one but two list comprehensions, and is very inefficient to boot.
In principle you can do as Kris suggested: dict(zip(categories, series)), just be aware that there can not be duplicates in categories (as in your sample code).
EDIT :
Now that you've clarified what you intended, this will work as expected:
from collections import defaultdict
d = defaultdict(list)
for k, v in zip(categories, series):
d[k].append(v)
d={ k:[] for k in categories }
map(lambda k,v: d[k].append(v), categories, series )
result:
d is now = {'A': [1, 3], 'C': [4], 'B': [2, 5]}
or (equivalent) using setdefault (thanks Kris R.)
d={}
map(lambda k,v: d.setdefault(k,[]).append(v), categories, series )

Categories

Resources