Convert a Pandas DataFrame with repetitive keys to a dictionary - python

I have a DataFrame with two columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in same row be values. However, entries in the first column are repeated -
Keys Values
1 1
1 6
1 9
2 3
3 1
3 4
The dict I want is - {1: [1,6,9], 2: [3], 3: [1,4]}
I am using the code - mydict=df.set_index('Keys').T.to_dict('list') however, the output has only unique values of keys. {1: [9], 2: [3], 3: [4]}

IIUC you can groupby on the 'Keys' column and then apply list and call to_dict:
In[32]:
df.groupby('Keys')['Values'].apply(list).to_dict()
Out[32]: {1: [1, 6, 9], 2: [3], 3: [1, 4]}
Breaking down the above into steps:
In[35]:
# groupby on the 'Keys' and apply list to group values into a list
df.groupby('Keys')['Values'].apply(list)
Out[35]:
Keys
1 [1, 6, 9]
2 [3]
3 [1, 4]
Name: Values, dtype: object
convert to a dict
In[37]:
# make a dict
df.groupby('Keys')['Values'].apply(list).to_dict()
Out[37]: {1: [1, 6, 9], 2: [3], 3: [1, 4]}
Thanks to #P.Tillman for the suggestion that to_frame was unnecessary, kudos to him

try this,
df.groupby('Keys')['Values'].unique().to_dict()
Output:
{1: array([1, 6, 9]), 2: array([3]), 3: array([1, 4])}

Related

Pandas dictionary creation optimization [duplicate]

I have an excel sheet that looks like so:
Column1 Column2 Column3
0 23 1
1 5 2
1 2 3
1 19 5
2 56 1
2 22 2
3 2 4
3 14 5
4 59 1
5 44 1
5 1 2
5 87 3
And I'm looking to extract that data, group it by column 1, and add it to a dictionary so it appears like this:
{0: [1],
1: [2,3,5],
2: [1,2],
3: [4,5],
4: [1],
5: [1,2,3]}
This is my code so far
excel = pandas.read_excel(r"e:\test_data.xlsx", sheetname='mySheet', parse_cols'A,C')
myTable = excel.groupby("Column1").groups
print myTable
However, my output looks like this:
{0: [0L], 1: [1L, 2L, 3L], 2: [4L, 5L], 3: [6L, 7L], 4: [8L], 5: [9L, 10L, 11L]}
Thanks!
You could groupby on Column1 and then take Column3 to apply(list) and call to_dict?
In [81]: df.groupby('Column1')['Column3'].apply(list).to_dict()
Out[81]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
Or, do
In [433]: {k: list(v) for k, v in df.groupby('Column1')['Column3']}
Out[433]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
According to the docs, the GroupBy.groups:
is a dict whose keys are the computed unique groups and corresponding
values being the axis labels belonging to each group.
If you want the values themselves, you can groupby 'Column1' and then call apply and pass the list method to apply to each group.
You can then convert it to a dict as desired:
In [5]:
dict(df.groupby('Column1')['Column3'].apply(list))
Out[5]:
{0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
(Note: have a look at this SO question for why the numbers are followed by L)

How to perform group by in python?

I need to read data from an excel file and perform group by on the data after that.
the structure of the data is like following:
n c
1 2
1 3
1 4
2 3
2 4
2 5
3 1
3 2
3 3
I need to read these data and then generate a list of dictionaries based on the c value.
desired output would be a list of dictionaries with c as keys and values of n as values like this:
[{1:[3]}, {2:[1,3]}, {3:[1,2,3]}, {4:[1,2]}, {5:[2]}]
I use this function to read data and it works fine:
data = pandas.read_excel("pathtofile/filename.xlsx", header=None)
You can try this way:
d1 = df.groupby('c')['n'].agg(list).to_dict()
res = [{k:v} for k,v in d1.items()]
print(res)
Output:
[{1: [3]}, {2: [1, 3]}, {3: [1, 2, 3]}, {4: [1, 2]}, {5: [2]}]
Sample output dict
d=df.groupby('c').n.agg(list).to_dict()
{1: [3], 2: [1, 3], 3: [1, 2, 3], 4: [1, 2], 5: [2]}

Extracting from lists of pandas series to another based on indexes

I have pandas data frame having 2 series each of them contains 2d arrays like,
a is the first series sub-array is of different length like
a:
0 [[1,2,3,4,5,6,7,7],[1,2,3,4,5],[5,9,3,2]]
1 [[1,2,3],[6,7],[8,9,10]]
and b is the second one but its subarray has only one element like
b:
0 [[0],[2],[3]]
1 [ [1],[0],[1]]
I want to extract elements of the a series based on indexes given in b.
The result of the above example should be like:
0 [1,3,2]
1 [2, 6, 9]
Can anyone please help? Thanks a lot
Setup
a = pd.Series({0: [[1, 2, 3, 4, 5, 6, 7, 7], [1, 2, 3, 4, 5], [5, 9, 3, 2]],
1: [[1, 2, 3], [6, 7], [8, 9, 10]]})
b = pd.Series({0: [[0], [2], [3]], 1: [[1], [0], [1]]})
Difficult to make this efficient since you have lists of varying sizes, but here's a solution using a list comprehension and zip:
out = pd.Series([[x[y] for x, [y] in zip(i, j)] for i, j in zip(a, b)])
0 [1, 3, 2]
1 [2, 6, 9]
dtype: object
You can use apply to index a with b:
df.apply(lambda row: [row.a[i][row.b[i][0]] for i in range(len(row[0]))], axis=1)
0 [1, 3, 2]
1 [2, 6, 9]
dtype: object
Data:
data = {"a":[[[1,2,3,4,5,6,7,7],[1,2,3,4,5],[5,9,3,2]],
[[1,2,3],[6,7],[8,9,10]]],
"b": [[[0],[2],[3]],
[[1],[0],[1]]]}
df = pd.DataFrame(data)

Python : extracting only the first element on a dictionary of list using functions

In a previous post I have asked for constructing sequences from a dataframe using dictionaries in Python.
Construct sequences from a dataframe using dictionaries in Python
I want to change the function proposed in the answer seleted to have only items in the list without dates.
Having:
{1: [1, 2], 2: [3, 1,], 4: [5, 3, 1]}
Instead of :
{1: [[1, 'date_1'], [2, 'date_2']],
2: [[3, 'date_1'], [1, 'date_3']],
4: [[5, 'date_2'], [3, 'date_3'], [1, 'date_5']]}
Changing the function :
fnc = lambda x: x.sort_values('date').values.tolist()
df.set_index('users').groupby(level=0).apply(fnc).to_dict()
You can just select the items column in the lambda as it's being called on the entire df, you can sub-select from the result so you only get the column of interest in the result:
In [249]:
fnc = lambda x: x.sort_values('date')['items'].values.tolist()
df.set_index('users').groupby(level=0).apply(fnc).to_dict()
Out[249]:
{1: [1, 2], 2: [3, 1], 4: [5, 3, 1]}

Pandas: Convert dataframe to dict of lists

I have a dataframe like this:
col1, col2
A 0
A 1
B 2
C 3
I would like to get this:
{ A: [0,1], B: [2], C: [3] }
I tried:
df.set_index('col1')['col2'].to_dict()
but that is not quite correct. The first issue I have is 'A' is repeated, I end up getting A:1 only (0 gets overwritten). How to fix?
You can use a dictionary comprehension on a groupby.
>>> {idx: group['col2'].tolist()
for idx, group in df.groupby('col1')}
{'A': [0, 1], 'B': [2], 'C': [3]}
Solution
df.groupby('col1')['col2'].apply(lambda x: x.tolist()).to_dict()
{'A': [0, 1], 'B': [2], 'C': [3]}

Categories

Resources