How to perform group by in python? - python

I need to read data from an excel file and perform group by on the data after that.
the structure of the data is like following:
n c
1 2
1 3
1 4
2 3
2 4
2 5
3 1
3 2
3 3
I need to read these data and then generate a list of dictionaries based on the c value.
desired output would be a list of dictionaries with c as keys and values of n as values like this:
[{1:[3]}, {2:[1,3]}, {3:[1,2,3]}, {4:[1,2]}, {5:[2]}]
I use this function to read data and it works fine:
data = pandas.read_excel("pathtofile/filename.xlsx", header=None)

You can try this way:
d1 = df.groupby('c')['n'].agg(list).to_dict()
res = [{k:v} for k,v in d1.items()]
print(res)
Output:
[{1: [3]}, {2: [1, 3]}, {3: [1, 2, 3]}, {4: [1, 2]}, {5: [2]}]

Sample output dict
d=df.groupby('c').n.agg(list).to_dict()
{1: [3], 2: [1, 3], 3: [1, 2, 3], 4: [1, 2], 5: [2]}

Related

Pandas dictionary creation optimization [duplicate]

I have an excel sheet that looks like so:
Column1 Column2 Column3
0 23 1
1 5 2
1 2 3
1 19 5
2 56 1
2 22 2
3 2 4
3 14 5
4 59 1
5 44 1
5 1 2
5 87 3
And I'm looking to extract that data, group it by column 1, and add it to a dictionary so it appears like this:
{0: [1],
1: [2,3,5],
2: [1,2],
3: [4,5],
4: [1],
5: [1,2,3]}
This is my code so far
excel = pandas.read_excel(r"e:\test_data.xlsx", sheetname='mySheet', parse_cols'A,C')
myTable = excel.groupby("Column1").groups
print myTable
However, my output looks like this:
{0: [0L], 1: [1L, 2L, 3L], 2: [4L, 5L], 3: [6L, 7L], 4: [8L], 5: [9L, 10L, 11L]}
Thanks!
You could groupby on Column1 and then take Column3 to apply(list) and call to_dict?
In [81]: df.groupby('Column1')['Column3'].apply(list).to_dict()
Out[81]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
Or, do
In [433]: {k: list(v) for k, v in df.groupby('Column1')['Column3']}
Out[433]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
According to the docs, the GroupBy.groups:
is a dict whose keys are the computed unique groups and corresponding
values being the axis labels belonging to each group.
If you want the values themselves, you can groupby 'Column1' and then call apply and pass the list method to apply to each group.
You can then convert it to a dict as desired:
In [5]:
dict(df.groupby('Column1')['Column3'].apply(list))
Out[5]:
{0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
(Note: have a look at this SO question for why the numbers are followed by L)

Extracting values from a dictionary for a respective key

I have a dictionary in a below-mentioned pattern:
dict_one = {1: [2, 3, 4], 2: [3, 4, 4, 5],3 : [2, 5, 6, 6]}
I need to get an output such that for each key I have only one value adjacent to it and then finally I need to create a data frame out of it.
The output would be similar to:
1 2
1 3
1 4
2 3
2 4
2 4
2 5
3 2
3 5
3 6
3 6
Please help me with this.
dict_one = {1: [2, 3, 4], 2: [3, 4, 4, 5],3 : [2, 5, 6, 6]}
df_column = ['key','value']
for key in dict_one.keys():
value = dict_one.values()
row = (key,value)
extended_ground_truth = pd.DataFrame.from_dict(row, orient='index', columns=df_column)
extended_ground_truth.to_csv("extended_ground_truth.csv", index=None)
You can normalize the data as you iterate the dictionary
df=pd.DataFrame(((key, value[0]) for key,value in dict_one.items()),
columns=["key", "value"])
You can wrap the values in lists, then use DataFrame.from_dict and finally use explode to expand the lists:
pd.DataFrame.from_dict({k: [v] for k, v in dict_one.items()}, orient='index').explode(0)

How to convert Pandas data frame to dict with values in a list

I have a huge Pandas data frame with the structure follows as an example below:
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'C', 'C', 'C'], 'col2': [1, 2, 5, 2, 4, 6]})
df
col1 col2
0 A 1
1 A 2
2 B 5
3 C 2
4 C 4
5 C 6
The task is to build a dictionary with elements in col1 as keys and corresponding elements in col2 as values. For the example above the output should be:
A -> [1, 2]
B -> [5]
C -> [2, 4, 6]
Although I write a solution as
from collections import defaultdict
dd = defaultdict(set)
for row in df.itertuples():
dd[row.col1].append(row.col2)
I wonder if somebody is aware of a more "Python-native" solution, using in-build pandas functions.
Without apply we do it by for loop
{x : y.tolist() for x , y in df.col2.groupby(df.col1)}
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}
Use GroupBy.apply with list for Series of lists and then Series.to_dict:
d = df.groupby('col1')['col2'].apply(list).to_dict()
print (d)
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}

Convert a Pandas DataFrame with repetitive keys to a dictionary

I have a DataFrame with two columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in same row be values. However, entries in the first column are repeated -
Keys Values
1 1
1 6
1 9
2 3
3 1
3 4
The dict I want is - {1: [1,6,9], 2: [3], 3: [1,4]}
I am using the code - mydict=df.set_index('Keys').T.to_dict('list') however, the output has only unique values of keys. {1: [9], 2: [3], 3: [4]}
IIUC you can groupby on the 'Keys' column and then apply list and call to_dict:
In[32]:
df.groupby('Keys')['Values'].apply(list).to_dict()
Out[32]: {1: [1, 6, 9], 2: [3], 3: [1, 4]}
Breaking down the above into steps:
In[35]:
# groupby on the 'Keys' and apply list to group values into a list
df.groupby('Keys')['Values'].apply(list)
Out[35]:
Keys
1 [1, 6, 9]
2 [3]
3 [1, 4]
Name: Values, dtype: object
convert to a dict
In[37]:
# make a dict
df.groupby('Keys')['Values'].apply(list).to_dict()
Out[37]: {1: [1, 6, 9], 2: [3], 3: [1, 4]}
Thanks to #P.Tillman for the suggestion that to_frame was unnecessary, kudos to him
try this,
df.groupby('Keys')['Values'].unique().to_dict()
Output:
{1: array([1, 6, 9]), 2: array([3]), 3: array([1, 4])}

merge python dictionary keys based on common elements

Let's say we have a dictionary like this:
{0: [2, 8], 1: [8, 4], 3: [5]}
Then we encounter a key value pair 2 , 8 . Now, as the value 2 and 8 has already appeared for key 0, I need to merge the first two keys and create a new dictionary like the following:
{0: [2, 8, 4], 3: [5]}
I understand that it's possible to do a lot of looping and deleting. I'm really looking for a more pythonish way.
Thanks in advance.
your coworkers will hate you later but here
>>> d = {0: [2, 8], 1: [8, 4], 3: [5]}
>>> x = ((a,b) for a,b in itertools.combinations(d,2) if a in d and b in d and set(d[a]).intersection(d[b]))
>>> for a,b in x:d[min(a,b)].extend([i for i in d[max(a,b)] if i not in d[min(a,b)]]) or d.pop(max(a,b))
[8, 4]
>>> d
{0: [2, 8, 4], 3: [5]}
d = {0: [2, 8],
1: [8, 4],
3: [5]}
revmap = {}
for k,vals in d.items():
for v in vals:
revmap[k] = v
k,v = 2,8
d[revmap[k]].extend([i for i in d[revmap[v]] if i not in d[revmap[k]]])
d.pop(revmap[v])

Categories

Resources