Create dictionary from 2 columns of Dataframe

Create dictionary from 2 columns of Dataframe - python

I have a dataframe:
df = pd.DataFrame({
'ID': ['1', '4', '4', '3', '3', '3'],
'club': ['arts', 'math', 'theatre', 'poetry', 'dance', 'cricket']
})
Note: Both the columns of the data frame can have repeated values.
I want to create a dictionary of dictionaries for every rank with its unique club names.
It should look like this:
{
{'1':'arts'}, {'4':'math','theatre'}, {'3':'poetry','dance','cricket'}
}
Kindly help me with this

Try groupby() and then to_dict():
grouped = df.groupby("ID")["club"].apply(set)
print(grouped)
> ID
1 {arts}
3 {cricket, poetry, dance}
4 {math, theatre}
grouped_dict = grouped.to_dict()
print(grouped_dict)
> {'1': {'arts'}, '3': {'cricket', 'poetry', 'dance'}, '4': {'math', 'theatre'}}
Edit:
Changed to .apply(set) to get sets.

You can use a defaultdict:
from collections import defaultdict
d = defaultdict(set)
for k,v in zip(df['ID'], df['club']):
d[k].add(v)
dict(d)
output:
{'1': {'arts'}, '4': {'math', 'theatre'}, '3': {'cricket', 'dance', 'poetry'}}
or for a format similar to the provided output:
[{k:v} for k,v in d.items()]
output:
[{'1': {'arts'}},
{'4': {'math', 'theatre'}},
{'3': {'cricket', 'dance', 'poetry'}}]

Related

How to get individual fields in list of dictionary

I have a list of dictionaries called dictList that has data like so:
[{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}].
I am trying to create a new dictionary that uses the id as the key and total as the value.
So have tried this:
keys = [d['id'] for d in dictList]
values = [d['total'] for d in dictList]
new_dict[str(keys)]= values
However the output is: {"['5', '5']": [39, 43]}
I am not sure what is going on, I am just trying to get the id and the respective total like 5, 39 and 5, 43 in to new_dict.
EDIT:
Please note that dictList contains all the products with ID 5. There are other fields, but I didn't include them.

One approach:
data = [{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}]
res = {}
for d in data:
key = d["id"]
if key not in res:
res[key] = 0
res[key] += int(d["total"])
print(res)
Output
{'5': 82}
Alternative using collections.defaultdict:
from collections import defaultdict
data = [{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}]
res = defaultdict(int)
for d in data:
key = d["id"]
res[key] += int(d["total"])
print(res)
Output
defaultdict(<class 'int'>, {'5': 82})

Use sorted and itertools.groupby to group by the 'id' key of each list element:
import itertools
dictList = [{'id': '5', 'total': '39'}, {'id': '10', 'total': '10'},
{'id': '5', 'total': '43'}, {'id': '10', 'total': '22'}]
groups = itertools.groupby(sorted(dictList, key=lambda item: item['id'])
, key=lambda item: item['id'])
Next, take the sum of each group:
product_totals = {
key: sum(int(item['total']) for item in grp)
for key, grp in groups
}
Which gives:
{'10': 32, '5': 82}
If you have lots of such entries, you could consider using pandas to create a dataframe. Pandas has vectorized methods that help you crunch numbers faster. The idea behind finding the sum of totals is the same, except in this case we don't need to sort because pandas.groupby takes care of that for us
>>> import pandas as pd
>>> df = pd.DataFrame(dictList)
>>> df['total'] = df['total'].astype(int)
>>> df
id total
0 5 39
1 10 10
2 5 43
3 10 22
>>> df.groupby('id').total.sum()
id
10 32
5 82
Name: total, dtype: int32
>>> df.groupby('id').total.sum().as_dict()
{'10': 32, '5': 82}

Although I'm not sure what you are trying to do, try this:
for d in dictlist:
if new_dict[d["id"]]:
new_dict[d["id"]] += d["total"]
else:
new_dict[d["id"]] = d["total"]

dict from a dict of list

I have a Python dictionary with following format:
d1 = {'Name':['ABC'], 'Number':['123'], 'Element 1':['1', '2', '3'],
'Element2':['1','2','3']}
Expected output:
{'Name': 'ABC', 'Number': '123',
'Elements': [{'Element 1': '1', 'Element2': '1'},
{'Element 1': '2', 'Element2': '2'},
{'Element 1': '3', 'Element2': '3'}]
I have tried the following:
[{k: v[i] for k, v in d1.items() if i < len(v)}
for i in range(max([len(l) for l in d1.values()]))]
but getting this result:
[{'Name': 'ABC', 'Number': '123', 'Element 1': '1', 'Element 2': '1'},
{'Element 1': '2', 'Element 2': '2'},
{'Element 1': '3', 'Element 2': '3'}]
How can I go from here?

I strongly recommend not trying to do everything in one line. It's not always more efficient, and almost always less readable if you have any branching logic or nested loops.
Given your dict, we can pop() the Name and Number keys into our new dict. Then
output = dict()
d1 = {'Name':['ABC'], 'Number':['123'], 'Element 1':['1', '2', '3'], 'Element2':['1','2','3']}
output["Name"] = d1.pop("Name")
output["Number"] = d1.pop("Number")
print(output)
# prints:
# {'Name': ['ABC'], 'Number': ['123']}
print(d1)
# prints:
# {'Element 1': ['1', '2', '3'], 'Element2': ['1', '2', '3']}
Then, we zip all remaining values in the dictionary, and add them to a new list:
mylist = []
keys = d1.keys()
for vals in zip(*d1.values()):
temp_obj = dict(zip(keys, vals))
mylist.append(temp_obj)
print(mylist)
# prints:
# [{'Element 1': '1', 'Element2': '1'},
# {'Element 1': '2', 'Element2': '2'},
# {'Element 1': '3', 'Element2': '3'}]
And finally, assign that to output["Elements"]
output["Elements"] = mylist
print(output)
# prints:
# {'Name': ['ABC'], 'Number': ['123'], 'Elements': [{'Element 1': '1', 'Element2': '1'}, {'Element 1': '2', 'Element2': '2'}, {'Element 1': '3', 'Element2': '3'}]}
Since you don't want to hardcode the first two keys,
for k, v in d1.items():
if "element" not in k.lower():
output[k] = v
Or as a dict-comprehension:
output = {k: v for k, v in d1.items() if "element" not in k.lower()}

use a list of tuples to create the elements list of dictionaries. Use Convert to build your dictionary item from the tuple.
#https://www.geeksforgeeks.org/python-convert-list-tuples-dictionary/
d1 = {'Name':['ABC'], 'Number':['123'], 'Element 1':['1', '2', '3'],
'Element2':['1','2','3']}
def Convert(tup, di):
for a, b in tup:
di[a]=b
return di
dict={}
listElements=[]
for key,value in d1.items():
if isinstance(value,list) and len(value)>1:
for item in value:
listElements.append((key,item))
elif isinstance(value,list) and len(value)==1:
dict[key]=value[0]
else:
dict[key]=value
dict['Elements']=[Convert([(x,y)],{}) for x,y in listElements]
print(dict)
output:
{'Name': 'ABC', 'Number': '123', 'Elements': [{'Element 1': '1'}, {'Element 1': '2'}, {'Element 1': '3'}, {'Element2': '1'}, {'Element2': '2'}, {'Element2': '3'}]}

I'm going to explain step by step:
We build new_d1 variable, that is the dictionary you expect as output and it's initialized as {'Name': 'ABC', 'Number': '123'}. For achieving the above, we use comprehension notation taking into account the keys != 'Element'
new_d1 = {key: d1.get(key)[0] for key in filter(lambda x: 'Element' not in x, d1)}
We build elements variable, that's a list with the dictionaries matter for us, I mean, the dictionaries we have to manipulate to achieve the expected result. Then elements is [{'Element 1': ['1', '2', '3']}, {'Element2': ['1', '2', '3']}].
elements = [{key: d1.get(key)} for key in filter(lambda x: 'Element' in x, d1)]
We are going to do a Cartesian product using itertools.product taking into account each key and each item of the values present in elements.
product = [list(it.product(d.keys(), *d.values())) for d in elements]
Using zip, we arrange the data and covert them in dictionary. And finally we create "Elements" key in new_df1
elements_list = [dict(t) for index, t in enumerate(list(zip(*product)))]
new_d1["Elements"] = elements_list
print(new_d1)
Full code:
import itertools as it
new_d1 = {key: d1.get(key)[0] for key in filter(lambda x: 'Element' not in x, d1)}
elements = [{key: d1.get(key)} for key in filter(lambda x: 'Element' in x, d1)]
product = [list(it.product(d.keys(), *d.values())) for d in elements]
elements_list = [dict(t) for index, t in enumerate(list(zip(*product)))]
new_d1["Elements"] = elements_list
Output:
{'Elements': [{'Element 1': '1', 'Element2': '1'},
{'Element 1': '2', 'Element2': '2'},
{'Element 1': '3', 'Element2': '3'}],
'Name': 'ABC',
'Number': '123'}

How to get the keys if one of key is or '0' in dictionary

I have Dictionary is below, My product output is below. I need to create a new dictionary with two types Out_1, Out_2
product = {'Product1': {'index': '1', '1': 'Book', '2': 'Pencil', '3': 'Pen','value': '1'},
'Product2': {'index': '2', '1': 'Marker', '2': 'MYSQL', '3': 'Scale','value': '0'}}
If value inside product is 0 then extract the keys
Expected Output
Out_1 = {'Product2': {1:'Marker': '2': 'Compass', '3': 'Scale', 'value': 0}}
Out_2 = {'Product2':['Marker','Compass','Scale', '0']}
Psuedo code is below. i tried to create but not able to create as above
Out_1 = {}
Out_2 = {i:[]}
for i,j in product.items():
for a,b in j.items():
if a['value'] == 0:
Out_2.append(i)
I am getting indices error, How to get Out_1, Out_2

You can use dict comprehensions for this.
out_1 = {k: v for k, v in product.items() if v['value']=='0'}
out_2 = {k: list(v.values()) for k, v in product.items() if v['value']=='0'}

Hi do you really need this index variable? If not yes why would not you use list of dicts instead of dict of dicts. However here is what you wanted:
products = {'Product1': {'index': '1', '1': 'Book', '2': 'Pencil', '3': 'Pen','value': '1'},
'Product2': {'index': '2', '1': 'Marker', '2': 'MYSQL', '3': 'Scale','value': '0'}}
for k,product in products.items():
product.pop('index', None)
if product['value'] == '0':
products[k] = list(product.values())
print(products)
>>> {'Product1': {'1': 'Book', '2': 'Pencil', '3': 'Pen', 'value': '1'}, 'Product2': ['Marker', 'MYSQL', 'Scale', '0']}
I was not assigning it to any other variables like out1/2 in case you have more than 2 products

Here it is:
Code:
product = {'Product1': {'index': '1', '1': 'Book', '2': 'Pencil', '3': 'Pen','value': '1'},
'Product2': {'index': '2', '1': 'Marker', '2': 'MYSQL', '3': 'Scale','value': '0'}}
output_1 = {}
output_2 = {}
for key,val in product.items():
if (val['value'] == '0'):
output_1[key]=val
output_2[key]=val.values()
print(output_1)
print(output_2)
Output:
{'Product2': {'1': 'Marker', 'index': '2', '3': 'Scale', '2': 'MYSQL', 'value': '0'}}
{'Product2': ['Marker', '2', 'Scale', 'MYSQL', '0']}

iterate over data frame rows in pandas

Iterate over the rows in pandas and get the list of objects from rows
import pandas as pd;
df=pd.read_json("inputfile.txt")
data
0 {'M': {'1': 'data', '2': 'data2'}}
1 {'M': {'3': '555', '5': '3333'}}
data=[]
for row in df.iterrows():
d = [{k1+k2:v2 for k1,v1 in x.items() for k2,v2 in v1.items()} for x in row]
data.append(d)
print (data)
getting output like this
[[{'M1': 'data', 'M2': 'data2'}], [{'M3': '555', 'M5': '3333'}]]
need the output like this
[{'M1': 'data', 'M2': 'data2'}, {'M3': '555', 'M5': '3333'}]

.extend() - extends the list by adding all items of a list (passed as an argument) to the end.
Ex.
import pandas as pd
data = {'data':[{'M': {'1': 'data', '2': 'data2'}},{'M': {'3': '555', '5': '3333'}}]}
df = pd.DataFrame(data)
print(df)
result=[]
for row in df.iterrows():
x = [{"{0}{1}".format(k,k1) : v1 for k,v in x[1].items() for k1,v1 in v.items()} for x in row[1].items() ]
result.extend(x)
print(result)
Or single-line list comprehension
x = [{"{0}{1}".format(k,k1) : v1 for k,v in x[1].items() for k1,v1 in v.items()} for row in df.iterrows()
for x in row[1].items() ]
print(x)
O/P:
data
0 {'M': {'1': 'data', '2': 'data2'}}
1 {'M': {'3': '555', '5': '3333'}}
[{'M1': 'data', 'M2': 'data2'}, {'M3': '555', 'M5': '3333'}]

By doing:
d = [{k1+k2:v2 for k1,v1 in x.items() for k2,v2 in v1.items()} for x in row]
You are creating a list. And appending it to data.
Modified code:
import pandas as pd
df = pd.DataFrame({'data': [{'M': {'1': 'data', '2': 'data2'}}, {'M': {'3': '555', '5': '3333'}}]})
data = []
for row in df.iterrows():
d = {k1+k2:v2 for x in row[1] for k1,v1 in x.items() for k2,v2 in v1.items()}
data.append(d)
print(data)

Exclude repeated values from a dictionary and increment the 'qty' field accordingly

Considering '1', '2', '3', '4' are the indexes and everything else as the values of a dictionary in Python, I'm trying to exclude the repeating values and increment the quantity field when a dupicate is found. e.g.:
Turn this:
a = {'1': {'name': 'Blue', 'qty': '1', 'sub': ['sky', 'ethernet cable']},
'2': {'name': 'Blue', 'qty': '1', 'sub': ['sky', 'ethernet cable']},
'3': {'name': 'Green', 'qty': '1', 'sub': []},
'4': {'name': 'Blue', 'qty': '1', 'sub': ['sea']}}
into this:
b = {'1': {'name': 'Blue', 'qty': '2', 'sub': ['sky', 'ethernet cable']},
'2': {'name': 'Green', 'qty': '1', 'sub': []},
'3': {'name': 'Blue', 'qty': '1', 'sub': ['sea']}}
I was able to exclude the duplicates, but I'm having a hard time incrementing the 'qty' field:
b = {}
for k,v in a.iteritems():
if v not in b.values():
b[k] = v
P.S.: I posted this question earlier, but forgot to add that the dictionary can have that 'sub' field which is a list. Also, don't mind the weird string indexes.

First, convert the original dict 'name' and 'sub' keys to a comma-delimited string, so we can use set():
data = [','.join([v['name']]+v['sub']) for v in a.values()]
This returns
['Blue,sky,ethernet cable', 'Green', 'Blue,sky,ethernet cable', 'Blue,sea']
Then use the nested dict and list comprehensions as below:
b = {str(i+1): {'name': j.split(',')[0], 'qty': sum([int(qty['qty']) for qty in a.values() if (qty['name']==j.split(',')[0]) and (qty['sub']==j.split(',')[1:])]), 'sub': j.split(',')[1:]} for i, j in enumerate(set(data))}

Maybe you can try to use a counter like this:
b = {}
count = 1
for v in a.values():
if v not in b.values():
b[str(count)] = v
count += 1
print b

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create dictionary from 2 columns of Dataframe - python

Related

How to get individual fields in list of dictionary

dict from a dict of list

How to get the keys if one of key is or '0' in dictionary

iterate over data frame rows in pandas

Exclude repeated values from a dictionary and increment the 'qty' field accordingly

Categories

Resources