Grouping similar values in a dictionary

Grouping similar values in a dictionary - python

I'm new to programming and would appreciate if someone can help with the following in Python/Pandas.
I have a dictionary that has a list as the values. I'd like to be able to group together keys that have similar values. I've seen similar questions on here, but the catch in this case is i want to disregard the order of the values for example:
classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
jack and charles have the same values but in different order. I'd like an output that will give the value irrespective of order. In this case, the output would be written to a csv as
['20','male','soccer']: jack, charles
['26','male','tennis']: brian
['19','basketball','male']: zulu

Using frozensets, apply, groupby + agg:
s = pd.DataFrame(classmates).T.apply(frozenset, 1)
s2 = pd.Series(s.index.values, index=s)\
.groupby(level=0).agg(lambda x: list(x))
s2
(soccer, 20, male) [charles, jack]
(26, male, tennis) [brian]
(basketball, male, 19) [zulu]
dtype: object

You can invert the dictionary in the way you want with the following code:
classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
out_dict = {}
for key, value in classmates.items():
current_list = out_dict.get(tuple(sorted(value)), [])
current_list.append(key)
out_dict[tuple(sorted(value))] = current_list
print(out_dict)
This prints
{('20', 'male', 'soccer'): ['charles', 'jack'], ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']}

from collections import defaultdict
ans = defaultdict(list)
classmates={'jack':['20','male','soccer'],
'brian':['26','male','tennis'],
'charles':['male','soccer','20'],
'zulu':['19','basketball','male']
}
for k, v in classmates.items():
sorted_tuple = tuple(sorted(v))
ans[sorted_tuple].append(k)
# ans is: a dict you desired
# defaultdict(<class 'list'>, {('20', 'male', 'soccer'): ['jack','charles'],
# ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']})
for k, v in ans.items():
print(k, ':', v)
# output:
# ('20', 'male', 'soccer') : ['jack', 'charles']
# ('26', 'male', 'tennis') : ['brian']
# ('19', 'basketball', 'male') : ['zulu']

First of all convert your dictionary to a pandas dataframe.
df= pd.DataFrame.from_dict(classmates,orient='index')
Then sort it in ascending order by age.
df=df.sort_values(by=0,ascending=True)
Here 0 is a default column name. You can rename this column name.

You could do this in one line:
print({tuple(sorted(v)) : [k for k,vv in a.items() if sorted(vv) == sorted(v)] for v in a.values()})
or
Here is detailed solution :
dict_1 = {'jack': ['20', 'male', 'soccer'], 'brian': ['26', 'male', 'tennis'], 'charles': ['male', 'soccer', '20'],
'zulu': ['19', 'basketball', 'male']}
sorted_dict = {}
for key,value in dict_1.items():
sorted_1 = sorted(value)
sorted_dict[key] = sorted_1
tracking_of_duplicate = []
final_dict = {}
for key1,value1 in sorted_dict.items():
if value1 not in tracking_of_duplicate:
tracking_of_duplicate.append(value1)
final_dict[tuple(value1)] = [key1]
else:
final_dict[tuple(value1)].append(key1)
print(final_dict)

Related

Dictionary of dictionaries from list of lists with year keys

I have a list of list like this:
[['2014', 'MALE', 'WHITE NON HISPANIC', 'Zachary', '90', '39'],
['2014', 'MALE', 'WHITE NON HISPANIC', 'Zev', '49', '65']]
I want to converted in a dictionary like this:
{{2012: {1: 'David',
2: 'Joseph',
3: 'Michael',
4: 'Moshe'},
2013: {1: 'David',
2: 'Joseph',
3: 'Michael',
4: 'Moshe'},
I'm trying to do a list comprehension like this:
boy_names = {row[0]:{i:row[3]} for i,row in enumerate(records) if row[1]=='MALE'}
But the result I'm getting is like:
{'2011': {7851: 'Zev'}, '2012': {9855: 'Zev'},
'2013': {11886: 'Zev'}, '2014': {13961: 'Zev'}}
If I'm right, I think I'm taking the last value and its row number from enumerate by the year key, but no idea how to solve it.

You can use the length of the sub-dict under the year key to calculate the next incremental numeric key for the sub-dict under the current year. Use the dict.setdefault method to default the value of a new key to an empty dict:
boy_names = {}
for year, _, _, name, _, _ in records:
record = boy_names.setdefault(int(year), {})
record[len(record) + 1] = name

I believe you need
data = [['2014', 'MALE', 'WHITE NON HISPANIC', 'Zachary', '90', '39'],
['2014', 'MALE', 'WHITE NON HISPANIC', 'Zev', '49', '65']]
result = {}
for i in data: #Iterate sub-list
result.setdefault(i[0], []).append(i[3]) #Form DICT
result = {k: dict(enumerate(v, 1)) for k, v in result.items()} #Use enumerate to get index number
print(result)
# {'2014': {1: 'Zachary', 2: 'Zev'}}

Replace empty values of a dictionary with NaN

I have a dictionary with missing values (the key is there, but the associated value is empty). For example I want the dictionary below:
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
to be changed to this form:
dct = {'ID':NaN, 'gender':'male', 'age':'20', 'weight':NaN, 'height':'5.7'}
How can I write that in the most time-efficient way?

You can use a dictionary comprehension. Also as was noted in the comments, naming something dict in Python is not good practice.:
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
dct = {k: None if not v else v for k, v in dct.items() }
print(dct)
Output:
{'ID': None, 'gender': 'male', 'age': '20', 'weight': None, 'height': '5.7'}
Just replace None with whatever you want it to default to.
In your question, you want to replace with NaN.
You can use any of the following:
float('nan') if you are using Python 2.x, or with Python <3.5
math.nan for Python 3.5+
numpy.nan using numpy

You can use implicit syntax with boolean or expression:
In [1]: dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
In [2]: {k: v or None for k, v in dct.items()}
Out[2]: {'ID': None, 'age': '20', 'gender': 'male', 'height': '5.7', 'weight': None}
But be aware that in The Zen of Python it's said:
Explicit is better than implicit.

You can create a class object to represent NaN:
class NaN:
def __init__(self, default=None):
self.val = default
def __repr__(self):
return 'NaN'
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
new_d = {a:NaN() if not b else b for a, b in dct.items()}
Output:
{'gender': 'male', 'age': '20', 'ID': NaN, 'weight': NaN, 'height': '5.7'}

You can use a for loop to iterate over all of they keys and values in the Dictionary.
dct = {'ID': '', 'gender': 'male', 'age': '20', 'weight': '', 'height': '5.7'}
for key, value in dct.items():
if value == '':
dct[key] = 'NaN'
print(dct)
You created your dictionary with a series of key value pairs.
I used a for loop and the .items() method to iterate over each key value pair in your dictionary.
if the value of the key/value pair is an empty string, We change the that particular value to 'NaN' and leave the rest unchanged.
When we print the new dictionary we get this output:
{'ID': 'NaN', 'gender': 'male', 'age': '20', 'weight': 'NaN', 'height': '5.7'}
This is time efficient because it is a quick loop, so long as you are okay with not 'NaN' values being strings. I am not sure if you are looking for them to be strings, however, you can change the value from 'NaN' to None very simply if that is what you are looking for. The for loop is relatively efficient in terms of time since it will iterate over each value quickly.

How can I add a value in my list when a condition is true?

I have a dict with alot of items:
{'id-quantity-60': u'1', 'id-quantity-35': u'3','id-product-35': u'on', 'id-product-60': u'on',}
I need to create a list with all three elements inside.
I'm expecting a list like this:
<type 'list'>: [['60', u'1', u'on'], ['35', u'3', u'on'],]
I have only 2 values above, but 2 times the same product-id. So it should add the quantity and the 'on off' together to the same product-id.
How can I do that? I tried it with something like this:
for key, value in request.params.items():
if key[:12] == 'id-quantity-':
if key[12:] in list:
list.insert(key[12:], value)
else:
list.append([key[12:], value])
if key[:11] == 'id-product-':
if key[11:] in list:
list.insert(key[11:], value)
else:
list.append([key[11:], value])
The problem is I get this list all time splitet:
<type 'list'>: [['60', u'1'], ['35', u'3'], ['35', u'on'], ['60', u'on'],]
finally I should be able to fill the data in here (60, 1, True for example):
data = request.registry['website']._subscription_exclude_product(cr, uid, [{'product_id': int(60), 'qty': int(1), 'like': bool(True)}], context=context)
Thank you very much.

Is it what you expect?
products = {k.split('-')[-1]: v for k, v in x.items() if 'product' in k}
quantities = {k.split('-')[-1]: v for k, v in x.items() if 'quantity' in k}
all = [(k, v, 'on' if k in products else 'off') for k, v in quantities.items()]

You can use defaultdict() with a default list of 2 items to make it more flexible:
from collections import defaultdict
def default_list():
return [None, None]
request = {'id-quantity-60': u'1', 'id-quantity-71': u'0', 'id-quantity-35': u'3','id-product-35': u'on', 'id-product-60': u'on'}
result = defaultdict(default_list)
for key, value in request.items():
_, pattern, productid = key.split('-')
if pattern == 'quantity':
result[productid][0] = value
result[productid][1] = 'on' if int(value) else 'off'
elif pattern == 'product':
result[productid][1] = value
Returns:
defaultdict(<function default_list at 0x7faa3d3efe18>,
{'35': ['3', 'on'],
'60': ['1', 'on'],
'71': ['0', 'off']})
In case you really need a list:
resultList = [[k]+v for (k,v) in result.items()]
>>> [['60', '1', 'on'], ['71', '0', 'off'], ['35', '3', 'on']]

Assuming keys for quantity and product are in the same format across the dictionary:
d={'id-quantity-60': u'1', 'id-quantity-35': u'3','id-product-35': u'on', 'id-product-60': u'on',}
l=[]
for k,v in d.items():
if 'id-quantity' in k:
x = k.replace('id-quantity-','')
y = 'id-product-'+str(x)
l.append([x, v, d[y]])
print(l)
Output
[['60', '1', 'on'], ['35', '3', 'on']]

Suppose your input dictionary is in request.params variable,
quantity = {k.split("-")[-1]: v for k, v in request.params.items() if "quantity" in k}
product = {k.split("-")[-1]: v for k, v in request.params.items() if "product" in k}
result = [[k, v, product[k]] for k, v in quantity.items()]
print(result)
Output:
[['60', '1', 'on'], ['35', '3', 'on']]
Updated: replace result=... with the following
result = [[k, v, True if product.get(k) else False] for k, v in quantity.items()]
to get
[['35', '3', True], ['42', '0', False]]
if "id-product-42" is not in input dict.

Remove repeat values from a dictionary 'a' in a dictionary 'b' python

dictionary = {'0': "Linda", "1": "Anna", "2": 'Theda', "3":'Thelma',"4": 'Thursa',"5" :"Mary"}
dictionary2 = ['Linda', 'Ula', 'Vannie', 'Vertie', 'Mary']
I want to remove the same values from dictionary to dictionary2, I wrote the code like this:
[k for k, v in dictionary.items() if v not in dictionarya]
But it still can print out the words above same words, like this:
['0', '1', '2', '3', '4']
How to remove all the repeat words? so it can print out like this: e,g.
['1', '2', '3', '4']
How to just get the last loop? Thank you

To get the keys in which the values aren't contained in the other list you can use the following list comprehension
>>> [k for k,v in dictionary.items() if v not in dictionary2]
['2', '4', '3', '1']

Just do the following list comprehension:
>>> dictionary = {'0': "Linda", "1": "Anna", "2":"Mary"}
>>> dictionary2 = ['Linda', 'Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie', 'Mary']
>>> value = [i for i in dictionary2 if i not in dictionary.values()]
>>> value
['Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie']
>>>

This works:
dictionary = {'0': "Linda", "1": "Anna", "2":"Mary"}
dictionary2 = ['Linda', 'Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie', 'Mary']
# I want to remove the same values from dictionary to dictionary2, I wrote the code like this:
output = [i for i in dictionary2 if i not in dictionary.values()]
Result:
['Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie']

You could convert dictionary2 to a set, and use this syntax:
>>> dictionary = {'0': "Linda", "1": "Anna", "2": 'Theda', "3":'Thelma',"4": 'Thursa',"5" :"Mary"}
>>> dictionary2 = ['Linda', 'Ula', 'Vannie', 'Vertie', 'Mary']
>>> remove_names = set(dictionary2)
>>> [name for name in dictionary.values() if name not in remove_names]
['Anna', 'Theda', 'Thelma', 'Thursa']
>>> [id for id, name in dictionary.items() if name not in remove_names]
['1', '2', '3', '4']
Note that dictionary2 isn't a dict but a list.
Also, Python dicts are unordered. You cannot be sure that the result will always be 1,2,3,4.
Finally, if all your dict keys are integers (or look like integers), you'd better use a list :
>>> names = ["Linda","Anna",'Theda','Thelma','Thursa',"Mary"]
>>> remove_names = set(['Linda', 'Ula', 'Vannie', 'Vertie', 'Mary'])
>>> list(enumerate(names))
[(0, 'Linda'), (1, 'Anna'), (2, 'Theda'), (3, 'Thelma'), (4, 'Thursa'), (5, 'Mary')]
>>> [name for name in names if name not in remove_names]
['Anna', 'Theda', 'Thelma', 'Thursa']
>>> [id for id, name in enumerate(names) if name not in remove_names]
[1, 2, 3, 4]
This should be more readable and faster than your code.

dictionary = {'0': "Linda", "1": "Anna", "2":"Mary"}
dictionary2 = ['Linda', 'Theda', 'Thelma', 'Thursa', 'Ula','Vannie','Vertie', 'Mary']
for i in range(0, len(dictionary2)-1):
if dictionary2[i] in dictionary.values():
del dictionary2[i]
print (dictionary2)
Prints this:
['Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie']

Matching and Appending

I'm trying to figure out how to run 1 list through another list, and whenever the first names match, append it to the new list if it exists
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
So it looks something like this.
list3 = [["Ryan","Canada","10"],["John","United States","30"],["Jake","Spain","15"]
So far I haven't really been able to even come close, so even the smallest guidance would be much appreciated. Thanks.

You could transform them into dictionaries and then use a list comprehension:
dic1 = dict(list1)
dic2 = dict(list2)
list3 = [[k,dic2[k],dic1[k]] for k in dic2 if k in dic1]

If ordering isn't a concern, the most straightforward way is to convert the lists into more suitable data structures: dictionaries.
ages = dict(list1)
countries = dict(list2)
That'll make it a cinch to combine the pieces of data:
>>> {name: [ages[name], countries[name]] for name in ages.keys() & countries.keys()}
{'Ryan': ['10', 'Canada'], 'Jake': ['15', 'Spain'], 'John': ['30', 'United States']}
Or even better, use nested dicts:
>>> {name: {'age': ages[name], 'country': countries[name]} for name in ages.keys() & countries.keys()}
{'Ryan': {'country': 'Canada', 'age': '10'},
'Jake': {'country': 'Spain', 'age': '15'},
'John': {'country': 'United States', 'age': '30'}}

If the names are unique you can make list1 into a dictionary and then loop through list2 adding items from this dictionary.
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
list1_dict = dict(list1)
output = [item + [list1_dict[item[0]]] for item in list2]
If not, then you need to decide how to deal with cases of duplicate names.

You can use a set and an OrderedDict to combine the common names and keep order:
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
from collections import OrderedDict
# get set of names from list2
names = set(name for name,_ in list2)
# create an OrderedDict using name as key and full sublist as value
# filtering out names that are not also in list2
d = OrderedDict((sub[0], sub) for sub in list1 if sub[0] in names)
for name, country in list2:
if name in d:
# add country from each sublist with common name
d[name].append(country)
print(d.values()) # list(d.values()) for python3
[['Ryan', '10', 'Canada'], ['John', '30', 'United States'], ['Jake', '15', 'Spain']]
If list2 always has common names you can remove the if name in d:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grouping similar values in a dictionary - python

Using frozensets, apply, groupby + agg: s = pd.DataFrame(classmates).T.apply(frozenset, 1) s2 = pd.Series(s.index.values, index=s)\ .groupby(level=0).agg(lambda x: list(x)) s2 (soccer, 20, male) [charles, jack] (26, male, tennis) [brian] (basketball, male, 19) [zulu] dtype: object

First of all convert your dictionary to a pandas dataframe. df= pd.DataFrame.from_dict(classmates,orient='index') Then sort it in ascending order by age. df=df.sort_values(by=0,ascending=True) Here 0 is a default column name. You can rename this column name.

Related

Dictionary of dictionaries from list of lists with year keys

Replace empty values of a dictionary with NaN

How can I add a value in my list when a condition is true?

Remove repeat values from a dictionary 'a' in a dictionary 'b' python

Matching and Appending

Categories

Resources