Creating a dictionary of only the max common pairing of groups - python

I would like to create a dictionary of the max common pairings - an "agreement" table. Is it possible to shorten the code a bit when finding the agreement? As of now, I am not really liking finding the max count and then matching on the count to find the "agreement".
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({
'id': ['A', 'A', 'B', 'B', 'B', 'B'],
'value': [1, 1, 2, 2, 1, 2]})
df = df.groupby(["id","value"]).size().reset_index().rename(columns={0: "count"})
df["max_rank"] = df.groupby(["id"])["count"].transform("max")==df["count"]
df = df.loc[(df["max_rank"]==True)]
d = defaultdict(list)
for idx, row in df.iterrows():
d[row['id']].append(row['value'])
d = [{k: v} for k, v in d.items()]
d
output:
[{'A': [1]}, {'B': [2]}]

You can build a dict that maps each id to a list of values, and then use the collections.Counter.most_common method to obtain the most common value for each id:
from collections import Counter
d = {'id': ['A', 'A', 'B', 'B', 'B', 'B'], 'value': [1, 1, 2, 2, 1, 2]}
mapping = {}
for k, v in zip(d['id'], d['value']):
mapping.setdefault(k, []).append(v)
print({k: Counter(l).most_common(1)[0][0] for k, l in mapping.items()})
This outputs:
{'A': 1, 'B': 2}

Related

Count frequencies (unique rows) from a pandas list type column

I have a dataframe (df) like this:
id col
1 [A, B, C, C]
2 [B, C, D]
3 [C, D, E]
And, I have list like this:
l = ["A", "C", "F"]
For each element in l, I want to count the unique rows they appear in df.
'A': 1, 'C': 3, 'F': 0
But I'm not getting the part where I can check if the value exists in the list-column of the dataframe.
d = {}
for i in l:
df_tmp = df[i.isin(df['col'])]['id'] ## wrong, showing error, isin is not a string attribute
d[i] = len(df_tmp)
Anyway I can fix this? Or is there a more cleaner/efficient way?
N.B. There is a similar question Frequency counts for a pandas column of lists, but it is different as I have an external list to check the frequency.
Here we are using apply method that applies given function to each element of the column (in our case it is the function that tests whether an element belongs to the list or not), then we sum True values, i.e. rows in which we found requested values and eventually save it to the dictionary. And we do it for all requested letters. I have not tested performance of this solution.
import pandas as pd
df = pd.DataFrame([
{'id': 1, 'col': ['A', 'B', 'C', 'C']},
{'id': 2, 'col': ['B', 'C', 'D']},
{'id': 3, 'col': ['C', 'D', 'E']}])
letters = ["A", "C", "D", "F"]
res = {v: df['col'].apply(lambda x: v in x).sum()
for v in letters}
# output
# {'A': 1, 'C': 3, 'D': 2, 'F': 0}
You can just check the membership in the list for each value in ['A', 'C', 'F'] and compute sum() like:
vals = ['A', 'C', 'F']
{val: df['col'].apply(lambda x: val in x).sum() for val in vals}
output:
{'A': 1, 'C': 3, 'F': 0}
You can explode col column and keep rows where value in l list then use value_counts() to count value in Series.
l = ["A", "C", "D", "F"]
col = df['col'].apply(set).explode(ignore_index=True)
out = col[col.isin(l)].value_counts().reindex(l, fill_value=0).to_dict()
# or without define `col`
out = (df['col'].apply(set).explode(ignore_index=True)
[lambda d: d.isin(l)]
.value_counts().reindex(l, fill_value=0).to_dict())
print(out)
{'A': 1, 'C': 3, 'D': 2, 'F': 0}

Nested list comprehension on python

I'm a beginner in python and I want to use comprehension to create a dictionary. Let's say I have the below two list and want to convert them to a dictionary like {'Key 1':['c','d'], 'Key 2':['a','f'], 'Key 3':['b','e']}. I can only think of the code below and I don't know how to change the value of the key and the filter using comprehension. How should I change my code?
value = ['a','b','c','d','e','f']
key = [2, 3, 1, 1, 3, 2]
{"Key 1" : [value for key,value in list(zip(key,value)) if key==1]}
This should do it:
value = ['a','b','c','d','e','f']
key = [2, 3, 1, 1, 3, 2]
answer = {}
for k, v in zip(key, value):
if k in answer:
answer[k].append(v)
else:
answer[k] = [v]
print(answer)
{2: ['a', 'f'], 3: ['b', 'e'], 1: ['c', 'd']}
EDIT: oops, jumped the gun. Apologies.
Here's the comprehension version, but it's not very efficient:
{
k: [v for i, v in enumerate(value) if key[i] == k]
for k in set(key)
}
EDIT 2:
Here's an one that has better complexity:
import pandas as pd
series = pd.Series(key)
{
k: [value[i] for i in indices]
for k, indices in series.groupby(series).groups.items()
}
You could do it with dictionary comprehension and list comprehension:
{f"Key {k}" : [value for key,value in zip(key,value) if key == k] for k in key}
Your lists would yield the following:
{'Key 2': ['a', 'f'], 'Key 3': ['b', 'e'], 'Key 1': ['c', 'd']}
As requested.
use dict setdefault
value = ['a', 'b', 'c', 'd', 'e', 'f']
key = [2, 3, 1, 1, 3, 2]
d = {}
{d.setdefault(f'Key {k}', []).append(v) for k, v in zip(key, value)}
print(d)
output
{'Key 2': ['a', 'f'], 'Key 3': ['b', 'e'], 'Key 1': ['c', 'd']}
Usually, it is written as an explicit loop (O(n) solution):
>>> letters = 'abcdef'
>>> digits = [2, 3, 1, 1, 3, 2]
>>> from collections import defaultdict
>>> result = defaultdict(list) # digit -> letters
>>> for digit, letter in zip(digits, letters):
... result[digit].append(letter)
>>> result
defaultdict(<class 'list'>, {2: ['a', 'f'], 3: ['b', 'e'], 1: ['c', 'd']})
Nested comprehensions (O(n n) solution) like in other answers:
>>> {
... digit: [letter for d, letter in zip(digits, letters) if digit == d]
... for digit in set(digits)
... }
{1: ['c', 'd'], 2: ['a', 'f'], 3: ['b', 'e']}
If you need to write it as a single dict comprehension, itertools.groupby could be used (O(n log n) solution):
>>> from itertools import groupby
>>> {
... digit: [letter for _, letter in group]
... for digit, group in groupby(
... sorted(zip(digits, letters), key=lambda x: x[0]),
... key=lambda x: x[0]
... )
... }

Creating a dictionary from two lists? While taking an average of values?

Wizards of stackoverflow,
I wish to combine two lists to create a dictionary, I have used dict & zip, however it does not meet what I require.
If had these lists
keys = ['a', 'a', 'b', 'c']
values = [6, 2, 3, 4]
I would like for the dictionary to reflect the average value such that the output would be:
a_dict = {'a' : 4, 'b' : 3, 'c' : 4}
as a bonus but not required, if this is possible is there anyway to get a count of each duplicate?
i.e. output would be followed by 'a' was counted twice, other than just doing the count in the keys.
A straightforward solution (thanks #DeepSpace for dict-comprehension suggestion):
keys = ['a', 'a', 'b', 'c']
values = [6, 2, 3, 4]
out = {}
for k, v in zip(keys, values):
out.setdefault(k, []).append(v)
out = {key: sum(value) / len(value) for key, value in out.items()}
print(out)
Prints:
{'a': 4.0, 'b': 3.0, 'c': 4.0}
If you want count of keys, you can do for example:
out = {}
for k, v in zip(keys, values):
out.setdefault(k, []).append(v)
out = {key: (sum(value) / len(value), len(value)) for key, value in out.items()}
print(out)
Prints:
{'a': (4.0, 2), 'b': (3.0, 1), 'c': (4.0, 1)}
Where the second element of values is a count of key.
Solution with itertools (if keys are sorted):
keys = ['a', 'a', 'b', 'c']
values = [6, 2, 3, 4]
from itertools import groupby
from statistics import mean
out = {}
for k, g in groupby(zip(keys, values), lambda k: k[0]):
out[k] = mean(v for _, v in g)
print(out)
Prints:
{'a': 4, 'b': 3, 'c': 4}
calculating avg and frequency of each key dic = {key: [avg, frequency]}
keys = ['a', 'a', 'b', 'c']
values = [6, 2, 3, 4]
dic = {i:[[], 0] for i in keys}
for k, v in zip(keys, values):
dic[k][0].append(v)
dic[k][1]+=1
for k, v in dic.items():
dic[k][0] = sum(dic[k][0])/len(dic[k][0])
print(dic)
output
{'a': [4.0, 2], 'b': [3.0, 1], 'c': [4.0, 1]}
keys = ['a', 'a', 'b', 'c']
values = [6, 2, 3, 4]
d, count_dict=dict(), dict()
for i in range(len(keys)):
try:
d[keys[i]]+=values[i]
count_dict[keys[i]]+=1
except KeyError:
d[keys[i]]=values[i]
count_dict[keys[i]]=1
for keys,values in d.items():
d[keys]=d[keys]/count_dict[keys]
print(f'{keys} comes {count_dict[keys]} times')
print(d)

pythonic way to reverse a dict where values are lists?

I have a dictionary that looks something like this:
letters_by_number = {
1: ['a', 'b', 'c', 'd'],
2: ['b', 'd'],
3: ['a', 'c'],
4: ['a', 'd'],
5: ['b', 'c']
}
I want to reverse it to look something like this:
numbers_by_letter = {
'a': [1, 3, 4],
'b': [1, 2, 5],
'c': [1, 3, 5],
'd': [1, 2, 4]
}
I know that I could do this by looping through (key, value) through letters_by_number, looping through value (which is a list), and adding (val, key) to a list in the dictionary. This is cumbersome and I feel like there must be a more "pythonic" way to do this. Any suggestions?
This is well-suited for collections.defaultdict:
>>> from collections import defaultdict
>>> numbers_by_letter = defaultdict(list)
>>> for k, seq in letters_by_number.items():
... for letter in seq:
... numbers_by_letter[letter].append(k)
...
>>> dict(numbers_by_letter)
{'a': [1, 3, 4], 'b': [1, 2, 5], 'c': [1, 3, 5], 'd': [1, 2, 4]}
Note that you don't really need the final dict() call (a defaultdict will already give you the behavior you probably want), but I included it here because the result from your question is type dict.
Use setdefault:
letters_by_number = {
1: ['a', 'b', 'c', 'd'],
2: ['b', 'd'],
3: ['a', 'c'],
4: ['a', 'd'],
5: ['b', 'c']
}
inv = {}
for k, vs in letters_by_number.items():
for v in vs:
inv.setdefault(v, []).append(k)
print(inv)
Output
{'a': [1, 3, 4], 'b': [1, 2, 5], 'c': [1, 3, 5], 'd': [1, 2, 4]}
A (trivial) subclass of dict would make this very easy:
class ListDict(dict):
def __missing__(self, key):
value = self[key] = []
return value
letters_by_number = {
1: ['a', 'b', 'c', 'd'],
2: ['b', 'd'],
3: ['a', 'c'],
4: ['a', 'd'],
5: ['b', 'c']
}
numbers_by_letter = ListDict()
for key, values in letters_by_number.items():
for value in values:
numbers_by_letter[value].append(key)
from pprint import pprint
pprint(numbers_by_letter, width=40)
Output:
{'a': [1, 3, 4],
'b': [1, 2, 5],
'c': [1, 3, 5],
'd': [1, 2, 4]}
Here's a solution using a dict comprehension, without adding list elements in a loop. Build a set of keys by joining all the lists together, then build each list using a list comprehension. To be more efficient, I've first built another dictionary containing sets instead of lists, so that k in v is an O(1) operation.
from itertools import chain
def invert_dict_of_lists(d):
d = { i: set(v) for i, v in d.items() }
return {
k: [ i for i, v in d.items() if k in v ]
for k in set(chain.from_iterable(d.values()))
}
Strictly, dictionaries in modern versions of Python 3 retain the order that keys are inserted in. This produces a result where the keys are in the order they appear in the lists; not alphabetical order like in your example. If you do want the keys in sorted order, change for k in set(...) to for k in sorted(set(...)).

how to identify relationship/mapping between the two list in python?

I have created two list.
list1= [a,b,c,a,d]
list2=[1,2,3,4,5]
I want to find relationship between this two list based on index position i.e
In list1 a is repeated 2 times index 0,3 .in list2 index 0,3 values are 1 ,4 the relation is a one to many is a:{1,4}
next b not repeated in list 1 and it index is 1 and list2 index 1 value is 2 ,the relation is one to one b:{2}
my expected output will be {a:{1,4},b:{2},c:{3},d:{5}}
I'd use a defaultdict:
from collections import defaultdict
list1 = ['a', 'b', 'c', 'a', 'd']
list2 = [1, 2, 3, 4, 5]
result = defaultdict(set)
for value1, value2, in zip(list1, list2):
result[value1].add(value2)
print(dict(result))
outputs
{'a': {1, 4}, 'b': {2}, 'c': {3}, 'd': {5}}
You can use a combination of dictionary and list comprehension to do this:
{x: [list2[i] for i, j in enumerate(list1) if j == x] for x in list1}
output:
{'a': [1, 4], 'b': [2], 'c': [3], 'd': [5]}
a = ['a', 'b', 'c', 'a', 'd']
b = [1, 2, 3, 4, 5]
ret = {}
for idx, _a in enumerate(a):
value = ret.get(_a, ret.setdefault(_a, []))
value.append(b[idx])
And ret will be the output
Option is to zip the two lists:
L = list(zip(list1, list2))
Result:
[('a', 1), ('b', 2), ('c', 3), ('a', 4), ('d', 5)]
Use it to create a dictionary with sets as values:
D ={}
for key in L:
if key[0] not in D:
D[key[0]] = {key[1]}
else:
D[key[0]].add(key[1])
I would not do it this way in real code, but this approach is mildly entertaining and perhaps educational.
from collections import defaultdict
from itertools import groupby
from operator import itemgetter
xs = ['a', 'b', 'c', 'a', 'd']
ys = [1, 2, 3, 4, 5]
d = {
x : set(y for _, y in group)
for x, group in groupby(sorted(zip(xs, ys)), key = itemgetter(0))
}
print(d) # {'a': {1, 4}, 'b': {2}, 'c': {3}, 'd': {5}}
It's not from pure python, as this question tagged with pandas I tried this way.
Option-1
df=pd.DataFrame({'l1':list1,'l2':list2})
res1=df.groupby('l1').apply(lambda x:x.l2.values.tolist()).to_dict()
Option-2
print df.groupby('l1')['l2'].unique().to_dict()
Output:
{'a': [1, 4], 'c': [3], 'b': [2], 'd': [5]}

Categories

Resources