DataFrame columns to dictionary with multiple keys and values - python

I have the following DataFrame:
df = pd.DataFrame({
'From':['a','b','c','d'],
'To':['h','m','f','f'],
'week':[1,2,3,3]
})
I want to use column 'To' and 'week' as keys to map to value 'From', create a dictionary like {(1,'h'):'a',(2,'m'):'b',(3,'f'):['c','d']}, is there a way to do this? I tried to use
dict(zip([tuple(x) for x in df[['week','To']].to_numpy()], df['From']))
but it only gives me {(1,'h'):'a',(2,'m'):'b',(3,'f'):'d'}
. If there are multiple 'From's for the same ('week', 'To'), I want to put it in a list or set. Thanks!!

You can use .groupby() method followed by an .apply(list) method on the column From to convert the results into a list. From here, pandas has a .to_dict() method to convert your results to a dictionary.
>>> df.groupby(['To', 'week'])['From'].apply(list).to_dict()
{('f', 3): ['c', 'd'], ('h', 1): ['a'], ('m', 2): ['b']}
>>>
>>> # use lambda to convert lists with only one value to string
>>> df.groupby(['To', 'week'])['From'].apply(lambda x: list(x) if len(x) > 1 else list(x)[0]).to_dict()
{('f', 3): ['c', 'd'], ('h', 1): 'a', ('m', 2): 'b'}

Use below code to get your desired dictionary:
df.groupby(['To','week'])['From'].agg(','.join).apply(lambda s: s.split(',') if ',' in s else s).to_dict()
Output:
>>> df.groupby(['To','week'])['From'].agg(','.join).apply(lambda s: s.split(',') if ',' in s else s).to_dict()
{('f', 3): ['c', 'd'], ('h', 1): 'a', ('m', 2): 'b'}
groupby on To,Week and join the values with ,. Then just use apply to convert , separated values into lists, and finally convert the result to dictionary.

Related

Python: Marking duplicates in list

I have an unordered python list, what I want is to create 2nd list which will tell either the values in the first list are duplicate or unique.
For duplicates i have to mark them as duplicate1, duplicate2 and so on.
I will create a dictionary from these lists and later on these will be use in pandas dataframe.
I am stuck on logic of 2nd_list, could someone please help.
first_List = ['a', 'b', 'c', 'a', 'd', 'c']
EXPECTED OUTPUT:
second_List = ['dup1', 'unique', 'dup1', 'dup2', 'unique', 'dup2']
You can iterate the list by index, and for list value at given index, check if it is duplicate or not (isDuplicate) boolean is created in the code below, if it is a duplicate entry, then count how many times the current value appeared in the list for the given index and append the string to the second_List
second_List = []
for i in range(len(first_List)):
isDuplicate = first_List.count(first_List[i]) > 1
if isDuplicate:
count = first_List[:i+1].count(first_List[i])
second_List.append(f'dup{count}')
else:
second_List.append('unique')
OUTPUT:
['dup1', 'unique', 'dup1', 'dup2', 'unique', 'dup2']
Here is the equivalent List-Comprehension as well, if you are interested!
>>> [f'dup{first_List[:i+1].count(first_List[i])}'
... if first_List.count(first_List[i]) > 1
... else 'unique'
... for i in range(len(first_List))]
['dup1', 'unique', 'dup1', 'dup2', 'unique', 'dup2']
To be short
first_List = ['a', 'b', 'c', 'a', 'd', 'c']
d = {i:'' for i in first_List if first_List.count(i) > 1}
second_List = ['unique' if i not in d.keys() else f'dup{list(d.keys()).index(i)+1}' for i in first_List]
It works fine.
This is the same as
d = {i:'' for i in first_List if first_List.count(i) > 1}
second_List = list()
for i in first_List:
text = 'unique' if i not in d.keys() else f'dup{list(d.keys()).index(i)+1}'
second_List.append(text)

Enumerating all possible scenarios

I am trying to find all of the possible combinations for a set. Suppose I have 2 vehicles (A and B) and I want to use them by sending them and then return. Send and return are two distinct actions, and I want to enumerate all of the possible sequences of sending and returning this vehicle. Thus the set is [ A, A, B, B]. I use this code to enumerate:
from itertools import permutations
a = permutations(['A', 'A', 'B', 'B'])
# Print the permutations
seq = []
for i in list(a):
seq.append(i)
seq = list(set(seq)) # remove duplicates
The result is as follows:
('A', 'B', 'B', 'A')
('A', 'B', 'A', 'B')
('A', 'A', 'B', 'B')
('B', 'A', 'B', 'A')
('B', 'B', 'A', 'A')
('B', 'A', 'A', 'B')
Suppose my assumption is the two vehicles identical. Thus, it doesn't matter which one is on the first order (i.e. ABBA is the same as BAAB). Here's what I expect the result is:
('A', 'B', 'B', 'A')
('A', 'B', 'A', 'B')
('A', 'A', 'B', 'B')
I can do this easily by removing the last three elements. However, I encounter a problem when I try to do the same thing for three vehicles ( a = permutations(['A', 'A', 'B', 'B', 'C', 'C']). How to ensure that the result already considers the three identical vehicles?
One way would be to generate all the combinations, then filter for only those where the first mention of each vehicle is in alphabetical order.
In recent versions of Python, dict retains first-insertion order, so we can use it to determine the first mention; something like:
from itertools import permutations
seq = set()
for i in permutations(['A', 'A', 'B', 'B']):
first_mentions = {car: None for car in i}.keys()
if list(first_mentions) == sorted(first_mentions):
seq.add(i)
(This works in practice since Python 3.5, and officially since Python 3.7)
from itertools import permutations
a = permutations(['A', 'A', 'B', 'B'])
seq = []
for i in list(a):
if i[0]=='A':
seq.append(i)
seq = list(set(seq))
print(seq)
Try this, I think this should do

How to create dictionary with duplicate keys out of a column in python

I have two columns like this:
AB A
AD B
AB*AF D
CD*EG E
CG*AB H
I try to create a dictionary with the first column by splitting the values as keys and values with the delimiter '*'. There are duplicates that should not be deleted since the combinations are different. In addition, I need to compensate the values that are not combined with other values by using empty values.
This is my code:
for x in lines:
firstVal.append(x.split(' ')[0].split('*')[0])
if '*' in x:
secondVal.append(x.split(' ')[0].split('*')[1])
#to add empty values to create correct pairs (not the best way to compensate empty values!)
count = 0
while (count < 5):
count = count + 1
secondVal.insert(0, '')
#to create pairs
dictPairs = dict(zip(firstVal, secondVal))
I hope this fulfils your requirement:
first_column = ['AB', 'AD', 'AB*AF', 'CD*EG', 'CG*AB']
second_column = ['A', 'B', 'D', 'E', 'H']
d = {}
for index, item in enumerate(first_column):
for key in item.split('*'):
if key in d:
d[key].append(second_column[index])
else:
d[key] = [second_column[index]]
OUTPUT:
{'AB': ['A', 'D', 'H'],
'AD': ['B'],
'AF': ['D'],
'CD': ['E'],
'EG': ['E'],
'CG': ['H']}

Creating all combinations of a string given character values from a dictionary in python

var = {'A': 'A', 'W': 'AT', 'K': 'GT'}
lst = ['AWK']
Given the list and dictionary from above I would like to get the combinations of the string in lst:
[A,A,G], [A,A,T],[A,T,G],[A,T,T]
These combinations are derived from the string in lst and the variations of that string's individual characters (listed in the dict).
The dictionary is representing the different characters a specific lst character can represent. So a 'W' in lst can actually be an 'A' or 'T'.
As a further example, if lst = ['AW'], then the permutations would be: ['A','A'] and ['A','T'].
I am hoping something like itertools can help me with this.
IIUC, you can use itertools.product after building the appropriate list of possibilities for each character. For example:
>>> from itertools import product
>>> var = {'A': 'A', 'W': 'AT', 'K': 'GT'}
>>> word = "AWK"
>>> poss = [var[c] for c in word]
>>> poss
['A', 'AT', 'GT']
>>> list(product(*poss))
[('A', 'A', 'G'), ('A', 'A', 'T'), ('A', 'T', 'G'), ('A', 'T', 'T')]
Or, if you want new words instead of tuples:
>>> [''.join(p) for p in product(*poss)]
['AAG', 'AAT', 'ATG', 'ATT']
(BTW: these aren't permutations.)
It is hard to tell what you are asking, but I would guess this:
itertools.product(var.values())

Sort python dictionaries with 'value' as primary key and 'key' as secondary

What I am trying to do here is to display characters according to number of occurrences in a string in descending order. If two characters share the same number of occurrences, then they should be displayed as per the alphabetic order.
So given a string, 'abaddbccdd', what I want to display as output is:
['d', 'a', 'b', 'c']
Here is what I have done so far:
>>> from collections import Counter
>>> s = 'abaddbccdd'
>>> b = Counter(s)
>>> b
Counter({'d': 4, 'a': 2, 'c': 2, 'b': 2})
>>> b.keys()
['a', 'c', 'b', 'd']
>>> c = sorted(b, key=b.get, reverse=True)
>>> c
['d', 'a', 'c', 'b']
>>>
But how to handle the second part? 'a', 'b' and 'c' all appear in the text exactly twice and are out of order. What is the best way (hopefully shortest too) to do this?
This can be done in a single sorting pass. The trick is to do an ascending sort with the count numbers negated as the primary sorting key and the dictionary's key strings as the secondary sorting key.
b = {'d': 4, 'a': 2, 'c': 2, 'b': 2}
c = sorted(b, key=lambda k:(-b[k], k))
print(c)
output
['d', 'a', 'b', 'c']
The shortest way is:
>>> sorted(sorted(b), key=b.get, reverse=True)
['d', 'a', 'b', 'c']
So sort the sequence once in its natural order (the key order) then reverse sort on the values.
Note this won't have the fastest running time if the dictionary is large as it performs two full sorts, but in practice it is probably simplest because you want the values descending and the keys ascending.
The reason it works is that Python guarantees the sort to be stable. That means when the keys are equal the original order is preserved, so if you sort repeatedly from the last key back to the first you will get the desired result. Also reverse=True is different than just reversing the output as it also respects stability and only reverses the result where the keys are different.
You can use a lambda function:
>>> sorted(b, key=lambda char: (b.get(char), 1-ord(char)), reverse=True)
If you are already using a Counter object, there is the Counter.most_common method. This will return a list of the items in order of highest to lowest frequency.
>>> b.most_common()
[('d', 4), ('a', 2), ('b', 2), ('c', 2)]

Categories

Resources