Merging a list and a tuple in list - python

Suppose I have the following lists:
l1 = [['a','b','c'],['x','y','z'],['i','j','k']]
l2 = [(0,0.1),(0,0.2),(2,0.3),(2,0.4),(1,0.5),(0,0.6)]
I want to merge the two list on the keys of the l2 and index of l1 such that I get:
[[0.1,['a','b','c'],
[0.2,['a','b','c'],
[0.3,['i','j','k'],
[0.4,['i','j','k'],
[0.5,['x','y','z'],
[0.6,['a','b','c']]
I wonder how one does this? as the merge is not both on keys.

A simple comprehension will do:
[[v, l1[i]] for i, v in l2]
# [[0.1, ['a', 'b', 'c']],
# [0.2, ['a', 'b', 'c']],
# [0.3, ['i', 'j', 'k']],
# [0.4, ['i', 'j', 'k']],
# [0.5, ['x', 'y', 'z']],
# [0.6, ['a', 'b', 'c']]]

A list-comprehension with some indexing will do it nicely
l1 = [['a', 'b', 'c'], ['x', 'y', 'z'], ['i', 'j', 'k']]
l2 = [(0, 0.1), (0, 0.2), (2, 0.3), (2, 0.4), (1, 0.5), (0, 0.6)]
result = [[value, l1[idx]] for idx, value in l2]
The for loop equiv
result = []
for idx, value in l2:
l1_item = l1[idx]
result.append([value, l1_item])

l1 = [['a','b','c'],['x','y','z'],['i','j','k']]
l2 = [(0,0.1),(0,0.2),(2,0.3),(2,0.4),(1,0.5),(0,0.6)]
print([[v,l1[k]] for k,v in l2])

Related

Similarity score between 2 nested lists of strings

I have 2 nested lists and I want to compare all of the nested lists in one list with all of the nested lists in the other.
Example Data:
list1 = [['a', 'b', 'c', 'x'], ['d', 'e', 'f', 'p'], ['g', 'h', 'i']]
list2 = [['g', 'a', 'c'], ['d', 'h', 'b'], ['e', 'f', 'x', 't', 'q']]
For the comparison I want to compare each nested list from one list to each nested list in the other list. The comparison score would be calculated by:
number of overlaps / length of the longer sublist
For the first nested list in list1 the scores with list2 would then be:
2/4 -> 0.5
1/4 -> 0.25
1/5 -> 0.2
It seems like the best way to do this would then be to create a matrix but I'm not sure if that is the best option or how to calculate the score.
I tried calculating the scores like this but the numbers didn't make sense:
overlaps = []
for sublist in list1:
for sublist2 in list2:
comb_list = sublist + sublist2
num_overlap = len(set(comb_list))
num_long = max(len(sublist), len(sublist2))
overlap = num_overlap/num_long
overlaps.append(overlap)
Solution
So the issue with your original approach was that you were calculating the length of set(sublist1 + sublist2) but what you wanted was the length of set(sublist1) & set(sublist2) or the length of the set intersection:
A = [['a', 'b', 'c', 'x'], ['d', 'e', 'f', 'p'], ['g', 'h', 'i']]
B = [['g', 'a', 'c'], ['d', 'h', 'b'], ['e', 'f', 'x', 't', 'q']]
overlaps = []
for a in A:
for b in B:
num_overlap = len(set(a) & set(b))
num_long = max(len(a), len(b))
overlaps.append(num_overlap / num_long)
Output:
[0.5, 0.25, 0.2, 0.0, 0.25, 0.4, 0.3333333333333333, 0.3333333333333333, 0.0]
Comprehension version
Since SO likes comprehensions so much:
overlaps = [len(set(a) & set(b)) / max(len(a), len(b)) for a in A for b in B]
What went wrong in original approach
For a quick example of why your original approach didn't work using the first two comparisons:
>>> a = ['a', 'b', 'c', 'x']
>>> b = ['g', 'a', 'c']
>>> comb_list = a + b
>>> comb_list
['a', 'b', 'c', 'x', 'g', 'a', 'c']
>>> set(comb_list)
{'c', 'g', 'a', 'x', 'b'}
Workaround for original approach
>>> len(comb_list) - len(set(comb_list))
2
But that's more work than what I propose in my solution above.
Dictionary version
Since a list of scores without context might be difficult to interpret, let's use a dictionary:
scores = {}
for i, a in enumerate(A, 1):
for j, b in enumerate(B, 1):
num_overlap = len(set(a) & set(b))
num_long = max(len(a), len(b))
scores[f"a{i}b{j}"] = num_overlap / num_long
for key, val in scores.items():
print(f"{key}: {val}")
Output:
a1b1: 0.5
a1b2: 0.25
a1b3: 0.2
a2b1: 0.0
a2b2: 0.25
a2b3: 0.4
a3b1: 0.3333333333333333
a3b2: 0.3333333333333333
a3b3: 0.0

Transition probability matrix

I have the following array:
a=[['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
I wish to make a transition probability matrix of this, such that I get:
[[P_AA,P_AB,P_AC,P_AD],
[P_BA,P_BB,P_BC,P_BD],
[P_CA,P_CB,P_CC,P_CD],
[P_DA,P_DB,P_DC,P_DD]]
(Above is for illustration), where P_AA counts how many ["A","A"] are in the array a and so on divided by P_AA+P_AB+P_AC+P_AD . I have started by using the counter
from collections import Counter
Counter(tuple(x) for x in l)
which counts the elements of array correctly as:
Counter({('A', 'B'): 2,
('B', 'B'): 1,
('B', 'C'): 1,
('C', 'B'): 1,
('B', 'A'): 2,
('A', 'D'): 2,
('D', 'D'): 1,
('D', 'A'): 1})
So the matrix shall be,
[[0,2/5,0,2/5],[2/4,1/4,1/4,0],[0,1,0,0],[1/2,0,0,1/2]]
A pandas-based solution:
import pandas as pd
from collections import Counter
# Create a raw transition matrix
matrix = pd.Series(Counter(map(tuple, a))).unstack().fillna(0)
# Normalize the rows
matrix.divide(matrix.sum(axis=1),axis=0)
# A B C D
#A 0.0 0.50 0.00 0.5
#B 0.5 0.25 0.25 0.0
#C 0.0 1.00 0.00 0.0
#D 0.5 0.00 0.00 0.5
If the number of elements is small, simply looping over all elements should be no problem:
import numpy as np
a = [['A', 'B'], ['B', 'B'], ['B', 'C'], ['C', 'B'], ['B', 'A'],
['A', 'D'], ['D', 'D'], ['D', 'A'] ['A', 'B'], ['B', 'A'], ['A', 'D']]
a = np.asarray(a)
elems = np.unique(a)
dim = len(elems)
P = np.zeros((dim, dim))
for j, x_in in enumerate(elems):
for k, x_out in enumerate(elems):
P[j,k] = (a == [x_in, x_out]).all(axis=1).sum()
if P[j,:].sum() > 0:
P[j,:] /= P[j,:].sum()
Output:
array([[0. , 0.5 , 0. , 0.5 ],
[0.5 , 0.25, 0.25, 0. ],
[0. , 1. , 0. , 0. ],
[0.5 , 0. , 0. , 0.5 ]])
But you could also use the counter with a pre-allocated transition matrix, map the elements to indices, assign the counts as values, and normalize (last two steps just like I did).
from collections import Counter
a = [['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
counts = Counter(map(tuple, a))
letters = 'ABCD'
p = []
for letter in letters:
d = sum(v for k, v in counts.items() if k[0] == letter)
p.append([counts.get((letter, x), 0) / d for x in letters])
print(p)
Output:
[[0.0, 0.5, 0.0, 0.5],
[0.5, 0.25, 0.25, 0.0],
[0.0, 1.0, 0.0, 0.0],
[0.5, 0.0, 0.0, 0.5]]
This is a problem that fits itertools and Counter perfectly. Take a look at the following1:
l = [['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
from collections import Counter
from itertools import product, groupby
unique_elements = set(x for y in l for x in y) # -> {'B', 'C', 'A', 'D'}
appearances = Counter(tuple(x) for x in l)
# generating all possible combinations to get the probabilities
all_combinations = sorted(list(product(unique_elements, unique_elements)))
# calculating and arranging the probabilities
table = []
for i, g in groupby(all_combinations, key=lambda x: x[0]):
g = list(g)
local_sum = sum(appearances.get(y, 0) for y in g)
table.append([appearances.get(x, 0) / local_sum for x in g])
# [[0.0, 0.5, 0.0, 0.5], [0.5, 0.25, 0.25, 0.0], [0.0, 1.0, 0.0, 0.0], [0.5, 0.0, 0.0, 0.5]]
1 I am assuming you have a mistake on the formulation of your question: "...where P_AA counts how many ["A","A"] are in the array a and so on divided by P_AA + P_AB + P_AC + P_AD...". You mean to divide with something else, right?

How to use *args to combine multiple lists?

I currently have this function, which I'd like to make scalable to take in more lists. In other words, I'd like to use this function whether I have to combine 2 lists or 10 lists.
l1 = [['a','b','c'],['d','e','f']]
l2 = [['A','B','C'],['D','E','F']]
[L1 + L2 for L1, L2 in zip(l1, l2)]
result should be:
[['a','b','c','A','B','C'],['d','e','f','D','E','F']]
Use:
[sum(l, []) for l in zip(*lists)]
Demo:
>>> l1 = [['a', 'b', 'c'], ['d', 'e', 'f']]
>>> l2 = [['A', 'B', 'C'], ['D', 'E', 'F']]
>>> lists = (l1, l2)
>>> [sum(l, []) for l in zip(*lists)]
[['a', 'b', 'c', 'A', 'B', 'C'], ['d', 'e', 'f', 'D', 'E', 'F']]
or, as a function:
def combine_lists(*lists):
return [sum(l, []) for l in zip(*lists)]
combine_lists(l1, l2)

append list of values to sublists

How do you append each item of one list to each sublist of another list?
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
Result should be:
[['a','b','c',1],['d','e','f',2],['g','h','i',3]]
Keep in mind that I want to do this to a very large list, so efficiency and speed is important.
I've tried:
for sublist,value in a,b:
sublist.append(value)
it returns 'ValueError: too many values to unpack'
Perhaps a listindex or a listiterator could work, but not sure how to apply here
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
for ele_a, ele_b in zip(a, b):
ele_a.append(ele_b)
Result:
>>> a
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
The reason your original solution did not work, is that a,b does create a tuple, but not what you want.
>>> z = a,b
>>> type(z)
<type 'tuple'>
>>> z
([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']], [1, 2, 3])
>>> len(z[0])
3
>>> for ele in z:
... print ele
...
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']] #In your original code, you are
[1, 2, 3] #unpacking a list of 3 elements
#into two values, hence the
#'ValueError: too many values to unpack'
>>> zip(a,b) # using zip gives you what you want.
[(['a', 'b', 'c'], 1), (['d', 'e', 'f'], 2), (['g', 'h', 'i'], 3)]
Here is a simple solution:
a = [['a','b','c'],['d','e','f'],['g','h','i']]
b = [1,2,3]
for i in range(len(a)):
a[i].append(b[i])
print(a)
One option, using list comprehension:
a = [(a[i] + b[i]) for i in range(len(a))]
Just loop through the sublists, adding one item at a time:
for i in range(0,len(listA)):
listA.append(listB[i])
You can do:
>>> a = [['a','b','c'],['d','e','f'],['g','h','i']]
>>> b = [1,2,3]
>>> [l1+[l2] for l1, l2 in zip(a,b)]
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]
You can also abuse a side effect of list comprehensions to get this done in place:
>>> [l1.append(l2) for l1, l2 in zip(a,b)]
[None, None, None]
>>> a
[['a', 'b', 'c', 1], ['d', 'e', 'f', 2], ['g', 'h', 'i', 3]]

Sort List in Python by two other lists

My question is very similar to these two links 1 and 2:
I have three different lists. I want to sort List1 based on List2 (in ascending order). However, I have repeats in List2. I then want to sort these repeats by List3 (in descending order). Confusing enough?
What I have:
List1 = ['a', 'b', 'c', 'd', 'e']
List2 = [4, 2, 3, 2, 4]
List3 = [0.1, 0.8, 0.3, 0.6, 0.4]
What I want:
new_List1 = ['b', 'd', 'c', 'e', 'a']
'b' comes before 'd' since 0.8 > 0.6. 'e' comes before 'a' since 0.4 > 0.1.
I think you should be able to do this by:
paired_sorted = sorted(zip(List2,List3,List1),key = lambda x: (x[0],-x[1]))
l2,l3,l1 = zip(*paired_sorted)
In action:
>>> List1 = ['a', 'b', 'c', 'd', 'e']
>>> List2 = [4, 2, 3, 2, 4]
>>> List3 = [0.1, 0.8, 0.3, 0.6, 0.4]
>>> paired_sorted = sorted(zip(List2,List3,List1),key = lambda x: (x[0],-x[1]))
>>> l2,l3,l1 = zip(*paired_sorted)
>>> print l1
('b', 'd', 'c', 'e', 'a')
Here's how it works. First we match corresponding elements from your lists using zip. We then sort those elements based on the items from List2 first and (negated) List3 second. Then we just need to pull off the List1 elements again using zip and argument unpacking -- Although you could do it easily with a list-comprehension if you wanted to make sure you had a list at the end of the day instead of a tuple.
This gets a little tougher if you can't easily negate the values in List3 -- e.g. if they're strings. You need to do the sorting in 2 passes:
paired = zip(List2,List3,List1)
rev_sorted = sorted(paired,reverse=True,key=lambda x: x[1]) #"minor" sort first
paired_sorted = sorted(rev_sorted,key=lambda x:x[0]) #"major" sort last
l2,l3,l1 = zip(*paired_sorted)
(you could use operator.itemgetter(1) in place of lambda x:x[1] in the above if you prefer). This works because python sorting is "stable". It doesn't re-order "equal" elements.
This requires a decorate-sort-undecorate step:
decorated = zip(List1, List2, List3)
decorated.sort(key=lambda v: (v[1], -v[2]))
new_list1 = [v[0] for v in decorated]
or, combined into one line:
new_list1 = [v[0] for v in sorted(zip(List1, List2, List3), key=lambda v: (v[1], -v[2]))]
Output:
>>> List1 = ['a', 'b', 'c', 'd', 'e']
>>> List2 = [4, 2, 3, 2, 4]
>>> List3 = [0.1, 0.8, 0.3, 0.6, 0.4]
>>> new_list1 = [v[0] for v in sorted(zip(List1, List2, List3), key=lambda v: (v[1], -v[2]))]
>>> new_list1
['b', 'd', 'c', 'e', 'a']
>>> [v for i, v in sorted(enumerate(List1), key=lambda i_v: (List2[i_v[0]], -List3[i_v[0]]))]
['b', 'd', 'c', 'e', 'a']
This sorts the index/value pairs by using the indices to get the corresponding values from the other lists to use in the key function used for ordering by sorted(), and then extracts just the values using a list comprehension.
Here is a shorter alternative that sorts just the indices and then uses those indices to grab the values from List1:
>>> [List1[i] for i in sorted(range(len(List1)), key=lambda i: (List2[i], -List3[i]))]
['b', 'd', 'c', 'e', 'a']

Categories

Resources