I have the following array:
a=[['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
I wish to make a transition probability matrix of this, such that I get:
[[P_AA,P_AB,P_AC,P_AD],
[P_BA,P_BB,P_BC,P_BD],
[P_CA,P_CB,P_CC,P_CD],
[P_DA,P_DB,P_DC,P_DD]]
(Above is for illustration), where P_AA counts how many ["A","A"] are in the array a and so on divided by P_AA+P_AB+P_AC+P_AD . I have started by using the counter
from collections import Counter
Counter(tuple(x) for x in l)
which counts the elements of array correctly as:
Counter({('A', 'B'): 2,
('B', 'B'): 1,
('B', 'C'): 1,
('C', 'B'): 1,
('B', 'A'): 2,
('A', 'D'): 2,
('D', 'D'): 1,
('D', 'A'): 1})
So the matrix shall be,
[[0,2/5,0,2/5],[2/4,1/4,1/4,0],[0,1,0,0],[1/2,0,0,1/2]]
A pandas-based solution:
import pandas as pd
from collections import Counter
# Create a raw transition matrix
matrix = pd.Series(Counter(map(tuple, a))).unstack().fillna(0)
# Normalize the rows
matrix.divide(matrix.sum(axis=1),axis=0)
# A B C D
#A 0.0 0.50 0.00 0.5
#B 0.5 0.25 0.25 0.0
#C 0.0 1.00 0.00 0.0
#D 0.5 0.00 0.00 0.5
If the number of elements is small, simply looping over all elements should be no problem:
import numpy as np
a = [['A', 'B'], ['B', 'B'], ['B', 'C'], ['C', 'B'], ['B', 'A'],
['A', 'D'], ['D', 'D'], ['D', 'A'] ['A', 'B'], ['B', 'A'], ['A', 'D']]
a = np.asarray(a)
elems = np.unique(a)
dim = len(elems)
P = np.zeros((dim, dim))
for j, x_in in enumerate(elems):
for k, x_out in enumerate(elems):
P[j,k] = (a == [x_in, x_out]).all(axis=1).sum()
if P[j,:].sum() > 0:
P[j,:] /= P[j,:].sum()
Output:
array([[0. , 0.5 , 0. , 0.5 ],
[0.5 , 0.25, 0.25, 0. ],
[0. , 1. , 0. , 0. ],
[0.5 , 0. , 0. , 0.5 ]])
But you could also use the counter with a pre-allocated transition matrix, map the elements to indices, assign the counts as values, and normalize (last two steps just like I did).
from collections import Counter
a = [['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
counts = Counter(map(tuple, a))
letters = 'ABCD'
p = []
for letter in letters:
d = sum(v for k, v in counts.items() if k[0] == letter)
p.append([counts.get((letter, x), 0) / d for x in letters])
print(p)
Output:
[[0.0, 0.5, 0.0, 0.5],
[0.5, 0.25, 0.25, 0.0],
[0.0, 1.0, 0.0, 0.0],
[0.5, 0.0, 0.0, 0.5]]
This is a problem that fits itertools and Counter perfectly. Take a look at the following1:
l = [['A', 'B'],
['B', 'B'],
['B', 'C'],
['C', 'B'],
['B', 'A'],
['A', 'D'],
['D', 'D'],
['D', 'A'],
['A', 'B'],
['B', 'A'],
['A', 'D']]
from collections import Counter
from itertools import product, groupby
unique_elements = set(x for y in l for x in y) # -> {'B', 'C', 'A', 'D'}
appearances = Counter(tuple(x) for x in l)
# generating all possible combinations to get the probabilities
all_combinations = sorted(list(product(unique_elements, unique_elements)))
# calculating and arranging the probabilities
table = []
for i, g in groupby(all_combinations, key=lambda x: x[0]):
g = list(g)
local_sum = sum(appearances.get(y, 0) for y in g)
table.append([appearances.get(x, 0) / local_sum for x in g])
# [[0.0, 0.5, 0.0, 0.5], [0.5, 0.25, 0.25, 0.0], [0.0, 1.0, 0.0, 0.0], [0.5, 0.0, 0.0, 0.5]]
1 I am assuming you have a mistake on the formulation of your question: "...where P_AA counts how many ["A","A"] are in the array a and so on divided by P_AA + P_AB + P_AC + P_AD...". You mean to divide with something else, right?
Related
Suppose I have the following lists:
l1 = [['a','b','c'],['x','y','z'],['i','j','k']]
l2 = [(0,0.1),(0,0.2),(2,0.3),(2,0.4),(1,0.5),(0,0.6)]
I want to merge the two list on the keys of the l2 and index of l1 such that I get:
[[0.1,['a','b','c'],
[0.2,['a','b','c'],
[0.3,['i','j','k'],
[0.4,['i','j','k'],
[0.5,['x','y','z'],
[0.6,['a','b','c']]
I wonder how one does this? as the merge is not both on keys.
A simple comprehension will do:
[[v, l1[i]] for i, v in l2]
# [[0.1, ['a', 'b', 'c']],
# [0.2, ['a', 'b', 'c']],
# [0.3, ['i', 'j', 'k']],
# [0.4, ['i', 'j', 'k']],
# [0.5, ['x', 'y', 'z']],
# [0.6, ['a', 'b', 'c']]]
A list-comprehension with some indexing will do it nicely
l1 = [['a', 'b', 'c'], ['x', 'y', 'z'], ['i', 'j', 'k']]
l2 = [(0, 0.1), (0, 0.2), (2, 0.3), (2, 0.4), (1, 0.5), (0, 0.6)]
result = [[value, l1[idx]] for idx, value in l2]
The for loop equiv
result = []
for idx, value in l2:
l1_item = l1[idx]
result.append([value, l1_item])
l1 = [['a','b','c'],['x','y','z'],['i','j','k']]
l2 = [(0,0.1),(0,0.2),(2,0.3),(2,0.4),(1,0.5),(0,0.6)]
print([[v,l1[k]] for k,v in l2])
I have 2 nested lists and I want to compare all of the nested lists in one list with all of the nested lists in the other.
Example Data:
list1 = [['a', 'b', 'c', 'x'], ['d', 'e', 'f', 'p'], ['g', 'h', 'i']]
list2 = [['g', 'a', 'c'], ['d', 'h', 'b'], ['e', 'f', 'x', 't', 'q']]
For the comparison I want to compare each nested list from one list to each nested list in the other list. The comparison score would be calculated by:
number of overlaps / length of the longer sublist
For the first nested list in list1 the scores with list2 would then be:
2/4 -> 0.5
1/4 -> 0.25
1/5 -> 0.2
It seems like the best way to do this would then be to create a matrix but I'm not sure if that is the best option or how to calculate the score.
I tried calculating the scores like this but the numbers didn't make sense:
overlaps = []
for sublist in list1:
for sublist2 in list2:
comb_list = sublist + sublist2
num_overlap = len(set(comb_list))
num_long = max(len(sublist), len(sublist2))
overlap = num_overlap/num_long
overlaps.append(overlap)
Solution
So the issue with your original approach was that you were calculating the length of set(sublist1 + sublist2) but what you wanted was the length of set(sublist1) & set(sublist2) or the length of the set intersection:
A = [['a', 'b', 'c', 'x'], ['d', 'e', 'f', 'p'], ['g', 'h', 'i']]
B = [['g', 'a', 'c'], ['d', 'h', 'b'], ['e', 'f', 'x', 't', 'q']]
overlaps = []
for a in A:
for b in B:
num_overlap = len(set(a) & set(b))
num_long = max(len(a), len(b))
overlaps.append(num_overlap / num_long)
Output:
[0.5, 0.25, 0.2, 0.0, 0.25, 0.4, 0.3333333333333333, 0.3333333333333333, 0.0]
Comprehension version
Since SO likes comprehensions so much:
overlaps = [len(set(a) & set(b)) / max(len(a), len(b)) for a in A for b in B]
What went wrong in original approach
For a quick example of why your original approach didn't work using the first two comparisons:
>>> a = ['a', 'b', 'c', 'x']
>>> b = ['g', 'a', 'c']
>>> comb_list = a + b
>>> comb_list
['a', 'b', 'c', 'x', 'g', 'a', 'c']
>>> set(comb_list)
{'c', 'g', 'a', 'x', 'b'}
Workaround for original approach
>>> len(comb_list) - len(set(comb_list))
2
But that's more work than what I propose in my solution above.
Dictionary version
Since a list of scores without context might be difficult to interpret, let's use a dictionary:
scores = {}
for i, a in enumerate(A, 1):
for j, b in enumerate(B, 1):
num_overlap = len(set(a) & set(b))
num_long = max(len(a), len(b))
scores[f"a{i}b{j}"] = num_overlap / num_long
for key, val in scores.items():
print(f"{key}: {val}")
Output:
a1b1: 0.5
a1b2: 0.25
a1b3: 0.2
a2b1: 0.0
a2b2: 0.25
a2b3: 0.4
a3b1: 0.3333333333333333
a3b2: 0.3333333333333333
a3b3: 0.0
I have a list with the length of n:
labels = ['a', 'b', 'c', 'd']
and a array with the size of m*n like:
values = array([[0. , 0.6, 0.3, 0.1],
[0.5, 0.1, 0.1, 0.3],
[0.1, 0.2, 0.3, 0.4]])
I wish to sort labels by each line of values and to generate a new m*n array like:
labels_new = [['a', 'd', 'c', 'b'],
['b', 'c', 'd', 'a'],
['a', 'b', 'c', 'd']]
Is there any simple way to achieve this?
You can use argsort function. Just use a numpy array for preserving the labels and then a simple indexing:
In [6]: labels = np.array(['a', 'b', 'c', 'd'])
In [7]: labels[np.argsort(values)]
Out[7]:
array([['a', 'd', 'c', 'b'],
['b', 'c', 'd', 'a'],
['a', 'b', 'c', 'd']], dtype='<U1')
Function argsort arranges an array in a srted order and returns the corresponding array of indexes. That array can be used to reorder the labels:
[array(labels)[np.argsort(v)].tolist() for v in values]
#[['a', 'd', 'c', 'b'],
# ['b', 'c', 'd', 'a'],
# ['a', 'b', 'c', 'd']]
I'm trying to count the number of occurrences of elements within a list, if such elements are also lists. The order is also important.
[PSEUDOCODE]
lst = [ ['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'b', 'c'], ['c', 'b', 'a'] ]
print( count(lst) )
> { ['a', 'b', 'c'] : 2, ['d', 'e', 'f']: 1, ['c', 'b', 'a']: 1 }
One important factor is that ['a', 'b', 'c'] != ['c', 'b', 'a']
I have tried:
from collections import counter
print( Counter([tuple(x) for x in lst]) )
print( [[x, list.count(x)] for x in set(lst)] )
Which both resulted in ['a', 'b', 'c'] = ['c', 'b', 'a'], one thing i didn't want
I also tried:
from collections import counter
print( Counter( lst ) )
Which only resulted in error; since lists can't be used as keys in dicts.
Is there a way to do this?
You can't have list as a key to the dict because dictionaries only allows immutable objects as it's key. Hence you need to firstly convert your objects to tuple. Then you may use collection.Counter to get the count of each tuple as:
>>> from collections import Counter
>>> my_list = [ ['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'b', 'c'], ['c', 'b', 'a'] ]
# v to type-cast each sub-list to tuple
>>> Counter(tuple(item) for item in my_list)
Counter({('a', 'b', 'c'): 2, ('d', 'e', 'f'): 1, ('c', 'b', 'a'): 1})
just use collections.Counter on some equivalent type but hashable: the tuple:
import collections
lst = [ ['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'b', 'c'], ['c', 'b', 'a'] ]
c = collections.Counter(tuple(x) for x in lst)
print(c)
result:
Counter({('a', 'b', 'c'): 2, ('d', 'e', 'f'): 1, ('c', 'b', 'a'): 1})
Lists are not hashable, but you can use tuples as a workaround:
l = [ ['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'b', 'c'], ['c', 'b', 'a'] ]
new_l = list(map(tuple, l))
final_l = {a:new_l.count(a) for a in new_l}
Output:
{('a', 'b', 'c'): 2, ('d', 'e', 'f'): 1, ('c', 'b', 'a'): 1}
Or, if you really want to use lists, you can create a custom class to mimic the functionality of a dictionary hashing lists:
class List_Count:
def __init__(self, data):
new_data = list(map(tuple, data))
self.__data = {i:new_data.count(i) for i in new_data}
def __getitem__(self, val):
newval = [b for a, b in self.__data.items() if list(a) == val]
if not newval:
raise KeyError("{} not found".format(val))
return newval[0]
def __repr__(self):
return "{"+"{}".format(', '.join("{}:{}".format(list(a), b) for a, b in self.__data.items()))+"}"
l = List_Count([ ['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'b', 'c'], ['c', 'b', 'a'] ])
print(l)
print(l[['a', 'b', 'c']])
Output:
{['a', 'b', 'c']:2, ['d', 'e', 'f']:1, ['c', 'b', 'a']:1}
2
Another implementation with lists
l1 = [["a", "b", "c"], ["b", "c", "d"], ["a", "b", "c"], ["c", "b", "a"]]
def unique(l1):
l2 = []
for element in l1:
if element not in l2:
l2.append(element)
return l2
l2 = unique(l1)
for element in l2:
print(element, l1.count(element))
and if you want an dictionary from that you can just change the last part to
output = {element:l1.count(element) for element in unique(l1)}
Don't Use list as variable name.
You can try this approach if you don't want to use any module :
list_1 = [ ['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'b', 'c'], ['c', 'b', 'a'] ]
track={}
for i in list_1:
if tuple(i) not in track:
track[tuple(i)]=1
else:
track[tuple(i)]+=1
print(track)
outoput:
{('a', 'b', 'c'): 2, ('d', 'e', 'f'): 1, ('c', 'b', 'a'): 1}
You can also use default dict:
list_1 = [ ['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'b', 'c'], ['c', 'b', 'a'] ]
track={}
import collections
d=collections.defaultdict(list)
for j,i in enumerate(list_1):
d[tuple(i)].append(j)
print(list(map(lambda x:{x:len(d[x])},d.keys())))
I have 2 lists in following format:
list1 = [[[a, b], 0.2], [[a, c], 0.5], [[a, d], 0.3], [[b, d], 0.1]]
# list1 is sorted by first element of its sublist
list2 = [[a, b], [a, d], [b, d]]
# list2 is sorted
I want SUM of all 'second elements of sublist of list1' corresponding to each element in list2
therefore sum should be 0.2 + 0.3 + 0.1 = 0.6
note : elements in subsets always exists in list1
My solution:
list11=[]
list12=[]
for i in list1:
list11.append(i[0])
list12.append(i[1])
sum=0
for i in list2:
sum+=list12[list11.index(i)]
I hope there is a solution that does not involve creating temporary lists.
You can use a list comprehension to achieve this:
x = [['a', 'b'], ['a', 'd'], ['b', 'd']]
y = [[['a', 'b'], 0.2], [['a', c], 0.5], [['a', 'd'], 0.3], [['b', 'd'], 0.1]]
total = sum([b for a, b in y if a in x])
Demo
>>> x = [['a','b'], ['a', 'd'], ['b', 'd']]
>>> y = [[['a', 'b'], 0.2], [['a', 'c'], 0.5], [['a', 'd'], 0.3], [['b', 'd'], 0.1]]
>>> [b for a, b in y if a in x]
[0.2, 0.3, 0.1]
>>> sum([b for a, b in y if a in x])
0.6