Related
I have a set of Strings: {'Type A', 'Type B', 'Type C'} for instance, I'll call it x. The set can have up to 10 strings.
There is also a big list of sets, for instance [{'Type A', 'Type B', 'Type C'}, {'Type A', 'Type B', 'Type C'}, {'Type B', 'Type C, 'Type D'}, {'Type E', 'Type F', 'Type G'}] and so on.
My goal is to return all the sets in the big list that contain 60% or more of the same elements as x. So in this example, it would return the first 3 sets but not the 4th.
I know I could iterate over every set, compare elements, and then use the number of similarities to go about my business, but this is quite time intensive and my big list will probably have many many sets. Is there a better way to go about this? I thought about using frozenset() and hashing them, but I'm not sure what hashing function I would use, and how I would compare hashes.
Any help would be appreciated - many thanks!
l = [{'Type A', 'Type B', 'Type C'}, {'Type A', 'Type B', 'Type C'}, {'Type B', 'Type C', 'Type D'}, {'Type E', 'Type F', 'Type G'}]
x = {'Type A', 'Type B', 'Type C'}
for s in l:
print (len(x.intersection(s)))
Output:
3
3
2
0
With a function and a list of tuples returned:
def more_than(l,n):
return [ (s,round(len(x.intersection(s))/len(x),2)) for s in l if len(x.intersection(s))/len(x) > n]
print (more_than(l,0.6))
Output:
[({'Type B', 'Type A', 'Type C'}, 1.0), ({'Type B', 'Type A', 'Type C'}, 1.0), ({'Type B', 'Type C', 'Type D'}, 0.67)]
Here, just for convenience, I used round(len(x.intersection(s))/len(x),2) which translates to round(x,y). The round() will simply round your ratio to the number of decimal mentioned using the y variable.
How about this?
x = {'Type A', 'Type B', 'Type C'}
lst = [{'Type A', 'Type B', 'Type C'},
{'Type A', 'Type B', 'Type C'},
{'Type B', 'Type C', 'Type D'},
{'Type E', 'Type F', 'Type G'}]
[s for s in lst if len(s.intersection(x)) > len(x) * 0.6]
Below is the use-case I am trying to solve:
I have 2 lists of lists: (l and d)
In [1197]: l
Out[1197]:
[['Cancer A', 'Ecog 9', 'Fill 6'],
['Cancer B', 'Ecog 1', 'Fill 1'],
['Cancer A', 'Ecog 0', 'Fill 0']]
In [1198]: d
Out[1198]: [[100], [200], [500]]
It's a 2-part problem here:
Sort l based on the priority of values. eg: Cancer, Ecog and Fill (in this case key=(0,1,2)). It could be anything like Ecog, Cancer, Fill so, key=(1,0,2).
Sort d in the same order in which l has been sorted int above step.
Step #1 I'm able to achieve, like below:
In [1199]: import operator
In [1200]: sorted_l = sorted(l, key=operator.itemgetter(0,1,2))
In [1201]: sorted_l
Out[1200]:
[['Cancer A', 'Ecog 0', 'Fill 0'],
['Cancer A', 'Ecog 9', 'Fill 6'],
['Cancer B', 'Ecog 1', 'Fill 1']]
Now, I want to sort values of d in the same order as the sorted_l.
Expected output:
In [1201]: d
Out[1201]: [[500], [100], [200]]
What is the best way to do this?
Below is the solution with help from #juanpa.arrivillaga :
In [1272]: import operator
In [1273]: key = operator.itemgetter(0, 1, 2)
# Here param key, lets you sort `l` with your own function.
In [1275]: sorted_l,sorted_d = zip(*sorted(zip(l, d), key=lambda x: key(x[0])))
In [1276]: sorted_l
Out[1276]:
(['Cancer A', 'Ecog 0', 'Fill 0'],
['Cancer A', 'Ecog 9', 'Fill 6'],
['Cancer B', 'Ecog 1', 'Fill 1'])
In [1277]: sorted_d
Out[1277]: ([500], [100], [200])
I have a list and wanted to know how to access/loop through a list at a certain index position. The reason I want to do this is to only change those elements in those positon. In this example, position 1 is the name of a client and position 2 is the age. I only want to run a condition for the age of a client.
L = [['Sam', '35'],['John', '45'], ['Steve', '99']]
L = ['Group 1' if '35' in x else 'Group 2' if '45' in x else 'Group 3' for x in L]
Result:
print(L)
L = ['Group 1', 'Group 2', 'Group 3']
What I actually want:
print(L)
L = [['Sam', 'Group 1'],['John', 'Group 2'], ['Steve', 'Group 3']]
You could try something like this:
L = [['Sam', '35'],['John', '45'], ['Steve', '99']]
L = [[x[0],'Group 1'] if '35' in x else [x[0],'Group 2'] if '45' in x else [x[0],'Group 3'] for x in L]
print(L)
Output:
[['Sam', 'Group 1'],['John', 'Group 2'], ['Steve', 'Group 3']]
You can try
L = [[i[0], 'Group 1' if i[1] in '35' else 'Group 2' if '45' in i[1] else 'Group 3'] for i in L]
Output
[['Sam', 'Group 1'], ['John', 'Group 2'], ['Steve', 'Group 3']]
You could just add the first element back in your generated elements:
L = [[x[0], 'Group 1' if '35' in x else 'Group 2' if '45' in x else 'Group 3'] for x in L]
This would give you the wanted output
Although valid answers have been posted, often when you are writing multiple if statements it could be worth using a dictionary as a lookup. This can more easily be extended with new groups.
group_lookup = {'35': 'Group 1', '45': 'Group 2', '99': 'Group 3'}
L = [[x[0], group_lookup[x[1]]] for x in L]
print(L)
The code below generates unique combinations:
from itertools import permutations
comb3 = permutations([1,1,1,0,0,0] , 3)
def removeDuplicates(listofElements):
# Create an empty list to store unique elements
uniqueList = []
# Iterate over the original list and for each element
# add it to uniqueList, if its not already there.
for elem in listofElements:
elif elem not in uniqueList:
uniqueList.append(elem)
# Return the list of unique elements
return uniqueList
comb3 = removeDuplicates(comb3)
for i in list(comb3):
print(i)
Intermediate Output
The result output is a list of tuples. It will be interpreted as A, B, C, 1 = EXIST, 0 = NOT EXIST.
(1, 1, 1)
(1, 1, 0)
(1, 0, 1)
(1, 0, 0)
(0, 1, 1)
(0, 1, 0)
(0, 0, 1)
(0, 0, 0)
convert to list of lists
Convert lists of tuples to a list of lists and replace its contents
res = [list(ele) for ele in comb3]
for i in list(res):
if(i[0] == 1):
i[0] = 'A Exist'
if(i[0] == 0):
i[0] = 'A Not Exist'
if(i[1] == 1):
i[1] = 'B Exist'
if(i[1] == 0):
i[1] = 'B Not Exist'
if(i[2] == 1):
i[2] = 'C Exist'
if(i[2] == 0):
i[2] = 'C Not Exist'
Display results
for i in list(res):
print(i)
Final Output
['A Exist', 'B Exist', 'C Exist']
['A Exist', 'B Exist', 'C Not Exist']
['A Exist', 'B Not Exist', 'C Exist']
['A Exist', 'B Not Exist', 'C Not Exist']
['A Not Exist', 'B Exist', 'C Exist']
['A Not Exist', 'B Exist', 'C Not Exist']
['A Not Exist', 'B Not Exist', 'C Exist']
['A Not Exist', 'B Not Exist', 'C Not Exist']
Is there a more elegant or better way of replacing the contents of a list of list?
>>> names = ['A', 'B', 'C']
>>> verbs = [' Not Exist', ' Exist']
>>> [[names[n] + verbs[v] for n, v in enumerate(c)] for c in comb3]
[['A Exist', 'B Exist', 'C Exist'],
['A Exist', 'B Exist', 'C Not Exist'],
['A Exist', 'B Not Exist', 'C Exist'],
['A Exist', 'B Not Exist', 'C Not Exist'],
['A Not Exist', 'B Exist', 'C Exist'],
['A Not Exist', 'B Exist', 'C Not Exist'],
['A Not Exist', 'B Not Exist', 'C Exist'],
['A Not Exist', 'B Not Exist', 'C Not Exist']]]
First, it's inefficient to use permutations and then filter them out. What you're looking for is a cartesian product. Using itertools.product with a repeat argument, you can get your desired intermediate output.
from itertools import product
comb3 = list(product([1,0], repeat=3))
#Output:
[(1, 1, 1),
(1, 1, 0),
(1, 0, 1),
(1, 0, 0),
(0, 1, 1),
(0, 1, 0),
(0, 0, 1),
(0, 0, 0)]
From this point: You can use iteration and a mapping to cleanly get your desired output as follows.
column_names = 'ABC' #To map all names with the number of items. We can think of these as column names.
code_mapping = {0: 'Not Exist', 1: 'Exist'} #For mapping the codes to meanings.
output = []
for item in comb3:
row = [f"{name} {code_mapping[code]}" for name, code in zip(column_names, item)]
output.append(row)
print(output)
Output:
[['A Exist', 'B Exist', 'C Exist'],
['A Exist', 'B Exist', 'C Not Exist'],
['A Exist', 'B Not Exist', 'C Exist'],
['A Exist', 'B Not Exist', 'C Not Exist'],
['A Not Exist', 'B Exist', 'C Exist'],
['A Not Exist', 'B Exist', 'C Not Exist'],
['A Not Exist', 'B Not Exist', 'C Exist'],
['A Not Exist', 'B Not Exist', 'C Not Exist']]
You can use set to remove duplicates from a list.
Then map them to list
from itertools import permutations
import string
from pprint import pprint
alphabet = string.ascii_uppercase
comb3 = permutations([1,1,1,0,0,0] , 3)
comb3 = list(map(list,set(comb3)))
for i in comb3:
for index, value in enumerate(i):
i[index] = f'{alphabet[index]}{ " Not " if value>0 else " "}Exists'
pprint(comb3)
output
[['A Not Exists', 'B Not Exists', 'C Exists'],
['A Exists', 'B Not Exists', 'C Not Exists'],
['A Exists', 'B Not Exists', 'C Exists'],
['A Not Exists', 'B Exists', 'C Exists'],
['A Exists', 'B Exists', 'C Not Exists'],
['A Not Exists', 'B Exists', 'C Not Exists'],
['A Exists', 'B Exists', 'C Exists'],
['A Not Exists', 'B Not Exists', 'C Not Exists']]
You can do all of these just in 2 line:
comb3 = list(set(permutations([1,1,1,0,0,0] , 3))) # set will remove duplicates automatically
result = [[f"{i} {'' if j else 'NOT '}Exist" for i, j in zip(["A", "B", "C"], k)] for k in comb3]
result will be:
[['A Exist', 'B Exist', 'C NOT Exist'],
['A NOT Exist', 'B Exist', 'C Exist'],
['A NOT Exist', 'B Exist', 'C NOT Exist'],
['A Exist', 'B NOT Exist', 'C NOT Exist'],
['A NOT Exist', 'B NOT Exist', 'C Exist'],
['A Exist', 'B NOT Exist', 'C Exist'],
['A NOT Exist', 'B NOT Exist', 'C NOT Exist'],
['A Exist', 'B Exist', 'C Exist']]
Note that:
f'' works with python3.6 or higher.
You could do something like this:
a = ("A", "B", "C")
res = [["{} {}Exist".format(x, '' if y else 'NOT ') for x, y in zip(a, sub)] for sub in comb3]
or like that:
a = ("A {}Exist", "B {}Exist", "C {}Exist")
res = [[x.format('' if sub[i] else 'NOT ') for i, x in enumerate(a)] for sub in lst]
or the most elegant of 'em all:
a = [("A Not Exist", "B Not Exist", "C Not Exist"), ("A Exist", "B Exist", "C Exist")]
res = [[a[x][i] for i, x in enumerate(sub)] for sub in lst]
and they all return:
print(res) # -> [['A Exist', 'B Exist', 'C Exist'],
# ['A Exist', 'B Exist', 'C NOT Exist'],
# ['A Exist', 'B NOT Exist', 'C Exist'],
# ['A Exist', 'B NOT Exist', 'C NOT Exist'],
# ['A NOT Exist', 'B Exist', 'C Exist'],
# ['A NOT Exist', 'B Exist', 'C NOT Exist'],
# ['A NOT Exist', 'B NOT Exist', 'C Exist'],
# ['A NOT Exist', 'B NOT Exist', 'C NOT Exist']]
(I'm sure this has been answered somewhere but I really couldn't find the right question. Perhaps I don't know the correct verb for this exercise?)
I have two lists:
prefix = ['A', 'B', 'C']
suffix = ['a', 'b']
And I want to get this:
output = ['A a', 'A b', 'B a', 'B b', 'C a', 'C b']
I am aware of the zip method, which stops at the shortest length among the lists joined:
output_wrong = [p+' '+s for p,s in zip(prefix,suffix)]
So what's the most Pythonic way of doing this?
EDIT:
While majority of the answers prefer itertools.product, I instead much prefer this:
output = [i + ' ' + j for i in prefix for j in suffix]
as it doesn't introduce a new package, however basic that package is (ok I don't know which way is faster and this might be a matter of personal preference).
Use List Comprehension
prefix = ['A', 'B', 'C']
suffix = ['a', 'b']
result = [val+" "+val2 for val in prefix for val2 in suffix ]
print(result)
OUTPUT
['A a', 'A b', 'B a', 'B b', 'C a', 'C b']
Using itertools.product and list comprehension,
>>> [i + ' ' + j for i, j in product(prefix, suffix)]
# ['A a', 'A b', 'B a', 'B b', 'C a', 'C b']
Use itertools.product:
import itertools
prefix = ['A', 'B', 'C']
suffix = ['a', 'b']
print([f'{x} {y}' for x, y in itertools.product(prefix, suffix)])
# ['A a', 'A b', 'B a', 'B b', 'C a', 'C b']
This is called a Cartesian product:
[p + ' ' + s for p, s in itertools.product(prefix, suffix)]
Use product,
In [33]: from itertools import product
In [34]: map(lambda x:' '.join(x),product(prefix,suffix))
Out[34]: ['A a', 'A b', 'B a', 'B b', 'C a', 'C b']
Simply use list comprehension:
prefix = ['A', 'B', 'C']
suffix = ['a', 'b']
output = [i+" "+j for i in prefix for j in suffix]
print(output)
Output:
['A a', 'A b', 'B a', 'B b', 'C a', 'C b']
from itertools import product
map(' '.join, product(prefix, suffix))
# ['A a', 'A b', 'B a', 'B b', 'C a', 'C b']