How to efficiently search a list in python

How to efficiently search a list in python - python

I have a dictionary with only 4 keys (mydictionary) and a list (mynodes) as follows.
mydictionary = {0: {('B', 'E', 'G'), ('A', 'E', 'G'), ('A', 'E', 'F'), ('A', 'D', 'F'), ('C', 'D', 'F'), ('C', 'E', 'F'), ('A', 'D', 'G'), ('C', 'D', 'G'), ('C', 'E', 'G'), ('B', 'E', 'F')},
1: {('A', 'C', 'G'), ('E', 'F', 'G'), ('D', 'E', 'F'), ('A', 'F', 'G'), ('A', 'B', 'G'), ('B', 'D', 'F'), ('C', 'F', 'G'), ('A', 'C', 'E'), ('D', 'E', 'G'), ('B', 'F', 'G'), ('B', 'C', 'G'), ('A', 'C', 'D'), ('A', 'B', 'F'), ('B', 'D', 'G'), ('B', 'C', 'F'), ('A', 'D', 'E'), ('C', 'D', 'E'), ('A', 'C', 'F'), ('A', 'B', 'E'), ('B', 'C', 'E'), ('D', 'F', 'G')},
2: {('B', 'D', 'E'), ('A', 'B', 'D'), ('B', 'C', 'D')},
3: {('A', 'B', 'C')}}
mynodes = ['E', 'D', 'G', 'F', 'B', 'A', 'C']
I am checking how many times each node in mynodes list is in each key of mydictionary. For example, consider the above dictionary and list.
The output should be;
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}
For example, consider E. It appears 6 times in 0 key, 8 times in 1 key, 2 times in 2 key and 0 times in 3 key.
My current code is as follows.
triad_class_for_nodes = {}
for node in mynodes:
temp_list = []
for key, value in mydictionary.items():
temp_counting = 0
for triad in value:
#print(triad[0])
if node in triad:
temp_counting = temp_counting + 1
temp_list.append(tuple((key, temp_counting)))
triad_class_for_nodes.update({node: temp_list})
print(triad_class_for_nodes)
This works fine with the small dictionary values.
However, in my real dataset, I have millions of tuples in the value list for each of my 4 keys in my dictionary. Hence, my existing code is really inefficient and takes days to run.
When I search on how to make this more efficient I came accross this question (Fastest way to search a list in python), which suggests to make the list of values to a set. I tried this as well. However, it also takes days to run.
I am just wondering if there is a more efficient way of doing this in python. I am happy to transform my existing data formats into different structures (such as pandas dataframe) to make things more efficient.
A small sample of mydictionary and mynodes is attached below for testing purposes. https://drive.google.com/drive/folders/15Faa78xlNAYLPvqS3cKM1v8bV1HQzW2W?usp=sharing
mydictionary: see triads.txt
with open("triads.txt", "r") as file:
mydictionary = ast.literal_eval(file.read)
mynodes: see nodes.txt
with open("nodes.txt", "r") as file:
mynodes = ast.literal_eval(file.read)
I am happy to provide more details if needed.

Since you tag pandas, first we need convert your dict to pandas dataframe , then we stack it , and using crosstab
s=pd.DataFrame.from_dict(mydictionary,'index').stack()
s = pd.DataFrame(s.values.tolist(), index=s.index).stack()
pd.crosstab(s.index.get_level_values(0),s)
col_0 A B C D E F G
row_0
0 4 2 4 4 6 5 5
1 9 9 9 8 8 10 10
2 1 3 1 3 1 0 0
3 1 1 1 0 0 0 0
Update
s=pd.crosstab(s.index.get_level_values(0), s).stack().reset_index()
s[['row_0',0]].apply(tuple,1).groupby(s['col_0']).agg(list).to_dict()

If you're not using pandas, you could do this with Counter from collections:
from collections import Counter,defaultdict
from itertools import product
counts = Counter((c,k) for k,v in mydictionary.items() for t in v for c in t )
result = defaultdict(list)
for c,k in product(mynodes,mydictionary):
result[c].append((k,counts[(c,k)]))
print(result)
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}
Counter will manage counting instances for each combination of mydictionary key and node. You can then use these counts to create the expected output.
EDIT Expanded counts line:
counts = Counter() # initialize Counter() object
for key,tupleSet in mydictionary.items(): # loop through dictionary
for tupl in tupleSet: # loop through tuple set of each key
for node in tupl: # loop through node character in each tuple
counts[(node,key]] += 1 # count 1 node/key pair

Related

Python Exercise Tuples, set and list

can you please help me? I tried everything, but I'm lost.
From the data I need to get this list of tuples.
The idea is to organize this list in the best readable way.
From the data, if I found a 1, I need to make a tuple with the value and a list with the letters from the second column, so It would appear in order, like ['E', 'B', 'E'].
This is the data:
[(1, 'E'),
(2, 'A'),
(5, 'B'),
(3, 'A'),
(6, 'C'),
(7, 'A'),
(9, 'A'),
(1, 'B'),
(2, 'E'),
(3, 'B'),
(7, 'C'),
(5, 'C'),
(3, 'D'),
(8, 'E'),
(9, 'B'),
(8, 'D'),
(3, 'E'),
(5, 'D'),
(8, 'E'),
(9, 'E'),
(7, 'E'),
(3, 'E'),
(5, 'D'),
(9, 'A'),
(4, 'E'),
(6, 'E'),
(8, 'A'),
(5, 'E'),
(6, 'A'),
(0, 'C'),
(9, 'A'),
(3, 'D'),
(5, 'E'),
(4, 'B'),
(6, 'B'),
(7, 'D'),
(8, 'B'),
(9, 'C'),
(1, 'E'),
(5, 'E')]
# Result/
# ('0', ['C'])
# ('1', ['E', 'B', 'E'])
# ('2', ['A', 'E'])
# ('3', ['A', 'B', 'D', 'E', 'E', 'D'])
# ('4', ['E', 'B'])
# ('5', ['B', 'C', 'D', 'D', 'E', 'E', 'E'])
# ('6', ['C', 'E', 'A', 'B'])
# ('7', ['A', 'C', 'E', 'D'])
# ('8', ['E', 'D', 'E', 'A', 'B'])
# ('9', ['A', 'B', 'E', 'A', 'A', 'C'])

You can use defaultdict:
from collections import defaultdict
data = [(1, 'E'), (2, 'A'), (5, 'B'), (3, 'A'), (6, 'C'), (7, 'A'), (9, 'A'), (1, 'B'),
(2, 'E'), (3, 'B'), (7, 'C'), (5, 'C'), (3, 'D'), (8, 'E'), (9, 'B'), (8, 'D'),
(3, 'E'), (5, 'D'), (8, 'E'), (9, 'E'), (7, 'E'), (3, 'E'), (5, 'D'), (9, 'A'),
(4, 'E'), (6, 'E'), (8, 'A'), (5, 'E'), (6, 'A'), (0, 'C'), (9, 'A'), (3, 'D'),
(5, 'E'), (4, 'B'), (6, 'B'), (7, 'D'), (8, 'B'), (9, 'C'), (1, 'E'), (5, 'E')]
d = defaultdict(list)
for x, y in data:
d[str(x)].append(y)
output = sorted(d.items())
print(output)
Output:
[('0', ['C']), ('1', ['E', 'B', 'E']), ('2', ['A', 'E']), ('3', ['A', 'B', 'D', 'E', 'E', 'D']), ('4', ['E', 'B']), ('5', ['B', 'C', 'D', 'D', 'E', 'E', 'E']), ('6', ['C', 'E', 'A', 'B']), ('7', ['A', 'C', 'E', 'D']), ('8', ['E', 'D', 'E', 'A', 'B']), ('9', ['A', 'B', 'E', 'A', 'A', 'C '])]
Alternatively, if you know the keys a priori, you can set the keys in advance. This won't need defaultdict nor sorted (the latter in python 3.7+).
d = {str(k): [] for k in range(10)}
for x, y in data:
d[str(x)].append(y)
output = list(d.items())

How to readably and efficeintly create a dictionary of iterable combinations with a tuple of their indices as keys?

So i have some code that works, but it is at best hard to read, and I feel inefficient as it uses two list comprehensions where a single one should suffice.
What I need is to create a dictionary of all n combinations of the letters in alpha, with the key to the dictionary for each item being a tuple of the indices in alpha for the elements in the combination. This should work for any n:
n=2
from itertools import combinations
alpha = "abcde"
n = 2
D = {tuple([c_i[0] for c_i in comb]): tuple([c_i[1] for c_i in comb])
for comb in combinations(enumerate(alpha), n)}
>>>{(0, 1): ('a', 'b'),
(0, 2): ('a', 'c'),
(0, 3): ('a', 'd'),
(0, 4): ('a', 'e'),
(1, 2): ('b', 'c'),
(1, 3): ('b', 'd'),
(1, 4): ('b', 'e'),
(2, 3): ('c', 'd'),
(2, 4): ('c', 'e'),
(3, 4): ('d', 'e')}
n=3
from itertools import combinations
alpha = "abcde"
n = 3
D = {tuple([c_i[0] for c_i in comb]): tuple([c_i[1] for c_i in comb])
for comb in combinations(enumerate(alpha), n)}
>>>{(0, 1, 2): ('a', 'b', 'c'),
(0, 1, 3): ('a', 'b', 'd'),
(0, 1, 4): ('a', 'b', 'e'),
(0, 2, 3): ('a', 'c', 'd'),
(0, 2, 4): ('a', 'c', 'e'),
(0, 3, 4): ('a', 'd', 'e'),
(1, 2, 3): ('b', 'c', 'd'),
(1, 2, 4): ('b', 'c', 'e'),
(1, 3, 4): ('b', 'd', 'e'),
(2, 3, 4): ('c', 'd', 'e')}
This is working as desired, but I want to know if there is a more readable implementation, or one where I don't need a separate comprehension for [c_i[0] for c_i in comb] and [c_i[1] for c_i in comb] as this feels inefficient.
Note: this is a minimal case representation of a more complex problem where the elements of alpha are arguments to an expensive function and I want to store the output of f(alpha[i], alpha[j], alpha[k]) in a dictionary for ease of lookup without recomputation: ans = D[(i, j, k)]

Try this: (I feel it's a lot less complicated than the other answer, but that one works well too)
from itertools import combinations
alpha = "abcde"
n = 2
print({key: tuple([alpha[i] for i in key]) for key in combinations(range(len(alpha)), n)})
Output:
{(0, 1): ('a', 'b'), (0, 2): ('a', 'c'), (0, 3): ('a', 'd'), (0, 4): ('a', 'e'), (1, 2): ('b', 'c'), (1, 3): ('b', 'd'), (1, 4): ('b', 'e'), (2, 3): ('c', 'd'), (2, 4): ('c', 'e'), (3, 4): ('d', 'e')}

One way to avoid the seemingly redundant tuple key-value formation is to use zip with an assignment expression:
from itertools import combinations
alpha = "abcde"
n = 2
D = {(k:=list(zip(*comb)))[0]:k[1] for comb in combinations(enumerate(alpha), n)}
Output:
{(0, 1): ('a', 'b'), (0, 2): ('a', 'c'), (0, 3): ('a', 'd'), (0, 4): ('a', 'e'), (1, 2): ('b', 'c'), (1, 3): ('b', 'd'), (1, 4): ('b', 'e'), (2, 3): ('c', 'd'), (2, 4): ('c', 'e'), (3, 4): ('d', 'e')}

Group a list of N elements into sub-groups of two or one

I have a list of n numbers, e.g. [1, 2, 3, 4] and I want to write some code in python which finds all possible combinations where the numbers are grouped into either 2s or 1s and all n elements are present.
For example with the n=4 case the solution will be,
[(0, 1, 2, 3), ((0, 1), 2, 3), ((0, 2), 1, 3), ((0, 3), 2, 1),
((1, 2), 0, 3), ((1, 3), 0, 2), ((2, 3), 0, 1), ((0, 1), (2, 3)),
((0, 2), (1, 3)), ((0, 3), (1, 2))]
To give an intuitive explanation, I am looking for all combinations of four people where they are either in a pair or working alone.
I have tried the following mixture of combinations but I cannot see a clear method of continuing from here,
pairs = list(combinations([0, 1, 2, 3], 2))
groups = list(combinations(pairs, 2))
unique = []
for p in groups:
if len(set([i for sub in p for i in sub])) == 4:
unique.append(p)
unique + pairs
>> [((0, 1), (2, 3)),
((0, 2), (1, 3)),
((0, 3), (1, 2)),
(0, 1),
(0, 2),
(0, 3),
(1, 2),
(1, 3),
(2, 3)]
If I could fill in each entry so that all four numbers were present by inserting them as individuals (not in pairs) this would give my solution.
I don't need to do this for very large lists so I am not overly concerned with running time (I say this as I am aware that my current method can get out of hand quickly with large n).
Any help would be great!
Thanks

This was interestingly tricky. I used the combinations function to pull out each even subset and then wrote a recursive pairing process to make all possible pairings in those even subsets.
from itertools import combinations
def make_pairs(source, pairs=None, used=None):
if pairs is None: # entry level
sin = 0
results = []
pairs = []
used = [False] * len(source)
else:
sin = 1
while used[sin]:
sin +=1
used[sin] = True
for dex in range(sin + 1, len(source)):
if not used[dex]:
pairs.append( (source[sin], source[dex]))
if len(pairs)*2 == len(source):
yield tuple(pairs)
else:
used[dex] = True
yield from make_pairs(source, pairs, used)
used[dex] = False
pairs.pop()
used[sin]=False
def make_ones_and_twos(source):
yield tuple(source) # all singles case
inpair = 2
while inpair <= len(source):
for paired in combinations(source,inpair): # choose elements going in pairs
singles = tuple(sorted(set(source)-set(paired))) # others are singleton
# partition paired into actual pairs
for pairing in make_pairs(paired):
yield (pairing+singles)
inpair += 2
# sample run
elements = ['a','c','g','t','u']
print(*make_ones_and_twos(elements),sep='\n')
giving output
('a', 'c', 'g', 't', 'u')
(('a', 'c'), 'g', 't', 'u')
(('a', 'g'), 'c', 't', 'u')
(('a', 't'), 'c', 'g', 'u')
(('a', 'u'), 'c', 'g', 't')
(('c', 'g'), 'a', 't', 'u')
(('c', 't'), 'a', 'g', 'u')
(('c', 'u'), 'a', 'g', 't')
(('g', 't'), 'a', 'c', 'u')
(('g', 'u'), 'a', 'c', 't')
(('t', 'u'), 'a', 'c', 'g')
(('a', 'c'), ('g', 't'), 'u')
(('a', 'g'), ('c', 't'), 'u')
(('a', 't'), ('c', 'g'), 'u')
(('a', 'c'), ('g', 'u'), 't')
(('a', 'g'), ('c', 'u'), 't')
(('a', 'u'), ('c', 'g'), 't')
(('a', 'c'), ('t', 'u'), 'g')
(('a', 't'), ('c', 'u'), 'g')
(('a', 'u'), ('c', 't'), 'g')
(('a', 'g'), ('t', 'u'), 'c')
(('a', 't'), ('g', 'u'), 'c')
(('a', 'u'), ('g', 't'), 'c')
(('c', 'g'), ('t', 'u'), 'a')
(('c', 't'), ('g', 'u'), 'a')
(('c', 'u'), ('g', 't'), 'a')

This may do it:
from itertools import combinations
elements = [0, 1, 2, 3]
pairs = list(combinations(elements, 2))
groups = list(combinations(pairs, 2))
unique = []
for p in groups:
if len(set([i for sub in p for i in sub])) == 4:
unique.append(p)
pairs_with_singles = []
for pair in pairs:
missing = set(elements) - set(pair)
pairs_with_singles.append((pair, *missing))
print(unique + pairs_with_singles + [tuple(elements)])

Changing dictionary format

I want to change a dictionary below ...
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
into other form like this:
dict = [
('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7),
('B', 'D', 5), ('C', 'D', 12)]
This is what I done.
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
if(i[0] in dict):
value = dict[i[0]]
newvalue = i[1],i[2]
value.append(newvalue)
dict1[i[0]]=value
else:
newvalue = i[1],i[2]
l=[]
l.append(newvalue)
dict[i[0]]=l
print(dict)
Thanks

Python tuple is an immutable object. Hence any operation that tries to modify it (like append) is not allowed. However, following workaround can be used.
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
new_dict = []
for key, tuple_list in dict.items():
for tuple_item in tuple_list:
entry = list(tuple_item)
entry.append(key)
new_dict.append(tuple(entry))
print(new_dict)
Output:
[('B', 1, 'A'), ('C', 3, 'A'), ('D', 7, 'A'), ('D', 5, 'B'), ('D', 12, 'C')]

A simple aproach could be
new_dict = []
for letter1, list in dict.items():
for letter2, value in list:
new_dict.append([letter1, letter2, value])

With list comprehension;
dict_ = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
result = [(key, value[0], value[1]) for key, list_ in dict_.items() for value in list_]
Output;
[('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7), ('B', 'D', 5), ('C', 'D', 12)]

You can iterate through the dictionary using .items(). Notice that each value is by itself a list of tuples. We want to unpack each tuple, so we need a nested for-loop as shown below. res is the output list that we will populate within the loop.
res = []
for key, values in dict.items():
for value in values:
res.append((key, value[0], value[1]))
Sample output:
>>> res
[('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7), ('B', 'D', 5), ('C', 'D', 12)]
EDIT: If value is a tuple of more than two elements, we would modify the last line as follows, using tuple unpacking:
res.append((key, *value))
This effectively unpacks all the elements of value. For example,
>>> test = (1, 2, 3)
>>> (0, *test)
(0, 1, 2, 3)

pytest nested function parametrization

If I have this list of tuples:
[(['a', 'b', 'c'], [1, 2, 3]),
(['d', 'e', 'f'], [4, 5, 6])]
How can I parametrize a test function, so the following pairs are tested:
[('a', 1), ('a', 2), ('a', 3),
('b', 1), ('b', 2), ('b', 3),
('c', 1), ('c', 2), ('c', 3),
('d', 4), ('d', 5), ('d', 6),
('e', 4), ('e', 5), ('e', 6),
('f', 4), ('f', 5), ('f', 6)]
I'm aware that two stacked decorators will combine the two lists in one of the tuples.

Use itertools.product.
Example code is here:
import itertools
A = [(['a', 'b', 'c'], [1, 2, 3]),
(['d', 'e', 'f'], [4, 5, 6])]
L = []
for i in range(len(A)):
L += list(itertools.product(A[i][0], A[i][1]))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to efficiently search a list in python - python

Related

Python Exercise Tuples, set and list

How to readably and efficeintly create a dictionary of iterable combinations with a tuple of their indices as keys?

Group a list of N elements into sub-groups of two or one

Changing dictionary format

pytest nested function parametrization

Categories

Resources