all combination of a complicated list - python

I want to find all possible combination of the following list:
data = ['a','b','c','d']
I know it looks a straightforward task and it can be achieved by something like the following code:
comb = [c for i in range(1, len(data)+1) for c in combinations(data, i)]
but what I want is actually a way to give each element of the list data two possibilities ('a' or '-a').
An example of the combinations can be ['a','b'] , ['-a','b'], ['a','b','-c'], etc.
without something like the following case of course ['-a','a'].

You could write a generator function that takes a sequence and yields each possible combination of negations. Like this:
import itertools
def negations(seq):
for prefixes in itertools.product(["", "-"], repeat=len(seq)):
yield [prefix + value for prefix, value in zip(prefixes, seq)]
print list(negations(["a", "b", "c"]))
Result (whitespace modified for clarity):
[
[ 'a', 'b', 'c'],
[ 'a', 'b', '-c'],
[ 'a', '-b', 'c'],
[ 'a', '-b', '-c'],
['-a', 'b', 'c'],
['-a', 'b', '-c'],
['-a', '-b', 'c'],
['-a', '-b', '-c']
]
You can integrate this into your existing code with something like
comb = [x for i in range(1, len(data)+1) for c in combinations(data, i) for x in negations(c)]

Once you have the regular combinations generated, you can do a second pass to generate the ones with "negation." I'd think of it like a binary number, with the number of elements in your list being the number of bits. Count from 0b0000 to 0b1111 via 0b0001, 0b0010, etc., and wherever a bit is set, negate that element in the result. This will produce 2^n combinations for each input combination of length n.

Here is one-liner, but it can be hard to follow:
from itertools import product
comb = [sum(t, []) for t in product(*[([x], ['-' + x], []) for x in data])]
First map data to lists of what they can become in results. Then take product* to get all possibilities. Finally, flatten each combination with sum.

My solution basically has the same idea as John Zwinck's answer. After you have produced the list of all combinations
comb = [c for i in range(1, len(data)+1) for c in combinations(data, i)]
you generate all possible positive/negative combinations for each element of comb. I do this by iterating though the total number of combinations, 2**(N-1), and treating it as a binary number, where each binary digit stands for the sign of one element. (E.g. a two-element list would have 4 possible combinations, 0 to 3, represented by 0b00 => (+,+), 0b01 => (-,+), 0b10 => (+,-) and 0b11 => (-,-).)
def twocombinations(it):
sign = lambda c, i: "-" if c & 2**i else ""
l = list(it)
if len(l) < 1:
return
# for each possible combination, make a tuple with the appropriate
# sign before each element
for c in range(2**(len(l) - 1)):
yield tuple(sign(c, i) + el for i, el in enumerate(l))
Now we apply this function to every element of comb and flatten the resulting nested iterator:
l = itertools.chain.from_iterable(map(twocombinations, comb))

Related

How to use enumerate in a list comprehension with two lists?

I just started to use list comprehension and I'm struggling with it. In this case, I need to get the n number of each list (sequence_0 and sequence_1) that the iteration is at each time. How can I do that?
The idea is to get the longest sequence of equal nucleotides (a motif) between the two sequences. Once a pair is finded, the program should continue in the nexts nucleotides of the sequences, checking if they are also equal and then elonganting the motif with it. The final output should be an list of all the motifs finded.
The problem is, to continue in the next nucleotides once a pair is finded, i need the position of the pair in both sequences to the program continue. The index function does not work in this case, and that's why i need the enumerate.
Also, I don't understand exactly the reason for the x and y between (), it would be good to understand that too :)
just to explain, the content of the lists is DNA sequences, so its basically something like:
sequence_1 = ['A', 'T', 'C', 'A', 'C']
def find_shared_motif(arq):
data = fastaread(arq)
seqs = [list(sequence) for sequence in data.values()]
motifs = [[]]
i = 0
sequence_0, sequence_1 = seqs[0], seqs[1] # just to simplify
for x, y in [(x, y) for x in zip(sequence_0[::], sequence_0[1::]) for y in zip(sequence_1[::], sequence_1[1::])]:
print(f'Pairs {"".join(x)} and {"".join(y)} being analyzed...')
if x == y:
print(f'Pairs {"".join(x)} and {"".join(y)} match!')
motifs[i].append(x[0]), motifs[i].append(x[1])
k = sequence_0.index(x[0]) + 2 # NAO ESTA DEVOLVENDO O NUMERO CERTO
u = sequence_1.index(y[0]) + 2
print(k, u)
# Determines if the rest of the sequence is compatible
print(f'Starting to elongate the motif {x}...')
for j, m in enumerate(sequence_1[u::]):
try:
# Checks if the nucleotide is equal for both of the sequences
print(f'Analyzing the pair {sequence_0[k + j]}, {m}')
if m == sequence_0[k + j]:
motifs[i].append(m)
print(f'The pair {sequence_0[k + j]}, {m} is equal!')
# Stop in the first nonequal residue
else:
print(f'The pair {sequence_0[k + j]}, {m} is not equal.')
break
except IndexError:
print('IndexError, end of the string')
else:
i += 1
motifs.append([])
return motifs
...
One way to go with it is to start zipping both lists:
a = ['A', 'T', 'C', 'A', 'C']
b = ['A', 'T', 'C', 'C', 'T']
c = list(zip(a,b))
In that case, c will have the list of tuples below
c = [('A','A'), ('T','T'), ('C','C'), ('A','C'), ('C','T')]
Then, you can go with list comprehension and enumerate:
d = [(i, t) for i, t in enumerate(c)]
This will bring something like this to you:
d = [(0, ('A','A')), (1, ('T','T')), (2, ('C','C')), ...]
Of course you can go for a one-liner, if you want:
d = [(i, t) for i, t in enumerate(zip(a,b))]
>>> [(0, ('A','A')), (1, ('T','T')), (2, ('C','C')), ...]
Now, you have to deal with the nested tuples. Focus on the internal ones. It is obvious that what you want is to compare the first element of the tuples with the second ones. But, also, you will need the position where the difference resides (that lies outside). So, let's build a function for it. Inside the function, i will capture the positions, and t will capture the inner tuples:
def compare(a, b):
d = [(i, t) for i, t in enumerate(zip(a,b))]
for i, t in d:
if t[0] != t[1]:
return i
return -1
In that way, if you get -1 at the end, it means that all elements in both lists are equal, side by side. Otherwise, you will get the position of the first difference between them.
It is important to notice that, in the case of two lists with different sizes, the zip function will bring a list of tuples with the size matching the smaller of the lists. The extra elements of the other list will be ignored.
Ex.
list(zip([1,2], [3,4,5]))
>>> [(1,3), (2,4)]
You can use the function compare with your code to get the positions where the lists differ, and use that to build your motifs.

How to make an list if i have to start with A end according to length of data generate list containing [A,B,C...,AA]?

Get stuck at one point.
I have one list containing n elements so according to length,
I have to generate list.
For example i have an list contains 25 element then new list will be [A,B,C,...,Y] and for example list contains 26 elements then [A,B,C,...,Z].
Up to z i can easily get list but now i am getting more than 26 elements.
for example length is 27 then i want these type of list [A,B,C,...,Y,Z,AA].
so, how i am able to get these type of list any suggestion ???
Here is a way to write this with itertools: itr generates an infinite number of combinations, first of length 1, then of length 2, etc. Using islice takes the required number of elements.
from string import ascii_uppercase
from itertools import product, count, islice
itr = ("".join(tup)
# choose number of letters, e.g. 1, 2, 3
for k in count(1)
# choose all tuples of k letters, e.g. (A, ), (B, ), ... (A, A,) ...
for tup in product(ascii_uppercase, repeat=k))
res = list(islice(itr, 28))
you can do something like this:
import string
def letters_list_generator(number):
alphabet_string = string.ascii_uppercase
alphabet_list = list(alphabet_string)
letter_list = []
for i in range(0,number):
multiplier = int(i/26)+1
letter_index = i%26
string_for_list = alphabet_list[letter_index] * multiplier
letter_list.append(string_for_list)
return letter_list
For example you can try this code:
list_1 = ['A', 'B', 'C']
list_2 = ['1', '2', '3']
list_3 = []
for i in range(3):
list_3.append(list_1[i] + list_2[i]
The result is sum = ['A1', 'B2', 'C3']. If you play with the for loop, you can obtain what you want.

Generating all possible k-mers (string combinations) from a given list

I have a string S that is composed of 20 characters:
S='ARNDCEQGHILKMFPSTWYV'
I need to generate all possible k-mer combinations from a given input k.
When k == 3, then there are 8000 combinations (20*20*20) and the output list looks like this:
output = ['AAA', 'AAR', ..., 'AVV', ..., 'VVV'] #len(output)=8000
When k == 2, then there are 400 combinations (20*20) and the output list looks like this:
output = ['AA', 'AR', 'AN', ..., 'VV'] #len(output)=400
When k == 1, then there are only 20 combinations:
output =['A', 'R', 'N', ..., 'Y', 'V'] #len(output)=20
I know how to do this if the number k is fixed, like if k == 3, then I can do this:
for a in S:
for b in S:
for c in S:
output.append(a+b+c)
#then len(output)=8000
But the number k is chosen randomly.
I tried to use permutations, but it does not given me strings with repeated letters like 'AAA', but maybe it can and I'm just doing it wrong.
What you are looking for is itertools.product(). You can use repeat argument for the number of k's in your algorithm.
from itertools import product
...
list(product('ARNDCEQGHILKMFPSTWYV', repeat=2)) # len = 400
list(product('ARNDCEQGHILKMFPSTWYV', repeat=3)) # len = 8000
Bear in mind it returns tuples of characters as default, if you want strings instead, you can join using list comprehensions as below:
[''.join(c) for c in product('ARNDCEQGHILKMFPSTWYV', repeat=3)]
# ['AAA', 'AAR', ..., 'AVV', ..., 'VVV']
You can use itertools.product and generate the random value for k:
import itertools
import random
S = 'ARNDCEQGHILKMFPSTWYV'
final_results = map(''.join, itertools.product(*[S]*random.randint(1, 10)))
Just generate random integer V in range 0..L^k-1 where L is string length and k is length of k-mer.
Then build corresponding combination
V = Random(L**k)
for i in range(k):
C[i] = A[V % L] ///i-th letter using integer modulo
V = V // L ///integer division

How to separate uppercase and lowercase letters in a string?

I have written code that separates the characters at 'even' and 'odd' indices, and I would like to modify it so that it separates characters by upper/lower case.
I can't figure out how to do this for a string such as "AbBZxYp". I have tried using .lower and .upper but I think I'm using them incorrectly.
def upperLower(string):
odds=""
evens=""
for index in range(len(string)):
if index % 2 == 0:
evens = evens + string[index]
if not (index % 2 == 0):
odds = odds + string[index]
print "Odds: ", odds
print "Evens: ", evens
Are you looking to get two strings, one with all the uppercase letters and another with all the lowercase letters? Below is a function that will return two strings, the upper then the lowercase:
def split_upper_lower(input):
upper = ''.join([x for x in input if x.isupper()])
lower = ''.join([x for x in input if x.islower()])
return upper, lower
You can then call it with the following:
upper, lower = split_upper_lower('AbBZxYp')
which gives you two variables, upper and lower. Use them as necessary.
>>> filter(str.isupper, "AbBZxYp")
'ABZY'
>>> filter(str.islower, "AbBZxYp")
'bxp'
Btw, for odd/even index you could just do this:
>>> "AbBZxYp"[::2]
'ABxp'
>>> "AbBZxYp"[1::2]
'bZY'
There is an itertools recipe called partition that can do this. Here is the implementation:
From itertools recipes:
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
Upper and Lowercase Letters
You can manually implement the latter recipe, or install a library that implements it for you, e.g. pip install more_itertools:
import more_itertools as mit
iterable = "AbBZxYp"
pred = lambda x: x.islower()
children = mit.partition(pred, iterable)
[list(c) for c in children]
# [['A', 'B', 'Z', 'Y'], ['b', 'x', 'p']]
Here partition uses a predicate function to determine if each item in an iterable is lowercase. If not, it is filtered into the false group. Otherwise, it is filtered into the group of true items. We iterate to expose these groups.
Even and Odd Indices
You can modify this to work for odd and even indices as well:
import itertools as it
import more_itertools as mit
iterable = "AbBZxYp"
pred = lambda x: x[0] % 2 != 0
children = mit.partition(pred, tuple(zip(it.count(), iterable)))
[[i[1] for i in list(c)] for c in children]
# [['A', 'B', 'x', 'p'], ['b', 'Z', 'Y']]
Here we zip an itertools.count() object to enumerate the iterable. Then we iterate the children so that the sub items yield the letters only.
See also more_itertools docs for more tools.

python how to efficiently cycle through few elements in a list

I have a very long list in wich I would like to replace strings. I have made a simplified example below to illustrate my problem.
my_list = ['a7_1_1', 'a7_2_1', 'a7_3_1','a7_1_2', 'a7_2_2', 'a7_3_2','a7_1_3', 'a7_2_3', 'a7_3_3']
Out[12]:
['a7_1_1',
'a7_2_1',
'a7_3_1',
'a7_1_2',
'a7_2_2',
'a7_3_2',
'a7_1_3',
'a7_2_3',
'a7_3_3'
I would like to replace the strings with a suffix added to the first 3 strings so the final list should look like:
my_new_list = ['a7_1_1', 'a7_2_1', 'a7_3_1','a7_1_1.1', 'a7_2_1.1', 'a7_3_1.1','a7_1_1.2', 'a7_2_1.2', 'a7_3_1.2']
Out[15]:
['a7_1_1',
'a7_2_1',
'a7_3_1',
'a7_1_1.1',
'a7_2_1.1',
'a7_3_1.1',
'a7_1_1.2',
'a7_2_1.2',
'a7_3_1.2']
Is there an easy way to do this?
Using itertools.cycle() function
import itertools as it #1
def cycle_first_n(lst, n):
""" cycles through first n elements of the list """
c = it.cycle(lst[:n]) #2
for idx in xrange(len(lst)): #3
sfx = idx / n
yield c.next() + ('.' + str(sfx) if sfx > 0 else '') #4
itertools is a library for creating iterators for
efficient looping
creates an iterator to cycles through a slice of n elements of
the list
use xrange rather than range to avoid creating a presumably long
list in memory (see the question)
yield means we are creating a generator. Again to avoid creating a
long list in memory
How to use the function
lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
for o in cycle_first_n(lst, 3):
print o,
Output
a b c a.1 b.1 c.1 a.2 b.2
I'm not very clear with what you mean. Check if you want to do this:
>>> my_list = ['a7_1_1', 'a7_2_1', 'a7_3_1','a7_1_2', 'a7_2_2', 'a7_3_2','a7_1_3', 'a7_2_3', 'a7_3_3']
>>> my_new_list = sum([[x, x+'.1', x+'.2'] for x in my_list[:3]], [])
>>> print(my_new_list)
['a7_1_1', 'a7_1_1.1', 'a7_1_1.2', 'a7_2_1', 'a7_2_1.1', 'a7_2_1.2', 'a7_3_1', 'a7_3_1.1', 'a7_3_1.2']

Categories

Resources