Related
I created this function and it finds the location of the base in a dna sequences. Like dna = ['A', 'G', 'C', 'G', 'T', 'A', 'G', 'T', 'C', 'G', 'A', 'T', 'C', 'A', 'A', 'T', 'T', 'A', 'T', 'A', 'C', 'G', 'A', 'T', 'C', 'G', 'G', 'G', 'T', 'A', 'T']. I need it to find more than one base at a time like 'A''T'. Can anyone help?
def position(list, value):
pos = []
for n in range(len(list)):
if list[n] == value:
pos.append(n)
return pos
You can work with the dna sequence as a string, and then use regex:
import re
dna_str = ''.join(dna)
pattern = r'AT'
pos = [(i.start(0), i.end(0)) for i in re.finditer(pattern, dna_str)]
print(pos)
[(10, 12), (14, 16), (17, 19), (22, 24), (29, 31)]
side note, good not to use keywords for variable names. list is a python keyword
def position(l: list, values: list): -> list
pos = []
for i, val in enumerate(l):
if val in values:
pos.append(i)
return pos
You should definitely use Python built-in functions. For instance, instead of position(list, value) you could use comprehension
[n for n,x in enumerate(dna) if x == 'A']
Finding a bigram could be reduced to the above if you consider pairs of letters:
[n for n,x in enumerate(zip(dna[:-1], dna[1:])) if x==('A','T')]
If instead you want to find the positions of either 'A' or 'T', you could just specify that as the condition
[n for n,x in enumerate(dna) if x in ('A', 'T')]
Python will efficiently find a substring of a string starting from any point.
def positions(dnalist, substr):
dna = "".join(dnalist) # make single string
st = 0
pos = []
while True:
a_pos = dna.find(substr, st)
if a_pos < 0:
return pos
pos.append(a_pos)
st = a_pos + 1
Test usage:
>>> testdna = ['A', 'G', 'C', 'G', 'T', 'A', 'G', 'T', 'C', 'G', 'A', 'T', 'C', 'A', 'A', 'T', 'T', 'A', 'T', 'A', 'C', 'G', 'A', 'T', 'C', 'G', 'G', 'G', 'T', 'A', 'T']
>>> positions(testdna, "AT")
[10, 14, 17, 22, 29]
I have the following list:
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
I would like to sort the list such that every sixth element comes after the fifth value, eleventh after the second, second after the third, so on. The list should be of the following output:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
What I tried so far?
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
new_lst = [lst[0], lst[5], lst[10], lst[1], lst[6], lst[11], lst[2], lst[7], lst[12], lst[3], lst[8], lst[13] , lst[4], lst[9], lst[14]]
new_lst
This provides the desired output, but I am looking for an optimal script. How do I do that?
From the pattern, reshape as 2d then transpose and flatten
sum is convenient function where you can mention start point, in this case the identity is () or [], depending on type
### sol 1
import numpy as np
print('Using numpy')
x = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
np.array(x).reshape((-1, 5)).transpose().reshape(-1)
# array(['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O'], dtype='<U1')
# Sol 2
print('One more way without numpy')
list(
sum(
zip(x[:6], x[5:11], x[10:]),
()
)
)
# Sol 3
print('One more way without numpy')
sum(
[list(y) for y in zip(x[:6], x[5:11], x[10:])],
[]
)
# Sol 4
print('One more way without numpy')
list(
sum(
[y for y in zip(x[:6], x[5:11], x[10:])],
()
)
)
You can also use list comprehension if you want to avoid libraries:
lst = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
[x for t in zip(lst[:6], lst[5:11], lst[10:]) for x in t]
# ['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
If you want it repeating for every fifth and tenth element after current, then it would be
# Must consist of at least 14 values
input_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
output_list = []
for i in range(len(t) // 3):
output_list.append(t[i])
output_list.append(t[i + 5])
output_list.append(t[i + 10])
print(output_list)
No libraries used. It will give the desired result:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
I have a list of lists. I want to find all flat lists that keeps the order of each sublist. As an example, let's say I have a list of lists like this:
ll = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
It is trivial to get one solution. I managed to write the following code which can generate a random flat list that keeps the order of each sublist.
import random
# Flatten the list of lists
flat = [x for l in ll for x in l]
# Shuffle to gain randomness
random.shuffle(flat)
for l in ll:
# Find the idxs in the flat list that belongs to the sublist
idxs = [i for i, x in enumerate(flat) if x in l]
# Change the order to match the order in the sublist
for j, idx in enumerate(idxs):
flat[idx] = l[j]
print(flat)
This can generate flat lists that looks as follows:
['F', 'D', 'O', 'C', 'A', 'G', 'I', 'S', 'T', 'H']
['C', 'D', 'F', 'O', 'G', 'I', 'S', 'A', 'T', 'H']
['C', 'D', 'O', 'G', 'F', 'I', 'S', 'A', 'T', 'H']
['F', 'C', 'D', 'I', 'S', 'A', 'H', 'O', 'T', 'G']
As you can see, 'A' always appears after 'C', 'T' always appears after 'A', 'O' always appears after 'D', and so on...
However, I want to get all possible solutions.
Please note that :
I want a general code that works for any given list of lists, not just for "dog cat fish";
It does not matter whether there are duplicants or not because every item is distinguishable.
Can anyone suggest a fast Python algorithm for this?
Suppose you are combining the lists by hand. Instead of shuffling and putting things back in order, you would select one list and take its first element, then again select a list and take its first (unused) element, and so on. So the algorithm you need is this: What are all the different ways to pick from a collection of lists with these particular sizes?
In your example you have lists of length 3, 3, 4; suppose you had a bucket with three red balls, three yellow balls and four green balls, which orderings are possible? Model this, and then just pick the first unused element from the corresponding list to get your output.
Say what? For your example, the (distinct) pick orders would be given by
set(itertools.permutations("RRRYYYGGGG"))
For any list of lists, we'll use integer keys instead of letters. The pick orders are:
elements = []
for key, lst in enumerate(ll):
elements.extend( [ key ] * len(lst))
pick_orders = set(itertools.permutations(elements))
Then you just use each pick order to present the elements from your list of lists, say with pop(0) (from a copy of the lists, since pop() is destructive).
Yet another solution, but this one doesn't use any libraries.
def recurse(lst, indices, total, curr):
done = True
for l, (pos, index) in zip(lst, enumerate(indices)):
if index < len(l): # can increment index
curr.append(l[index]) # add on corresponding value
indices[pos] += 1 # increment index
recurse(lst, indices, total, curr)
# backtrack
indices[pos] -= 1
curr.pop()
done = False # modification made, so not done
if done: # no changes made
total.append(curr.copy())
return
def list_to_all_flat(lst):
seq = [0] * len(lst) # set up indexes
total, curr = [], []
recurse(lst, seq, total, curr)
return total
if __name__ == "__main__":
lst = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
print(list_to_all_flat(lst))
Try:
from itertools import permutations, chain
ll = [["D", "O", "G"], ["C", "A", "T"], ["F", "I", "S", "H"]]
x = [[(i1, i2, o) for i2, o in enumerate(subl)] for i1, subl in enumerate(ll)]
l = sum(len(subl) for subl in ll)
def is_valid(c):
seen = {}
for i1, i2, _ in c:
if i2 != seen.get(i1, -1) + 1:
return False
else:
seen[i1] = i2
return True
for c in permutations(chain(*x), l):
if is_valid(c):
print([o for *_, o in c])
Prints:
['D', 'O', 'G', 'C', 'A', 'T', 'F', 'I', 'S', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'T', 'I', 'S', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'I', 'T', 'S', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'T', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'H', 'T']
['D', 'O', 'G', 'C', 'F', 'A', 'T', 'I', 'S', 'H']
['D', 'O', 'G', 'C', 'F', 'A', 'I', 'T', 'S', 'H']
['D', 'O', 'G', 'C', 'F', 'A', 'I', 'S', 'T', 'H']
...
['F', 'I', 'S', 'H', 'C', 'D', 'A', 'O', 'T', 'G']
['F', 'I', 'S', 'H', 'C', 'D', 'A', 'T', 'O', 'G']
['F', 'I', 'S', 'H', 'C', 'A', 'D', 'O', 'G', 'T']
['F', 'I', 'S', 'H', 'C', 'A', 'D', 'O', 'T', 'G']
['F', 'I', 'S', 'H', 'C', 'A', 'D', 'T', 'O', 'G']
['F', 'I', 'S', 'H', 'C', 'A', 'T', 'D', 'O', 'G']
You can use a recursive generator function:
ll = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
def get_combos(d, c = []):
if not any(d) and len(c) == sum(map(len, ll)):
yield c
elif any(d):
for a, b in enumerate(d):
for j, k in enumerate(b):
yield from get_combos(d[:a]+[b[j+1:]]+d[a+1:], c+[k])
print(list(get_combos(ll)))
Output (first ten permutations):
[['D', 'O', 'G', 'C', 'A', 'T', 'F', 'I', 'S', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'T', 'I', 'S', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'I', 'T', 'S', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'T', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'H', 'T'], ['D', 'O', 'G', 'C', 'F', 'A', 'T', 'I', 'S', 'H'], ['D', 'O', 'G', 'C', 'F', 'A', 'I', 'T', 'S', 'H'], ['D', 'O', 'G', 'C', 'F', 'A', 'I', 'S', 'T', 'H'], ['D', 'O', 'G', 'C', 'F', 'A', 'I', 'S', 'H', 'T'], ['D', 'O', 'G', 'C', 'F', 'I', 'A', 'T', 'S', 'H']]
For simplicity, let's start with two list item in a list.
from itertools import permutations
Ls = [['D', 'O', 'G'], ['C', 'A', 'T']]
L_flattened = []
for L in Ls:
for item in L:
L_flattened.append(item)
print("L_flattened:", L_flattened)
print(list(permutations(L_flattened, len(L_flattened))))
[('D', 'O', 'G', 'C', 'A', 'T'), ('D', 'O', 'G', 'C', 'T', 'A'), ('D', 'O', 'G', 'A', 'C', 'T'), ('D', 'O', 'G', 'A', 'T', 'C'), ('D', 'O', 'G', 'T', 'C', 'A'), ('D', 'O', 'G', 'T', 'A', 'C'), ('D', 'O', 'C', 'G', 'A', 'T'),
('D', 'O', 'C', 'G', 'T', 'A'), ('D', 'O', 'C', 'A', 'G', 'T'), ('D', 'O', 'C', 'A', 'T', 'G'),
...
Beware that permutations grow very quickly in their sizes.
In your example there are 10 items and Permutation(10, 10) = 3628800.
I suggest you to calculate permutation here to get an idea before running actual code (which may cause memory error/freeze/crash in system).
You can try verifying all possible permutations:
import random
import itertools
import numpy as np
ll = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
flat = [x for l in ll for x in l]
all_permutations = list(itertools.permutations(flat))
good_permutations = []
count = 0
for perm in all_permutations:
count += 1
cond = True
for l in ll:
idxs = [perm.index(x) for i, x in enumerate(flat) if x in l]
# check if ordered
if not np.all(np.diff(np.array(idxs)) >= 0):
cond = False
break
if cond == True:
good_permutations.append(perm)
if count >= 10000:
break
print(len(good_permutations))
It is only a basic solution as it is really slow to compute (I set the count to limit the number of permutations that are verified).
This question already has answers here:
How do I split a list into equally-sized chunks?
(66 answers)
Closed 6 years ago.
I'm trying to create and initialize a matrix. Where I'm having an issue is that each row of my matrix I create is the same, rather than moving through the data set.
I've tried to correct it by checking if the value was already in the matrix and that didn't solve my problem.
def createMatrix(rowCount, colCount, dataList):
mat = []
for i in range (rowCount):
rowList = []
for j in range (colCount):
if dataList[j] not in mat:
rowList.append(dataList[j])
mat.append(rowList)
return mat
def main():
alpha = ['a','b','c','d','e','f','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
mat = createMatrix(5,5,alpha)
print (mat)
The output should be like this:
['a','b','c','d','e'] , ['f','h','i','j','k'], ['l','m','n','o','p'] , ['q','r','s','t','u'], ['v','w','x','y','z']
My issue is I just keep getting the first a,b,c,d,e list for all 5 lists returned
You need to keep track of the current index in your loop.
Essentially you want to turn a list like 0,1,2,3,4,....24 (these are the indices of your initial array, alpha) into:
R1C1, R1C2, R1C3, R1C4, R1C5
R2C1, R2C2... etc
I added the logic to do this the way you are currently doing it:
def createMatrix(rowCount, colCount, dataList):
mat = []
for i in range(rowCount):
rowList = []
for j in range(colCount):
# you need to increment through dataList here, like this:
rowList.append(dataList[rowCount * i + j])
mat.append(rowList)
return mat
def main():
alpha = ['a','b','c','d','e','f','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
mat = createMatrix(5,5,alpha)
print (mat)
main()
which then prints out:
[['a', 'b', 'c', 'd', 'e'], ['f', 'h', 'i', 'j', 'k'], ['l', 'm', 'n', 'o', 'p'], ['q', 'r', 's', 't', 'u'], ['v', 'w', 'x', 'y', 'z']]
The reason you were always receiving a,b,c,d,e is because when you write this:
rowList.append(dataList[j])
what it is effectively doing is it is iterating 0-4 for every row. So basically:
i = 0
rowList.append(dataList[0])
rowList.append(dataList[1])
rowList.append(dataList[2])
rowList.append(dataList[3])
rowList.append(dataList[4])
i = 1
rowList.append(dataList[0]) # should be 5
rowList.append(dataList[1]) # should be 6
rowList.append(dataList[2]) # should be 7
rowList.append(dataList[3]) # should be 8
rowList.append(dataList[4]) # should be 9
etc.
You can use a list comprehension:
>>> li= ['a','b','c','d','e','f','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
>>> [li[i:i+5] for i in range(0,len(li),5)]
[['a', 'b', 'c', 'd', 'e'], ['f', 'h', 'i', 'j', 'k'], ['l', 'm', 'n', 'o', 'p'], ['q', 'r', 's', 't', 'u'], ['v', 'w', 'x', 'y', 'z']]
Or, if you don't mind tuples, use zip:
>>> zip(*[iter(li)]*5)
[('a', 'b', 'c', 'd', 'e'), ('f', 'h', 'i', 'j', 'k'), ('l', 'm', 'n', 'o', 'p'), ('q', 'r', 's', 't', 'u'), ('v', 'w', 'x', 'y', 'z')]
Or apply list to the tuples:
>>> map(list, zip(*[iter(li)]*5))
[['a', 'b', 'c', 'd', 'e'], ['f', 'h', 'i', 'j', 'k'], ['l', 'm', 'n', 'o', 'p'], ['q', 'r', 's', 't', 'u'], ['v', 'w', 'x', 'y', 'z']]
What is the simplest method for replacing one item in a list with two?
So:
list=['t' , 'r', 'g', 'h', 'k']
if I wanted to replace 'r' with 'a' and 'b':
list = ['t' , 'a' , 'b', 'g', 'h', 'k']
It can be done fairly easily with slice assignment:
>>> l = ['t' , 'r', 'g', 'h', 'k']
>>>
>>> pos = l.index('r')
>>> l[pos:pos+1] = ('a', 'b')
>>>
>>> l
['t', 'a', 'b', 'g', 'h', 'k']
Also, don't call your variable list, since that name is already used by a built-in function.
In case list contains more than 1 occurrences of 'r' then you can use a list comprehension or itertools.chain.from_iterable with a generator expression.But, if list contains just one such item then for #arshajii's solution.
>>> lis = ['t' , 'r', 'g', 'h', 'k']
>>> [y for x in lis for y in ([x] if x != 'r' else ['a', 'b'])]
['t', 'a', 'b', 'g', 'h', 'k']
or:
>>> from itertools import chain
>>> list(chain.from_iterable([x] if x != 'r' else ['a', 'b'] for x in lis))
['t', 'a', 'b', 'g', 'h', 'k']
Here's an overcomplicated way to do it that splices over every occurrence of 'r'. Just for fun.
>>> l = ['t', 'r', 'g', 'h', 'r', 'k', 'r']
>>> reduce(lambda p,v: p + list('ab' if v=='r' else v), l, [])
['t', 'a', 'b', 'g', 'h', 'a', 'b', 'k', 'a', 'b']
Now go upvote one of the more readable answers. :)