How to split a list up - python

I have a list that I called lst, it is as follows:
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
What i want to know is how to split this up into four letter strings which start with the first, second, third, and fourth letters; then move to the second, third, fourth and fifth letters and so on and then add it to a new list to be compared to a main list.
Thanks

To get the first sublist, use lst[0:4]. Use python's join function to merge it into a single string. Use a for loop to get all the sublists.
sequences = []
sequence_size = 4
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
for i in range(len(lst) - sequence_size + 1):
sequence = ''.join(lst[i : i + sequence_size])
sequences.append(sequence)
print(sequences)

All 4-grams (without padding):
# window size:
ws = 4
lst2 = [
''.join(lst[i:i+ws])
for i in range(0, len(lst))
if len(lst[i:i+ws]) == 4
]
Non-overlapping 4-grams:
lst3 = [
''.join(lst[i:i+ws])
for i in range(0, len(lst), ws)
if len(lst[i:i+ws]) == 4
]

I think the other answers solve your problem, but if you are looking for a pythonic way to do this, I used List comprehension. It is very recommended to use this for code simplicity, although sometimes diminish code readability. Also it is quite shorter.
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
result = [''.join(lst[i:i+4]) for i in range(len(lst)-3)]
print(result)

Use:
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
i=0
New_list=[]
while i<(len(lst)-3):
New_list.append(lst[i]+lst[i+1]+lst[i+2]+lst[i+3])
i+=1
print(New_list)
Output:
['ACTG', 'CTGA', 'TGAC', 'GACG', 'ACGC', 'CGCA', 'GCAG']

Related

Index of a list item that occurs multiple times

I have the following code
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
for x in items:
print(x, end='')
print(items.index(x), end='')
## out puts: a0a0b2a0c4c4d6
I understand that python finds the first item in the list to index, but is it possible for me to get an output of a0a1b2a3c4c5d6 instead?
It would be optimal for me to keep using the for loop because I will be editing the list.
edit: I made a typo with the c indexes
And in case you really feel like doing it in one line:
EDIT - using .format or format-strings makes this shorter / more legible, as noted in the comments
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join("{}{}".format(e,i) for i,e in enumerate(items)))
For Python 3.7 you can do
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join(f"{e}{i}" for i, e in enumerate(items)))
ORIGINAL
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join((str(e) for item_with_index in enumerate(items) for e in item_with_index[::-1])))
Note that the reversal is needed (item_with_index[::-1]) because you want the items printed before the index but enumerate gives tuples with the index first.
I think you're looking for a0a1b2a3c4c5d6 instead.
for i, x in enumerate(items):
print("{}{}".format(x,i), end='')
Don't add or remove items from your list as you are traversing it. If you want the output specified, you can use enumerate to get the items and the indices of the list.
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
for idx, x in enumerate(items):
print("{}{}".format(x, idx), end='')
# outputs a0a1b2a3c4c5d6

How to group similar sequence preserving order in Python? [duplicate]

This question already has an answer here:
How can I group equivalent items together in a Python list?
(1 answer)
Closed 3 years ago.
I want to split a list sequence of items in Python or group them if they are similar.
I already found a solution but I would like to know if there is a better and more efficient way to do it (always up to learn more).
Here is the main goal
input = ['a','a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
desired_ouput = [['a','a'], ['i'], ['e','e', 'e'], ['i', 'i'], ['a', 'a']
So basically I choose to group by similar neighbour.I try to find a way to split them if different but get no success dooing it.
I'm also keen on listening the good way to expose the problem
#!/usr/bin/env python3
def group_seq(listA):
listA = [[n] for n in listA]
for i,l in enumerate(listA):
_curr = l
_prev = None
_next= None
if i+1 < len(listA):
_next = listA[i+1]
if i > 0:
_prev = listA[i-1]
if _next is not None and _curr[-1] == _next[0]:
listA[i].extend(_next)
listA.pop(i+1)
if _prev is not None and _curr[0] == _prev[0]:
listA[i].extend(_prev)
listA.pop(i-1)
return listA
listA = ['a','a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
output = group_seq(listA)
print(listA)
['a', 'a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
print(output)
[['a', 'a'], ['i'], ['e', 'e', 'e'], ['i', 'i'], ['a', 'a']]
I think itertool.groupby is probably the nicest way to do this. It's flexible and efficient enough that it's rarely to your advantage to re-implement it yourself:
from itertools import groupby
inp = ['a','a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
output = [list(g) for k,g in groupby(inp)]
print(output)
prints
[['a', 'a'], ['i'], ['e', 'e', 'e'], ['i', 'i'], ['a', 'a']]
If you do implement it yourself, it can probably be much simpler. Just keep track of the previous value and the current list you're appending to:
def group_seq(listA):
prev = None
cur = None
ret = []
for l in listA:
if l == prev: # assumes list doesn't contain None
cur.append(l)
else:
cur = [l]
ret.append(cur)
prev = l
return ret

Sorting a list using an alphabet string

I'm trying to sort a list containing only lower case letters by using the string :
alphabet = "abcdefghijklmnopqrstuvwxyz".
that is without using sort, and with O(n) complexity only.
I got here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
new_list = []
length = len(lst)
for i in range(length):
new_list.insert(alphabet.index(lst[i]),lst[i])
print (new_list)
return new_list
for this input :
m = list("emabrgtjh")
I get this:
['e']
['e', 'm']
['a', 'e', 'm']
['a', 'b', 'e', 'm']
['a', 'b', 'e', 'm', 'r']
['a', 'b', 'e', 'm', 'r', 'g']
['a', 'b', 'e', 'm', 'r', 'g', 't']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
looks like something goes wrong along the way, and I can't seem to understand why.. if anyone can please enlighten me that would be great.
You are looking for a bucket sort. Here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
# Here, create the 26 buckets
new_list = [''] * len(alphabet)
for letter in lst:
# This is the bucket index
# You could use `ord(letter) - ord('a')` in this specific case, but it is not mandatory
index = alphabet.index(letter)
new_list[index] += letter
# Assemble the buckets
return ''.join(new_list)
As for complexity, since alphabet is a pre-defined fixed-size string, searching a letter in it is requires at most 26 operations, which qualifies as O(1). The overall complexity is therefore O(n)

Start loop after certain element in list is reached

How do I start executing code in a for loop after a certain element in the list has been reached. I've got something that works, but is there a more pythonic or faster way of doing this?
list = ['a', 'b', 'c', 'd', 'e', 'f']
condition = 0
for i in list:
if i == 'c' or condition == 1:
condition = 1
print i
One way would to be to iterate over a generator combining dropwhile and islice:
from itertools import dropwhile, islice
data = ['a', 'b', 'c', 'd', 'e', 'f']
for after in islice(dropwhile(lambda L: L != 'c', data), 1, None):
print after
If you want including then drop the islice.
A little simplified code:
lst = ['a', 'b', 'c', 'd', 'e', 'f']
start_index = lst.index('c')
for i in lst[start_index:]:
print i

Python Remove SOME duplicates from a list while maintaining order?

I want to remove certain duplicates in my python list.
I know there are ways to remove all duplicates, but I wanted to remove only consecutive duplicates, while maintaining the list order.
For example, I have a list such as the following:
list1 = [a,a,b,b,c,c,f,f,d,d,e,e,f,f,g,g,c,c]
However, I want to remove the duplicates, and maintain order, but still keep the 2 c's and 2 f's, such as this:
wantedList = [a,b,c,f,d,e,f,g,c]
So far, I have this:
z = 0
j=0
list2=[]
for i in list1:
if i == "c":
z = z+1
if (z==1):
list2.append(i)
if (z==2):
list2.append(i)
else:
pass
elif i == "f":
j = j+1
if (j==1):
list2.append(i)
if (j==2):
list2.append(i)
else:
pass
else:
if i not in list2:
list2.append(i)
However, this method gives me something like:
wantedList = [a,b,c,c,d,e,f,f,g]
Thus, not maintaining the order.
Any ideas would be appreciated! Thanks!
Not completely sure if c and f are special cases, or if you want to compress consecutive duplicates only. If it is the latter, you can use itertools.groupby():
>>> import itertools
>>> list1
['a', 'a', 'b', 'b', 'c', 'c', 'f', 'f', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g', 'c', 'c']
>>> [k for k, g in itertools.groupby(list1)]
['a', 'b', 'c', 'f', 'd', 'e', 'f', 'g', 'c']
To remove consecutive duplicates from a list, you can use the following generator function:
def remove_consecutive_duplicates(a):
last = None
for x in a:
if x != last:
yield x
last = x
With your data, this gives:
>>> list1 = ['a','a','b','b','c','c','f','f','d','d','e','e','f','f','g','g','c','c']
>>> list(remove_consecutive_duplicates(list1))
['a', 'b', 'c', 'f', 'd', 'e', 'f', 'g', 'c']
If you want to ignore certain items when removing duplicates...
list2 = []
for item in list1:
if item not in list2 or item in ('c','f'):
list2.append(item)
EDIT: Note that this doesn't remove consecutive items
EDIT
Never mind, I read your question wrong. I thought you were wanting to keep only certain sets of doubles.
I would recommend something like this. It allows a general form to keep certain doubles once.
list1 = ['a','a','b','b','c','c','f','f','d','d','e','e','f','f','g','g','c','c']
doubleslist = ['c', 'f']
def remove_duplicate(firstlist, doubles):
newlist = []
for x in firstlist:
if x not in newlist:
newlist.append(x)
elif x in doubles:
newlist.append(x)
doubles.remove(x)
return newlist
print remove_duplicate(list1, doubleslist)
The simple solution is to compare this element to the next or previous element
a=1
b=2
c=3
d=4
e=5
f=6
g=7
list1 = [a,a,b,b,c,c,f,f,d,d,e,e,f,f,g,g,c,c]
output_list=[list1[0]]
for ctr in range(1, len(list1)):
if list1[ctr] != list1[ctr-1]:
output_list.append(list1[ctr])
print output_list
list1 = ['a', 'a', 'b', 'b', 'c', 'c', 'f', 'f', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g', 'c', 'c']
wantedList = []
for item in list1:
if len(wantedList) == 0:
wantedList.append(item)
elif len(wantedList) > 0:
if wantedList[-1] != item:
wantedList.append(item)
print(wantedList)
Fetch each item from the main list(list1).
If the 'temp_list' is empty add that item.
If not , check whether the last item in the temp_list is
not same as the item we fetched from 'list1'.
if items are different append into temp_list.

Categories

Resources