How to group similar sequence preserving order in Python? [duplicate] - python

This question already has an answer here:
How can I group equivalent items together in a Python list?
(1 answer)
Closed 3 years ago.
I want to split a list sequence of items in Python or group them if they are similar.
I already found a solution but I would like to know if there is a better and more efficient way to do it (always up to learn more).
Here is the main goal
input = ['a','a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
desired_ouput = [['a','a'], ['i'], ['e','e', 'e'], ['i', 'i'], ['a', 'a']
So basically I choose to group by similar neighbour.I try to find a way to split them if different but get no success dooing it.
I'm also keen on listening the good way to expose the problem
#!/usr/bin/env python3
def group_seq(listA):
listA = [[n] for n in listA]
for i,l in enumerate(listA):
_curr = l
_prev = None
_next= None
if i+1 < len(listA):
_next = listA[i+1]
if i > 0:
_prev = listA[i-1]
if _next is not None and _curr[-1] == _next[0]:
listA[i].extend(_next)
listA.pop(i+1)
if _prev is not None and _curr[0] == _prev[0]:
listA[i].extend(_prev)
listA.pop(i-1)
return listA
listA = ['a','a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
output = group_seq(listA)
print(listA)
['a', 'a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
print(output)
[['a', 'a'], ['i'], ['e', 'e', 'e'], ['i', 'i'], ['a', 'a']]

I think itertool.groupby is probably the nicest way to do this. It's flexible and efficient enough that it's rarely to your advantage to re-implement it yourself:
from itertools import groupby
inp = ['a','a', 'i', 'e', 'e', 'e', 'i', 'i', 'a', 'a']
output = [list(g) for k,g in groupby(inp)]
print(output)
prints
[['a', 'a'], ['i'], ['e', 'e', 'e'], ['i', 'i'], ['a', 'a']]
If you do implement it yourself, it can probably be much simpler. Just keep track of the previous value and the current list you're appending to:
def group_seq(listA):
prev = None
cur = None
ret = []
for l in listA:
if l == prev: # assumes list doesn't contain None
cur.append(l)
else:
cur = [l]
ret.append(cur)
prev = l
return ret

Related

Python recursion systematic ordering

I wrote my code and it's working perfectly but the output doesn't really look good. I was it to look more presentable/systematic. How do I do that? This is the kind of result I'm currently getting:
and this is the type of result I want:
This code is basically to find permutations of whatever is inputted.
def permutations(aSet):
if len(aSet) <= 1: return aSet
all_perms = []
first_element = aSet[0:1]
subset = aSet[1:]
partial = permutations(subset)
for permutation in partial:
for index in range(len(aSet)):
new_perm = list(permutation[:index])
new_perm.extend(first_element)
new_perm.extend(permutation[index:])
all_perms.append(new_perm)
return all_perms
I can't figure out what to try.
You can sort the output array with a custom key function. Here keyFunc converts a permutaiton (list of characters) into a single string to perform lexicographic sorting.
from pprint import pprint
# insert your function here
def keyFunc(char_list):
return ''.join(char_list)
chars = list('dog')
permutation = permutations(chars)
permutation.sort(key=keyFunc)
pprint(permutation)
Output:
[['d', 'g', 'o'],
['d', 'o', 'g'],
['g', 'd', 'o'],
['g', 'o', 'd'],
['o', 'd', 'g'],
['o', 'g', 'd']]
Here's a way to order the permutations differently: for each item in the input array, take it out of the array, find all permutations of the remaining subarray, then prepend this item to each permutation of this subarray. This has the effect of placing permutations with similar prefixes together.
from pprint import pprint
def permutations2(chars):
if len(chars) <= 1: return [chars]
all_perms = []
for idx, char in enumerate(chars):
subarr = chars[:idx] + chars[idx+1:]
subperms = permutations2(subarr)
for subperm in subperms:
new_perm = [char] + subperm
all_perms.append(new_perm)
return all_perms
chars = list('dog')
pprint(permutations2(chars))
Result:
[['d', 'o', 'g'],
['d', 'g', 'o'],
['o', 'd', 'g'],
['o', 'g', 'd'],
['g', 'd', 'o'],
['g', 'o', 'd']]

Remove characters from a list with for loop [duplicate]

This question already has answers here:
How to remove items from a list while iterating?
(25 answers)
Closed 2 years ago.
I'm having trouble understanding how 'for loop' works in Python. I want to remove a character from a list using for loop to iterate through the list but the output is not as expected.
In the following code I want to remove the character 'e':
lista = ['g', 'e', 'e', 'k', 'e','s', 'e', 'e']
for x in lista:
if x == 'e':
lista.remove(x)
print(lista)
It prints ['g', 'k', 's', 'e', 'e'] when I was expecting ['g', 'k', 's'].
Thank you.
You cannot remove things from a list when you iterate over it. This is because when you remove an item from the list it shrinks. So what's happening is that when you encounter an 'e', the list is shrunk and you go to the next item in the list. But since the list shrunk, you're actually jumping over an item.
To solve your problem, you have to iterate over copy of your list.
lista = ['g', 'e', 'e', 'k', 'e','s', 'e', 'e']
for x in lista.copy():
if x == 'e':
lista.remove(x)
print(lista)
You can use the following code
lista = ['g', 'e', 'e', 'k', 'e','s', 'e', 'e']
for i in range(0, len(lista)):
element = lista[i]
if element == 'e':
del lista[i]
The above approach will modify the original list.
A far more simpler and better way is as follows:
list(filter(('e').__ne__, lista))
Both the methods return
['g', 'k', 's']
A solution to your problem may be this :
while 1:
try:
lista.remove("e")
except ValueError:
break
The simple list comprehension. The idea is to not update the list while iterating.
lista = ['g', 'e', 'e', 'k', 'e','s', 'e', 'e']
lista = [i for i in lista if i!='e']
I think the most pythonic way is to do it using list comprehension as shown below:
lista = ['g', 'e', 'e', 'k', 'e','s', 'e', 'e']
lista = [x for x in lista if x != 'e']
print(lista)
The reason why your method is not working is because you cannot remove items from a list whilst iterating over it, as you will be changing the indexes of each object in the list.
When you remove an item from a List it gets updated. Therefore shrinking it.
Thus, in your case when first two of the e's are removed, last two elements are not taken into consideration.
What you need to do is to check if the the element still exists-
lista = ['g', 'e', 'e', 'k', 'e', 's', 'e', 'e']
while 'e' in lista:
lista.remove('e')
print(lista)
UPDATE:
As #mad_ pointed out, you can reduce the complexity by-
lista = ['g', 'e', 'e', 'k', 'e', 's', 'e', 'e']
print([i for i in lista if i != 'e'])

How to split a list up

I have a list that I called lst, it is as follows:
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
What i want to know is how to split this up into four letter strings which start with the first, second, third, and fourth letters; then move to the second, third, fourth and fifth letters and so on and then add it to a new list to be compared to a main list.
Thanks
To get the first sublist, use lst[0:4]. Use python's join function to merge it into a single string. Use a for loop to get all the sublists.
sequences = []
sequence_size = 4
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
for i in range(len(lst) - sequence_size + 1):
sequence = ''.join(lst[i : i + sequence_size])
sequences.append(sequence)
print(sequences)
All 4-grams (without padding):
# window size:
ws = 4
lst2 = [
''.join(lst[i:i+ws])
for i in range(0, len(lst))
if len(lst[i:i+ws]) == 4
]
Non-overlapping 4-grams:
lst3 = [
''.join(lst[i:i+ws])
for i in range(0, len(lst), ws)
if len(lst[i:i+ws]) == 4
]
I think the other answers solve your problem, but if you are looking for a pythonic way to do this, I used List comprehension. It is very recommended to use this for code simplicity, although sometimes diminish code readability. Also it is quite shorter.
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
result = [''.join(lst[i:i+4]) for i in range(len(lst)-3)]
print(result)
Use:
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
i=0
New_list=[]
while i<(len(lst)-3):
New_list.append(lst[i]+lst[i+1]+lst[i+2]+lst[i+3])
i+=1
print(New_list)
Output:
['ACTG', 'CTGA', 'TGAC', 'GACG', 'ACGC', 'CGCA', 'GCAG']

Search for a char in list of lists

I have a list of lists, and I want to return those sublists that have a specific char.
If the list is:
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
I want to retrive ['g', 'j'] "or it's position" if I search using j or g
Try this:-
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def search(search_char):
result = [x for x in lst if search_char in x]
return result
print(search('g'))
For a start there is a keyword error in your variable - list is a keyword, try my_list.
This works for returning the list you want:
#Try this
my_list = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def check_for_letter(a_list,search):
for i in a_list[:]:
if search in a_list[0]:
return a_list[0]
else:
a_list[0] = a_list[1]
Session below:
>>> check_for_letter(my_list,"j")
['g', 'j']
>>>
This is one way. It works even for repeats.
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def searcher(lst, x):
for i in range(len(lst)):
if x in lst[i]:
yield i
list(searcher(lst, 'g')) # [1]
list(map(lst.__getitem__, searcher(lst, 'g'))) # [['g', 'j']]
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
spec_char = input("What character do you want to find?: ")#ask for a character to find
def find_char(spec_char):
for list_count, each_list in enumerate(lst): #iterates through each list in lst
if spec_char in each_list: #checks if specified character is in each_list
return spec_char, lst[list_count] #returns both the character and list that contains the character
def main(): #good habit for organisation
print(find_char(spec_char)) #print the returned value of find_char
if __name__ == '__main__':
main()
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def search(spec_char):
for subList in lst:
if spec_char in subList:
return subList
return False
print search('g')
>>> ['g', 'j']
y=[['a','b'],['c','d'],['e','f'],['f']]
result=[x for x in y if 'f' in x])
here I took 'f' as the character to be searched
Alternatively, we can also use the lambda and the filter functions.
The basics for lambda and filter function can be found in python documentation.
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
ch = input('Enter the character') # or directly type the character in ''
# lambda <parameter to the function>: <value to be returned>
# filter(<lambda function to check for the condition>, <sequence to iterate over>)
r = list(filter(lambda lst: ch in lst,lst))
print(r)
Note: To see the value returned by the lambda and the filter functions, I am storing the result in a list and printing the final output.
Below is the solution and explanation for your question. Please view the image at the bottom of this answer for further clarification and to view the output.
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']] #So this is your list
character=input("Enter the character: ") #The character you are searching for
for a in lst: #it means that variable [a] is an item in the list [lst]
for b in a: #it means that variable [b] is an item in the list [a]
if(b==character): #To check if the given character is present in the list
print(a) #Finally to print the list in which the given character is present
So, the code part is over. Now, let's look what the output will be.
C:\Users\User\Desktop\python>coc.py
Enter the character: a
['a', 'e']
C:\Users\User\Desktop\python>coc.py
Enter the character: w
['m', 'n', 'w']
Click here to view the image of my code and output

Sorting a list using an alphabet string

I'm trying to sort a list containing only lower case letters by using the string :
alphabet = "abcdefghijklmnopqrstuvwxyz".
that is without using sort, and with O(n) complexity only.
I got here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
new_list = []
length = len(lst)
for i in range(length):
new_list.insert(alphabet.index(lst[i]),lst[i])
print (new_list)
return new_list
for this input :
m = list("emabrgtjh")
I get this:
['e']
['e', 'm']
['a', 'e', 'm']
['a', 'b', 'e', 'm']
['a', 'b', 'e', 'm', 'r']
['a', 'b', 'e', 'm', 'r', 'g']
['a', 'b', 'e', 'm', 'r', 'g', 't']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
looks like something goes wrong along the way, and I can't seem to understand why.. if anyone can please enlighten me that would be great.
You are looking for a bucket sort. Here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
# Here, create the 26 buckets
new_list = [''] * len(alphabet)
for letter in lst:
# This is the bucket index
# You could use `ord(letter) - ord('a')` in this specific case, but it is not mandatory
index = alphabet.index(letter)
new_list[index] += letter
# Assemble the buckets
return ''.join(new_list)
As for complexity, since alphabet is a pre-defined fixed-size string, searching a letter in it is requires at most 26 operations, which qualifies as O(1). The overall complexity is therefore O(n)

Categories

Resources