The last list elements and conditional loops

The last list elements and conditional loops - python

mylist="'a','b','c'"
count=0
i=0
while count< len(mylist):
if mylist[i]==mylist[i+1]:
print mylist[i]
count +=1
i +=1
Error:
File "<string>", line 6, in <module>
IndexError: string index out of range
I'm assuming that when it gets to the last (nth) element it can't find an n+1 to compare it to, so it gives me an error.
Interestingly, i think that I've done this before and not had this problem on a larger list: Here is an example (with credit to Raymond Hettinger for fixing it up)
list=['a','a','x','c','e','e','f','f','f']
i=0
count = 0
while count < len(list)-2:
if list[i] == list[i+1]:
if list [i+1] != list [i+2]:
print list[i]
i+=1
count +=1
else:
print "no"
count += 1
else:
i +=1
count += 1
For crawling through a list in the way I've attempted, is there any fix so that I don't go "out of range?" I plan to implement this on a very large list, where I'll have to check if "list[i]==list[i+16]", for example. In the future, I would like to add on conditions like "if int(mylist[i+3])-int(mylist[i+7])>10: newerlist.append[mylist[i]". So it's important that I solve this problem.
I thought about inserting a break statement, but was unsuccessful.
I know this is not the most efficient, but I'm at the point where it's what i understand best.

So it sounds like you are trying to compare elements in your list at various fixed offsets. perhaps something like this could help you:
for old, new in zip(lst, lst[n:]):
if some_cond(old, new):
do_work()
Explanation:
lst[n:] returns a copy of lst, starting from the nth (mind the 0-indexing) element
>>> lst = [1,2,2,3];
>>> lst[1:]
[2,2,3]
zip(l1, l2) creates a new list of tuples, with one element from each list
>>> zip(lst, lst[1:])
[(1, 2), (2, 2), (2, 3)]
Note that it stops as soon as either list runs out. in this case, the offset list runs out first.
for a list of tuples, you can "upack directly" in the loop variable, so
for old, new in zip(lst, lst[1:])
gives loops through the elements you want (pairs of successive elements in your list)

As a general idea, if you are trying to look ahead a certain number of places, you can do a few things:
In the loop check (I.e. count < length), you'll need to check on the max field. So in your example, you wanted to go 16 spaces. This would mean that you would need to check count < (length - 16). The downside is that your last elements (the last 16) won't be iterated over.
Check inside the loop to make sure the index is applicable. That is, on each if statement start with: if(I+16 < length && logic_you_want_to_check). This will allow you to continue through the loop, but when the logic will fail because its out of bounds, you won't error out.
Note- this probably isn't what you want, but ill add it for completeness. Wrap around your logic. This will only work if wrap arounds can be considered. If you literally want to check the 16th index ahead of your current index (I.e like a place in a line perhaps), then wrapping around doesn't really suit well. But if don't need that logic, and want to model your values in a circular pattern, you can modulus your index. That is: if array[i] == array [(i + 16)%length(array)] would check either 16 ahead or wrap around to the front of the array.

Edit:
Right, with the new information in the OP, this becomes much simpler. Use the itertools grouper() recipe to group the data for each person into tuples:
import itertools
def grouper(iterable, n, fillvalue=None):
"""Collect data into fixed-length chunks or blocks"""
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
data = ['John', 'Sally', '5', '10', '11', '4', 'John', 'Sally', '3', '7', '7', '10', 'Bill', 'Hallie', '4', '6', '2', '1']
grouper(data, 6)
Now your data looks like:
[
('John', 'Sally', '5', '10', '11', '4'),
('John', 'Sally', '3', '7', '7', '10'),
('Bill', 'Hallie', '4', '6', '2', '1')
]
Which should be easy to work with, by comparison.
Old Answer:
If you need to make more arbitrary links, rather than just checking continuous values:
def offset_iter(iterable, n):
offset = iter(iterable)
consume(offset, n)
return offset
data = ['a', 'a', 'x', 'c', 'e', 'e', 'f', 'f', 'f']
offset_3 = offset_iter(data, 3)
for item, plus_3 in zip(data, offset_3): #Naturally, itertools.izip() in 2.x
print(item, plus_3) #if memory usage is important.
Naturally, you would want to use semantically valid names. The advantage to this method is it works with arbitrary iterables, not just lists, and is efficient and readable, without any ugly, inefficient iteration by index. If you need to continue checking once the offset values have run out (for other conditions, say) then use itertools.zip_longest() (itertools.izip_longest() in 2.x).
Using the consume() recipe from itertools.
import itertools
import collections
def consume(iterator, n):
"""Advance the iterator n-steps ahead. If n is none, consume entirely."""
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(itertools.islice(iterator, n, n), None)
I would, however, greatly question if you need to re-examine your data structure in this case.
Original Answer:
I'm not sure what your aim is, but from what I gather you probably want itertools.groupby():
>>> import itertools
>>> data = ['a', 'a', 'x', 'c', 'e', 'e', 'f', 'f', 'f']
>>> grouped = itertools.groupby(data)
>>> [(key, len(list(items))) for key, items in grouped]
[('a', 2), ('x', 1), ('c', 1), ('e', 2), ('f', 3)]
You can use this to work out when there are (arbitrarily large) runs of repeated items. It's worth noting you can provide itertools.groupby() with a key argument that will group them based on any factor you want, not just equality.

If you adhere to "Practicality beats purity":
for idx, element in enumerate(yourlist[n:]):
if yourlist[idx] == yourlist[idx-n]
...
If you don't care about memory efficiency go for second's answer. If you want the purest answer then go for Lattyware's one.

Related

How to wrap a string or an array around and slice the wrapped string or array in Python?

Before anything: I did read Wrapping around a python list as a slice operation and wrapping around slices in Python / numpy
This question is not a duplicate of any of those two questions simply because this question is a totally different question. So stop downvoting it and do not mark it as a duplicate. In the first mentioned thread, the "wrap" there means something different. For the second mentioned thread, they dealt with ndarray and can only work for integers only.
Real question:
How to slice a string or an array from a point to another point with an end between them?
Essentially, we want to do something like this,
n = whatever we want
print(string[n-5:n+6])
The above code may look normal. But it doesn't work near the edges (near the beginning of the string/array or the end of the string/array). Because Python's slicing doesn't allow slicing through the end of the array and continuing from the beginning. What if n is smaller than 5 or length of string longer than n+6?
Here's a better example, consider that we have
array = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
We want to print an element with its nearest two neighbors in string for all elements in an array
print("Two neighbors:")
for i, x in enumerate(array):
print(array[i-1] + array[i] + array[(i+1)%len(array)])
Output:
Two neighbors:
kab
abc
bcd
cde
def
efg
fgh
ghi
hij
ijk
jka
So far so good, let's do it with four neighbors.
print("Four neighbors:")
for i, x in enumerate(array):
print(array[i-2] + array[i-1] + array[i] + array[(i+1)%len(array)] + array[(i+2)%len(array)])
Output:
Four neighbors:
jkabc
kabcd
abcde
bcdef
cdefg
defgh
efghi
fghij
ghijk
hijka
ijkab
You can see where this is going, as the desired number of neighbors grow, the number of times we must type them out one by one increases.
Is there a way instead of s[n-3]+s[n-2]+s[n-1]+s[n]+s[n+1]+s[n+2]+s[n+3], we can do something like s[n-3:n+4]?
Note that s[n-3:n]+s[n:(n+4)%len(s)] doesn't work at the edges.
NOTE:
For the particular example above, it is possible to do a 3*array or add a number of elements to the front and to the back to essentially "pad" it.
However, this type of answer cost a bit of memory AND cannot work when we want to wrap it many folds around.
Consider the following,
# len(string) = 10
# n = 0 or any number we want
print(string[n-499:n+999])
If the start and end indices can be flexible instead of mirroring each other(eg. string[n-2:n+9] instead of string[n-3:n+4]), it is even better.

A solution which doesn't use an excessive amount of memory is as follows
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
def get_sequences(a_list, sequence_length):
sequences = []
for i in range(len(my_list)):
sequences.append("".join(str(my_list[(x + i) % len(my_list)]) for x in range(sequence_length)))
return sequences
print(get_sequences(my_list, 2))
print(get_sequences(my_list, 3))
will output
['12', '23', '34', '45', '56', '67', '78', '89', '91']
['123', '234', '345', '456', '567', '678', '789', '891', '912']
This is nice because it utilizes a generator everywhere that it can.

This could give ideas. The only thing to check is the order in your interval. Works with any n.
array = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
def print_neighbors(n_neighbors):
for idx in range(len(array)):
start = (idx- n_neighbors//2) % len(array)
end = (idx+n_neighbors//2) % len(array) + 1
if start > end:
print(''.join(array[start:] + array[:end]))
else:
print(''.join(array[start:end]))
>>> print_neighbors(6)
ijkabcd
jkabcde
kabcdef
abcdefg
bcdefgh
cdefghi
defghij
efghijk
fghijka
ghijkab
hijkabc

You could create a class to wrap your original iterable like this:
class WrappingIterable():
def __init__(self, orig):
self.orig=orig
def __getitem__(self, index):
return self.orig[index%len(self.orig)]
def __len__(self):
return len(self.orig)
>>> w = WrappingIterable("qwerty")
>>> for i in range(-2, 8):
... print(w[i])
t
y
q
w
e
r
t
y
q
w

For this particular issue you can use a snippet like this:
def print_neighbors(l, n):
wrapped = l[-(n//2):] + l + l[:(n//2)]
for i in range(len(l)):
print(''.join(wrapped[i:i+n+1]))
l = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
print_neighbors(l, 2)
print_neighbors(l, 4)
Hope it makes sense!

Comparing elements between two lists of lists in python

List A:
[('Harry', 'X', 'A'),
('James', 'Y', 'G'),
('John', 'Z', 'D')]
List B:
[('Helen', '2', '(A; B)', '3'),
('Victor', '9', '(C; D; E)', '4'),
('Alan', '10', '(A)', '57'),
('Paul', '11', '(F; B)', '43'),
('Sandra', '12', '(F)', '31')]
Basically I have to compare the third element (for x in listA -> x[2]) from list A and check if is there any list in list B that has the same element (for y in listB, x[2] == y[2]) but I'm just losing my mind with this.
My idea was to get the third element from each list in list B, put them into a new list, and then remove that ";" so I could access each element way more easily.
for x in listB:
j = x[2]
j = j.strip().split(', ')
for k in j:
FinalB.append(k)
FinalB = [(k[1:-1].split(";")) for k in FinalB]
Then I'd take the third element from each list of list A and compare them with the elements inside each list of FinalB: if there was a match, I'd get the index of the element in FinalB (the one that's matched), use that index to access his list in listB and get the first element of his list inside list B (basically, I have to know the names from the users inside each list that have the same 3rd element)
My code so far:
FinalB= []
DomainsList = []
for x in listA:
j = x[2]
j = j.strip().split(', ')
for k in j:
FinalB.append(k)
FinalB = [(k[1:-1].split(";")) for k in FinalB]
for y in listA:
for z in FinalB:
for k in z:
if y[2] == k:
m = FinalB.index(z)
DomainsList.append([listA[m][0],listB[m][0]])
return DomainsList
Yes, this is not working (no error, I probably just did this in an absolute wrong way) and I can't figure out what and where I'm doing wrong.

First, I think a better way to handle '(C; D; E)' is to change it to 'CDE', so the first loop becomes:
FinalB = [filter(str.isalpha, x[2]) for x in listB]
We take each string and keep only the alpha characters, so we end up with:
In [18]: FinalB
Out[18]: ['AB', 'CDE', 'A', 'FB', 'F']
This means we can use listA[x][2] in FinalB[y] to test if we have a match:
for y in listA:
for z in FinalB:
if y[2] in z:
DomainsList.append([y[0], listB[FinalB.index(z)][0]])
I had to tweak the arguments to the append() to pick the right elements, so we end up with:
In [17]: DomainsList
Out[17]: [['Harry', 'Helen'], ['Harry', 'Alan'], ['John', 'Victor']]
Usefully, if instead of '(C; D; E)' you have '(foo; bar; baz)', then with just one tweak the code can work for that too:
import re
FinalB = [filter(None, re.split("[; \(\)]+", x[2])) for x in listB]
The remaining code works as before.

It will always help to start a question with context and details.
The python version could also come into play.
The data structure you have given for us to work with is very questionable - especially the third element in each of the tuples in listB...why have a string element and then define it like this '(C; D; E)' ??
Even though I don't understand where you are coming from with this or what this is meant to achieve,no context provided in post, this code should get you there.
It will give you a list of tupples ( listC ), with each tuple having two elements. Element one having the name from listA and element 2 having the name from listB where they have a match as described in post.
NOTE: at the moment the match is simply done with a find, which will work perfectly with the provided details, however you may need to change this to be suitable for your needs if you could have data that would cause false positives or if you want to ignore case.
listA = [('Harry', 'X', 'A'), ('James', 'Y', 'G'), ('John', 'Z', 'D')]
listB = [('Helen', '2', '(A; B)', '3'),
('Victor', '9', '(C; D; E)', '4'),
('Alan', '10', '(A)', '57'),
('Paul', '11', '(F; B)', '43'),
('Sandra', '12', '(F)', '31')]
listC = []
for a in listA:
for b in listB:
if b[2].find(a[2]) != -1:
listC.append((a[0], b[0]))
print(listC)
This gives you.
[('Harry', 'Helen'), ('Harry', 'Alan'), ('John', 'Victor')]

Convert 5A2B4C11G string to [(5,"A"),(2,"B"),(4,"C"),(11,"G")] in Python

The title pretty much says it all. I have a small run-length decoding script:
def RLdecode(characterList):
decodedString = ""
for character, count in characterList:
decodedString += character.upper() * count
return decodedString
That script requires a list (or whatever this is) that looks like:
[(5,"A"),(2,"B"),(4,"C"),(11,"G")]
But in order to make it more user-friendly, I want the user to be able to input a string like this:
"5A2B4C11G"
How would I convert a string like the one above into a list readable by my script? Also, sorry that the title of the question is very specific, but I don't know what the process is called :\

using itertools.groupby:
There's a nice way to do the letter/digit grouping using itertools.groupby:
import itertools
a="5A2B4C11G"
result = [("".join(v)) for k,v in itertools.groupby(a,str.isdigit)]
that returns ['5', 'A', '2', 'B', '4', 'C', '11', 'G']
Unfortunately, it flattens the number/letter tuple, so more work is required. Note that applying Kaushik solution to that input gives expected result now that the number/letter is properly done:
[(int(result[i]),result[i+1]) for i in range(0,len(result),2)]
result:
[(5, 'A'), (2, 'B'), (4, 'C'), (11, 'G')]
using regexes:
Anyway, in that case, regular expressions are well suited to extract the patterns with the required hierarchy.
Just match the string using 1 or more digits + a letter, and convert the obtained tuples to match the (integer, string) format, using a list comprehension to do so, in one line.
import re
a="5A2B4C11G"
result = [(int(i),v) for i,v in re.findall('(\d+)([A-Z])',a)]
print(result)
gives:
[(5, 'A'), (2, 'B'), (4, 'C'), (11, 'G')]

Using list comprehension :
#s is the string
[(int(s[i]),s[i+1]) for i in range(0,len(s),2)]
#driver values
IN : s="5A2B4C"
OUT : [(5, 'A'), (2, 'B'), (4, 'C')]
Here range(0,len(s),2) gives values as : [0, 2, 4] which we use to go through the string.
NOTE : this ofcourse only works with strings of even size and with numbers below 10.
EDIT : As for numbers with double digits, the answer by Jean-François Fabre works well.

You can do this with regex if you want:
In one line
sorted_list=[i for i in re.findall(pattern, a, re.M)]
Same approach :
import re
a="5A2B4C"
pattern=r'(\d)(\w)'
list=[]
art=re.findall(pattern,a,re.M)
for i in art:
list.append(i)
print(list)
For your new edited problem here is my new solution :
import re
a = "5A2B4C11G"
pattern = r'([0-9]+)([a-zA-Z])'
list = []
art = re.findall(pattern, a, re.M)
for i in art:
list.append(i)
print(list)
Output:
[('5', 'A'), ('2', 'B'), ('4', 'C'), ('11', 'G')]

You have already got the answer from Jean-François Fabre.
The process is call length decoding.
The whole process can be done in one liner by following code.
from re import sub
text = "5A2B4C11G"
sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),text)
OUTPUT : 'AAAAABBCCCCGGGGGGGGGGG'
NOTE This is not the answer but just an optimization idea for the OP as answer is already present in Jean-François Fabre

import re
str = "5A2B4C11G"
pattern = r"(\d+)(\D)" # group1: digit(s), group2: non-digit
substitution = r"\1,\2 " # "ditits,nondigit "
temp = re.sub(pattern, substitution, str) # gives "5,A 2,B 4,C 11,G "
temp = temp.split() # gives ['5,A', '2,B', '4,C', '11,G']
result = [el.split(",") for el in temp] # gives [['5', 'A'], ['2', 'B'],
# ['4', 'C'], ['11', 'G']] - see note
First we replace sequences of digits followed by a symbol to something to which we can apply 2-level split(), choosing 2 different delimiters in the replacement string r"\1,\2 "
space for the 1st level (outer) split(), and
, for the 2nd level one (inner).
Then we apply those 2 splits.
Note: If you have a significant reason to obtain tuples (instead of good enough inner lists), simply apply the tuple() function in the last statement:
result = [tuple(el.split(",")) for el in temp]

Grouping every three items together in list - Python [duplicate]

This question already has answers here:
How can you split a list every x elements and add those x amount of elements to an new list?
(4 answers)
Closed 5 years ago.
I have a list consisting of a repeating patterns i.e.
list=['a','1','first','b','2','second','c','3','third','4','d','fourth']`
I am not sure how long this list will be, it could be fairly long, but I want to create list of the repeating patters i.e. along with populated names
list_1=['a','1','first']
list_2=['b','2','second']
list_3=['c','3','third']
..... etc
What is the best, basic code (not requiring import of modules) that I can use to achieve this?

You can get the chunks using zip():
>>> lst = ['a','1','first','b','2','second','c','3','third','4','d','fourth']
>>> list(zip(*[iter(lst)]*3))
[('a', '1', 'first'), ('b', '2', 'second'), ('c', '3', 'third'), ('4', 'd', 'fourth')]
Using zip() avoids creating intermediate lists, which could be important if you have long lists.
zip(*[iter(lst)]*3) could be rewritten:
i = iter(lst) # Create iterable from list
zip(i, i, i) # zip the iterable 3 times, giving chunks of the original list in 3
But the former, while a little more cryptic, is more general.
If you need names for this lists then I would suggest using a dictionary:
>>> d = {'list_{}'.format(i): e for i, e in enumerate(zip(*[iter(lst)]*3), 1)}
>>> d
{'list_1': ('a', '1', 'first'),
'list_2': ('b', '2', 'second'),
'list_3': ('c', '3', 'third'),
'list_4': ('4', 'd', 'fourth')}
>>> d['list_2']
('b', '2', 'second')

Try this
chunks = [data[x:x+3] for x in xrange(0, len(data), 3)]
It will make sublists with 3 items

Python string pattern recognition/compression

I can do basic regex alright, but this is slightly different, namely I don't know what the pattern is going to be.
For example, I have a list of similar strings:
lst = ['asometxt0moretxt', 'bsometxt1moretxt', 'aasometxt10moretxt', 'zzsometxt999moretxt']
In this case the common pattern is two segments of common text: 'sometxt' and 'moretxt', starting and separated by something else that is variable in length.
The common string and variable string can of course occur at any order and at any number of occasions.
What would be a good way to condense/compress the list of strings into their common parts and individual variations?
An example output might be:
c = ['sometxt', 'moretxt']
v = [('a','0'), ('b','1'), ('aa','10'), ('zz','999')]

This solution finds the two longest common substrings and uses them to delimit the input strings:
def an_answer_to_stackoverflow_question_1914394(lst):
"""
>>> lst = ['asometxt0moretxt', 'bsometxt1moretxt', 'aasometxt10moretxt', 'zzsometxt999moretxt']
>>> an_answer_to_stackoverflow_question_1914394(lst)
(['sometxt', 'moretxt'], [('a', '0'), ('b', '1'), ('aa', '10'), ('zz', '999')])
"""
delimiters = find_delimiters(lst)
return delimiters, list(split_strings(lst, delimiters))
find_delimiters and friends finds the delimiters:
import itertools
def find_delimiters(lst):
"""
>>> lst = ['asometxt0moretxt', 'bsometxt1moretxt', 'aasometxt10moretxt', 'zzsometxt999moretxt']
>>> find_delimiters(lst)
['sometxt', 'moretxt']
"""
candidates = list(itertools.islice(find_longest_common_substrings(lst), 3))
if len(candidates) == 3 and len(candidates[1]) == len(candidates[2]):
raise ValueError("Unable to find useful delimiters")
if candidates[1] in candidates[0]:
raise ValueError("Unable to find useful delimiters")
return candidates[0:2]
def find_longest_common_substrings(lst):
"""
>>> lst = ['asometxt0moretxt', 'bsometxt1moretxt', 'aasometxt10moretxt', 'zzsometxt999moretxt']
>>> list(itertools.islice(find_longest_common_substrings(lst), 3))
['sometxt', 'moretxt', 'sometx']
"""
for i in xrange(min_length(lst), 0, -1):
for substring in common_substrings(lst, i):
yield substring
def min_length(lst):
return min(len(item) for item in lst)
def common_substrings(lst, length):
"""
>>> list(common_substrings(["hello", "world"], 2))
[]
>>> list(common_substrings(["aabbcc", "dbbrra"], 2))
['bb']
"""
assert length <= min_length(lst)
returned = set()
for i, item in enumerate(lst):
for substring in all_substrings(item, length):
in_all_others = True
for j, other_item in enumerate(lst):
if j == i:
continue
if substring not in other_item:
in_all_others = False
if in_all_others:
if substring not in returned:
returned.add(substring)
yield substring
def all_substrings(item, length):
"""
>>> list(all_substrings("hello", 2))
['he', 'el', 'll', 'lo']
"""
for i in range(len(item) - length + 1):
yield item[i:i+length]
split_strings splits the strings using the delimiters:
import re
def split_strings(lst, delimiters):
"""
>>> lst = ['asometxt0moretxt', 'bsometxt1moretxt', 'aasometxt10moretxt', 'zzsometxt999moretxt']
>>> list(split_strings(lst, find_delimiters(lst)))
[('a', '0'), ('b', '1'), ('aa', '10'), ('zz', '999')]
"""
for item in lst:
parts = re.split("|".join(delimiters), item)
yield tuple(part for part in parts if part != '')

Here is a scary one to get the ball rolling.
>>> import re
>>> makere = lambda n: ''.join(['(.*?)(.+)(.*?)(.+)(.*?)'] + ['(.*)(\\2)(.*)(\\4)(.*)'] * (n - 1))
>>> inp = ['asometxt0moretxt', 'bsometxt1moretxt', 'aasometxt10moretxt', 'zzsometxt999moretxt']
>>> re.match(makere(len(inp)), ''.join(inp)).groups()
('a', 'sometxt', '0', 'moretxt', '', 'b', 'sometxt', '1', 'moretxt', 'aa', '', 'sometxt', '10', 'moretxt', 'zz', '', 'sometxt', '999', 'moretxt', '')
I hope its sheer ugliness will inspire better solutions :)

This seems to be an example of the longest common subsequence problem. One way could be to look at how diffs are generated. The Hunt-McIlroy algorithm seems to have been the first, and is such the simplest, especially since it apparently is non-heuristic.
The first link contains detailed discussion and (pseudo) code examples. Assuming, of course, Im not completely of the track here.

This look much like the LZW algorithm for data (text) compression. There should be python implementations out there, which you may be able to adapt to your need.
I assume you have no a priori knowledge of these sub strings that repeat often.

I guess you should start by identifying substrings (patterns) that frequently occur in the strings. Since naively counting substrings in a set of strings is rather computationally expensive, you'll need to come up with something smart.
I've done substring counting on a large amount of data using generalized suffix trees (example here). Once you know the most frequent substrings/patterns in the data, you can take it from there.

How about subbing out the known text, and then splitting?
import re
[re.sub('(sometxt|moretxt)', ',', x).split(',') for x in lst]
# results in
[['a', '0', ''], ['b', '1', ''], ['aa', '10', ''], ['zz', '999', '']]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

The last list elements and conditional loops - python

If you adhere to "Practicality beats purity": for idx, element in enumerate(yourlist[n:]): if yourlist[idx] == yourlist[idx-n] ... If you don't care about memory efficiency go for second's answer. If you want the purest answer then go for Lattyware's one.

Related

How to wrap a string or an array around and slice the wrapped string or array in Python?

Comparing elements between two lists of lists in python

Convert 5A2B4C11G string to [(5,"A"),(2,"B"),(4,"C"),(11,"G")] in Python

Grouping every three items together in list - Python [duplicate]

Python string pattern recognition/compression

Categories

Resources