I know that in python there is a in operator which can used to check whether any sub-string or char is present in a string or not. I want to do this by checking each string (of length substring). Is the code below is the only way or is there any other way that I can achieve this?
m = "college"
s = "col"
lm = len(m)
ls = len(s)
f = 0
for i in range(lm):
if (i+ls) <= lm:
if s == m[i:(i+ls)]:
global f
f = 1
break
if f:
print "present"
else:
print "not present"
What I am doing here is if my sub-string is col, my program checks the string of length sub-string with sub-string by moving from start to end of the main-string and returns true or not.
col
oll
lle
leg
ege
Your code is a legitimate way to quickly implement a general substring search, but not the only one. More efficient algorithms include Boyer-Moore string search, Knuth-Morris-Pratt search, or search implemented with a DFA.
This is a large topic and your question does not make it clear what kind of information you are actually after. In case of Python, it is of course most effective to simply use the in operator and the related methods str.find and str.index, all of which deploy a simplified Boyer-Moore.
You could try something like this:
In [1]: m = 'college'
In [2]: s = 'col'
In [3]: if any(m[i:i+len(s)] == s for i in range(len(m)-len(s)+1)):
...: print 'Present'
...: else:
...: print 'Not present'
...:
Present
Where the any checks every substring of m of length len(s) and sees if it equals s. If so, it returns True and stops further processing (this is called 'short-circuiting' and is pretty similar to the break you have above).
Here is what the any piece would look like if we replaced it with a list comprehension and took out the equality comparison:
In [4]: [m[i:i+len(s)] for i in range(len(m)-len(s)+1)]
Out[4]: ['col', 'oll', 'lle', 'leg', 'ege']
You don't need global there. Also, you can do
In [1]: %paste
m = "college"
s = "col"
In [2]: 'not ' * all(s != m[i:i+len(s)] for i in range(1+len(m)-len(s))) + 'present'
Out[2]: 'present'
But actually you should of course just do s in m,
This kind of problem calls for a functional solution:
def strcomp(s, subs):
if len(s) < len(subs):
return False
elif s[0:len(subs)] == subs:
return True
else:
return strcomp(s[1:], subs)
You call recursively the strcomp function, each time with the "long" string - s losing its head until you either find subs in the first position or s becomes shorter than subs.
Related
Given 2 strings, each containing a DNA sequence, the function returns a bool to show if a contiguous sub-fragment of length 5 or above exists in string1 that can pair w/ a fragment of str2.
Here is what I tried using the functions "complement" and "reverese_complement" that I created but it doesn't give the right result:
def complement5(sequence1,sequence2):
for i in range(len(sequence1) - 4):
substr1 = sequence1[i:i+5]
if complement(substr1) in sequence2 or reverese_complement(substr1) in sequence2:
print(f'{substr1, complement(substr1), reverese_complement(substr1)} in {sequence2}')
return True
else:
return False
Then when I try:
complement5('CAATTCC','CTTAAGG')
it gives False instead of True
I personally would try to identify the whole complement first, then try to use the python function find(). See the example below
s = "This be a string"
if s.find("is") == -1:
print("No 'is' here!")
else:
print("Found 'is' in the string.")
So... for your genetics problem here:
Given str1 = "AGAGAG..."
Given str2 = "TCTCTC..."
Iterate through str1 sub-chunks (of length 5?) with a for loop and identify possible complements.
str1-complement-chunk = "TCTCTC" #use a method here. Obviously i'm simplifiying.
if (str2.find(str1-complement-chunk ) == -1):
print("dang couldn't find it")
else:
print('There exists a complement at <index>')
Additionally, you may reverse the chunk if you need to do so. I believe it can be accessed with str1[:-1], or something like that idk tho.
This question already has answers here:
Anagram of String 2 is Substring of String 1
(5 answers)
Closed 5 years ago.
EDIT: Posting my final solution because this was a very helpful thread and I want to add some finality to it. Using the advice from both answers below I was able to craft a solution. I added a helper function in which I defined an anagram. Here is my final solution:
def anagram(s1, s2):
s1 = list(s1)
s2 = list(s2)
s1.sort()
s2.sort()
return s1 == s2
def Question1(t, s):
t_len = len(t)
s_len = len(s)
t_sort = sorted(t)
for start in range(s_len - t_len + 1):
if anagram(s[start: start+t_len], t):
return True
return False
print Question1("app", "paple")
I am working on some practice technical interview questions and I'm stuck on the following question:
Find whether an anagram of string t is a substring of s
I have worked out the following two variants of my code, and a solution to this I believe lies in a cross between the two. The problem I am having is that the first code always prints False., regardless of input. The second variation works to some degree. However, it cannot sort individual letters. For example t=jks s=jksd will print True! however t=kjs s=jksd will print False.
def Question1():
# Define strings as raw user input.
t = raw_input("Enter phrase t:")
s = raw_input("Enter phrase s:")
# Use the sorted function to find if t in s
if sorted(t.lower()) in sorted(s.lower()):
print("True!")
else:
print("False.")
Question1()
Working variant:
def Question1():
# Define strings as raw user input.
t = raw_input("Enter phrase t:")
s = raw_input("Enter phrase s:")
# use a loop to find if t is in s.
if t.lower() in s.lower():
print("True!")
else:
print("False.")
Question1()
I believe there is a solution that lies between these two, but I'm having trouble figuring out how to use sorted in this situation.
You're very much on the right track. First, please note that there is no loop in your second attempt.
The problem is that you can't simply sort all of s and then look for sorted(t) in that. Rather, you have to consider each len(t) sized substring of s, and check that against the sorted t. Consider the trivial example:
t = "abd"
s = "abdc"
s trivially contains t. However, when you sort them, you get the strings abd and abcd, and the in comparison fails. The sorting gets other letters in the way.
Instead, you need to step through s in chunks the size of t.
t_len = len(t)
s_len = len(s)
t_sort = sorted(t)
for start in range(s_len - t_len + 1):
chunk = s[start:start+t_len]
if t_sort == sorted(chunk):
# SUCCESS!!
I think your problem lies in the "substring" requirement. If you sort, you destroy order. Which means that while you can determine that an anagram of string1 is an anagram of a substring of string2, until you actually deal with string2 in order, you won't have a correct answer.
I'd suggest iterating over all the substrings of length len(s1) in s2. This is a straightforward for loop. Once you have the substrings, you can compare them (sorted vs sorted) with s1 to decide if there is any rearrangement of s1 that yields a contiguous substring of s2.
Viz:
s1 = "jks"
s2 = "aksjd"
print('s1=',s1, ' s2=', s2)
for offset in range(len(s2) - len(s1) + 1):
ss2 = s2[offset:offset+len(s1)]
if sorted(ss2) == sorted(s1):
print('{} is an anagram of {} at offset {} in {}'.format(ss2, s1, offset, s2))
[Edit: as someone pointed out I have used improperly the palindrom concept, now I have edited with the correct functions. I have done also some optimizations in the first and third example, in which the for statement goes until it reach half of the string]
I have coded three different versions for a method which checks if a string is a palindrome. The method are implemented as extensions for the class "str"
The methods also convert the string to lowercase, and delete all the punctual and spaces. Which one is the better (faster, pythonic)?
Here are the methods:
1) This one is the first solution that I thought of:
def palindrom(self):
lowerself = re.sub("[ ,.;:?!]", "", self.lower())
n = len(lowerself)
for i in range(n//2):
if lowerself[i] != lowerself[n-(i+1)]:
return False
return True
I think that this one is the more faster because there aren't transformations or reversing of the string, and the for statement breaks at the first different element, but I don't think it's an elegant and pythonic way to do so
2) In the second version I do a transformation with the solution founded here on stackoverflow (using advanced slicing string[::-1])
# more compact
def pythonicPalindrom(self):
lowerself = re.sub("[ ,.;:?!]", "", self.lower())
lowerReversed = lowerself[::-1]
if lowerself == lowerReversed:
return True
else:
return False
But I think that the slicing and the comparision between the strings make this solution slower.
3) The thirds solution that I thought of, use an iterator:
# with iterator
def iteratorPalindrom(self):
lowerself = re.sub("[ ,.;:?!]", "", self.lower())
iteratorReverse = reversed(lowerself)
for char in lowerself[0:len(lowerself)//2]:
if next(iteratorReverse) != char:
return False
return True
which I think is way more elegant of the first solution, and more efficient of the second solution
So, I decided to just timeit, and find which one was the fastest. Note that the final function is a cleaner version of your own pythonicPalindrome. It is defined as follows:
def palindrome(s, o):
return re.sub("[ ,.;:?!]", "", s.lower()) == re.sub("[ ,.;:?!]", "", o.lower())[::-1]
Methodology
I ran 10 distinct tests per function. In each test run, the function was called 10000 times, with arguments self="aabccccccbaa", other="aabccccccbaa". The results can be found below.
palindrom iteratorPalindrome pythonicPalindrome palindrome
1 0.131656638 0.108762937 0.071676536 0.072031984
2 0.140950052 0.109713793 0.073781851 0.071860462
3 0.126966087 0.109586756 0.072349792 0.073776719
4 0.125113136 0.108729573 0.094633969 0.071474645
5 0.130878159 0.108602964 0.075770395 0.072455015
6 0.133569472 0.110276694 0.072811747 0.071764222
7 0.128642812 0.111065438 0.072170571 0.072285204
8 0.124896702 0.110218949 0.071898959 0.071841214
9 0.123841905 0.109278358 0.077430437 0.071747112
10 0.124083576 0.108184210 0.080211147 0.077391086
AVG 0.129059854 0.109441967 0.076273540 0.072662766
STDDEV 0.005387429 0.000901370 0.007030835 0.001781309
It would appear that the cleaner version of your pythonicPalindrome is marginally faster, but both functions clearly outclass the alternatives.
It seems that you want to know the execution time of your blocks of code and compare them.
You can use the timeit module.
Here's a quick way:
import timeit
start = timeit.default_timer()
#Your code here
stop = timeit.default_timer()
print stop - start
Read more:
Option 1
Option 2
You could also time this one-liner that does not use re, but itertools instead:
def isPalindrom(self):
return all(i==j for i, j in itertools.zip_longest((i.lower() for i in self if i not in " ,.;:?!"), (j.lower() for j in self[::-1] if j not in " ,.;:?!")))
Or, explained in more details:
def isPalindrom(self):
#using generators to not use memory
stripped_self = (i.lower() for i in self if i not in " ,.;:?!")
reversed_stripped_self = (j.lower() for j in self[::-1] if j not in " ,.;:?!")
return all(self_char==reversed_char for self_char, reversed_char in itertools.zip_longest(stripped_self, reversed_stripped_self))
Recall that filter works on strings:
>>> st="One string, with punc. That also needs lowercase!"
>>> filter(lambda c: c not in " ,.;:?!", st.lower())
'onestringwithpuncthatalsoneedslowercase'
So your test can be a one liner that is obvious in function:
>>> str
'!esacrewol sdeen osla tahT .cnup htiw ,gnirts enO'
>>> filter(lambda c: c not in " ,.;:?!", st.lower())==filter(lambda c: c not in " ,.;:?!", str.lower()[::-1])
True
Or, if you are going to use a regex, just reverse the result with the idiomatic str[::-1]:
>>> "123"[::-1]
'321'
>>> re.sub(r'[ ,.;:?!]', '', st.lower())==re.sub(r'[ ,.;:?!]', '', str.lower())[::-1]
True
The fastest may be to use string.tranlate to delete the characters:
>>> import string
>>> string.translate(st, None, " ,.;:?!")
'OnestringwithpuncThatalsoneedslowercase'
>>> string.translate(st, None, " ,.;:?!")==string.translate(str, None, " ,.;:?!")[::-1]
True
When we pass a word it checks if it can be reversed,If it can be reversed it prints "This is a Palindrome". or "This is NOT a Palindrome"
def reverse(word):
x = ''
for i in range(len(word)):
x += word[len(word)-1-i]
return x
word = input('give me a word:\n')
x = reverse(word)
if x == word:
print('This is a Palindrome')
else:
print('This is NOT a Palindrome')
Why not using a more pythonic way!
def palindrome_checker(string):
string = string.lower()
return string == string[::-1] # returns a boolean
Most of the questions I've found are biased on the fact they're looking for letters in their numbers, whereas I'm looking for numbers in what I'd like to be a numberless string.
I need to enter a string and check to see if it contains any numbers and if it does reject it.
The function isdigit() only returns True if ALL of the characters are numbers. I just want to see if the user has entered a number so a sentence like "I own 1 dog" or something.
Any ideas?
You can use any function, with the str.isdigit function, like this
def has_numbers(inputString):
return any(char.isdigit() for char in inputString)
has_numbers("I own 1 dog")
# True
has_numbers("I own no dog")
# False
Alternatively you can use a Regular Expression, like this
import re
def has_numbers(inputString):
return bool(re.search(r'\d', inputString))
has_numbers("I own 1 dog")
# True
has_numbers("I own no dog")
# False
You can use a combination of any and str.isdigit:
def num_there(s):
return any(i.isdigit() for i in s)
The function will return True if a digit exists in the string, otherwise False.
Demo:
>>> king = 'I shall have 3 cakes'
>>> num_there(king)
True
>>> servant = 'I do not have any cakes'
>>> num_there(servant)
False
Use the Python method str.isalpha(). This function returns True if all characters in the string are alphabetic and there is at least one character; returns False otherwise.
Python Docs: https://docs.python.org/3/library/stdtypes.html#str.isalpha
https://docs.python.org/2/library/re.html
You should better use regular expression. It's much faster.
import re
def f1(string):
return any(i.isdigit() for i in string)
def f2(string):
return re.search('\d', string)
# if you compile the regex string first, it's even faster
RE_D = re.compile('\d')
def f3(string):
return RE_D.search(string)
# Output from iPython
# In [18]: %timeit f1('assdfgag123')
# 1000000 loops, best of 3: 1.18 µs per loop
# In [19]: %timeit f2('assdfgag123')
# 1000000 loops, best of 3: 923 ns per loop
# In [20]: %timeit f3('assdfgag123')
# 1000000 loops, best of 3: 384 ns per loop
You could apply the function isdigit() on every character in the String. Or you could use regular expressions.
Also I found How do I find one number in a string in Python? with very suitable ways to return numbers. The solution below is from the answer in that question.
number = re.search(r'\d+', yourString).group()
Alternatively:
number = filter(str.isdigit, yourString)
For further Information take a look at the regex docu: http://docs.python.org/2/library/re.html
Edit: This Returns the actual numbers, not a boolean value, so the answers above are more correct for your case
The first method will return the first digit and subsequent consecutive digits. Thus 1.56 will be returned as 1. 10,000 will be returned as 10. 0207-100-1000 will be returned as 0207.
The second method does not work.
To extract all digits, dots and commas, and not lose non-consecutive digits, use:
re.sub('[^\d.,]' , '', yourString)
I'm surprised that no-one mentionned this combination of any and map:
def contains_digit(s):
isdigit = str.isdigit
return any(map(isdigit,s))
in python 3 it's probably the fastest there (except maybe for regexes) is because it doesn't contain any loop (and aliasing the function avoids looking it up in str).
Don't use that in python 2 as map returns a list, which breaks any short-circuiting
You can accomplish this as follows:
if a_string.isdigit():
do_this()
else:
do_that()
https://docs.python.org/2/library/stdtypes.html#str.isdigit
Using .isdigit() also means not having to resort to exception handling (try/except) in cases where you need to use list comprehension (try/except is not possible inside a list comprehension).
You can use NLTK method for it.
This will find both '1' and 'One' in the text:
import nltk
def existence_of_numeric_data(text):
text=nltk.word_tokenize(text)
pos = nltk.pos_tag(text)
count = 0
for i in range(len(pos)):
word , pos_tag = pos[i]
if pos_tag == 'CD':
return True
return False
existence_of_numeric_data('We are going out. Just five you and me.')
You can use range with count to check how many times a number appears in the string by checking it against the range:
def count_digit(a):
sum = 0
for i in range(10):
sum += a.count(str(i))
return sum
ans = count_digit("apple3rh5")
print(ans)
#This print 2
import string
import random
n = 10
p = ''
while (string.ascii_uppercase not in p) and (string.ascii_lowercase not in p) and (string.digits not in p):
for _ in range(n):
state = random.randint(0, 2)
if state == 0:
p = p + chr(random.randint(97, 122))
elif state == 1:
p = p + chr(random.randint(65, 90))
else:
p = p + str(random.randint(0, 9))
break
print(p)
This code generates a sequence with size n which at least contain an uppercase, lowercase, and a digit. By using the while loop, we have guaranteed this event.
any and ord can be combined to serve the purpose as shown below.
>>> def hasDigits(s):
... return any( 48 <= ord(char) <= 57 for char in s)
...
>>> hasDigits('as1')
True
>>> hasDigits('as')
False
>>> hasDigits('as9')
True
>>> hasDigits('as_')
False
>>> hasDigits('1as')
True
>>>
A couple of points about this implementation.
any is better because it works like short circuit expression in C Language and will return result as soon as it can be determined i.e. in case of string 'a1bbbbbbc' 'b's and 'c's won't even be compared.
ord is better because it provides more flexibility like check numbers only between '0' and '5' or any other range. For example if you were to write a validator for Hexadecimal representation of numbers you would want string to have alphabets in the range 'A' to 'F' only.
What about this one?
import string
def containsNumber(line):
res = False
try:
for val in line.split():
if (float(val.strip(string.punctuation))):
res = True
break
except ValueError:
pass
return res
containsNumber('234.12 a22') # returns True
containsNumber('234.12L a22') # returns False
containsNumber('234.12, a22') # returns True
I'll make the #zyxue answer a bit more explicit:
RE_D = re.compile('\d')
def has_digits(string):
res = RE_D.search(string)
return res is not None
has_digits('asdf1')
Out: True
has_digits('asdf')
Out: False
which is the solution with the fastest benchmark from the solutions that #zyxue proposed on the answer.
Also, you could use regex findall. It's a more general solution since it adds more control over the length of the number. It could be helpful in cases where you require a number with minimal length.
s = '67389kjsdk'
contains_digit = len(re.findall('\d+', s)) > 0
Simpler way to solve is as
s = '1dfss3sw235fsf7s'
count = 0
temp = list(s)
for item in temp:
if(item.isdigit()):
count = count + 1
else:
pass
print count
alp_num = [x for x in string.split() if x.isalnum() and re.search(r'\d',x) and
re.search(r'[a-z]',x)]
print(alp_num)
This returns all the string that has both alphabets and numbers in it. isalpha() returns the string with all digits or all characters.
This too will work.
if any(i.isdigit() for i in s):
print("True")
You can also use set.intersection
It is quite fast, better than regex for small strings.
def contains_number(string):
return True if set(string).intersection('0123456789') else False
An iterator approach. It consumes all characters unless a digit is met. The second argument of next fix the default value to return when the iterator is "empty". In this case it set to False but also '' works since it is casted to a boolean value in the if.
def has_digit(string):
str_iter = iter(string)
while True:
char = next(str_iter, False)
# check if iterator is empty
if char:
if char.isdigit():
return True
else:
return False
or by looking only at the 1st term of a generator comprehension
def has_digit(string):
return next((True for char in string if char.isdigit()), False)
I'm surprised nobody has used the python operator in. Using this would work as follows:
foo = '1dfss3sw235fsf7s'
bar = 'lorem ipsum sit dolor amet'
def contains_number(string):
for i in range(10):
if str(i) in list(string):
return True
return False
print(contains_number(foo)) #True
print(contains_number(bar)) #False
Or we could use the function isdigit():
foo = '1dfss3sw235fsf7s'
bar = 'lorem ipsum sit dolor amet'
def contains_number(string):
for i in list(string):
if i.isdigit():
return True
return False
print(contains_number(foo)) #True
print(contains_number(bar)) #False
These functions basically just convert s into a list, and check whether the list contains a digit. If it does, it returns True, if not, it returns False.
I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)