Which function definition is more efficient in Python, even though they do the same task? I.e when should we use a for loop and when should we use a while loop?
def count_to_first_vowel(s):
''' (str) -> str
Return the substring of s up to but not including the first vowel in s. If no vowel
is present, return s.
>>> count_to_first_vowel('hello')
'h'
>>> count_to_first_vowel('cherry')
'ch'
>>> count_to_first_vowel('xyz')
xyz
'''
substring = ''
for char in s:
if char in 'aeiouAEIOU':
return substring
substring = substring + char
return substring
or
def count_to_first_vowel(s):
''' (str) -> str
Return the substring of s up to but not including the first vowel in s. If no vowel
is present, return s.
>>> count_to_first_vowel('hello')
'h'
>>> count_to_first_vowel('cherry')
'ch'
>>> count_to_first_vowel('xyz')
xyz
'''
substring = ''
i = 0
while i < len(s) and not s[i] in 'aeiouAEIOU':
substring = substring + s
i = i + 1
return substring
The for loop evaluates the length once, and operates knowing that. The while loop will have to evaluate len(s) each loop. There may be more overhead in accessing the individual index of the string each time the while statement gets evaluated too.
If the while loop is recalculating something like len() each time, I think it would be more efficient to use for. Both of them have to test at least one condition each loop though.
Rewriting the while loop to use a save variable like len = len(s) may remove that extra bit and make them extremely close. This is more true when you consider that your for loop is executing a second internal loop.
Related
I'm trying to compress a string in a way that any sequence of letters in strict alphabetical order is swapped with the first letter plus the length of the sequence.
For example, the string "abcdefxylmno", would become: "a6xyl4"
Single letters that aren't in order with the one before or after just stay the way they are.
How do I check that two letters are successors (a,b) and not simply in alphabetical order (a,c)? And how do I keep iterating on the string until I find a letter that doesn't meet this requirement?
I'm also trying to do this in a way that makes it easier to write an inverse function (that given the result string gives me back the original one).
EDIT :
I've managed to get the function working, thanks to your suggestion of using the alphabet string as comparison; now I'm very much stuck on the inverse function: given "a6xyl4" expand it back into "abcdefxylmno".
After quite some time I managed to split the string every time there's a number and I made a function that expands a 2 char string, but it fails to work when I use it on a longer string:
from string import ascii_lowercase as abc
def subString(start,n):
L=[]
ind = abc.index(start)
newAbc = abc[ind:]
for i in range(len(newAbc)):
while i < n:
L.append(newAbc[i])
i+=1
res = ''.join(L)
return res
def unpack(S):
for i in range(len(S)-1):
if S[i] in abc and S[i+1] not in abc:
lett = str(S[i])
num = int(S[i+1])
return subString(lett,num)
def separate(S):
lst = []
for i in S:
lst.append(i)
for el in lst:
if el.isnumeric():
ind = lst.index(el)
lst.insert(ind+1,"-")
a = ''.join(lst)
L = a.split("-")
if S[-1].isnumeric():
L.remove(L[-1])
return L
else:
return L
def inverse(S):
L = separate(S)
for i in L:
return unpack(i)
Each of these functions work singularly, but inverse(S) doesn't output anything. What's the mistake?
You can use the ord() function which returns an integer representing the Unicode character. Sequential letters in alphabetical order differ by 1. Thus said you can implement a simple funtion:
def is_successor(a,b):
# check for marginal cases if we dont ensure
# input restriction somewhere else
if ord(a) not in range(ord('a'), ord('z')) and ord(a) not in range(ord('A'),ord('Z')):
return False
if ord(b) not in range(ord('a'), ord('z')) and ord(b) not in range(ord('A'),ord('Z')):
return False
# returns true if they are sequential
return ((ord(b) - ord(a)) == 1)
You can use chr(int) method for your reversing stage as it returns a string representing a character whose Unicode code point is an integer given as argument.
This builds on the idea that acceptable subsequences will be substrings of the ABC:
from string import ascii_lowercase as abc # 'abcdefg...'
text = 'abcdefxylmno'
stack = []
cache = ''
# collect subsequences
for char in text:
if cache + char in abc:
cache += char
else:
stack.append(cache)
cache = char
# if present, append the last sequence
if cache:
stack.append(cache)
# stack is now ['abcdef', 'xy', 'lmno']
# Build the final string 'a6x2l4'
result = ''.join(f'{s[0]}{len(s)}' if len(s) > 1 else s for s in stack)
I am trying to make the python program check if at least one letter is in a string?
import string
s = ('hello')
if string.ascii_lowercase in s:
print('yes')
else:
print('no')
It always just prints no
Well, string.ascii_lowercase is equal to 'abcdefghijklmnopqrstuvwxyz'. That doesn't look like it's contained in hello, right?
What you should do instead is to go over the letters in ascii_lowercase and check if any of them are in your string s.
import string
s = ('hello')
if any([letter in s for letter in string.ascii_lowercase]):
print('yes')
else:
print('no')
Wonderfully smart people in the comments have pointed out that you can drop the [ ] brackets that would usually create a list, turning our list comprehension into something called a generator. This would prevent the need to check every single letter in ascii_lowercase and make our code a little bit faster - as it stands, the whole list is generated and then checked. With the generator, the letters are checked only up to e, as that's in 'hello'.
I was able to shave off a whole nanosecond this way! Still, straight up going through the whole list should be fine as well for most cases and is certainly simpler.
An efficient way to check if some string s contains any character from some alphabet:
alphabet = frozenset(string.ascii_lowercase)
any(letter in alphabet for letter in s)
Key points:
Avoid linear search by storing the alphabet in a set instead of a more general iterable that doesn't allow fast (O(1)) check of elements
Loop over the input, not the target alphabet, because the alphabet is probably a finite set of constant size, and allow even very large inputs efficiently, without linear searching and excessive memory use (putting input in a set instead of the alphabet)
Avoid unnecessary list creation (and wasted memory) by using a generator expression
Here are some inferior alternatives.
Linear search over string.ascii_lowercase:
any(letter in string.ascii_lowercase for letter in s)
Linear search over string.ascii_lowercase, and a useless list creation:
any([letter in string.ascii_lowercase for letter in s])
Linear search over the input, very poor performance in the worst case when the input is very long and does not contain any character from the alphabet:
any(letter in s for letter in string.ascii_lowercase)
Currently you are checking whether the whole string string.ascii_lowercase is in s.
You have to check every single character of string.ascii_lowercase instead.
The naive solution would look like this:
>>> s = 'hello'
>>> for letter in string.ascii_lowercase:
... if letter in s:
... print('yes')
... break
... else:
... print('no')
...
yes
Here, the else block will only execute if the loop was not broken by the break statement.
A shorthand for the for loop would be to use the any builtin paired with a generator-expression:
>>> contained = any(letter in s for letter in string.ascii_lowercase)
>>> print('yes' if contained else 'no')
yes
Finally, you can improve the runtime of both implementations by using the set of characters from s, i.e. s = set(s). This will ensure that every in check is performed in constant time rather than iterating over s for every letter that is searched.
edit: Here's another short one:
>>> if set(s).intersection(string.ascii_lowercase):
... print('yes')
... else:
... print('no')
...
yes
This uses the fact that an empty set (the possible result of the intersection) will be treated as False in the if check.
(It has the slight drawback that the computation of the intersection does not stop once a single shared letter letter is found.)
Make a set of each string and check the size of their intersection
def share_letter(s1, s2):
return bool(set(s1).intersection(s2))
string.ascii_lowercase is a string that contains all the lower case alphabets, i.e abcdefghijklmnopqrstuvwxyz.
So, in the if condition, if string.ascii_lowercase in s you are checking if the string contains a substring abcdefghijklmnopqrstuvwxyz.
You can try this,
if any(e in string.ascii_lowercase for e in s):
...
The expression inside any is a generator, thus it stop checking at the first match.
Another way to do this is,
if any(e.islower() for e in s):
...
This is another option:
import string
s = ('hello')
alpha = string.ascii_lowercase
if any(i in alpha for i in s):
print('yes')
else:
print('no')
Or maybe quicker:
import string
s = ('hello')
alpha = string.ascii_lowercase
for l in s:
if l in alpha:
print("yes")
break
print("no")
I'm trying to delete all the characters at the end of a string following the last occurrence of a '+'. So for instance, if my string is 'Mother+why+is+the+river+laughing' I want to reduce this to 'Mother+why+is+the+river'. I don't know what the string will be in advance though.
I thought of iterating backwards over the string. Something like:
while letter in my_string[::-1] != '+':
my_string = my_string[:-1]
This won't work because letter in not pre defined.
Any thoughts?
Just use str.rsplit():
my_string = my_string.rsplit('+', 1)[0]
.rsplit() splits from the end of a string; with a limit of 1 it'll only split on the very last + in the string and [0] gives you everything before that last +.
Demo:
>>> 'Mother+why+is+the+river+laughing'.rsplit('+', 1)[0]
'Mother+why+is+the+river'
If there is no + in the string, the original string is returned:
>>> 'Mother'.rsplit('+', 1)[0]
'Mother'
As for your loop; you are testing against a reversed string and the condition returns True until the last + has been removed; you'd have to test in the loop for what character you just removed:
while True:
last = my_string[-1]
my_string = my_string[:-1]
if last == '+':
break
but this is rather inefficient compared to using str.rsplit(); creating a new string for each character removed is costly.
I understand that recursion is when a function calls itself, however I can't figure out how exactly to get my function to call it self to get the desired results. I need to simply count the vowels in the string given to the function.
def recVowelCount(s):
'return the number of vowels in s using a recursive computation'
vowelcount = 0
vowels = "aEiou".lower()
if s[0] in vowels:
vowelcount += 1
else:
???
I came up with this in the end, thanks to some insight from here.
def recVowelCount(s):
'return the number of vowels in s using a recursive computation'
vowels = "aeiouAEIOU"
if s == "":
return 0
elif s[0] in vowels:
return 1 + recVowelCount(s[1:])
else:
return 0 + recVowelCount(s[1:])
Try this, it's a simple solution:
def recVowelCount(s):
if not s:
return 0
return (1 if s[0] in 'aeiouAEIOU' else 0) + recVowelCount(s[1:])
It takes into account the case when the vowels are in either uppercase or lowercase. It might not be the most efficient way to traverse recursively a string (because each recursive call creates a new sliced string) but it's easy to understand:
Base case: if the string is empty, then it has zero vowels.
Recursive step: if the first character is a vowel add 1 to the solution, otherwise add 0. Either way, advance the recursion by removing the first character and continue traversing the rest of the string.
The second step will eventually reduce the string to zero length, therefore ending the recursion. Alternatively, the same procedure can be implemented using tail recursion - not that it makes any difference regarding performance, given that CPython doesn't implement tail recursion elimination.
def recVowelCount(s):
def loop(s, acc):
if not s:
return acc
return loop(s[1:], (1 if s[0] in 'aeiouAEIOU' else 0) + acc)
loop(s, 0)
Just for fun, if we remove the restriction that the solution has to be recursive, this is how I'd solve it:
def iterVowelCount(s):
vowels = frozenset('aeiouAEIOU')
return sum(1 for c in s if c in vowels)
Anyway this works:
recVowelCount('murcielago')
> 5
iterVowelCount('murcielago')
> 5
Your function probably needs to look generally like this:
if the string is empty, return 0.
if the string isn't empty and the first character is a vowel, return 1 + the result of a recursive call on the rest of the string
if the string isn't empty and the first character is not a vowel, return the result of a recursive call on the rest of the string.
Use slice to remove 1st character and test the others. You don't need an else block because you need to call the function for every case. If you put it in else block, then it will not be called, when your last character is vowel: -
### Improved Code
def recVowelCount(s):
'return the number of vowels in s using a recursive computation'
vowel_count = 0
# You should also declare your `vowels` string as class variable
vowels = "aEiou".lower()
if not s:
return 0
if s[0] in vowels:
return 1 + recVowelCount(s[1:])
return recVowelCount(s[1:])
# Invoke the function
print recVowelCount("rohit") # Prints 2
This will call your recursive function with new string with 1st character sliced.
this is the straightforward approach:
VOWELS = 'aeiouAEIOU'
def count_vowels(s):
if not s:
return 0
elif s[0] in VOWELS:
return 1 + count_vowels(s[1:])
else:
return 0 + count_vowels(s[1:])
here is the same with less code:
def count_vowels_short(s):
if not s:
return 0
return int(s[0] in VOWELS) + count_vowels_short(s[1:])
here is another one:
def count_vowels_tailrecursion(s, count=0):
return count if not s else count_vowels_tailrecursion(s[1:], count + int(s[0] in VOWELS))
Unfortunately, this will fail for long strings.
>>> medium_sized_string = str(range(1000))
>>> count_vowels(medium_sized_string)
...
RuntimeError: maximum recursion depth exceeded while calling a Python object
if this is something of interest, look at this blog article.
Here's a functional programming approach for you to study:
map_ = lambda func, lst: [func(lst[0])] + map_(func, lst[1:]) if lst else []
reduce_ = lambda func, lst, init: reduce_(func, lst[1:], func(init, lst[0])) if lst else init
add = lambda x, y: int(x) + int(y)
is_vowel = lambda a: a in 'aeiou'
s = 'How razorback-jumping frogs can level six piqued gymnasts!'
num_vowels = reduce_(add, map_(is_vowel, s), 0)
The idea is to divide the problem into two steps, where the first ("map") converts the data into another form (a letter -> 0/1) and the second ("reduce") collects converted items into one single value (the sum of 1's).
References:
http://en.wikipedia.org/wiki/Map_(higher-order_function)
http://en.wikipedia.org/wiki/Reduce_(higher-order_function)
http://en.wikipedia.org/wiki/MapReduce
Another, more advanced solution is to convert the problem into tail recursive and use a trampoline to eliminate the recursive call:
def count_vowels(s):
f = lambda s, n: lambda: f(s[1:], n + (s[0] in 'aeiou')) if s else n
t = f(s, 0)
while callable(t): t = t()
return t
Note that unlike naive solutions this one can work with very long strings without causing "recursion depth exceeded" errors.
How to check for a string that for every character in it, there exists all the characters which are alphabetically smaller than it before it e.g aab is correct while aacb is not, because the second case, we have 'c' but 'b' is not present before it.
Also aac is not correct as it does not have 'b' before 'c'.
A pseudocode. Works for cases like abac too.
max = 'a' - 1 // character immediately before 'a'
for char in string
if char > max + 1
// bad string, stop algorithm
end
if char > max
max = char
end
end
The idea is that we need to check only that the character preceding the current one alphabetically has occurred before. If we have character e now and d has occurred before, then c, b and a did too.
Consider this as a bad answer
import string
foo = string.printable[10:36]
a = 'aac'
for i in a:
if i =='a':continue
if a.rfind(foo[foo.rfind(i)-1])!=-1:continue
else:print 'check_not cleared';break
ALPHA = 'abcdefghijklmnopqrstuvwxyz'
tests = [
'aab','abac','aabaacaabade', # First 3 tests should eval True
'ba','aac','aabbccddf' # Last 3 test should eval False
]
def CheckString(test):
alpha_counter = 0
while test:
if test[0] == ALPHA[alpha_counter]:
test = test.replace(ALPHA[alpha_counter],'')
alpha_counter+=1
else:
return False
return True
for test in tests:
print CheckString(test)
True
True
True
False
False
False
Given your criteria...
All you need to do is check the first letter to see if it passes your criteria... if it does, remove all occurrences of that letter from the string. And move onto the next letter. Your given criteria makes it easy because you just need to check alphabetically.
aabaacaabade
take the string above for example.
first letter 'a' passes criteria [there are no letters before 'a']
remove all 'a's from string remaining string: bcbde
first letter 'b' passes criteria [there was an 'a' before the 'b']
remove all 'b's from string remaining string: cde
first letter 'c' passes criteria [there was an 'a' and a 'b' before the 'c']
remove all 'c's from string remaining string: de
...
That should work if I understood your criteria correctly.
I believe to understand your question correctly, and here is my attempt at answering it, if I have mis-understood please correct me.
The standard comparisons (<, <=, >, >=, ==, !=) apply to strings. These comparisons use the standard character-by-character comparison rules for ASCII or Unicode. That being said, the greater and less than operators will compare strings using alphabetical order.
You might want to use the ascii encoding of the character.
mystr = "aab"
curr = ord(mystr[0])
for char in mystr[1:]:
if ord(char) < curr:
print "This character should not be here"
if ord(char) > curr:
curr = ord(char)
Changes made to reflect user470379's suggestion:
mystr = "aab"
curr = mystr[0]
for char in mystr[1:]:
if char < curr:
print "This character should not be here"
if char > curr:
curr = char
The idea is very simple, for each char in the string, it should not less than its preceding, and it shouldn't larger than its preceding + 1.
How about this? It simplifies the problem by first removing duplicate characters, then you only need to check the string is a prefix of the string containing all lowercase (ascii) letters.
import string
def uniq(s):
last = None
for c in s:
if c != last: yield c
last = c
def is_gapless_ascending(s):
s = ''.join(uniq(s))
return string.ascii_lowercase.startswith(s)