Unusual behaviour when using string concatenation inside for loop

Unusual behaviour when using string concatenation inside for loop - python

So the code below properly removes all the vowels from a string as expected.
def disemvowel(string):
# Letters to remove & the new, vowerl free string
vowels_L = list('aeiouAEIOU')
new_string = ""
# Go through each word in the string
for word in string:
# Go through each character in the word
for character in word:
# Skip over vowels, include everything elses
if character in vowels_L:
pass
else:
new_string += character
# Put a space after every word
new_string += ' '
# Exclude space place at end of string
return new_string[:-1]
no_vowels = disemvowel('Nasty Comment: Stack exchange sucks!')
print(no_vowels)
>>>>python remove_vowels.py
>>>>Nsty Cmmnt: Stck xchng scks!
However, when I move the statement: "new_string+= ' '" to where I think it should be (I come from a C/C++ background), I end up getting a weird answer,
def disemvowel(string):
# Letters to remove & the new, vowerl free string
vowels_L = list('aeiouAEIOU')
new_string = ""
# Go through each word in the string
for word in string:
# Go through each character in the word
for character in word:
# Skip over vowels, include everything elses
if character in vowels_L:
pass
else:
new_string += character
# THIS IS THE LINE OF CODE THAT WAS MOVED
# Put a space after every word
new_string += ' '
# Exclude space place at end of string
return new_string[:-1]
no_vowels = disemvowel('Nasty Comment: Stack exchange sucks!')
print(no_vowels)
>>>>python remove_vowels.py
>>>>N s t y C m m n t : S t c k x c h n g s c k s !
Instead of placing a space after a word has finished being iterated over exclusively, a space is also place wherever there was a vowel. I was hoping someone would be able to explain why this occurs, even though in C the result would be quite different. Also, any suggestions to streamline/condense the could would be welcome! : )

for word in string doesn't iterate over the words; it iterates over the characters. You don't need to add spaces at all, because the spaces in the original string are preserved.

As interjay comments, your indentation is way out. Python relies on the indentation to describe which statements belong to what block, instead of the more common BEGIN ... END or { ... }.
In addition, user2357112 observes that you are expecting words from your string, whereas a string is simply a list of characters, and for word in string will set word to one character of string at a time
It is also much cleaner to use not in rather than an if together with a pass.
This is much closer to what you intended
def disemvowel(string):
# Letters to remove & the new, vowel-free string
vowels_list = 'aeiouAEIOU'
new_string = ""
# Go through each character in the string
for character in string:
# Skip over vowels, include everything else
if character not in vowels_list:
new_string += character
return new_string
print disemvowel('Nasty Comment: Stack exchange sucks!')
output
Nsty Cmmnt: Stck xchng scks!

Related

Swap last two characters in a string, make it lowercase, and add a space

I'm trying to take the last two letters of a string, swap them, make them lowercase, and leave a space in the middle. For some reason the output gives me white space before the word.
For example if input was APPLE then the out put should be e l
It would be nice to also be nice to ignore non string characters so if the word was App3e then the output would be e p
def last_Letters(word):
last_two = word[-2:]
swap = last_two[-1:] + last_two[:1]
for i in swap:
if i.isupper():
swap = swap.lower()
return swap[0]+ " " +swap[1]
word = input(" ")
print(last_Letters(word))

You can try with the following function:
import re
def last_Letters(word):
letters = re.sub(r'\d', '', word)
if len(letters) > 1:
return letters[-1].lower() + ' ' + letters[-2].lower()
return None
It follows these steps:
removes all the digits
if there are at least two characters:
lowers every character
builds the required string by concatenation of the nth letter, a space and the nth-1 letter
and returns the string
returns "None"

Since I said there was a simpler way, here's what I would write:
text = input()
result = ' '.join(reversed([ch.lower() for ch in text if ch.isalpha()][-2:]))
print(result)
How this works:
[ch.lower() for ch in text] creates a list of lowercase characters from some iterable text
adding if ch.isalpha() filters out anything that isn't an alphabetical character
adding [-2:] selects the last two from the preceding sequence
and reversed() takes the sequence and returns an iterable with the elements in reverse
' '.join(some_iterable) will join the characters in the iterable together with spaces in between.
So, result is set to be the last two characters of all of the alphabetical characters in text, in reverse order, separated by a space.
Part of what makes Python so powerful and popular, is that once you learn to read the syntax, the code very naturally tells you exactly what it is doing. If you read out the statement, it is self-describing.

Encoding duplicate words is not working in the code?

I am a learner and i was doing a charachter encoding exercise in codewars
My code is failing in tests for "(" and ")" and random characters
def duplicate_encode(word):
#your code here
word = word.lower()
for ch in word:
if word.count(ch) == 1:
word = word.replace(ch, "(")
else:
word = word.replace(ch, ")")
return word
can anybody help
The problem statement is as follows :
The goal of this exercise is to convert a string to a new string where each character in the new string is "(" if that character appears only once in the original string, or ")" if that character appears more than once in the original string. Ignore capitalization when determining if a character is a duplicate.

For example input: HaO#lknFmcxzI( RHJ
When your iteration gets as far as the (, it will count the number of ( in the string including some that were not in the original string, because your function puts new ( into the string.
Also if your function uses replace(ch, ...) and ch is equal to ( or ), you are altering all the parentheses you have added so far.
A way to avoid that is to not keep altering the string while you're looking at it, but build up a new separate sequence of characters.
def duplicate_encode(word):
word = word.lower()
new = []
for ch in word:
if word.count(ch) == 1:
new.append('(')
else:
new.append(')')
return ''.join(new)
The one liner in your comments:
''.join('(' if word.lower().count(ch) == 1 else ')' for ch in word.lower())
uses a generator expression. It iterates through the string (transformed to lower case), and generates either ( or ) for each one (depending on the count), and then at the end, joins up all the characters to a new string.

Python - string index out of range issue

This is the question I was given to solve:
Create a program inputs a phrase (like a famous quotation) and prints all of the words that start with h-z.
I solved the problem, but the first two methods didn't work and I wanted to know why:
#1 string index out of range
quote = input("enter a 1 sentence quote, non-alpha separate words: ")
word = ""
for character in quote:
if character.isalpha():
word += character.upper()
else:
if word[0].lower() >= "h":
print(word)
word = ""
else:
word = ""
I get the IndexError: string index out of range message for any words after "g". Shouldn't the else statement catch it? I don't get why it doesn't, because if I remove the brackets [] from word[0], it works.
#2: last word not printing
quote = input("enter a 1 sentence quote, non-alpha separate words: ")
word = ""
for character in quote:
if character.isalpha():
word += character.upper()
else:
if word.lower() >= "h":
print(word)
word = ""
else:
word = ""
In this example, it works to a degree. It eliminates any words before 'h' and prints words after 'h', but for some reason doesn't print the last word. It doesn't matter what quote i use, it doesn't print the last word even if it's after 'h'. Why is that?

You're calling on word[0]. This accesses the first element of the iterable string word. If word is empty (that is, word == ""), there is no "first element" to access; thus you get an IndexError. If a "word" starts with a non-alphabetic character (e.g. a number or a dash), then this will happen.
The second error you're having, with your second code snippet leaving off the last word, is because of the approach you're using for this problem. It looks like you're trying to walk through the sentence you're given, character by character, and decide whether to print a word after having read through it (which you know because you hit a space character. But this leads to the issue with your second approach, which is that it doesn't print the last string. That's because the last character in your sentence isn't a space - it's just the last letter in the last word. So, your else loop is never executed.
I'd recommend using an entirely different approach, using the method string.split(). This method is built-in to python and will transform one string into a list of smaller strings, split across the character/substring you specify. So if I do
quote = "Hello this is a sentence"
words = quote.split(' ')
print(words)
you'll end up seeing this:
['Hello', 'this', 'is', 'a', 'sentence']
A couple of things to keep in mind on your next approach to this problem:
You need to account for empty words (like if I have two spaces in a row for some reason), and make sure they don't break the script.
You need to account for non-alphanumeric characters like numbers and dashes. You can either ignore them or handle them differently, but you have to have something in place.
You need to make sure that you handle the last word at some point, even if the sentence doesn't end in a space character.
Good luck!

Instead of what you're doing, you can Iterate over each word in the string and count how many of them begin in those letters. Read about the function str.split(), in the parameter you enter the divider, in this case ' ' since you want to count the words, and that returns a list of strings. Iterate over that in the loop and it should work.

replace() function is not replacing 'e' character

My code should recognize the vowel letters and remove them from input string using the replace() function. However it works fine except for the 'e' letter.
If the input is "Hey look Words!" the output is "Hey lk Wrds!".
It identifies the 'e' only if if the "vowels" string is equal to "e" or "eE" only!
I am curious to know why?
def anti_vowel(text):
vowles="AaEeOoIiUu"
newstr=""
for i in text:
if i in vowles:
newstr=text.replace(i,"")
return newstr

You are placing only the last replacement result in newstr. All your previous str.replace() results are discarded.
For your input text Hey look Words!, the last vowel encountered is o so only o is replaced. The e replacement did take place and was stored in newstr but that value was then discarded when you set newstr to the result of the o replacement. It thus depends on the input string what vowel exactly will remain replaced; for the sentence 'The cat sat on the mat' it'll be a as that is the last vowel you test and replace.
Just loop directly over vowels and replace each of those characters; it is save to call str.replace() where the first argument is not present. Store the result back in text so that any subsequent replacements stick:
def anti_vowel(text):
vowels = "AaEeOoIiUu"
for vowel in vowels:
text = text.replace(vowel, "")
return text
Better still, use the str.translate() method to replace all vowels in one go:
# Python 2 version
def anti_vowel(text):
vowels = "AaEeOoIiUu"
return text.translate(None, vowels)
# Python 3 version
def anti_vowel(text):
vowels = str.maketrans(dict.fromkeys("AaEeOoIiUu"))
return text.translate(vowels)
str.translate() makes all replacements at once; the method changed between Python 2 str and Python 3 str, but in both versions all the vowels are ignored as the new string is built, without any further loops.

There's no reason to iterate through all the letters in the word; the replace() method does that for you. And you are erasing newstr every time, so by the end, all you're doing is replacing u. Here's what you need to do.
def anti_vowel(text):
vowels = "AaEeIiOoUu"
for i in vowels:
text = text.replace(i, "")
return text
This way, each time you replace text, you save and keep the replaced string. What you were doing earlier was making newstr into text without A, then replacing newstr with text sans a (but with A), so on and so forth. The end result was text without u but with everything else.

You should change your code to:
def anti_vowel(text):
vowles="AaEeOoIiUu"
newstr=text
for i in newstr:
if i in vowles:
newstr=newstr.replace(i,"")
return newstr
Then you will acummulate each replacement in your final string.
The way you are doing you always use the original string and replace only one group of chars ('Ee', 'Aa', etc...) in each iteration. So, in the end, you get a result of only one of these groups replaced in the original string.

Python script to insert space between different character types: Why is this so slow?

I'm working with some text that has a mix of languages, which I've already done some processing on and is in the form a list of single characters (called "letters"). I can tell which language each character is by simply testing if it has case or not (with a small function called "test_lang"). I then want to insert a space between characters of different types, so I don't have any words that are a mix of character types. At the same time, I want to insert a space between words and punctuation (which I defined in a list called "punc"). I wrote a script that does this in a very straight-forward way that made sense to me (below), but apparently is the wrong way to do it, because it is incredibly slow.
Can anyone tell me what the better way to do this is?
# Add a space between Arabic/foreign mixes, and between words and punc
cleaned = ""
i = 0
while i <= len(letters)-2: #range excludes last letter to avoid Out of Range error for i+1
cleaned += letters[i]
# words that have case are Latin; otherwise Arabic
if test_lang(letters[i]) != test_lang(letters[i+1]):
cleaned += " "
if letters[i] in punc or letters[i+1] in punc:
cleaned += " "
i += 1
cleaned += letters[len(letters)-1] # add in last letter

There are a few things going on here:
You call test_lang() on every letter in the string twice, this is probably the main reason this is slow.
Concatenating strings in Python isn't very efficient, you should instead use a list or generator and then use str.join() (most likely, ''.join()).
Here is the approach I would take, using itertools.groupby():
from itertools import groupby
def keyfunc(letter):
return (test_lang(letter), letter in punc)
cleaned = ' '.join(''.join(g) for k, g in groupby(letters, keyfunc))
This will group the letters into consecutive letters of the same language and whether or not they are punctuation, then ''.join(g) converts each group back into a string, then ' '.join() combines these strings adding a space between each string.
Also, as noted in comments by DSM, make sure that punc is a set.

Every time you perform a string concatenation, a new string is created. The longer the string gets, the longer each concatenation takes.
http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm
You might be better off declaring a list big enough to store the characters of the output, and joining them at the end.

I suggest an entirely different solution that should be very fast:
import re
cleaned = re.sub(r"(?<!\s)\b(?!\s)", " ", letters, flags=re.LOCALE)
This inserts a space at every word boundary (defining words as "sequences of alphanumeric characters, including accented characters in your current locale", which should work in most cases), unless it's a word boundary next to whitespace.
This should split between Latin and Arabic characters as well as between Latin and punctuation.

Assuming test_lang is not the bottleneck, I'd try:
''.join(
x + ' '
if x in punc or y in punc or test_lang(x) != test_lang(y)
else x
for x, y in zip(letters[:-1], letters[1:])
)

Here is a solution that uses yield. I would be interested to know whether this runs any faster than your original solution.
This avoids all the indexing in the original. It just iterates through the input, holding onto a single previous character.
This should be easy to modify if your requirements change in the future.
ch_sep = ' '
def _sep_chars_by_lang(s_input):
itr = iter(s_input)
ch_prev = next(itr)
yield ch_prev
while True:
ch = next(itr)
if test_lang(ch_prev) != test_lang(ch) or ch_prev in punc:
yield ch_sep
yield ch
ch_prev = ch
def sep_chars_by_lang(s_input):
return ''.join(_sep_chars_by_lang(s_input))

Keeping the basic logic of the OP's original code, we speed it up by not doing all that [i] and [i+1] indexing. We use a prev and next reference that scan through the string, maintaining prev one character behind next:
# Add a space between Arabic/foreign mixes, and between words and punc
cleaned = ''
prev = letters[0]
for next in letters[1:]:
cleaned += prev
if test_lang(prev) != test_lang(next):
cleaned += ' '
if prev in punc or next in punc:
cleaned += ' '
prev = next
cleaned += next
Testing on a string of 10 million characters shows this is about twice the speed of the OP code. The "string concatenation is slow" complaint is obsolete, as others have pointed out. Running the test again using the ''.join(...) metaphor shows a slighly slower execution than using string concatenation.
Further speedup may come through not calling the test_lang() function but by inlining some simple code. Can't comment as I don't really know what test_lang() does :).
Edit: removed a 'return' statement that should not have been there (testing remnant!).
Edit: Could also speedup by not calling test_lang() twice on the same character (on next in one loop and then prev in the following loop). Cache the test_lang(next) result.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unusual behaviour when using string concatenation inside for loop - python

for word in string doesn't iterate over the words; it iterates over the characters. You don't need to add spaces at all, because the spaces in the original string are preserved.

Related

Swap last two characters in a string, make it lowercase, and add a space

Encoding duplicate words is not working in the code?

Python - string index out of range issue

replace() function is not replacing 'e' character

Python script to insert space between different character types: Why is this so slow?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unusual behaviour when using string concatenation inside for loop - python

for word in string doesn't iterate over the words; it iterates over the characters. You don't need to add spaces at all, because the spaces in the original string are preserved.

Related

Swap last two characters in a string, make it lowercase, and add a space

Encoding duplicate words is not working in the code?

Python - string index out of range issue

replace() function is not replacing 'e' character

Python script to insert space between different character types: Why is this *so* slow?

Categories

Resources

Python script to insert space between different character types: Why is this so slow?