How to efficiently remove single letter words in Python

How to efficiently remove single letter words in Python - python

I want to remove single letter words such as a, i e, e g, b f f, y o l o, c y l using a Python function.
My current code looks follows.
def remove_single_letters(concept):
mystring = "valid"
if len(concept) == 1:
mystring = "invalid"
if len(concept)>1:
validation = []
splits = concept.split()
for item in splits:
if len(item) > 1:
validation.append("valid")
if len(validation) != 1:
mystring = "invalid"
return mystring
print(remove_single_letters("b f f"))
It works fine. However, I am wondering if there is a more efficient way (with lesser time) of doing it in python.

Here is a single line solution:
def remove_single_letters(concept):
return ["valid", "invalid"][concept.count(" ") >= len(concept) // 2]
Update: Note that this is shorter and cool looking but does not necessarily run faster.
Explanation:
concept.count(" "): Returns the number of spaces in string
>= len(concept) // 2: Returns True if more than half of the string is spaces (it fails when there are multiple spaces between legitimate words as #user202729 mentioned)
["valid", "invalid"][result]: This part is just for fun: It returns the first element if the result is False and the second element if the result is True (because False is equal to 0 and True is equal to 1).

I would go for a more concise solution (yet not faster as both solutions are of O(n)) if you want to check if any 1 letter character exists in the string:
remove_single_letters = lambda concept:"invalid" if 1 in [len(item) for item in concept.split()] else "valid"
print(remove_single_letters("a b c"))
#prints invalid
An ordinary function would be like this:
def remove_single_letters(concept):
return "invalid" if 1 in [len(item) for item in concept.split()] else "valid"
They both check the length of elements in the split input to find any item with length of 1 and is insensitive to multiple spaces thanks to split().
If you want to check the strings that are totally made up of single characters:
def remove_single_letters(concept):
u = set(len(item) for item in concept.split())
return "invalid" if len(u) == 1 and 1 in u else "valid"

Related

identifying the substring when the number of characters in between don't matter

def checkPattern(x, string):
e = len(string)
if len(x) < e:
return False
for i in range(e - 1):
x = string[i]
y = string[i + 1]
last = x.rindex(x)
first = x.index(y)
if last == -1 or first == -1 or last > first:
return False
return True
if __name__ == "__main__":
x = str(input())
string = "hello"
if checkPattern(x, string) is True:
print('YES')
if checkPattern(x, string) is False:
print('NO')
So basically the code is supposed to identify a substring when the number of characters between the substring's letters don't matter. string = "hello" is supposed to be the substring. While the characters in between don't matter the order still matters so If I type "h.e.l.l.o" for example it's a YES, but if it's something like "hlelo" it's a NO. I sorta copied the base of the code and I'm still a little new to python so sorry if the question and code aren't clear.

Assuming I understand, and the input hlelo is No and the input h.e..l.l.!o is Yes, then the following code should work:
def checkPattern(x, string):
assert x and string, "Error. Both inputs should be non-empty. "
count_idx = 0 # index which counts where you are.
for letter in x:
if letter == string[count_idx]:
count_idx += 1 # increment to check the next string
if count_idx == len(string):
return True
# pattern was found if counter found matches equal to string length
return False
if __name__ == "__main__":
inp = input()
string = "hello"
if checkPattern(inp, string) is True:
print('YES')
if checkPattern(inp, string) is False:
print('NO')
Explaination: Regardless of the input string, x, you want to loop through each character of the search-string hello, to check if you find each character in the correct order. What my solution does is that it counts how many of the characters h, e, l, l, o it has found, starting from 0. If it finds a match for h, it moves on to check for a match for e, and so on. Ultimately, if you search through the entire string x, and the counter does not equal to the length of the search string (i.e. you could not find all the hello characters), it returns false.
EDIT: Small debug in the way the return worked. Instead returns if ever the counter goes over the length. Also added more examples given in comments

Here is my solution to this problem:
pattern = "hello"
def patternCheck(word, pattern) -> bool:
plist = list(pattern)
wlist = list(word)
for p in plist:
if p in wlist:
for _ in range(wlist.index(p) , -1, -1):
wlist.pop(_)
else:
return False
return True
print(patternCheck("h.e.l.l.o", pattern))
print(patternCheck("aalohel", pattern))
print(patternCheck("hhhhheeelllooo", pattern))
Explanation
First we convert our strings to a list
plist = list(pattern)
wlist = list(word)
Now we check using a for loop if every element in our pattern list is in the word list.
for p in plist:
if p in wlist:
If yes then we remove all the elements from index 0 to the index of that element.
for _ in range(wlist.index(p) , -1, -1):
wlist.pop(_)
We are removing elements in decreasing order of there indices to protect ourself from the IndexError: pop index out of range.
If the for loop ends normally then there was a match and we return True. Else if the element was not found in the word list in the first place then we return false as there is no match.

How to check if all lowercase 'g' in a string has another 'g' adjacent to it?

I need to write a program that returns True/False based on the following conditions:
A lowercase 'g' in a string is "happy" if there is another 'g' adjacent to it, such as "gg". Then, return True if all the g's in the given string are happy.
Else, return False
The desired output is:
string: xggt
output: True
string: abgsx
output: False
string: xxggygxx
output: False
I am a little lost with this question and am scratching my head at trying to solve it. Would anyone be able to assist me and explain how the desired result is achieved?
My current code is :
s=input("Enter string: ")
happy = true
sad = false
while s <='g':
print ("Happy?",sad)
elif s <='gg':
print("Happy?", happy)

Using re library is very helpful when looking for patterns in strings.
So in that case you would want to search for one of the following patters:
"g"
"...Xg"
"gX..."
"...XgX..."
where X is anything except 'g'.
so with re library it would be:
import re
s=input("Enter string: ")
res = not bool(re.search(r"(^g$|[^g]g$|^g[^g]|[^g]g[^g])", s))
print("Happy?", res)

The classic way of doing this would be with a regex, but I'll show how to do it with itertools.groupby:
>>> from itertools import groupby
>>> def is_happy(s):
... """Returns whether all 'g's are next to at least one other 'g'."""
... return all(sum(1 for _ in x) > 1 for g, x in groupby(s, "g".__eq__) if g)
...
>>> is_happy("xggt")
True
>>> is_happy("abgsx")
False
>>> is_happy("xxggygxx")
False
Let's break down the individual pieces:
for g, x in groupby(s, "g".__eq__)
groupby turns an iterable (in this case s) into "groups" based on a grouping function.
g in this iteration is the result of "g".__eq__, i.e. whether the group is of gs or not. x is the actual group (i.e. a substring of s).
if g
The if g on the end means we'll skip any g, x pair where g is false (i.e. any group that's not just gs).
sum(1 for _ in x) > 1
This tells us whether the group has more than one element. If x were a string we could just use its len, but it's a _grouper which is a one-use iterable that doesn't implement __len__, so instead we use sum to iterate over it and count 1 for each character.
all(...)
tells us whether all of the elements in the iteration are true -- are all the g groups longer than 1 element?

Using regular expressions can be a little hard to read and understand, but is more efficient. This is an implementation that searches for instances of the letter g and then looks for a preceding or following g. If not found, then return unhappy result:
s = input("Enter string: ")
happy = True
for ctr, letter in enumerate(s):
if letter == "g":
if ctr > 0 and s[ctr - 1] == "g":
pass
elif ctr < (len(s) - 1) and s[ctr + 1] == "g":
pass
else:
# unhappy
happy = False
break
print(s, happy)

Here's a way of doing it by checking the index before and after every "g" in a word.
my_inputs = ["xggt", "abgsx", "xxggygxx"]
for _input in my_inputs:
state = False
for i, char in enumerate(_input):
if char == "g":
if i > 0 and _input[i-1] == "g" or i < len(_input)-1 and _input[i+1] == "g":
state = True
else:
state = False
print(f"string: {_input}\noutput: {state}\n")
output:
string: xggt
output: True
string: abgsx
output: False
string: xxggygxx
output: False

Remove string character after run of n characters in string

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row. For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'. I have written this crude function that compiles without errors but doesn't quite get me what I want:
def strcut(string, n):
for i in range(len(string)):
for j in range(n):
if i + j < len(string)-(n-1):
if string[i] == string[i+j]:
beg = string[:i]
ends = string[i+1:]
string = beg + ends
print(string)
These are the outputs for strcut('aaabccdddd', n):
n
output
expected
1
'abcdd'
'abcd'
2
'acdd'
'aabccdd'
3
'acddd'
'aaabccddd'
I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?

This may not answer why your code does not work, but here's an alternate solution using regex:
import re
def strcut(string, n):
return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)
How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"
(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times
The replacement string is the first capture group repeated n times
For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" + "aaa" = "aaaa". That is replaced with "aaa".
regex101 can explain it better than me.

you can implement a stack data structure.
Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1
# your code goes here
def func(string, n):
stack = []
counter = None
for i in string:
if not stack:
counter = 1
stack.append(i)
elif stack[-1]==i:
if counter+1<=n:
stack.append(i)
counter+=1
elif stack[-1]!=i:
stack.append(i)
counter = 1
return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')
output
True
True
True
True

The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :
def strcut(string,n) :
new_string = ""
first_c, s = string[0], 0
for c in string :
if c != first_c :
first_c, s= c, 0
s += 1
if s > n : continue
else : new_string += c
return new_string
print(strcut("aabcaaabbba",2)) # output : #aabcaabba

Simply, to anwer the question
appears in the string more than n times in a row
the following code is small and simple, and will work fine :-)
def strcut(string: str, n: int) -> str:
tmp = "*" * (n+1)
for char in string:
if tmp[len(tmp) - n:] != char * n:
tmp += char
print(tmp[n+1:])
strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)
Output:
abcd
aabccdd
aaabccddd
Notes:
The character "*" in the line tmp = "*"*n+string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.
The print(tmp[n:]) line simply removes the "*" characters added in the beginning.

You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.
def strcut(s,n):
result = '' # resulting string
char,count = '',0 # initial character and count
for c in s: # only loop once on the characters
if c == char: count += 1 # increase count
else: char,count = c,1 # reset character/count
if count<=n: result += c # include character if count is ok
return result

Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.
def strcut(string, n):
for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
if string.count(string[i]) > n:
string = remove(string,i)
print(string)
def remove(string, i):
if i > len(string):
return string[:i]
return string[:i] + string[i+1:]
strcut('aaabccdddd',2)

How to make a for loop iterate through each item in a string with an if statement?

I'm trying to make a function that takes in a string from a user and then outputs the same string. However for each letter in an even position it outputs the corresponding lower case letter, and for each letter in an odd position it outputs the corresponding uppercase letter. Keep in mind only one word will be passed through it at a time.
I've tried to create a for loop with an if statement nested within it, but so far, the for loop stops after iterating through the first letter. My code is below:
def converter(string):
for letters in string:
if len(letters) % 2 == 0:
return letters.lower()
elif len(letters)% 2 != 0:
return letters.upper()
When I run the code:
converter('app')
The output I get is 'A'
The expected output should be 'aPp'

The first thing you need to know is that in Python, strings are immutable. So "modifying" a string means you have to build a new string from scratch in (here, I call that newstring).
Second, you are misunderstanding the loop. You are saying for letters in string. This loop iterates over each letter of the string. On the first iteration, letters is the first letter of the strong. You then convert it to upper case (since the length of a single letter is always 1), and return it. You aren't reaching the rest of the letters! In the code below, I change the plurality to just letter to make this idea clear.
This amends all of those problems:
def converter(string):
newstring = ""
for i, letter in enumerate(string):
if i % 2 == 0:
newstring += letter.lower()
elif i % 2 != 0:
newstring += letter.upper()
return newstring
This can be boiled down to a nice list comprehension:
def converter(string):
return "".join([letter.lower() if i % 2 == 0 else letter.upper()
for i, letter in enumerate(string)])

In [1]: def converter(string):
...: return ''.join([j.upper() if i % 2 == 1 else j.lower() for i, j in enumerate(string)])
In [2]: converter('apple')
Out[2]: 'aPpLe'

''.join([s.lower() if c % 2 == 0 else s.upper() for c, s in enumerate('apple')])
# returns 'aPpLe'
first check for the condition, then iterate through the string using the nice old enumerate built-in.

Count vowels from raw input

I have a homework question which asks to read a string through raw input and count how many vowels are in the string. This is what I have so far but I have encountered a problem:
def vowels():
vowels = ["a","e","i","o","u"]
count = 0
string = raw_input ("Enter a string: ")
for i in range(0, len(string)):
if string[i] == vowels[i]:
count = count+1
print count
vowels()
It counts the vowels fine, but due to if string[i] == vowels[i]:, it will only count one vowel once as i keeps increasing in the range. How can I change this code to check the inputted string for vowels without encountering this problem?

in operator
You probably want to use the in operator instead of the == operator - the in operator lets you check to see if a particular item is in a sequence/set.
1 in [1,2,3] # True
1 in [2,3,4] # False
'a' in ['a','e','i','o','u'] # True
'a' in 'aeiou' # Also True
Some other comments:
Sets
The in operator is most efficient when used with a set, which is a data type specifically designed to be quick for "is item X part of this set of items" kind of operations.*
vowels = set(['a','e','i','o','u'])
*dicts are also efficient with in, which checks to see if a key exists in the dict.
Iterating on strings
A string is a sequence type in Python, which means that you don't need to go to all of the effort of getting the length and then using indices - you can just iterate over the string and you'll get each character in turn:
E.g.:
for character in my_string:
if character in vowels:
# ...
Initializing a set with a string
Above, you may have noticed that creating a set with pre-set values (at least in Python 2.x) involves using a list. This is because the set() type constructor takes a sequence of items. You may also notice that in the previous section, I mentioned that strings are sequences in Python - sequences of characters.
What this means is that if you want a set of characters, you can actually just pass a string of those characters to the set() constructor - you don't need to have a list one single-character strings. In other words, the following two lines are equivalent:
set_from_string = set('aeiou')
set_from_list = set(['a','e','i','o','u'])
Neat, huh? :) Do note, however, that this can also bite you if you're trying to make a set of strings, rather than a set of characters. For instance, the following two lines are not the same:
set_with_one_string = set(['cat'])
set_with_three_characters = set('cat')
The former is a set with one element:
'cat' in set_with_one_string # True
'c' in set_with_one_string # False
Whereas the latter is a set with three elements (each one a character):
'c' in set_with_three_characters` # True
'cat' in set_with_three_characters # False
Case sensitivity
Comparing characters is case sensitive. 'a' == 'A' is False, as is 'A' in 'aeiou'. To get around this, you can transform your input to match the case of what you're comparing against:
lowercase_string = input_string.lower()

You can simplify this code:
def vowels():
vowels = 'aeiou'
count = 0
string = raw_input ("Enter a string: ")
for i in string:
if i in vowels:
count += 1
print count
Strings are iterable in Python.

for i in range(0, len(string)):
if string[i] == vowels[i]:
This actually has a subtler problem than only counting each vowel once - it actually only tests if the first letter of the string is exactly a, if the second is exactly e and so on.. until you get past the fifth. It will try to test string[5] == vowels[5] - which gives an error.
You don't want to use i to look into vowels, you want a nested loop with a second index that will make sense for vowels - eg,
for i in range(len(string)):
for j in range(len(vowels)):
if string[i] == vowels[j]:
count += 1
This can be simplified further by realising that, in Python, you very rarely want to iterate over the indexes into a sequence - the for loop knows how to iterate over everything that you can do string[0], string[1] and so on, giving:
for s in string:
for v in vowels:
if s == v:
count += 1
The inner loop can be simplified using the in operation on lists - it does exactly the same thing as this code, but it keeps your code's logic at a higher level (what you want to do vs. how to do it):
for s in string:
if s in vowels:
count += 1
Now, it turns out that Python lets do math with booleans (which is what s in vowels gives you) and ints - True behaves as 1, False as 0, so True + True + False is 2. This leads to a one liner using a generator expression and sum:
sum(s in vowels for s in string)
Which reads as 'for every character in string, count how many are in vowels'.

you can use filter for a one liner
print len(filter(lambda ch:ch.lower() in "aeiou","This is a String"))

Here's a more condensed version using sum with a generator:
def vowels():
string = raw_input("Enter a string: ")
print sum(1 for x in string if x.lower() in 'aeiou')
vowels()

Option on a theme
Mystring = "The lazy DOG jumped Over"
Usestring = ""
count=0
for i in Mystring:
if i.lower() in 'aeiou':
count +=1
Usestring +='^'
else:
Usestring +=' '
print (Mystring+'\n'+Usestring)
print ('Vowels =',count)
The lazy DOG jumped Over
^ ^ ^ ^ ^ ^ ^
Vowels = 7

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to efficiently remove single letter words in Python - python

Related

identifying the substring when the number of characters in between don't matter

How to check if all lowercase 'g' in a string has another 'g' adjacent to it?

Remove string character after run of n characters in string

How to make a for loop iterate through each item in a string with an if statement?

Count vowels from raw input

Categories

Resources