Deleting adjacent repeating characters in a string

Deleting adjacent repeating characters in a string - python

I want to delete adjacent repeating characters in a string in Python 3. For ex if the input is AABBC the output is ABC or if the input is AAABBBCC the output is ABC. I made two attempts to solve this problem.
Attempt #1
string = input()
for i in range (len(string) - 1):
if string[i] == string[i+1]:
string.replace(string[i],"")
print(string)
The above code returns the same string that is entered. If I enter AABBC it simply returns the same string. Not knowing what I was doing wrong, I tried another attempt.
Attempt #2
string = input()
new = []
for i in string:
new.append(i)
for i in range (len(new) - 3):
"""in the above line, if I set the range to (len(new)-2), it gives me an
error saying "list index out of range"."""
if new[i] == new[i+1]:
new.pop(i)
print(new)
The above code works for double repeating characters, but fails when there are 3 or more repeating characters. If I input AABBC it returns the list ['A','B','C'] which is perfectly fine, but with the input AAABBCC it returns ['A', 'A', 'B', 'C'].

Using Regex:
import re
s = ["AABBC", "AAABBBCC"]
for i in s:
print( re.sub(r"(.)\1+", r"\1", i) )
Or:
s = ["AABBC", "AAABBBCC"]
for i in s:
temp = []
for j in i:
if not temp:
temp.append(j)
else:
if temp[-1] != j:
temp.append(j)
print("".join(temp))
Output:
ABC
ABC

You can use itertools to group the characters like,
>>> import itertools
>>> [x[0] for x in itertools.groupby('AABBCC')]
['A', 'B', 'C']

string = 'AAABBBCC'
result = ''
for letter in string:
if len(result) > 0:
if result[-1] != letter: # [-1] is the last letter
result += letter
else:
result = letter # the first letter will always be included
print(result) # ABC
That is, only append the letter if it is not already at the end of the result string.

An easy to understand short solution :
mot = 'AABBBCC'
a = [mot[0]] + [mot[i] if mot[i]!=mot[i-1] else '' for i in range(1, len(mot))]
>>> ['A', '', 'B', '', '', 'C', '']
result = ''
for lettre in a:
result += lettre
result
>>> 'ABC'
You first create a list of the letters which respects a certain condition, then you convert this list into a string. This algorithm can be used for many different conditions.
Note that you don't need to import any new library.

Related

Need help for p-language deciphering

I don't know if you're familiar with the P-language or if it's something that's just known in my country. Basically, everytime you come across a vowel in a word, you replace the vowel the same vowel + p + the vowel again.
So 'home' would be 'hopomepe' in the p-language.
Now I'm tasked to decipher the p-language and turn sentences that are written in the p-language back to normal.
p = str(input())
for letter in range(1, len(p)):
if p[letter]=='.':
break
if p[letter-1]==p[letter+1] and p[letter]=='p':
p = p[:letter-1] + p[letter+1:]
print(p)
This is my code so far, it works except I don't know how to make it work for double vowel sounds like 'io' in scorpion (scoporpiopion) for example.
Also when a sentence starts with a vowel, this code doesn't work on that vowel.
For example 'Apan epelepephapant' becomes 'Apan elephant' with my code.
And my code crashes with string index out of bounds when it doesn't end on '.' but it crashes everytime when I don't have that if for the '.' case.
TLDR; How can I get change my code so it works for double vowels and at the start of my sentence.
EDIT: To clarify, like in my example, combination vowels should count as 1 vowel. Scorpion would be Scoporpiopion instead of Scoporpipiopon, boat would be boapoat, boot would be boopoot, ...

You can do it using regular expressions:
import re
def decodePLanguage(p):
return re.subn(r'([aeiou]+)p\1', r'\1', p, flags=re.IGNORECASE)[0]
In [1]: decodePLanguage('Apan epelepephapant')
Out[1]: 'An elephant'
In [2]: decodePLanguage('scoporpiopion')
Out[2]: 'scorpion'
This uses re.subn function to replace all regex matches.
In r'([aeiou]+)p\1', the [aeiou]+ part matches several vowels in a row, and \1 ensures you have the same combination after a p.
Then r'\1' is used to replace the whole match with the first vowel group.

EDIT: working code
def decipher(p):
result = ''
while len(p) > 2:
# first strip out all the consecutive consonants each iteration
idx = 0
while p[idx].lower() not in 'aeiou' and idx < len(p) - 2:
idx += 1
result += p[:idx]
p = p[idx:]
# if there is any string remaining to process, that starts with a vowel
if len(p) > 2:
idx = 0
# scan forward until 'p'
while p[idx].lower() != 'p':
idx += 1
# sanity check
if len(p) < (idx*2 + 1) or p[:idx].lower() != p[idx+1:2*idx+1].lower():
raise ValueError
result += p[:idx]
p = p[2*idx+1:]
result += p
return result
In your example input 'Apan epelepephapant', you compare 'A' == 'a' and get False. It seems you want to compare 'a' == 'a', that is, the str.lower() of each.
It also seems you don't check if the character before the p and after the p is a vowel; that is, if you come across the string hph, as written, your function deciphers it to simply h.
Earlier version of code below:
def decipher(p):
while len(p) > 2:
if p[0].lower() in 'aeiou' and p[0].lower() == p[2].lower() and p[1] == 'p':
result += p[0]
p = p[3:]
else:
result += p[0]
p = p[1:]
result += p
return result
called as e.g.
p = str(input())
print(decipher(p))

Since #Kolmar already gave a regex solution, I'm going to add one without regex
To help think through this, I'm going to first show you my solution to encode a regular string into p-language. In this approach, I group the characters in the string by whether or not they are vowels using itertools.groupby(). This function groups consecutive elements having the same key in the same group.
def p_encode(s):
vowels = {'a', 'e', 'i', 'o', 'u'}
s_groups = [(k, list(v)) for k, v in itertools.groupby(s, lambda c: c.lower() in vowels)]
# For scorpion, this will look like this:
# [(False, ['s', 'c']),
# (True, ['o']),
# (False, ['r', 'p']),
# (True, ['i', 'o']),
# (False, ['n'])]
p_output = []
# Now, we go over each group and do the encoding for the vowels.
for is_vowel_group, group_chars in s_groups:
p_output.extend(group_chars) # Add these chars to the output
if is_vowel_group: # Special treatment for vowel groups
p_output.append("p")
p_output.extend(c.lower() for c in group_chars)
return "".join(p_output)
I added a list comprehension to define s_groups to show you how it works. You can skip the list comprehension and directly iterate for is_vowel_group, group_chars in itertools.groupby(s, lambda c: c.lower() in vowels)
Now, to decode this, we can do something similar in reverse, but this time manually do the grouping because we need to process ps differently when they're in the middle of a vowel group.
I suggest you don't modify the string as you're iterating over it. At best, you'll have written some code that's hard to understand. At worst, you'll have bugs because the loop will try to iterate over more indices than actually exist.
Also, you iterate over 1..len(p), and then try to access p[i+1]. In the last iteration this will throw an IndexError. And because you want repeated vowels to count as a single group, this doesn't work. You're going to have to group the vowels and non-vowels separately, and then join them into a single string.
def p_decode(p):
vowels = {'a', 'e', 'i', 'o', 'u'}
p_groups = []
current_group = None
for c in p:
if current_group is not None:
# If the 'vowelness' of the current group is the same as this character
# or ( the current group is a vowel group
# and the current character is a 'p'
# and the current group doesn't contain a 'p' already )
if (c.lower() in vowels) == current_group[0] or \
( current_group[0] and
c.lower() == 'p' and
'p' not in current_group[1]):
current_group[1].append(c) # Add c to the current group
else:
current_group = None # Reset the current group to None so you can make it later
if current_group is None:
current_group = (c.lower() in vowels, [c]) # Make the current group
p_groups.append(current_group) # Append it to the list
# For scorpion => scoporpiopion
# p_groups looks like this:
# [(False, ['s', 'c']),
# (True, ['o', 'p', 'o']),
# (False, ['r', 'p']),
# (True, ['i', 'o', 'p', 'i', 'o']),
# (False, ['n'])]
p_output = []
for is_vowel_group, group_chars in p_groups:
if is_vowel_group:
h1 = group_chars[:len(group_chars)//2] # First half of the group
h2 = group_chars[-len(group_chars)//2+1:] # Second half of the group, excluding the p
# Add the first half to the output
p_output.extend(h1)
if h1 != h2:
# The second half of this group is not repeated characters
# so something in the input was wrong!
raise ValueError(f"Invalid input '{p}' to p_decode(): vowels before and after 'p' are not the same in group '{''.join(group_chars)}'")
else:
# Add all chars in non-vowel groups to the output
p_output.extend(group_chars)
return "".join(p_output)
And now, we have:
words = ["An elephant", "scorpion", "boat", "boot", "Hello World", "stupid"]
for w in words:
p = p_encode(w)
d = p_decode(p)
print(w, p, d, sep=" | ")
Which gives (prettification mine):
Word
Encoded
Decoded
An elephant
Apan epelepephapant
An elephant
scorpion
scoporpiopion
scorpion
boat
boapoat
boat
boot
boopoot
boot
Hello World
Hepellopo Woporld
Hello World
stupid
stupupipid
stupid
Also, words that aren't actually encoded correctly (such as "stupid") throw a ValueError
>>> p_decode("stupid")
ValueError: Invalid input 'stupid' to p_decode(): vowels before and after 'p' are not the same in group 'upi'

multiplying letter of string by digits of number

I want to multiply letter of string by digits of number. For example for a word "number" and number "123"
output would be "nuummmbeerrr". How do I create a function that does this? My code is not usefull, because it doesn't work.
I have only this
def new_word(s):
b=""
for i in range(len(s)):
if i % 2 == 0:
b = b + s[i] * int(s[i+1])
return b
for new_word('a3n5z1') output is aaannnnnz .

Using list comprehension and without itertools:
number = 123
word = "number"
new_word = "".join([character*n for (n, character) in zip(([int(c) for c in str(number)]*len(str(number)))[0:len(word)], word)])
print(new_word)
# > 'nuummmbeerrr'
What it does (with more details) is the following:
number = 123
word = "number"
# the first trick is to link each character in the word to the number that we want
# for this, we multiply the number as a string and split it so that we get a list...
# ... with length equal to the length of the word
numbers_to_characters = ([int(c) for c in str(number)]*len(str(number)))[0:len(word)]
print(numbers_to_characters)
# > [1, 2, 3, 1, 2, 3]
# then, we initialize an empty list to contain the repeated characters of the new word
repeated_characters_as_list = []
# we loop over each number in numbers_to_letters and each character in the word
for (n, character) in zip(numbers_to_characters, word):
repeated_characters_as_list.append(character*n)
print(repeated_characters_as_list)
# > ['n', 'uu', 'mmm', 'b', 'ee', 'rrr']
new_word = "".join(repeated_characters_as_list)
print(new_word)
# > 'nuummmbeerrr'

This will solve your issue, feel free to modify it to fit your needs.
from itertools import cycle
numbers = cycle("123")
word = "number"
output = []
for letter in word:
output += [letter for _ in range(int(next(numbers)))]
string_output = ''.join(output)
EDIT:
Since you're a beginner This will be easier to understand for you, even though I suggest reading up on the itertools module since its the right tool for this kind of stuff.
number = "123"
word = "number"
output = []
i = 0
for letter in word:
if(i == len(number)):
i = 0
output += [letter for _ in range(int(number[i]))]
i += 1
string_output = ''.join(output)
print(string_output)

you can use zip to match each digit to its respective char in the word (using itertools.cycle for the case the word is longer), then just multiply the char by that digit, and finally join to a single string.
try this:
from itertools import cycle
word = "number"
number = 123
number_digits = [int(d) for d in str(number)]
result = "".join(letter*num for letter,num in zip(word,cycle(number_digits)))
print(result)
Output:
nuummmbeerrr

Python: Case Sensitive Counter

I'm making a small script that needs to take any string as an argument and determine if there are any duplicates (letters or numbers). However, it needs to treat uppercase and lowercase as different entities. What I got so far is:
import collections
string = str(input('Enter Sequence: '))
x = list(string)
a = [item for item, count in collections.Counter(x).items() if count > 1]
if len(a) == 0:
return True
else:
return False
This yields correct results only if there is no both upper and lower case instances of the same letter, so it wont work if I enter i.e. 'moOse'
If anyone can help on how to make different case letters count separately, I'd appreciate it.
Thank you

Unable to replicate your problem - Counter is case sensitive.
Demo:
import collections
for s in ["aa","aAbBcC"]:
x = list(s)
a = [item for item, count in collections.Counter(x).items() if count > 1]
if len(a) == 0:
print(x, "has no dupes")
else:
print(x, "has dupes:", a)
Output:
['a', 'a'] has dupes: ['a']
['a', 'A', 'b', 'B', 'c', 'C'] has no dupes
No need to use/import Counter to test if you only have unique elements. Compare len(set(data)) against len(data):
def is_unique(d):
return len(set(d)) == len(d)
for d in ["qwertzui4567QWERTZUI","AA"]:
print(f"{d} :" ,'is Unique' if is_unique(d) else 'has Duplicates')
Output:
qwertzui4567QWERTZUI : is Unique
AA : has Duplicates

How to write my own split function without using .split and .strip function?

How to write my own split function? I just think I should remove spaces, '\t' and '\n'. But because of the shortage of knowledge, I have no idea of doing this question
Here is the original question:
Write a function split(string) that returns a list of words in the
given string. Words may be separated by one or more spaces ' ' , tabs
'\t' or newline characters '\n' .
And there are examples:
words = split('duff_beer 4.00') # ['duff_beer', '4.00']
words = split('a b c\n') # ['a', 'b', 'c']
words = split('\tx y \n z ') # ['x', 'y', 'z']
Restrictions: Don't use the str.split method! Don't use the str.strip method

Some of the comments on your question provide really interesting ideas to solve the problem with the given restrictions.
But assuming you should not use any python builtin split function, here is another solution:
def split(string, delimiters=' \t\n'):
result = []
word = ''
for c in string:
if c not in delimiters:
word += c
elif word:
result.append(word)
word = ''
if word:
result.append(word)
return result
Example output:
>>> split('duff_beer 4.00')
['duff_beer', '4.00']
>>> split('a b c\n')
['a', 'b', 'c']
>>> split('\tx y \n z ')
['x', 'y', 'z']

I think using regular expressions is your best option as well.
I would try something like this:
import re
def split(string):
return re.findall('\S+',string)
This should return a list of all none whitespace characters in your string.
Example output:
>>> split('duff_beer 4.00')
['duff_beer', '4.00']
>>> split('a b c\n')
['a', 'b', 'c']
>>> split('\tx y \n z ')
['x', 'y', 'z']

This is what you can do with assigning a list, This is tested on python3.6
Below is Just an example..
values = 'This is a sentence'
split_values = []
tmp = ''
for words in values:
if words == ' ':
split_values.append(tmp)
tmp = ''
else:
tmp += words
if tmp:
split_values.append(tmp)
print(split_values)
Desired output:
$ ./splt.py
['This', 'is', 'a', 'sentence']

You can use the following function that sticks to the basics, as your professor apparently prefers:
def split(s):
output = []
delimiters = {' ', '\t', '\n'}
delimiter_found = False
for c in s:
if c in delimiters:
delimiter_found = True
elif output:
if delimiter_found:
output.append('')
delimiter_found = False
output[-1] += c
else:
output.append(c)
return output
so that:
print(split('duff_beer 4.00'))
print(split('a b c\n'))
print(split('\tx y \n z '))
would output:
['duff_beer', '4.00']
['a', 'b', 'c']
['x', 'y', 'z']

One approach would be to iterate over every char until you find a seperator, built a string from that chars and append it to the outputlist like this:
def split(input_str):
out_list = []
word = ""
for c in input_str:
if c not in ("\t\n "):
word += c
else:
out_list.append(word)
word = ""
out_list.append(word)
return out_list
a = "please\nsplit\tme now"
print(split(a))
# will print: ['please', 'split', 'me', 'now']
Another thing you could do is by using regex:
import re
def split(input_str):
out_list = []
for m in re.finditer('\S+', input_str):
out_list.append(m.group(0))
return out_list
a = "please\nsplit\tme now"
print(split(a))
# will print: ['please', 'split', 'me', 'now']
The regex \S+ is looking for any sequence of non whitespace characters and the function re.finditer returns an iterator with MatchObject instances over all non-overlapping matches for the regex pattern.

Please find my solution, it is not the best one, but it works:
def convert_list_to_string(b):
localstring=""
for i in b:
localstring+=i
return localstring
def convert_string_to_list(b):
locallist=[]
for i in b:
locallist.append(i)
return locallist
def mysplit(inputString, separator):
listFromInputString=convert_string_to_list(inputString)
part=[]
result=[]
j=0
for i in range(0, len(listFromInputString)):
if listFromInputString[i]==separator:
part=listFromInputString[j:i]
j=i+1
result.append(convert_to_string(part))
else:
pass
if j != 0:
result.append(convert_to_string(listFromInputString[j:]))
if len(result)==0:
result.append(inputString)
return result
Test:
mysplit("deesdfedefddfssd", 'd')
Result: ['', 'ees', 'fe', 'ef', '', 'fss', '']

Some of your solutions are very good, but it seems to me that there are more alternative options than using the function:
values = 'This is a sentence'
split_values = []
tmp = ''
for words in values:
if words == ' ':
split_values.append(tmp)
tmp = ''
else:
tmp += words
if tmp:
split_values.append(tmp)
print(split_values)

a is string and s is pattern here.
a="Tapas Pall Tapas TPal TapP al Pala"
s="Tapas"
def fun(a,s):
st=""
l=len(s)
li=[]
lii=[]
for i in range(0,len(a)):
if a[i:i+l]!=s:
st=st+a[i]
elif i+l>len(a):
st=st+a[i]
else:
li.append(st)
i=i+l
st=""
li.append(st)
lii.append(li[0])
for i in li[1:]:
lii.append(i[l-1:])
return lii
print(fun(a,s))
print(a.split(s))

This handles for whitespaces in strings and returns empty lists if present
def mysplit(strng):
#
# put your code here
#
result = []
words = ''
for char in strng:
if char != ' ':
words += char
else:
if words:
result.append(words)
words = ''
result.append(words)
for item in result:
if item == '':
result.remove(item)
return result
print(mysplit("To be or not to be, that is the question"))
print(mysplit("To be or not to be,that is the question"))
print(mysplit(" "))
print(mysplit(" abc "))
print(mysplit(""))

def mysplit(strng):
my_string = ''
liste = []
for x in range(len(strng)):
my_string += "".join(strng[x])
if strng[x] == ' ' or x+1 == len(strng):
liste.append(my_string.strip())
my_string = ''
liste = [elem for elem in liste if elem!='']
return liste

It is always a good idea to provide algorithm before coding:
This is the procedure for splitting words on delimiters without using any python built in method or function:
Initialize an empty list [] called result which will be used to save the resulting list of words, and an empty string called word = "" which will be used to concatenate each block of string.
Keep adding string characters as long as the delimiter is not reached
When you reach the delimiter, and len(word) = 0, Don't do whatever is below. Just go to the next iteration. This will help detecting and removing leading spaces.
When you reach the delimiter, and len(word) != 0, Append word to result, reinitialize word and jump to the next iteration without doing whatever is below
Return result
def my_split(s, delimiter = [" ","\t"]):
result,word = [], "" # Step 0
N = len(s)
for i in range(N) : #
if N == 0:# Case of empty string
return result
else: # Non empty string
if s[i] in delimiter and len(word) == 0: # Step 2
continue # Step 2: Skip, jump to the next iteration
if s[i] in delimiter and len(word) != 0: # Step 3
result.append(word) # Step 3
word = "" # Step 3
continue # Step 3: Skip, jump to the next iteration
word = word + s[i] # Step 1.
return result
print(my_split(" how are you? please split me now! "))

All the above answers are good, there is a similar solution with an extra empty list.
def my_split(s):
l1 = []
l2 = []
word = ''
spaces = ['', '\t', ' ']
for letters in s:
if letters != ' ':
word += letters
else:
l1.append(word)
word = ''
if word:
l1.append(word)
for words in l1:
if words not in spaces:
l2.append(words)
return l2
my_string = ' The old fox jumps into the deep river'
y = my_split(my_string)
print(y)

Using a For Loop to Change Words in Strings to List Items

I am trying to use a for loop to find every word in a string that contains exactly one letter e.
My guess is that I need to use a for loop to first separate each word in the string into its own list (for example, this is a string into ['this'], ['is'], ['a'], ['string'])
Then, I can use another For Loop to check each word/list.
My string is stored in the variable joke.
I'm having trouble structuring my For Loop to make each word into its own list. Any suggestions?
j2 = []
for s in joke:
if s[0] in j2:
j2[s[0]] = joke.split()
else:
j2[s[0]] = s[0]
print(j2)

This is a classic case for list comprehensions. To generate a list of words containing exactly one letter 'e', you would use the following source.
words = [w for w in joke.split() if w.count('e') == 1]

For finding words with exactly one letter 'e', use regex
import re
mywords = re.match("(\s)*[e](\s)*", 'this is your e string e')
print(mywords)

I would use Counter:
from collections import Counter
joke = "A string with some words they contain letters"
j2 = []
for w in joke.split():
d = Counter(w)
if 'e' in d.keys():
if d['e'] == 1:
j2.append(w)
print(j2)
This results in:
['some', 'they']

A different way to do it using numpy which is all against for:
s = 'Where is my chocolate pizza'
s_np = np.array(s.split())
result = s_np[np.core.defchararray.count(s_np, 'e').astype(bool)]

This is one way:
mystr = 'this is a test string'
[i for i in mystr.split() if sum(k=='e' for k in i) == 1]
# test
If you need an explicit loop:
result = []
for i in mystr:
if sum(k=='e' for k in i) == 1:
result.append(i)

sentence = "The cow jumped over the moon."
new_str = sentence.split()
count = 0
for i in new_str:
if 'e' in i:
count+=1
print(i)
print(count)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deleting adjacent repeating characters in a string - python

Using Regex: import re s = ["AABBC", "AAABBBCC"] for i in s: print( re.sub(r"(.)\1+", r"\1", i) ) Or: s = ["AABBC", "AAABBBCC"] for i in s: temp = [] for j in i: if not temp: temp.append(j) else: if temp[-1] != j: temp.append(j) print("".join(temp)) Output: ABC ABC

You can use itertools to group the characters like, >>> import itertools >>> [x[0] for x in itertools.groupby('AABBCC')] ['A', 'B', 'C']

Related

Need help for p-language deciphering

multiplying letter of string by digits of number

Python: Case Sensitive Counter

How to write my own split function without using .split and .strip function?

Using a For Loop to Change Words in Strings to List Items

Categories

Resources