Python: how to replace a substring with a number of its occurences?

Python: how to replace a substring with a number of its occurences? - python

Let's say I have a string presented in the following fashion:
st = 'abbbccccaaaAAbccc'
The task is to encode it so that single characters are followed by a number of their occurences:
st = 'a1b3c4a3A2b1c3'
I know one possible solution but it's too bulky and primitive.
s = str(input())
l = len(s)-1
c = 1
t = ''
if len(s)==1:
t = t +s+str(c)
else:
for i in range(0,l):
if s[i]==s[i+1]:
c +=1
elif s[i]!=s[i+1]:
t = t + s[i]+str(c)
c = 1
for j in range(l,l+1):
if s[-1]==s[-2]:
t = t +s[j]+str(c)
elif s[-1]!=s[-2]:
t = t +s[j]+str(c)
c = 1
print(t)
Is there any way to solve this shortly and elegantly?
P.S: I'm an unexperienced Python user and a new StackOverflow member, so I beg my pardon if the question is asked incorrectly.

Take advantage of the standard library:
from itertools import groupby
st = "abbbccccaaaAAbccc"
print("".join("{}{}".format(key, len(list(group))) for key, group in groupby(st)))
Output:
a1b3c4a3A2b1c3
>>>

just loop through and count. There are more graceful snippets but this will get the job done and is clear:
count = 1
char = st[0]
new_st = []
for c in st[1:]:
if c == char:
count += 1
else:
new_st.append(char + str(count))
char = c
count = 1
new_st.append(char + str(count))
s2= "".join(new_st)
print(s2) # 'a1b3c4a3A2b1c3'
If you want a fancy recursive solution:
def compose(s):
if not s:
return ""
count = 1
for char in s[1:]:
if s[0] != char:
break
count += 1
return s[0] + str(count) + compose(s[count:])

Related

How can i get count of irregular repeating characters?

Input is xyz = 'aaabbbaaa', I want output as 3a3b3a
xyz = 'aaabbbaaa'
p = xyz[0]
i = 0
out = {}
while i < len(xyz):
if p == xyz[i]:
if xyz[i] not in out:
out[xyz[i]] = []
out[xyz[i]].append(xyz[i])
else:
p = xyz[i]
i += 1
print(out)
Help me, How can i achieve this??

This is likely the simplest method and easiest to understand.
Create a tally variable and increment it when you see repeating characters, then when you see a non repeating character write the previous character and the tally to a string and start the tally back to 1.... repeat until string ends
xyz = 'aaabbbaaa'
tally = 1
string = ''
prev = xyz[0]
for char in xyz[1:]:
if char == prev:
tally += 1
else:
string += str(tally) + prev
prev = char
tally = 1
string += str(tally) + prev
print(string) # 3a3b3a

what result do you expect to get if the string has single characters? suppose we should just skip a single character:
from re import sub
s = 'aabbbcaaabc'
sub(r'(\w)\1*',lambda m: f"{l if (l:=len(m[0]))>1 else ''}{m[1]}",s)
>>>
# '2a3bc3abc'

Longest substring without repeating characters in python

This is a pretty standard interview question. Find the longest substring without repeating characters. Here are the test cases,
abcabcbb -> 3
bbbbb -> 1
pwwkew -> 3
bpfbhmipx -> 7
tmmzuxt -> 5
Here's my code which uses a pretty simple approach with 2 pointers.
def lengthOfLongestSubstring(s):
checklist = {}
starting_index_of_current_substring = 0
length_of_longest_substring = 0
for i, v in enumerate(s):
if v in checklist:
starting_index_of_current_substring = checklist[v] + 1
else:
length_of_current_substring = i - starting_index_of_current_substring + 1
length_of_longest_substring = max(length_of_current_substring, length_of_longest_substring)
checklist[v] = i
return length_of_longest_substring
My code passes all the test cases except the last one (actual 4, expected 5). Can someone help me modify the code to take care of the last test case. I don't wish to reinvent the algorithm.

Here is a simple tweak in your code with 2 pointers to find the longest sub-string without repeating characters.
Change in your code is instead of calculating the length of longest substring when v is not present in checklist, I am calculating length of longest substring for all cases.
def lengthOfLongestSubstring(s):
checklist = {}
starting_index_of_current_substring = 0
length_of_longest_substring = 0
for i, v in enumerate(s):
if v in checklist:
starting_index_of_current_substring = max(starting_index_of_current_substring, checklist[v] + 1)
checklist[v] = i
length_of_longest_substring = max(length_of_longest_substring, i - starting_index_of_current_substring + 1)
return length_of_longest_substring
## Main
result = {}
for string in ['abcabcbb', 'bbbbb', 'ppwwkew', 'wcabcdeghi', 'bpfbhmipx', 'tmmzuxt', 'AABGAKGIMN', 'stackoverflow']:
result[string] = lengthOfLongestSubstring(string)
print(result)
Sample run:
{'abcabcbb': 3, 'bbbbb': 1, 'ppwwkew': 3, 'wcabcdeghi': 8, 'bpfbhmipx': 7, 'tmmzuxt': 5, 'AABGAKGIMN': 6, 'stackoverflow': 11}

This post is pretty old, but I think my solution fixes the bug in the original code.
def lengthOfLongestSubstring(s):
checklist = {}
starting_index_of_current_substring = 0
length_of_longest_substring = 0
for i, v in enumerate(s):
if v in checklist:
if checklist[v] >= starting_index_of_current_substring:
starting_index_of_current_substring = checklist[v] + 1
length_of_current_substring = i - starting_index_of_current_substring + 1
length_of_longest_substring = max(length_of_current_substring, length_of_longest_substring)
checklist[v] = i
return length_of_longest_substring

This doesnt really iterate upon your solution, but it's a bit simpler approach, just to give you an idea how it could be also solved.
def longest_substr(s):
longest = 0
for start_index in range(len(s)):
contains = set()
for letter in s[start_index:]:
if letter in contains:
break
contains.add(letter)
longest = max(longest, len(contains))
return longest

0
I would prefer this solution>>
Time and Space Management Optimised:
def lengthOfLongestSubstring(self, s: str) -> int:
curlen = maxlen = 0 # curlen saves the max len of substrings ending with current num
for i, num in enumerate(s):
curlen -= s[i-curlen:i].find(num)
maxlen = max(maxlen, curlen)
return maxlen

Find Longest Substring in the string without repeating characters.
def find_non_repeating_substring(input_str):
output_length = 0
longest_sub_str = ''
len_str = len(input_str)
index = 0
while len_str != 1:
l_str = ''
for i in range(index, len(input_str)):
if input_str[i] not in l_str:
l_str = l_str + input_str[i]
else:
break
sub_str_length = len(l_str)
if sub_str_length > output_length:
output_length = sub_str_length
longest_sub_str = l_str
len_str = len_str -1
index = index + 1
return output_length,longest_sub_str
if __name__ == '__main__':
input_str = raw_input("Please enter the string: ")
sub_str_length, sub_str = find_non_repeating_substring(input_str)
print ('Longest Substing lenght is "{0}" and the sub string is "{1}"'.format(sub_str_length, sub_str))```

Python: Letters to numbers not working correctly

So I have this project for school and I am so close to finishing it but there is one that I just cant seem to get to work properly. One of the functions I have is:
vowels = "aeiou"
consonants = "bcdfghjklmnpqrstvwyz"
def alphapinDecode(tone):
s = tone.lower()
pin = ''
for ch in s:
if ch in consonants:
idx = consonants.find(ch)
elif ch in vowels:
idx2 = vowels.find(ch)
pin = str(pin) + str(idx*5 + idx2)
print(pin)
return None
For the most part the function runs exactly how I want it to. I take a string and it returns numbers as a string.
For example:
>>> alphapinDecode('bomelela')
3464140
But when I do this one:
>>>> alphapinDecode('bomeluco')
it returns 346448 instead of the 3464408 like it is supposed to do (according to my assignment). Now I know the function is giving me the correct answer based on the code, but what am I missing to have it include the 0 before the 8?
EDIT:
Function is supposed to take the string that you pass(tone) and break it up into 2 letter chunks(vowel/consonant pair). With the pair, it is supposed to use the pair and index them with vowels/consonants and return a number. >>>alphapinDecode('hi') returns 27 because consonants[h] gives idx = 5 while vowels[i] gives idx2 = 2

I think your lecture trying to test students coding adaptability.
If really want achieve some output like that please try like below
vowels = "aeiou"
consonants = "bcdfghjklmnpqrstvwyz"
def alphapinDecode(tone):
s = tone.lower()
pin = ''
for ch in s:
if ch in consonants:
idx = consonants.find(ch)
elif ch in vowels:
idx2 = vowels.find(ch)
num = '%02d' % int((idx*5) + idx2) #python 2
num = "{0:0=2d}".format((idx*5) + idx2) #python 3 more verbose
pin = pin + str(num)
print(int(pin))
return None
alphapinDecode('bomeluco') # 3464408
alphapinDecode('bomelela') # 3464140

Your approach is perhaps awkward - I would iterate two characters at a time:
def alphapinDecode(tone):
s = tone.lower()
pin = ''
# Step over the string two characters at a time
for i in range(0, len(s), 2):
ch1 = s[i]
ch2 = s[i+1]
if ch1 in consonants and ch2 in vowels:
idx1 = consonants.find(ch1)
idx2 = vowels.find(ch2)
this_pair = idx1*5 + idx2
# For debugging
print(this_pair)
pin = pin + str(this_pair)
# We need to print without leading zeroes
print(int(pin))
# Returning the pin as an integer is better, IMO
return int(pin)
OK, now we have the code looking a bit better, we can see, I hope, that for the co pair in your second text, the value is 1*5 + 3, which equals 8, of course, but you really want 08. There's a number of ways to do this, but since you're a beginner I'll illustrate the easiest way.
this_pair = idx1*5 + idx2
if this_pair < 10:
# If the pair is less than ten, prepend a leading zero
this_pair_pin = '0' + str(this_pair)
else
this_pair_pin = str(this_pair)
pin = pin + this_pair_pin
EDIT: Let's forget about accumulating the answer in a string as we can simplify the code:
pin = 0
#...
this_pair = idx1*5 + idx2
pin = pin * 100 + this_pair
print(pin)

Python: Converting word to list of letters, then returning indexes of the letters against lower case alphabet

I have already completed the task but in its most basic form looking for help shortening it and so it can apply to any word not just one with eight letters, here's what I've got so far (bit long for what it does):
alpha = map(chr, range(97, 123))
word = "computer"
word_list = list(word)
one = word[0]
two = word[1]
three = word[2]
four = word[3]
five = word[4]
six = word[5]
seven = word[6]
eight = word[7]
one_index = str(alpha.index(one))
two_index = str(alpha.index(two))
three_index = str(alpha.index(three))
four_index = str(alpha.index(four))
five_index = str(alpha.index(five))
six_index = str(alpha.index(six))
seven_index = str(alpha.index(seven))
eight_index = str(alpha.index(eight))
print (one + "=" + one_index)
print (two + "=" + two_index)
print (three + "=" + three_index)
print (four + "=" + four_index)
print (five + "=" + five_index)
print (six + "=" + six_index)
print (seven + "=" + seven_index)
print (eight + "=" + eight_index)

What you are probably looking for is a for-loop.
Using a for-loop your code could look like this:
word = "computer"
for letter in word:
index = ord(letter)-97
if (index<0) or (index>25):
print ("'{}' is not in the lowercase alphabet.".format(letter))
else:
print ("{}={}".format(letter, str(index+1))) # +1 to make a=1
If you use
for letter in word:
#code
the following code will be executed for every letter in the word (or element in word if word is a list for example).
A good start to learn more about loops is here: https://en.wikibooks.org/wiki/Python_Programming/Loops
You can find tons of ressources in the internet covering this topic.

Use for loop for loop,
alpha = map(chr, range(97, 123))
word = "computer"
for l in word:
print '{} = {}'.format(l,alpha.index(l.lower()))
Result
c = 2
o = 14
m = 12
p = 15
u = 20
t = 19
e = 4
r = 17

Start with a dict that maps each letter to its number.
import string
d = dict((c, ord(c)-ord('a')) for c in string.lowercase)
Then pair each letter of your string to the appropriate index.
result = [(c, d[c]) for c in word]

thanks for the help managed to solve it myself in a different way using a function and a while loop, not as short but will work for all lower case words:
alpha = map(chr, range (97,123))
word = "computer"
count = 0
y = 0
def indexfinder (number):
o = word[number]
i = str(alpha.index(o))
print (o + "=" + i)
while count < len(word):
count = count + 1
indexfinder (y)
y = y+1

Create a compress function in Python?

I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.
Ex:
>>> compress("ddaaaff")
'd2a3f2'
def compress(s):
count=0
for i in range(0,len(s)):
if s[i] == s[i-1]:
count += 1
c = s.count(s[i])
return str(s[i]) + str(c)

Here is a short python implementation of a compression function:
def compress(string):
res = ""
count = 1
#Add in first character
res += string[0]
#Iterate through loop, skipping last one
for i in range(len(string)-1):
if(string[i] == string[i+1]):
count+=1
else:
if(count > 1):
#Ignore if no repeats
res += str(count)
res += string[i+1]
count = 1
#print last one
if(count > 1):
res += str(count)
return res
Here are a few examples:
>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'

Short version with generators:
from itertools import groupby
import re
def compress(string):
return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))
(1) Grouping by chars with groupby(string)
(2) Counting length of group with sum(1 for _ in group) (because no len on group is possible)
(3) Joining into proper format
(4) Removing 1 chars for single items when there is a no digit before and after 1

There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:
def compress(s):
count=0
for i in range(0, len(s)):
print "Checking character", i, s[i]
if s[i] == s[i-1]:
count += 1
c = s.count(s[i])
print "Found", s[i], c, "times"
return str(s[i]) + str(c)
print compress("ddaaaff")
Here's the output:
Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2
Process finished with exit code 0
(1) You throw away the results of all but the last letter's search.
(2) You count all occurrences, not merely the consecutive ones.
(3) You cast a string to a string -- redundant.
Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.

x="mississippi"
res = ""
count = 0
while (len(x) > 0):
count = 1
res= ""
for j in range(1, len(x)):
if x[0]==x[j]:
count= count + 1
else:
res = res + x[j]
print(x[0], count, end=" ")
x=res

Just another simplest way to perform this:
def compress(str1):
output = ''
initial = str1[0]
output = output + initial
count = 1
for item in str1[1:]:
if item == initial:
count = count + 1
else:
if count == 1:
count = ''
output = output + str(count)
count = 1
initial = item
output = output + item
print (output)
Which gives the output as required, examples:
>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o
>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt
>> compress("mississippi")
mis2is2ip2i

from collections import Counter
def string_compression(string):
counter = Counter(string)
result = ''
for k, v in counter.items():
result = result + k + str(v)
print(result)

input = "mississippi"
count = 1
for i in range(1, len(input) + 1):
if i == len(input):
print(input[i - 1] + str(count), end="")
break
else:
if input[i - 1] == input[i]:
count += 1
else:
print(input[i - 1] + str(count), end="")
count = 1
Output : m1i1s2i1s2i1p2i1

s=input("Enter the string:")
temp={}
result=" "
for x in s:
if x in temp:
temp[x]=temp[x]+1
else:
temp[x]=1
for key,value in temp.items():
result+=str(key)+str(value)
print(result)

Here is something I wrote.
def stringCompression(str1):
counter=0
prevChar = str1[0]
str2=""
charChanged = False
loopCounter = 0
for char in str1:
if(char==prevChar):
counter+=1
charChanged = False
else:
str2 += prevChar + str(counter)
counter=1
prevChar = char
if(loopCounter == len(str1) - 1):
str2 += prevChar + str(counter)
charChanged = True
loopCounter+=1
if(not charChanged):
str2+= prevChar + str(counter)
return str2
Not the best code I guess. But works well.
a -> a1
aaabbbccc -> a3b3c3

This is a solution to the problem. But keep in mind that this method only effectively works if there's a lot of repetition, specifically if consecutive characters are repetitive. Otherwise, it will only worsen the situation.
e.g.,
AABCD --> A2B1C1D1
BcDG ---> B1c1D1G1
def compress_string(s):
result = [""] * len(s)
visited = None
index = 0
count = 1
for c in s:
if c == visited:
count += 1
result[index] = f"{c}{count}"
else:
count = 1
index += 1
result[index] = f"{c}{count}"
visited = c
return "".join(result)

You can simply achieve that by:
gstr="aaabbccccdddee"
last=gstr[0]
count=0
rstr=""
for i in gstr:
if i==last:
count=count+1
elif i!=last:
rstr=rstr+last+str(count)
count=1
last=i
rstr=rstr+last+str(count)
print ("Required string for given string {} after conversion is {}.".format(gstr,rstr))

Here is a short python implementation of a compression function:
#d=compress('xxcccdex')
#print(d)
def compress(word):
list1=[]
for i in range(len(word)):
list1.append(word[i].lower())
num=0
dict1={}
for i in range(len(list1)):
if(list1[i] in list(dict1.keys())):
dict1[list1[i]]=dict1[list1[i]]+1
else:
dict1[list1[i]]=1
s=list(dict1.keys())
v=list(dict1.values())
word=''
for i in range(len(s)):
word=word+s[i]+str(v[i])
return word

Below logic will work irrespective of
Data structure
Group By OR Set or any sort of compression logic
Capital or non-capital characters
Character repeat if not sequential
def fstrComp_1(stng):
sRes = ""
cont = 1
for i in range(len(stng)):
if not stng[i] in sRes:
stng = stng.lower()
n = stng.count(stng[i])
if n > 1:
cont = n
sRes += stng[i] + str(cont)
else:
sRes += stng[i]
print(sRes)
fstrComp_1("aB*b?cC&")

I wanted to do it by partitioning the string.
So aabbcc would become: ['aa', 'bb', 'cc']
This is how I did it:
def compression(string):
# Creating a partitioned list
alist = list(string)
master = []
n = len(alist)
for i in range(n):
if alist[i] == alist[i-1]:
master[-1] += alist[i]
else:
master += alist[i]
# Adding the partitions together in a new string
newString = ""
for i in master:
newString += i[0] + str(len(i))
# If the newString is longer than the old string, return old string (you've not
# compressed it in length)
if len(newString) > n:
return string
return newString
string = 'aabbcc'
print(compression(string))

string = 'aabccccd'
output = '2a3b4c4d'
new_string = " "
count = 1
for i in range(len(string)-1):
if string[i] == string[i+1]:
count = count + 1
else:
new_string = new_string + str(count) + string[i]
count = 1
new_string = new_string + str(count) + string[i+1]
print(new_string)

For a coding interview, where it was about the algorithm, and not about my knowledge of Python, its internal representation of data structures, or the time complexity of operations such as string concatenation:
def compress(message: str) -> str:
output = ""
length = 0
previous: str = None
for char in message:
if previous is None or char == previous:
length += 1
else:
output += previous
if length > 1:
output += str(length)
length = 1
previous = char
if previous is not None:
output += previous
if length > 1:
output += str(length)
return output
For code I'd actually use in production, not reinventing any wheels, being more testable, using iterators until the last step for space efficiency, and using join() instead of string concatenation for time efficiency:
from itertools import groupby
from typing import Iterator
def compressed_groups(message: str) -> Iterator[str]:
for char, group in groupby(message):
length = sum(1 for _ in group)
yield char + (str(length) if length > 1 else "")
def compress(message: str) -> str:
return "".join(compressed_groups(message))
Taking things a step further, for even more testability:
from itertools import groupby
from typing import Iterator
from collections import namedtuple
class Segment(namedtuple('Segment', ['char', 'length'])):
def __str__(self) -> str:
return self.char + (str(self.length) if self.length > 1 else "")
def segments(message: str) -> Iterator[Segment]:
for char, group in groupby(message):
yield Segment(char, sum(1 for _ in group))
def compress(message: str) -> str:
return "".join(str(s) for s in segments(message))
Going all-out and providing a Value Object CompressedString:
from itertools import groupby
from typing import Iterator
from collections import namedtuple
class Segment(namedtuple('Segment', ['char', 'length'])):
def __str__(self) -> str:
return self.char + (str(self.length) if self.length > 1 else "")
class CompressedString(str):
#classmethod
def compress(cls, message: str) -> "CompressedString":
return cls("".join(str(s) for s in cls._segments(message)))
#staticmethod
def _segments(message: str) -> Iterator[Segment]:
for char, group in groupby(message):
yield Segment(char, sum(1 for _ in group))
def compress(message: str) -> str:
return CompressedString.compress(message)

def compress(val):
print(len(val))
end=0
count=1
result=""
for i in range(0,len(val)-1):
#print(val[i],val[i+1])
if val[i]==val[i+1]:
count=count+1
#print(count,val[i])
elif val[i]!=val[i+1]:
#print(end,i)
result=result+val[end]+str(count)
end=i+1
count=1
result=result+val[-1]+str(count)
return result
res=compress("I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.")
print(len(res))

Use python's standard library re.
def compress(string):
import re
p=r'(\w+?)\1+' # non greedy, group1 1
sub_str=string
for m in re.finditer(p,string):
num=m[0].count(m[1])
sub_str=re.sub(m[0],f'{m[1]}{num}',sub_str)
return sub_str
string='aaaaaaaabbbbbbbbbcccccccckkkkkkkkkkkppp'
string2='ababcdcd'
string3='abcdabcd'
string4='ababcdabcdefabcdcd'
print(compress(string))
print(compress(string2))
print(compress(string3))
print(compress(string4))
Resut:
a8b9c8k11p3
ab2cd2
abcd2
ab2cdabcdefabcd2

Using generators:
input = "aaaddddffwwqqaattttttteeeeeee"
from itertools import groupby
print(''.join(([char+str(len(list(group))) for char, group in groupby(input)])))

def compress(string):
# taking out unique characters from the string
unique_chars = []
for c in string:
if not c in unique_chars:
unique_chars.append(c)
# Now count the characters
res = ""
for i in range(len(unique_chars)):
count = string.count(unique_chars[i])
res += unique_chars[i]+str(count)
return res
string = 'aabccccd'
compress(string)

from collections import Counter
def char_count(input_str):
my_dict = Counter(input_str)
print(my_dict)
output_str = ""
for i in input_str:
if i not in output_str:
output_str += i
output_str += str(my_dict[i])
return output_str
result = char_count("zddaaaffccc")
print(result)

This is the modification of Patrick Yu's code. It code fails for the below test cases.
SAMPLE INPUT:
c
aaaaaaaaaabcdefgh
EXPECTED OUTPUT:
c1
a10b1c1d1e1f1g1h1
OUPUT OF Patrick's Code:
c
a10bcdefgh
Below is the modified code:
def Compress(S):
Ans = S[0]
count = 1
for i in range(len(S)-1):
if S[i] == S[i+1]:
count += 1
else:
if count >= 1:
Ans += str(count)
Ans += S[i+1]
count = 1
if count>=1:
Ans += str(count)
return Ans
Just the condition must be changed from greater(">") to greater than equal to(">=") when comparing the count with 1.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how to replace a substring with a number of its occurences? - python

Take advantage of the standard library: from itertools import groupby st = "abbbccccaaaAAbccc" print("".join("{}{}".format(key, len(list(group))) for key, group in groupby(st))) Output: a1b3c4a3A2b1c3 >>>

Related

How can i get count of irregular repeating characters?

Longest substring without repeating characters in python

Python: Letters to numbers not working correctly

Python: Converting word to list of letters, then returning indexes of the letters against lower case alphabet

Create a compress function in Python?

Categories

Resources