Replace in string based on function ouput - python

So, for input:
accessibility,random good bye
I want output:
a11y,r4m g2d bye
So, basically, I have to abbreviate all words of length greater than or equal to 4 in the following format: first_letter + length_of_all_letters_in_between + last_letter
I try to do this:
re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", r"\1" + str(len(r"\2")) + r"\3", s)
But it does not work. In JS, I would easily do:
str.replace(/([A-Za-z])([A-Za-z]{2,})([A-Za-z])/g, function(m, $1, $2, $3){
return $1 + $2.length + $3;
});
How do I do the same in Python?
EDIT: I cannot afford to lose any punctuation present in original string.

What you are doing in JavaScript is certainly right, you are passing an anonymous function. What you do in Python is to pass a constant expression ("\12\3", since len(r"\2") is evaluated before the function call), it is not a function that can be evaluated for each match!
While anonymous functions in Python aren't quite as useful as they are in JS, they do the job here:
>>> import re
>>> re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: "{}{}{}".format(m.group(1), len(m.group(2)), m.group(3)), "accessability, random good bye")
'a11y, r4m g2d bye'
What happens here is that the lambda is called for each substitution, taking a match object. I then retrieve the needed information and build a substitution string from that.

The issue you're running into is that len(r'\2') is always 2, not the length of the second capturing group in your regular expression. You can use a lambda expression to create a function that works just like the code you would use in JavaScript:
re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])",
lambda m: m.group(1) + str(len(m.group(2)) + m.group(3),
s)
The m argument to the lambda is a match object, and the calls to its group method are equivalent to the backreferences you were using before.
It might be easier to just use a simple word matching pattern with no capturing groups (group() can still be called with no argument to get the whole matched text):
re.sub(r'\w{4,}', lambda m: m.group()[0] + str(len(m.group())-2) + m.group()[-1], s)

tmp, out = "",""
for ch in s:
if ch.isspace() or ch in {",", "."}:
out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
tmp = ""
else:
tmp += ch
out += "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(out)
a11y,r4m g2d bye
If you only want alpha characters use str.isalpha:
tmp, out = "", ""
for ch in s:
if not ch.isalpha():
out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
tmp = ""
else:
tmp += ch
out += "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(out)
a11y,r4m g2d bye
The logic is the same for both, it is just what we check for that differs, if not ch.isalpha() is False we found a non alpha character so we need to process the tmp string and add it to out output string. if len(tmp) is not greater than 3 as per the requirement we just add the tmp string plus the current char to our out string.
We need a final out += "{}{}{} outside the loop to catch when a string does not end in a comma, space etc.. If the string did end in a non-alpha we would be adding an empty string so it would make no difference to the output.
It will preserve punctuation and spaces:
s = "accessibility,random good bye !! foobar?"
def func(s):
tmp, out = "", ""
for ch in s:
if not ch.isalpha():
out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
tmp = ""
else:
tmp += ch
return "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(func(s,3))
a11y,r4m g2d bye !! f4r?

Keep it simple...
>>> s = "accessibility,random good bye"
>>> re.sub(r'\B[A-Za-z]{2,}\B', lambda x: str(len(x.group())), s)
'a11y,r4m g2d bye'
\B which matches between two word characters or two non-word chars helps to match all the chars except first and last.

As an alternative precise way you can use a separate function for re.sub and use the simple regex r"(\b[a-zA-Z]+\b)".
>>> def replacer(x):
... g=x.group(0)
... if len(g)>3:
... return '{}{}{}'.format(g[0],len(g)-2,g[-1])
... else :
... return g
...
>>> re.sub(r"(\b[a-zA-Z]+\b)", replacer, s)
'a11y,r4m g2d bye'
Also as a pythonic and general way, to get the replaced words within a list you can use a list comprehension using re.finditer :
>>> from operator import sub
>>> rep=['{}{}{}'.format(i.group(0)[0],abs(sub(*i.span()))-2,i.group(0)[-1]) if len(i.group(0))>3 else i.group(0) for i in re.finditer(r'(\w+)',s)]
>>> rep
['a11y', 'r4m', 'g2d', 'bye']
The re.finditer will returns a generator contains all matchobjects then you can iterate over it and get the start and end of matchobjects with span() method.

Using regex and comprehension:
import re
s = "accessibility,random good bye"
print "".join(w[0]+str(len(w)-2)+w[-1] if len(w) > 3 else w for w in re.split("(\W)", s))
Gives:
a11y,r4m g2d bye

Have a look at the following code
sentence = "accessibility,random good bye"
sentence = sentence.replace(',', " ")
sentence_list = sentence.split(" ")
for item in sentence_list:
if len(item) >= 4:
print item[0]+str(len(item[1:len(item)-1]))+item[len(item)-1]
The only thing you should take care of comma and other punctuation characters.

Related

remove consecutive substrings from a string without importing any packages

I want to remove consecutive "a" substrings and replace them with one "a" from a string without importing any packages.
For example, I want to get abbccca from aaabbcccaaaa.
Any suggestions?
Thanks.
This method will remove a determined repeated char from your string:
def remove_dulicated_char(string, char):
new_s = ""
prev = ""
for c in string:
if len(new_s) == 0:
new_s += c
prev = c
if c == prev and c == char:
continue
else:
new_s += c
prev = c
return new_s
print(remove_dulicated_char("aaabbcccaaaa", "a"))
Whats wrong with using a loop?
oldstring = 'aaabbcccaaaa'
# Initialise the first character as the same as the initial string
# as this will always be the same.
newstring = oldstring[0]
# Loop through each character starting at the second character
# check if the preceding character is an a, if it isn't add it to
# the new string. If it is an a then check if the current character
# is an a too. If the current character isn't an a then add it to
# the new string.
for i in range(1, len(oldstring)):
if oldstring[i-1] != 'a':
newstring += oldstring[i]
else:
if oldstring[i] != 'a':
newstring += oldstring[i]
print(newstring)
using python regular expressions this will do it.
If you don't know about regex. They are extremely powerful for
this kind of matching
import re
str = 'aaabbcccaaaa'
print(re.sub('a+', 'a', str))
You can use a function that removes double values of a string occurrence recursively until only one occurrence of the repeating string remains:
val = 'aaabbcccaaaaaaaaaaa'
def remove_doubles(v):
v = v.replace('aa', 'a')
if 'aa' in v:
v = remove_doubles(v)
if 'aa' in v:
v = remove_doubles(v)
else: return v
else: return v
print(remove_doubles(val))
There are many ways to do this. Here's another one:
def remove_duplicates(s, x):
t = [s[0]]
for c in s[1:]:
if c != x or t[-1] != x:
t.append(c)
return ''.join(t)
print(remove_duplicates('aaabbcccaaaa', 'a'))

how replace every successive character in string with ‘#’ with python?

Example- For Given string ‘Hello World’ returned string is ‘H#l#o W#r#d’.
i tried this code but spaces are also included in this . i want spaces to be maintain in between words
def changer():
ch=[]
for i in 'Hello World':
ch.append(i)
for j in range(1,len(ch),2):
ch[j]= '#'
s=''
for k in ch:
s=s+k
print(s)
changer()
Output - H#l#o#W#r#d
Output i want = H#l#o W#r#d
You can str.split on whitespace to get substrings, then for each substring replace all the odd characters with '#' while preserving the even characters. Then str.join the replaced substrings back together.
>>> ' '.join(''.join('#' if v%2 else j for v,j in enumerate(i)) for i in s.split())
'H#l#o W#r#d'
you can control the increment, by default 2 but, in case of spaces 1 to jump it and continue evaluating the next word
def changer():
ch=[]
increment = 2
for i in 'Hello World':
ch.append(i)
for j in range(1,len(ch),increment):
if not ch[j].isspace():
ch[j]= '#'
increment = 2
else:
increment = 1
s=''
for k in ch:
s=s+k
print(s)
changer()
Since you said you don't want spaces to be included in the output, don't include them:
ch=[]
for i in 'Hello World':
ch.append(i)
for j in range(1,len(ch),2):
if ch[j] != " ": # don't 'include' spaces
ch[j]= '#'
s=''
for k in ch:
s=s+k
print(s)
There are a lot of very inconsistent answers here. I think we need a little more info to get you the solution you are expecting. Can you give a string with more words in it to confirm your desired output. You said you want every successive character to be a #, and gave an example of H#l#o W#r#d. Do you want the space to be included in determining what the next character should be? Or should the space be written, but skipped over as a determining factor for the next character? The other option would be 'H#l#o #o#l#' where the space is included in the new text, but is ignored when determining the next character.
Some of the answers give something like this:
string = "Hello World This Is A Test"
'H#l#o W#r#d T#i# I# A T#s#'
'H#l#o W#r#d T#i# #s A T#s#'
'H#l#o W#r#d T#i# I# A T#s# '
This code gives the output: 'H#l#o W#r#d T#i# #s A T#s#'
string = 'Hello World This Is A Test'
solution = ''
c = 0
for letter in string:
if letter == ' ':
solution += ' '
c += 1
elif c % 2:
solution += "#"
c += 1
else:
solution += letter
c += 1
If you actually want the desired outcome if including the whitespace, but not having them be a factor in determing the next character, alls you need to do is remove the counter first check so the spaces do not affect the succession. The solution would be: 'H#l#o #o#l# T#i# I# A #e#t'
You could use accumulate from itertools to build the resulting string progressively
from itertools import accumulate
s = "Hello World"
p = "".join(accumulate(s,lambda r,c:c if {r[-1],c}&set(" #") else "#"))
print(p)
Using your algorithm, you can process each word individually, this way you don't run into issues with spaces. Here's an adaptation of your algorithm where each word is concatenated to a string after being processed:
my_string = 'Hello World'
my_processed_string = ''
for word in my_string.split(' '):
ch = []
for i in word:
ch.append(i)
for j in range(1, len(ch), 2):
ch[j] = '#'
for k in ch:
my_processed_string += k
my_processed_string += ' '
You can maintain a count separate of whitespace and check its lowest bit, replacing the character with hash depending on even or odd.
def changer():
ch=[]
count = 0 # hash even vals (skips 0 because count advanced before compare)
for c in 'Hello World':
if c.isspace():
count = 0 # restart count
else:
count += 1
if not count & 1:
c = '#'
ch.append(c)
s = ''.join(ch)
print(s)
changer()
Result
H#l#o W#r#d
I have not made much changes to your code. so i think this maybe easy for you to understand.
enter code here
def changer():
ch=[]
h='Hello World' #I have put your string in a variable
for i in h:
ch.append(i)
for j in range(1,len(ch),2):
if h[j]!=' ':
ch[j]= '#'
s=''
for k in ch:
s=s+k
print(s)
changer()

Swapping uppercase and lowercase in a string [duplicate]

This question already has answers here:
How can I invert (swap) the case of each letter in a string?
(8 answers)
How can I use `return` to get back multiple values from a loop? Can I put them in a list?
(2 answers)
Closed 6 months ago.
I would like to change the chars of a string from lowercase to uppercase.
My code is below, the output I get with my code is a; could you please tell me where I am wrong and explain why?
Thanks in advance
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
for word in words:
if word.isupper() == True:
return word.lower()
else:
return word.upper()
print to_alternating_case(test)
If you want to invert the case of that string, try this:
>>> 'AltERNating'.swapcase()
'aLTernATING'
There are two answers to this: an easy one and a hard one.
The easy one
Python has a built in function to do that, i dont exactly remember what it is, but something along the lines of
string.swapcase()
The hard one
You define your own function. The way you made your function is wrong, because
iterating over a string will return it letter by letter, and you just return the first letter instead of continuing the iteration.
def to_alternating_case(string):
temp = ""
for character in string:
if character.isupper() == True:
temp += character.lower()
else:
temp += word.upper()
return temp
Your loop iterates over the characters in the input string. It then returns from the very first iteration. Thus, you always get a 1-char return value.
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
rval = ''
for c in words:
if word.isupper():
rval += c.lower()
else:
rval += c.upper()
return rval
print to_alternating_case(test)
That's because your function returns the first character only. I mean return keyword breaks your for loop.
Also, note that is unnecessary to convert the string into a list by running words = list(string) because you can iterate over a string just as you did with the list.
If you're looking for an algorithmic solution instead of the swapcase() then modify your method this way instead:
test = "AltERNating"
def to_alternating_case(string):
res = ""
for word in string:
if word.isupper() == True:
res = res + word.lower()
else:
res = res + word.upper()
return res
print to_alternating_case(test)
You are returning the first alphabet after looping over the word alternating which is not what you are expecting. There are some suggestions to directly loop over the string rather than converting it to a list, and expression if <variable-name> == True can be directly simplified to if <variable-name>. Answer with modifications as follows:
test = "AltERNating"
def to_alternating_case(string):
result = ''
for word in string:
if word.isupper():
result += word.lower()
else:
result += word.upper()
return result
print to_alternating_case(test)
OR using list comprehension :
def to_alternating_case(string):
result =[word.lower() if word.isupper() else word.upper() for word in string]
return ''.join(result)
OR using map, lambda:
def to_alternating_case(string):
result = map(lambda word:word.lower() if word.isupper() else word.upper(), string)
return ''.join(result)
You should do that like this:
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
newstring = ""
if word.isupper():
newstring += word.lower()
else:
newstring += word.upper()
return alternative
print to_alternating_case(test)
def myfunc(string):
i=0
newstring=''
for x in string:
if i%2==0:
newstring=newstring+x.lower()
else:
newstring=newstring+x.upper()
i+=1
return newstring
contents='abcdefgasdfadfasdf'
temp=''
ss=list(contents)
for item in range(len(ss)):
if item%2==0:
temp+=ss[item].lower()
else:
temp+=ss[item].upper()
print(temp)
you can add this code inside a function also and in place of print use the return key
string=input("enter string:")
temp=''
ss=list(string)
for item in range(len(ss)):
if item%2==0:
temp+=ss[item].lower()
else:
temp+=ss[item].upper()
print(temp)
Here is a short form of the hard way:
alt_case = lambda s : ''.join([c.upper() if c.islower() else c.lower() for c in s])
print(alt_case('AltERNating'))
As I was looking for a solution making a all upper or all lower string alternating case, here is a solution to this problem:
alt_case = lambda s : ''.join([c.upper() if i%2 == 0 else c.lower() for i, c in enumerate(s)])
print(alt_case('alternating'))
You could use swapcase() method
string_name.swapcase()
or you could be a little bit fancy and use list comprehension
string = "thE big BROWN FoX JuMPeD oVEr thE LAZY Dog"
y = "".join([val.upper() if val.islower() else val.lower() for val in string])
print(y)
>>> 'THe BIG brown fOx jUmpEd OveR THe lazy dOG'
This doesn't use any 'pythonic' methods and gives the answer in a basic logical format using ASCII :
sentence = 'aWESOME is cODING'
words = sentence.split(' ')
sentence = ' '.join(reversed(words))
ans =''
for s in sentence:
if ord(s) >= 97 and ord(s) <= 122:
ans = ans + chr(ord(s) - 32)
elif ord(s) >= 65 and ord(s) <= 90 :
ans = ans + chr(ord(s) + 32)
else :
ans += ' '
print(ans)
So, the output will be : Coding IS Awesome

How to remove or strip off white spaces without using strip() function?

Write a function that accepts an input string consisting of alphabetic
characters and removes all the leading whitespace of the string and
returns it without using .strip(). For example if:
input_string = " Hello "
then your function should return a string such as:
output_string = "Hello "
The below is my program for removing white spaces without using strip:
def Leading_White_Space (input_str):
length = len(input_str)
i = 0
while (length):
if(input_str[i] == " "):
input_str.remove()
i =+ 1
length -= 1
#Main Program
input_str = " Hello "
result = Leading_White_Space (input_str)
print (result)
I chose the remove function as it would be easy to get rid off the white spaces before the string 'Hello'. Also the program tells to just eliminate the white spaces before the actual string. By my logic I suppose it not only eliminates the leading but trailing white spaces too. Any help would be appreciated.
You can loop over the characters of the string and stop when you reach a non-space one. Here is one solution :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if c != ' ':
return input_str[i:]
Edit :
#PM 2Ring mentionned a good point. If you want to handle all types of types of whitespaces (e.g \t,\n,\r), you need to use isspace(), so a correct solution could be :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if not c.isspace():
return input_str[i:]
Here's another way to strip the leading whitespace, that actually strips all leading whitespace, not just the ' ' space char. There's no need to bother tracking the index of the characters in the string, we just need a flag to let us know when to stop checking for whitespace.
def my_lstrip(input_str):
leading = True
for ch in input_str:
if leading:
# All the chars read so far have been whitespace
if not ch.isspace():
# The leading whitespace is finished
leading = False
# Start saving chars
result = ch
else:
# We're past the whitespace, copy everything
result += ch
return result
# test
input_str = " \n \t Hello "
result = my_lstrip(input_str)
print(repr(result))
output
'Hello '
There are various other ways to do this. Of course, in a real program you'd simply use the string .lstrip method, but here are a couple of cute ways to do it using an iterator:
def my_lstrip(input_str):
it = iter(input_str)
for ch in it:
if not ch.isspace():
break
return ch + ''.join(it)
and
def my_lstrip(input_str):
it = iter(input_str)
ch = next(it)
while ch.isspace():
ch = next(it)
return ch + ''.join(it)
Use re.sub
>>> input_string = " Hello "
>>> re.sub(r'^\s+', '', input_string)
'Hello '
or
>>> def remove_space(s):
ind = 0
for i,j in enumerate(s):
if j != ' ':
ind = i
break
return s[ind:]
>>> remove_space(input_string)
'Hello '
>>>
Just to be thorough and without using other modules, we can also specify which whitespace to remove (leading, trailing, both or all), including tab and new line characters. The code I used (which is, for obvious reasons, less compact than other answers) is as follows and makes use of slicing:
def no_ws(string,which='left'):
"""
Which takes the value of 'left'/'right'/'both'/'all' to remove relevant
whitespace.
"""
remove_chars = (' ','\n','\t')
first_char = 0; last_char = 0
if which in ['left','both']:
for idx,letter in enumerate(string):
if not first_char and letter not in remove_chars:
first_char = idx
break
if which == 'left':
return string[first_char:]
if which in ['right','both']:
for idx,letter in enumerate(string[::-1]):
if not last_char and letter not in remove_chars:
last_char = -(idx + 1)
break
return string[first_char:last_char+1]
if which == 'all':
return ''.join([s for s in string if s not in remove_chars])
you can use itertools.dropwhile to remove all particualar characters from the start of you string like this
import itertools
def my_lstrip(input_str,remove=" \n\t"):
return "".join( itertools.dropwhile(lambda x:x in remove,input_str))
to make it more flexible, I add an additional argument called remove, they represent the characters to remove from the string, with a default value of " \n\t", then with dropwhile it will ignore all characters that are in remove, to check this I use a lambda function (that is a practical form of write short anonymous functions)
here a few tests
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip(" Hello ")
'Hello '
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip("--- Hello ","-")
' Hello '
>>> my_lstrip("--- Hello ","- ")
'Hello '
>>> my_lstrip("- - - Hello ","- ")
'Hello '
>>>
the previous function is equivalent to
def my_lstrip(input_str,remove=" \n\t"):
i=0
for i,x in enumerate(input_str):
if x not in remove:
break
return input_str[i:]

Functions Assist

I'm new to python and for an exercise I'm creating a function that would do the same as the .replace method.
I have this so far:
def replace_str (string, substring, replace):
my_str = ""
for index in range(len(string)):
if string[index:index+len(substring)] == substring :
my_str += replace
else:
my_str += string[index]
return my_str
When tested with:
print (replace_str("hello", "ell", "xx"))
It returns:
hxxllo
I was hoping someone could help point me in the right direction so that it replaces "ell" with "xx" and then skips to the "o" and prints:
hxxo
as the .replace string method would do.
Usually, using a while with an index variable maintained by hand is a bad idea, but when you need to manipulate the index within the loop, it can be an okay option:
def replace_str(string, substring, replace):
my_str = ""
index = 0
while index < len(string):
if string[index:index+len(substring)] == substring:
my_str += replace
# advance index past the end of replaced part
else:
my_str += string[index]
# advance index to the next character
return my_str
Note that x.replace(y, z) does something different when y is empty. If you want to match that behavior, it may be worth a special case in your code.
You could do the following:
import sys
def replace_str(string, substring, replace):
new_string = ''
substr_idx = 0
for character in string:
if character == substring[substr_idx]:
substr_idx += 1
else:
new_string += character
if substr_idx == len(substring):
new_string += replace
substr_idx = 0
return new_string
if len(sys.argv) != 4:
print("Usage: %s [string] [substring] [replace]" % sys.argv[0])
sys.exit(1)
print(replace_str(sys.argv[1], sys.argv[2], sys.argv[3]))
Note that using the str.join() command on a list (list.append is O(1)) works faster than the above, but you said that you can't use the string methods.
Example usage:
$ python str.py hello ell pa
hpao
$ python str.py helloella ell pa
hpaopaa

Categories

Resources