Remove a prefix from a string [duplicate]

Remove a prefix from a string [duplicate] - python

This question already has answers here:
How to remove the left part of a string?
(21 answers)
Closed 6 years ago.
I am trying to do the following, in a clear pythonic way:
def remove_prefix(str, prefix):
return str.lstrip(prefix)
print remove_prefix('template.extensions', 'template.')
This gives:
xtensions
Which is not what I was expecting (extensions). Obviously (stupid me), because I have used lstrip wrongly: lstrip will remove all characters which appear in the passed chars string, not considering that string as a real string, but as "a set of characters to remove from the beginning of the string".
Is there a standard way to remove a substring from the beginning of a string?

As noted by #Boris-Verkhovskiy and #Stefan, on Python 3.9+ you can use
text.removeprefix(prefix)
In older versions you can use with the same behavior:
def remove_prefix(text, prefix):
if text.startswith(prefix):
return text[len(prefix):]
return text # or whatever

Short and sweet:
def remove_prefix(text, prefix):
return text[text.startswith(prefix) and len(prefix):]

What about this (a bit late):
def remove_prefix(s, prefix):
return s[len(prefix):] if s.startswith(prefix) else s

regex solution (The best way is the solution by #Elazar this is just for fun)
import re
def remove_prefix(text, prefix):
return re.sub(r'^{0}'.format(re.escape(prefix)), '', text)
>>> print remove_prefix('template.extensions', 'template.')
extensions

I think you can use methods of the str type to do this. There's no need for regular expressions:
def remove_prefix(text, prefix):
if text.startswith(prefix): # only modify the text if it starts with the prefix
text = text.replace(prefix, "", 1) # remove one instance of prefix
return text

def remove_prefix(s, prefix):
if s.startswith(prefix):
return s[len(prefix):]
else:
return s

Related

Using regex as search string for python's "in" keyword [duplicate]

This question already has answers here:
Regular expression to filter list of strings matching a pattern
(5 answers)
Closed 2 years ago.
Say I have a dictionary of sets of paths:
my_dict['some_key'] = {'abc/hi/you','xyz/hi/you','jkl/hi/you'}
I want to see if a path appears is in this set. If I have the whole path I simply would do the following:
str = 'abc/hi/you'
if str in my_dict['some_key']:
print(str)
But what if I don't know b is what comes in between a and c. What if it could be literally anything. If I was lsing in a shell I'd just put * and call it a day.
What I want to be able to do is have str be a regx:
regx = '^a.*c/hi/you$' #just assume this is the ideal regex. Doesn't really matter.
if regx in my_dict['some_key']:
print('abc/hi/you') #print the actual path, not the regx
What is a clean and fast way to implement something like this?

You need to loop through the set rather than a simple in call.
To avoid setting up the whole dictionary of sets for the example I have abstracted it as simply my_set.
import re
my_set = {'abc/hi/you','xyz/hi/you','jkl/hi/you'}
regx = re.compile('^a.*c/hi/you$')
for path in my_set:
if regx.match(path):
print(path)
I chose to compile instead of simply re.match() because the set could have 1 million plus elements in the actual implementation.

You can subclass the set class and implement the a in b operator
import re
from collections import defaultdict
class MySet(set):
def __contains__(self, regexStr):
regex = re.compile(regexStr)
for e in self:
if regex.match(e):
return True
return False
my_dict = defaultdict(MySet)
my_dict['some_key'].add('abc/hi/you')
regx = '^a.*c/hi/you$'
if regx in my_dict['some_key']:
print('abc/hi/you')

Checked docs on String.replace() not working anyway Python [duplicate]

This question already has answers here:
Why doesn't calling a string method (such as .replace or .strip) modify (mutate) the string?
(3 answers)
Closed 4 years ago.
I have read all the python docs on String.replace, yet I am still having trouble with my code. My code is as follows:
#within a class
def fixdata(self):
global inputs_list
for i in range(0, len(inputs_list)):
inputs_list[i].replace("\n", "")
print(inputs_list[i]) # to check output
What I am hoping for is that all \n characters (newlines) are replaced with empty string ""so that trailing newlines are removed. .strip(), .rstrip(), and, I'm assuming, .lstrip() do not work.

Dok: string.replace()
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
You replace and throw the copy away:
#within a class
def fixdata(self):
global inputs_list
for i in range(0, len(inputs_list)):
inputs_list[i].replace("\n", "") # Thrown away
print(inputs_list[i]) # to check output
More doku:
lstrip() / rstrip()/ strip()
Return a copy of the string ...
Fix:
def fixdata(self):
global inputs_list # a global in a class-method? pass it to the class and use a member?
inputs_list = [item.rstrip() for item in input_lists]

You need to assign the new string like this:
inputs_list[i] = inputs_list[i].replace("\n", "")

Python - First and last character in string must be alpha numeric, else delete [duplicate]

This question already has answers here:
How to remove non-alphanumeric characters at the beginning or end of a string
(5 answers)
Closed 6 years ago.
I am wondering how I can implement a string check, where I want to make sure that the first (&last) character of the string is alphanumeric. I am aware of the isalnum, but how do I use this to implement this check/substitution?
So, I have a string like so:
st="-jkkujkl-ghjkjhkj*"
and I would want back:
st="jkkujkl-ghjkjhkj"
Thanks..

Though not exactly what you want, but using str.strip should serve your purpose
import string
st.strip(string.punctuation)
Out[174]: 'jkkujkl-ghjkjhkj'

You could use regex like shown below:
import re
# \W is a set of all special chars, and also include '_'
# If you have elements in the set [\W_] at start and end, replace with ''
p = re.compile(r'^[\W_]+|[\W_]+$')
st="-jkkujkl-ghjkjhkj*"
print p.subn('', st)[0]
Output:
jkkujkl-ghjkjhkj
Edit:
If your special chars are in the set: !"#$%&\'()*+,-./:;<=>?#[\]^_`{|}~
#Abhijit's answer is much simpler and cleaner.
If you are not sure then this regex version is better.

You can use following two expressions:
st = re.sub('^\W*', '', st)
st = re.sub('\W*$', '', st)
This will strip all non alpha chars of the beginning and the end of the string, not just the first ones.

You could use a regular expression.
Something like this could work;
\w.+?\w
However I'm don't know how to do a regexp match in python..

hint 1: ord() can covert a letter to a character number
hint 2: alpha charterers are between 97 and 122 in ord()
hint 3: st[0] will return the first letter in string st[-1] will return the last

An exact answer to your question may be the following:
def stringCheck(astring):
firstChar = astring[0] if astring[0].isalnum() else ''
lastChar = astring[-1] if astring[-1].isalnum() else ''
return firstChar + astring[1:-1] + lastChar

Python: Best practice for dynamically constructing regex [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 9 months ago.
I have a simple function to remove a "word" from some text:
def remove_word_from(word, text):
if not text or not word: return text
rec = re.compile(r'(^|\s)(' + word + ')($|\s)', re.IGNORECASE)
return rec.sub(r'\1\3', text, 1)
The problem, of course, is that if word contains characters such as "(" or ")" things break, and it generally seems unsafe to stick a random word in the middle of a regex.
What's best practice for handling cases like this? Is there a convenient, secure function I can call to escape "word" so it's safe to use?

You can use re.escape(word) to escape the word.

Unless you're forced to use regexps, couldn't you use instead the replace method for strings ?
text = text.replace(word, '')
This allows you to get rid of punctuation issues.

Write a sanitizer function and pass word through that first.
def sanitize(word):
def literalize(wd, escapee):
return wd.replace(escapee, "\\%s"%escapee)
return reduce(literalize, "()[]*?{}.+|", word)
def remove_word_from(word, text):
if not text or not word: return text
rec = re.compile(r'(^|\s)(' + sanitize(word) + ')($|\s)', re.IGNORECASE)
return rec.sub(r'\1\3', text, 1)

How does Python's triple-quote string work?

How should this function be changed to return "123456"?
def f():
s = """123
456"""
return s
UPDATE: Everyone, the question is about understanding how to not have \t or whatever when having a multiline comment, not how to use the re module.

Don't use a triple-quoted string when you don't want extra whitespace, tabs and newlines.
Use implicit continuation, it's more elegant:
def f():
s = ('123'
'456')
return s

def f():
s = """123\
456"""
return s
Don't indent any of the blockquote lines after the first line; end every line except the last with a backslash.

Subsequent strings are concatenated, so you can use:
def f():
s = ("123"
"456")
return s
This will allow you to keep indention as you like.

textwrap.dedent("""\
123
456""")
From the standard library. First "\" is necessary because this function works by removing the common leading whitespace.

Maybe I'm missing something obvious but what about this:
def f():
s = """123456"""
return s
or simply this:
def f():
s = "123456"
return s
or even simpler:
def f():
return "123456"
If that doesn't answer your question, then please clarify what the question is about.

You might want to check this str.splitlines([keepends])
Return a list of the lines in the string, breaking at line boundaries.
This method uses the universal newlines approach to splitting lines.
Line breaks are not included in the resulting list unless keepends is
given and true.
Python recognizes "\r", "\n", and "\r\n" as line boundaries for 8-bit strings.
So, for the problem at hand ... we could do somehting like this..
>>> s = """123
... 456"""
>>> s
'123\n456'
>>> ''.join(s.splitlines())
'123456'

re.sub('\D+', '', s)
will return a string, if you want an integer, convert this string with int.

Try
import re
and then
return re.sub("\s+", "", s)

My guess is:
def f():
s = """123
456"""
return u'123456'
Minimum change and does what is asked for.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove a prefix from a string [duplicate] - python

As noted by #Boris-Verkhovskiy and #Stefan, on Python 3.9+ you can use text.removeprefix(prefix) In older versions you can use with the same behavior: def remove_prefix(text, prefix): if text.startswith(prefix): return text[len(prefix):] return text # or whatever

Short and sweet: def remove_prefix(text, prefix): return text[text.startswith(prefix) and len(prefix):]

What about this (a bit late): def remove_prefix(s, prefix): return s[len(prefix):] if s.startswith(prefix) else s

regex solution (The best way is the solution by #Elazar this is just for fun) import re def remove_prefix(text, prefix): return re.sub(r'^{0}'.format(re.escape(prefix)), '', text) >>> print remove_prefix('template.extensions', 'template.') extensions

I think you can use methods of the str type to do this. There's no need for regular expressions: def remove_prefix(text, prefix): if text.startswith(prefix): # only modify the text if it starts with the prefix text = text.replace(prefix, "", 1) # remove one instance of prefix return text

def remove_prefix(s, prefix): if s.startswith(prefix): return s[len(prefix):] else: return s

Related

Using regex as search string for python's "in" keyword [duplicate]

Checked docs on String.replace() not working anyway Python [duplicate]

Python - First and last character in string must be alpha numeric, else delete [duplicate]

Python: Best practice for dynamically constructing regex [duplicate]

How does Python's triple-quote string work?

Categories

Resources