Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I can use other escape characters without any problem but my atom text editor and python itself doesn't see it as an escape character but as a normal character.
print '\s', test_line
just writes
\stesting_bot1
How can I make it so that the editor and python will see this as an escape character and as space ?
\s isn't an escape sequence in Python. \t, \n, \r etc are (see the Python lexical analysis docs) but non-special characters will not be interpreted as anything special, hence your \s appearing literally.
However, \s does means space in regular expression syntax of course...
I think you may be confusing a regex with a string. For a normal string, you just need to use the space character to print it:
print(' testing_bot1')
\s is not an escape sequence, so it will be interpreted as just backslash + "s".
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Edit= Some moderators recommended me to make my self more clear, so here we go.
As a personal project in python, I'm making a very simple software that asks the user for an email address and then checks if the syntaxis of the email is correct.
I made a tuple of special characters that are not allowed in an email address, one of those characters is "\". I was looking online like crazy for how to make \ into a str with no result. I try looking online for the use of the function \ with no result either.
V = "\" doesn't work, it gives me a syntax error. I know it is possible to make it into a string because I've done it with an Input() command.
Please help.
It's not clear to me what language you're using - but in most cases you need to escape the backslash, as it is an escape character itself.
V="\\"
This functionality exists that you can include special characters (in this case, a double quote) in the string:
V="The following will be in quotes: \"Hello, World\""
In this case, the escaped double quotes will be treated as literal characters in the string, and will not signal the end of the string as they would without the escape character.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What do the below lines of code do? And what is its Jython equivalent?
Function Import_PUERTOR(strField, strRecord)
Dim re
Set re = New RegExp
re.Pattern = "^\s*"
re.MultiLine = False
strField = re.Replace(strField,"")
End Function
this code strips the leading spaces from the left of the strField string.
Python regex conversion? no need, python has a non-regex built-in for that (faster, shorter to write):
strField = strField.lstrip()
will do
lstrip returns a copy of the string with leading characters removed.
Syntax
str. lstrip([chars])
chars
Optional. String specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix; rather, all combinations of its values are stripped.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In my chat app TalkTalkTalk, for usernames, I allowed alphanumeric characters only (A-Z, a-z, 0-9):
username = re.sub(r'\W+', '', username) # regex to keep alphanumeric only
This is a bit too restrictive because UTF8 characters are useful in many cases (people who have a name with another alphabet than latin, etc.). Now I would like to allow these useful UTF8 characters from other alphabets, and even things like ❤ ☀ ☆ ☂ ☻ ♞ ☯ ☭ ☢. (Why not?)
But I don't want :
all kind of whitespaces, all kind of newlines (
)
malicious characters that look like empty zero-width char : http://unicode-table.com/fr/200D/
etc. and more generally every character that could make that userA<malicious_char> looks like real userA.
Which are the printable UTF8 characters? (to be used in a username)
How to filter them with a regex, for example in Python?
Note: This question is about finding a regex to filter them, so it's not a duplicate of some linked questions.
You can use flag re.UNICODE and unicode in regex expression, \u200b is not technically defined as whitespace
python 2.7 and 3
import re
username = u'My \u200bNick \u2602 \u263b \u200c '
white_chars = ['\s', u'\u200b',u'\u200c'] #etc
regex_str = '[' + ''.join(white_chars) + ']'
regex = re.compile(regex_str, flags=re.UNICODE)
regex.sub("", username )
print ( regex.sub("", username ) )
you get
u'MyNick\u2602\u263b'
MyNick☂☻
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Why Python's re module escapes semicolon characters?
print(re.escape('text;text'))
gives me the following output:
text\;text
>>> re.escape.__doc__
'Escape all non-alphanumeric characters in pattern.'
It escapes ;(semicolon), because ; is not an alphanumeric character.
It escapes a semicolon because that is what it's designed to do. As per the docs, it escapes all non-alphanumeric characters.
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
def symbolsReplaceDashes(text):
I want to replace all spaces and symbols with hyphens. Because I want to use this with URL.
import re
text = "this isn't alphanumeric"
result = re.sub(r'\W','-',text) # result will be "this-isn-t-alphanumeric"
The \W class is the inverse of the \w class, which consists of alphanumeric characters and underscores ([a-zA-Z0-9_]). Thus, replacing any character that doesn't match \W with a dash will leave you with a string that consists of only alphanumerics, underscores, and dashes, suitable for a URL.
Instead of regex, if you want to escape a string to be used for an url, use urllib.quote() or urllib.quote_plus(). For more complex queries, you might want to build the url using urllib.urlencode(). You can reverse the quotation with urllib.unquote() and urllib.unquote_plus().
This response doesn't use regular expressions, but should also work, with greater control over the types of symbols to filter. It uses the unicodedata module to remove all symbols by checking the categories of the characters.
import unicodedata
# See http://www.dpawson.co.uk/xsl/rev2/UnicodeCategories.html for character categories
replace = ('Sc', 'Sk', 'Sm', 'So', 'Zs')
def symbolsReplaceDashes(text):
L = []
for char in text:
if unicodedata.category(unicode(char)) in replace:
L.append('-')
else: L.append(char)
return ''.join(L)
You may need to use something like urllib.quote(output.encode('utf-8')) to encode characters if ranges are beyond basic ASCII alphanumeric characters.