How can I handle custom emojis and clean them? For example turn <a:load:742504529278402560> into just :load:?
There doesn't seem to be a built in way in the library to do this though.
Here is a way:
import re
def cleanemojis(string):
return re.sub(r"<a?:([a-zA-Z0-9_-]{1,32}):[0-9]{17,21}>", r":\1:", string)
>>> cleanemojis("Loading <a:load:742504529278402560>")
"Loading :load:"
you probably would want to use regex. Try this one:
import re
pattern = r":\w*:"
# NEXT IS JUST A TEST
string = "<a:load:742504529278402560>"
result = re.search(pattern, string)
print(result.group())
As Arie Chertkov said, that would be the ideal way to do it. As per your request, I've written it into a function.
import re
pattern = r":\w*:"
def clean(string):
result = re.search(pattern, string)
return(result.group())
print(clean("<a:load:742504529278402560>"))
Related
i want to change all numbers in a document to word. follow two functions detect numbers in string by pattern and convert it to word through num2word library.
import num2words
from re import sub
def _conv_num(match):
word=num2words(match)
return word
def change_to_word(text):
normalized_text = sub(r'[^\s]*\d+[^\s]*', lambda m: _conv_num(m.group()), text)
return normalized_text
when i use these two function by follow code
txt="there are 3 books"
change_to_word(txt)
python issue this error
TypeError: 'module' object is not callable
i tried to find some similar post but it seems that no body had same issue or i didn't search in proper way, so kindly help me with a solution or a link about it
regards
I would do it like this:
import re
def _conv_num(match):
return num2words(match.group())
def numbers_to_words(text):
return re.sub(r'\b\d+\b', _conv_num, text)
for clarity, import the whole regular expression library and use re.sub() instead of just sub
no need for a lambda if your conversion function takes a match instead of a string
use word boundary matchers (\b) in the regular expression
more descriptive name for the main function
I want to use regex to get a part of the string. I want to remove the kerberos and everything after it and get the Username
import re
text = 'Kerberos://DME.DMS.WORLD.DMSHEN/Username'
reg1 = re.compile(r"^((Kerberos?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$",text)
print(reg1)
Output
Username
I am new to regex and tried this regex but it doesn't seem to work
Your regex works just fine, but I am assuming you would like to make most of the groups non-capturing (you can do that by adding ?: to each group.
It will give you the following:
re.match(r"^(?:(?:Kerberos?|ftp):\/)?\/?(?:[^:\/\s]+)(?:(\/\w+)*\/)(?P<u>[\w\-\.]+[^#?\s]+)(?:.*)?(?:#[\w\-]+)?$",t).group('u')
Also, for future reference, try using https://regex101.com/ , it has an easy way to test your regex + explanations on each part.
How about this simple one:
import re
text = 'Kerberos://DME.DMS.WORLD.DMSHEN/Username'
reg1 = re.findall(r"//.*/(.*)", text)
print(''.join(reg1))
# Username
If you want you can use split instead of regex
text = 'Kerberos://DME.DMS.WORLD.DMSHEN/Username'
m = text.split('/')[-1]
print m
I have the list with lines like:
=cat-egory/packagename-version
so I have to split it up into 3 different variables, like
category = cat-egory
package_name = packagename
package_version = version
I have to avoid
= and /
chars
I am fond of perl so I used to write a regexp like:
(?<==)\w+.\w+
which would give me cat-egory without leading = character
and so on, but as far as I know ?<= does not work in python, how must I extract the data then?
It seems to be working well. See: https://regex101.com/r/nnMRKd/2
Seems to work OK, maybe you are just missing the basic Python framework for capturing:
import re
text = "=cat-egory/packagename-version"
results = re.search("(?<==)\w+.\w+", text)
if results:
print (results.group(0))
output:
cat-egory
Make sure to use .search instead of .match as suggested by a comment. the .group is how you reference what you have captured instead of $1 in perl. Nothing too fancy here :)
You could even go one step further and use tuple unpacking:
import re
string = "=cat-egory/packagename-version"
rx = re.compile(r'(?<==)([^/]+)/([^-]+)-(.+)')
for match in rx.finditer(string):
category, package_name, version = match.groups()
print(category)
# cat-egory
So I have a string in which I have an URL.
The URL/string is something like this:
https://example.com/main/?code=32ll48hma6ldfm01bpki&data=57600&data2=aardappels
I want to get the code but I coulnd't figure out how. I looked at the .split() method. But I do not think it is efficient. and I couldn't really find a way to get it working.
Use urlparse and parse_qs from urlparse module:
from urlparse import urlparse, parse_qs
# For Python 3:
# from urllib.parse import urlparse, parse_qs
url = ' https://example.com/main'
url += '/?code=32ll48hma6ldfm01bpki&data=57600&data2=aardappels'
parsed = urlparse(url)
code = parse_qs(parsed.query).get('code')[0]
It does exactly what you want.
As #IronFist mentions, .split() method works only if you assume there is no '&' in the code parameter. If not, you can use .split() method a couple of times and get the desired code paramter:
url = "https://example.com/main/?code=32ll48hma6ldfm01bpki&data=57600&data2=aardappels"
code = url.split('/?')[1].split('&')[0]
There are many ways doing this!!! Easier way is to use urlparse. and the other way is to use the regular expression, but experts suggest that using regular expression on URL's can be tedious and the code becomes very difficult to maintain.
Another easy way is as shown below,
str1 = 'https://example.com/main/?code=32ll48hma6ldfm01bpki&data=57600&data2=aardappels'
codeStart = str1.find('code=')
codeEnd = str1.find('&data=')
print str1[codeStart+5:codeEnd]
Using regular expressions:
>>> import re
>>> url = 'https://example.com/main/?code=32ll48hma6ldfm01bpki&data=57600&data2=aardappels'
>>> code = re.search("code=([0-9a-zA-Z]+)&?", url).group(1)
>>> print code
32ll48hma6ldfm01bpki
I want to normalize strings like
'1:2:3','10:20:30'
to
'01:02:03','10:20:30'
by using re module of Python,
so I am trying to select the string like '1:2:3' then match the single number '1','2','3'..., here is my pattern:
^\d(?=\D)|(?<=\D)\d(?=\D)|(?<=\D)\d$
it works but I think the pattern is not simple enough, anybody could help me simplify it? or use map()/split() if it's more sophisticated.
\b matches between a word character and a non-word character.
>>> import re
>>> l = ['1:2:3','10:20:30']
>>> [re.sub(r'\b(\d)\b', r'0\1', i) for i in l]
['01:02:03', '10:20:30']
DEMO
re.sub(r"(?<!\d)(\d)(?!\d)",r"0\1",test_str)
You can simplify it to this.See demo.
https://regex101.com/r/nD5jY4/4#python
If the string is like
x="""'1:2:3','10:20:30'"""
Then do
print ",".join([re.sub(r"(?<!\d)(\d)(?!\d)",r"0\1",i) for i in x.split(",")])
You could do this with re, but pretty much nobody will know how it works afterwards. I'd recommend this instead:
':'.join("%02d" % int(x) for x in original_string.split(':'))
It's more clear how it works.