Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
Expression example:
"abcddomain_rgz.png"
"djhajhdomain_rgb1.png"
Want to replace domain*.png in above expression with "domain.json".
Answers:
"abcddomain.json"
"djhajhdomain.json"
This is typical case of regex as mentioned in the comment section. Since you do not know the exact length of the string to be replaced right after domain until .png, you need to use a regular expression to perform that replacement.
Python provides you with the re module, which you can use its sub function to perform the replace:
import re
string = "djhajhdomain_rgb1.png"
result = re.sub("domain(.*).png", "domain.json", string)
print(result)
This will return:
djhajhdomain.json
use python regex instead (re package):
re.sub(r'domain.*\.png$', r"domain.json", 'djhajhdomain_rgb1.png')
Your best bet here would be regex.
x = "djhajhdomain_rgb1.png"
y = "djhajhdomain.json"
import re
pattern = re.compile(r'\w+domain')
ext = '.json'
match = re.match(pattern, x).group(0)
result = match+ext
assert result == y
import regex
compile a pattern to search in string. (Note here that the pattern will only accept alphanumerals and/or underscore before the literal string "domain")
set a pre-defined string extension
use the pattern compiled to match the string
concatenate the result
confirm that your result matches your desired output
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I want to extract all the text printed after "AAAAAAAAAAAAAAAAAA"
Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp
The following does not work:
import re
m = re.findall(r'AAAAAAAAAAAAAAAAAA(.*)', result)
print m[0]
Also, can I specify a variable in a regular expression instead of a hard coded string: "AAAAAAAAAAAAAAAAAA"?
Reason being, the text: "AAAAAAAAAAAAAAAAAA" is a variable and changes. So, I would like to look for a specific variable value in the pattern and then extract all the text after it.
Use re.S or re.DOTALL (they are synonyms) to have findall match across lines. Or, in your case, search is probably more appropriate since you only want one match. Also, to have it work for a non-hard-coded string, simply use string formatting or string concatenation. To avoid having unescaped regex characters in the string, run it through re.escape.
import re
result = """Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp"""
s = 'AAAAAAAAAAAAAAAAAA'
# With formatting
m = re.search(r'{}(.*)'.format(re.escape(s)), result, re.S)
# With concatenation
m = re.search(re.escape(s) + r'(.*)', result, re.S)
print m.group(1)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to replace every caret character with a unicode superscript, for nicer printing of equations in python. My problem is, every caret may be followed by a different exponent value, so in the unicode string u'\u00b*', the * wildcard needs to be the exponent I want to print in the string. I figured some regex would work for this, but my experience with that is very little.
For example, supposed I have a string
"x^3-x^2"
, I would then want this to be converted to the unicode string
u"x\u00b3-x\u00b2"
You can use re.sub and str.translate to catch exponents and change them to unicode superscripts.
import re
def to_superscript(num):
transl = str.maketrans(dict(zip('1234567890', '¹²³⁴⁵⁶⁷⁸⁹⁰')))
return num.translate(transl)
s = 'x^3-x^2'
out = re.sub('\^\s*(\d+)', lambda m: to_superscript(m[1]), s)
print(out)
Output
x³-x²
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I would like to extract some strings from between quotes using regular expression. The text is shown below:
CCKeyUpDomReady('test.asmx/asdasd', 'QMlPJZTOH09XOPCcbB2jcg==', '0OO6h+G2Tzhr5XWj1Upg0A==', '0OO6h+G2Tzhr5XWj1Upg0A==', '/qqwweq2.asmx/qqq')
Expected result must be:
test.asmx/asdasd
/qqwweq2.asmx/qqq
How can I do it? Here is the platform for testing:
https://regexr.com/3n142
The criteria: string which is between quotes must contains "asmx" word. The text is much more than showed above. You can think like that you are searching asmx urls in a website source code.
See regex in use here
'((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*)'
' Match this literally
((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*) Capture the following into capture group 1
(?:[^'\\]|\\.)* This is a beautiful trick gathered from PhiLho's answer to Regex for quoted string with escaping quotes. It matches escaped ' or any other character.
asmx The OP's search string/criterion
(?:[^'\\]|\\.)* This again
' Match this literally
The result is in capture group:
test.asmx/asdasd
/qqwweq2.asmx/qqq
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I created a small python function to remove some undesired elements from strings written in Chinese.
Those undesired elements feature an ampersand at the beginning (&Something).
The function uses a regex to spot them, remove them and return the longest part of the string without those undesired elements, but for some reason it's not working as expected.
I tested the function on strings in other languages and alphabets and it works as expected.
# -*- coding: utf-8 -*-
import re
def clean_sentence(my_text):
split_the_text = re.split(r'([&].*?\s)', my_text)
longest_sentence = max(split_the_text, key=len)
return longest_sentence
my_string = "一个神奇的鸭子飞在与&SOMETHING然后唱支歌给&PERSON"
print clean_sentence(my_string)
That's the output:
õ©Çõ©¬þÑ×ÕÑçþÜäÚ©¡Õ¡ÉÚú×Õ£¿õ©Ä&SOMETHINGþäÂÕÉÄÕö▒µö»µ¡îþ╗Ö&PERSON
Pretty simple:
There is no whitespace but you require one. If your SOMETHING or PERSON are only english characters or digits, you might be able to get along with:
import re
def clean_sentence(my_text):
split_the_text = re.split(r'&\w+', my_text)
longest_sentence = max(split_the_text, key=len)
return longest_sentence
my_string = "一个神奇的鸭子飞在与&SOMETHING然后唱支歌给&PERSON"
print(clean_sentence(my_string))
# 一个神奇的鸭子飞在与
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have the following link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg
How to take just this one part of the link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
and remove everything else? I also want to keep the extension.
I want to remove this part:
._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_
and keep this part:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
How can I do this in python?
You could use:
re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
This makes some assumptions but works on your input. The search starts at the ._ sequence, takes anything after that that is a letter, digit, dash, underscore, dot or comma, then matches the extension. I picked an explicit small group of possible extensions; you could also just use (\.w+)$ at the end instead to widen the acceptable extensions to word characters.
Demo:
>>> import re
>>> inputurl = 'http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg'
>>> re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
'http://ecx.images-amazon.com/images/I51JXXb2vpDL.jpg'
url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
l = url.split(".")
print(".".join(l[:-2:])+".{}".format(l[-1]))
prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
The following should work:
import re
url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
print re.sub(r"(https?://.+?)\._.+(\.\w+)", r'\1\2', url)
The above code prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
An important detail: More links are necessary to find the correct pattern. I'm currently assuming you want everything until the first ._
url = re.sub("(/[^./]+)\.[^/]*?(\.[^.]+)$", "\\1\\2", url)