re.compile regex assistance (python, beautifulsoup)

re.compile regex assistance (python, beautifulsoup) - python

Using this code from a different thread
import re
import requests
from bs4 import BeautifulSoup
data = """
<script type="text/javascript">
window._propertyData =
{ *** a lot of random code and other data ***
"property": {"street": "21st Street", "apartment": "2101", "available": false}
*** more data ***
}
</script>
"""
soup = BeautifulSoup(data, "xml")
pattern = re.compile(r'\"street\":\s*\"(.*?)\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)
This gets me the desired result:
21st Street
However, i was trying to get the whole thing by trying different variations of the regex and couldn't achieve the output to be:
{"street": "21st Street", "apartment": "2101", "available": false}
I have tried the following:
pattern = re.compile(r'\"property\":\s*\"(.*?)\{\"', re.MULTILINE | re.DOTALL)
Its not producing the desired result.
Your help is appreciated!
Thanks.

As per commented above , correct your typo and you use this
r"property\W+({.*?})"
RegexDemo
property : look for exact string
\W+ : matches any non-word character
({.*?}) : capture group one
.* matches any character inside braces {}
? matches as few times as possible

You can try this:
\"property\":\s*(\{.*?\})
capture group 1 contains yor desired data
Explanation
Sample Code:
import re
regex = r"\"property\":\s*(\{.*?\})"
test_str = ("window._propertyData = \n"
" { *** a lot of random code and other data ***\n"
" \"property\": {\"street\": \"21st Street\", \"apartment\": \"2101\", \"available\": false}\n"
" *** more data ***\n"
" }")
matches = re.finditer(regex, test_str, re.MULTILINE | re.DOTALL)
for matchNum, match in enumerate(matches):
print(match.group(1))
Run it here

Try this, It may be long but work's fine
\"property\"\:\s*(\{((?:\"\w+\"\:\s*\"?[\w\s]+\"?\,?\s?)+?)\})
https://regex101.com/r/7KzzRV/3

import re
import ast
data = """
<script type="text/javascript">
window._propertyData =
{ *** a lot of random code and other data ***
"property": {"street": "21st Street", "apartment": "2101", "available": false}
*** more data ***
}
</script>
"""
property = re.search(r'"property": ({.+?})', data)
str_form = property.group(1)
print('str_form: ' + str_form)
dict_form = ast.literal_eval(str_form.replace('false', 'False'))
print('dict_form: ', dict_form)
out:
str_form: {"street": "21st Street", "apartment": "2101", "available": false}
dict_form: {'available': False, 'street': '21st Street', 'apartment': '2101'}

Related

How to replace URL parts with regex in Python?

I have a JSON file with URLs that looks something like this:
{
"a_url": "https://foo.bar.com/x/env_t/xyz?asd=dfg",
"b_url": "https://foo.bar.com/x/env_t/xyz?asd=dfg",
"some other property": "blabla",
"more stuff": "yep",
"c_url": "https://blabla.com/somethingsomething?maybe=yes"
}
In this JSON, I want to look up all URLs that have a specific format, and then replace some parts in it.
In URLs that have the format of the first 2 URLs, I want to replace "foo" by "fooa" and "env_t" by "env_a", so that the output looks like this:
{
"a_url": "https://fooa.bar.com/x/env_a/xyz?asd=dfg",
"b_url": "https://fooa.bar.com/x/env_a/xyz?asd=dfg",
"some other property": "blabla",
"more stuff": "yep",
"c_url": "https://blabla.com/somethingsomething?maybe=yes"
}
I can't figure out how to do this. I came up with this regex:
https://foo([a-z]?)\.bar\.com/x/(.+)/.+\"
In regex101 this matches my URLs and captures the groups that I'm seeking to replace, but I can't figure out how to do this with Python's regex.sub().

1. Using regex
import regex
url = 'https://foo.bar.com/x/env_t/xyz?asd=dfg'
new_url = regex.sub(
'(env_t)',
'env_a',
regex.sub('(foo)', 'fooa', url)
)
print(new_url)
output:
https://fooa.bar.com/x/env_a/xyz?asd=dfg
2. Using str.replace
with open('./your-json-file.json', 'r+') as f:
content = f.read()
new_content = content \
.replace('foo', 'fooa') \
.replace('env_t', 'env_a')
f.seek(0)
f.write(new_content)
f.truncate()

python/ruby regex for domain in text

I have text as result i need to find all domains on this text, domains with subdomain and without with https and without;
opt-i.mydomain.com 'www-oud.mydomain.com'
https\\u003a\\u002f\\u002fapi.noddos.com\\u002fabout\\u002fen-us\\u002fsignin\\u002f"
<script type="text/javascript">
(function(){
var baseUrl = 'https\x3A\x2F\x2Fhubspot.mydomain.com';
var baseUrl = 'https\x3A\x2F\x2Fhub-spot.mydomain.com';
</script>
#=========================================================
define("modules/constants/env", [], function() {
return {"BATCH_THUMB_ENDPOINTS": [], "LIVE_TRANSCODE_SERVER": "showbox-tr.dropbox.com", "STATIC_CONTENT_HOST": "cfl.dropboxstatic.com", "NOTES_WEBSERVER": "paper.dropbox.com", "REDIRECT_SAFE_ORIGINS": ["www.dropbox.com", "dropbox.com", "api.dropboxapi.com", "api.dropbox.com", "linux.dropbox.com", "photos.dropbox.com", "carousel.dropbox.com", "client-web.dropbox.com", "services.pp.dropbox.com", "www.dropbox.com", "docsend.com", "paper.dropbox.com", "notes.dropbox.com", "test.composer.dropbox.com", "showcase.dropbox.com", "collections.dropbox.com", "embedded.hellosign.com", "help.dropbox.com", "help-stg.dropbox.com", "experience.dropbox.com", "learn.dropbox.com", "learn-stage.dropbox.com", "app.hellosign.com", "replay.dropbox.com"], "PROF_SHARING_WEBSERVER": "showcase.dropbox.com", "FUNCAPTCHA_SERVER": "dropboxcaptcha.com", "__esModule": true};
});
#========================================================
https://mydomain.co/path/2
https://api.mydomain.co/path/2
https://api-v1.mydomain.co/path/2
https://superdomain.com:443
https://api.superdomain.com:443
https\\u003a\\u002f\\u002fapi.noddos.com\\u002fabout\\u002fen-us\\u002fsignin\\u002f"
https\\u003a\\u002f\\u002fnoddos.com\\u002fabout\\u002fen-us\\u002fsignin\\u002f"
root.I13N_config.location = "https:\u002F\u002Flocation.com\u002Faccount\u002Fchallenge\u002Frecaptcha
root.I13N_config.location = "https:\u002F\u002Fapi.location.com\u002Faccount\u002Fchallenge\u002Frecaptcha
&scope=openid%20profile%20https%3A%2F%2Fapi.domain2.com%2Fv2%2FOfficeHome.All&response_mode=form_post&nonce
&scope=openid%20profile%20https%3A%2F%2Fdomain2.com%2Fv2%2FOfficeHome.All&response_mode=form_post&nonce
https%3a%2f%2fwww.anotherdomain.com%2fv2%2
i try this regex but its not capture all i need.
re.compile(
r'''((?<=x2[fF]|02[fF]|%2[fF])|(?<=//))(\w\.|\w[A-Za-z0-9-]{0,61}\w\.){1,3}[A-Za-z]{2,6}|(?<=["'])(\w\.|\w[A-Za-z0-9-]{0,61}\w\.){1,3}[A-Za-z]{2,6}(?=["'])''',
re.VERBOSE)
captured result by regex:
{'showbox-tr.dropbox.com', 'api.dropbox.com', 'api-v1.mydomain.co', 'www-oud.mydomain.com', 'cfl.dropboxstatic.com', 'hub-spot.mydomain.com', 'paper.dropbox.com', 'superdomain.com', 'api.superdomain.com', 'linux.dropbox.com', 'embedded.hellosign.com', 'api.location.com', 'api.dropboxapi.com', 'www.dropbox.com', 'location.com', 'api.domain2.com', 'dropbox.com', 'api.noddos.com', 'dropboxcaptcha.com', 'learn-stage.dropbox.com', 'test.composer.dropbox.com', 'help-stg.dropbox.com', 'replay.dropbox.com', 'domain2.com', 'hubspot.mydomain.com', 'learn.dropbox.com', 'help.dropbox.com', 'collections.dropbox.com', 'app.hellosign.com', 'api.mydomain.co', 'noddos.com', 'docsend.com', 'mydomain.co', 'notes.dropbox.com', 'photos.dropbox.com', 'client-web.dropbox.com', 'services.pp.dropbox.com', 'OfficeHome.All', 'showcase.dropbox.com', 'carousel.dropbox.com', 'experience.dropbox.com'}
expected results so in general its not capture "opt-i.mydomain.com":
{'opt-i.mydomain.com', 'hubspot.mydomain.com', 'embedded.hellosign.com', 'carousel.dropbox.com', 'api.dropboxapi.com', 'experience.dropbox.com', 'linux.dropbox.com', 'api.superdomain.com', 'noddos.com', 'showcase.dropbox.com', 'app.hellosign.com', 'www-oud.mydomain.com', 'showbox-tr.dropbox.com', 'help-stg.dropbox.com', 'api.domain2.com', 'notes.dropbox.com', 'paper.dropbox.com', 'services.pp.dropbox.com', 'collections.dropbox.com', 'learn.dropbox.com', 'location.com', 'api.location.com', 'docsend.com', 'api.dropbox.com', 'replay.dropbox.com', 'mydomain.co', 'hub-spot.mydomain.com', 'www.dropbox.com', 'learn-stage.dropbox.com', 'domain2.com', 'help.dropbox.com', 'api.mydomain.co', 'api-v1.mydomain.co', 'superdomain.com', 'dropboxcaptcha.com', 'api.noddos.com', 'dropbox.com', 'test.composer.dropbox.com', 'cfl.dropboxstatic.com', 'client-web.dropbox.com', 'opt-i.mydomain.com', 'photos.dropbox.com'}
i also test this regex it match all domains better than previus but problem is with doamin`s that has unicode example:
"https\\u003a\\u002f\\u002fapi.noddos.com" will capture "u002fapi.noddos.com" but we need "api.noddos.com"
re.compile(
r'''
([a-z0-9][a-z0-9\-]{0,61}[a-z0-9]\.)+[a-z0-9][a-z0-9\-]*[a-z0-9]
''', re.VERBOSE)

import re
regex = r"(?:[a-z0-9][a-z0-9\-]{0,61}[a-z0-9]\.)+[a-z0-9][a-z0-9\-]*[a-z0-9]"
data = """opt-i.mydomain.com 'www-oud.mydomain.com'
https\\u003a\\u002f\\u002fapi.noddos.com\\u002fabout\\u002fen-us\\u002fsignin\\u002f"
<script type="text/javascript">
(function(){
var baseUrl = 'https\x3A\x2F\x2Fhubspot.mydomain.com';
var baseUrl = 'https\x3A\x2F\x2Fhub-spot.mydomain.com';
</script>
#=========================================================
define("modules/constants/env", [], function() {
return {"BATCH_THUMB_ENDPOINTS": [], "LIVE_TRANSCODE_SERVER": "showbox-tr.dropbox.com", "STATIC_CONTENT_HOST": "cfl.dropboxstatic.com", "NOTES_WEBSERVER": "paper.dropbox.com", "REDIRECT_SAFE_ORIGINS": ["www.dropbox.com", "dropbox.com", "api.dropboxapi.com", "api.dropbox.com", "linux.dropbox.com", "photos.dropbox.com", "carousel.dropbox.com", "client-web.dropbox.com", "services.pp.dropbox.com", "www.dropbox.com", "docsend.com", "paper.dropbox.com", "notes.dropbox.com", "test.composer.dropbox.com", "showcase.dropbox.com", "collections.dropbox.com", "embedded.hellosign.com", "help.dropbox.com", "help-stg.dropbox.com", "experience.dropbox.com", "learn.dropbox.com", "learn-stage.dropbox.com", "app.hellosign.com", "replay.dropbox.com"], "PROF_SHARING_WEBSERVER": "showcase.dropbox.com", "FUNCAPTCHA_SERVER": "dropboxcaptcha.com", "__esModule": true};
});
#========================================================
https://mydomain.co/path/2
https://api.mydomain.co/path/2
https://api-v1.mydomain.co/path/2
https://superdomain.com:443
https://api.superdomain.com:443
https\\u003a\\u002f\\u002fapi.noddos.com\\u002fabout\\u002fen-us\\u002fsignin\\u002f"
https\\u003a\\u002f\\u002fnoddos.com\\u002fabout\\u002fen-us\\u002fsignin\\u002f"
root.I13N_config.location = "https:\u002F\u002Flocation.com\u002Faccount\u002Fchallenge\u002Frecaptcha
root.I13N_config.location = "https:\u002F\u002Fapi.location.com\u002Faccount\u002Fchallenge\u002Frecaptcha
&scope=openid%20profile%20https%3A%2F%2Fapi.domain2.com%2Fv2%2FOfficeHome.All&response_mode=form_post&nonce
&scope=openid%20profile%20https%3A%2F%2Fdomain2.com%2Fv2%2FOfficeHome.All&response_mode=form_post&nonce"""
data = re.sub(r"u[0-9a-z]{4}", "", data)
matches = re.findall(regex, data)
#output matches:
['opt-i.mydomain.com', 'www-oud.mydomain.com', 'api.noddos.com', 'ht.mydomain.com', 'hub-spot.mydomain.com', 'showbox-tr.dropbox.com', 'cfl.dropboxstatic.com', 'paper.dropbox.com', 'www.dropbox.com', 'dropbox.com', 'api.dropboxapi.com', 'api.dropbox.com', 'linux.dropbox.com', 'photos.dropbox.com', 'carousel.dropbox.com', 'client-web.dropbox.com', 'services.pp.dropbox.com', 'www.dropbox.com', 'docsend.com', 'paper.dropbox.com', 'notes.dropbox.com', 'test.composer.dropbox.com', 'showcase.dropbox.com', 'collections.dropbox.com', 'embedded.hellosign.com', 'help.dropbox.com', 'help-stg.dropbox.com', 'experience.dropbox.com', 'learn.dropbox.com', 'learn-stage.dropbox.com', 'app.hellosign.com', 'replay.dropbox.com', 'showcase.dropbox.com', 'dropboxcaptcha.com', 'mydomain.co', 'api.mydomain.co', 'api-v1.mydomain.co', 'somain.com', 'api.somain.com', 'api.noddos.com', 'noddos.com', 'config.location', 'location.com', 'config.location', 'api.location.com', 'api.domain2.com', 'domain2.com']

Python: How to get full match with RegEx

I'm trying to filter out a link from some java script. The java script part isin't relevant anymore because I transfromed it into a string (text).
Here is the script part:
<script>
setTimeout("location.href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe';", 2000);
$(function() {
$("#whats_new_panels").bxSlider({
controls: false,
auto: true,
pause: 15000
});
});
setTimeout(function(){
$("#download_messaging").hide();
$("#next_button").show();
}, 10000);
</script>
Here is what I do:
import re
def get_link_from_text(text):
text = text.replace('\n', '')
text = text.replace('\t', '')
text = re.sub(' +', ' ', text)
search_for = re.compile("href[ ]*=[ ]*'[^;]*")
debug = re.search(search_for, text)
return debug
What I want is the href link and I kind of get it, but for some reason only like this
<_sre.SRE_Match object; span=(30, 112), match="href = 'https://airdownload.adobe.com/air/win/dow>
and not like I want it to be
<_sre.SRE_Match object; span=(30, 112), match="href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe'">
So my question is how to get the full link and not only a part of it.
Might the problem be that re.search isin't returning longer strings? Because I tried altering the RegEx, I even tried matching the link 1 by 1, but it still returns only the part I called out earlier.

I've modified it slightly, but for me it returns the complete string you desire now.
import re
text = """
<script>
setTimeout("location.href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe';", 2000);
$(function() {
$("#whats_new_panels").bxSlider({
controls: false,
auto: true,
pause: 15000
});
});
setTimeout(function(){
$("#download_messaging").hide();
$("#next_button").show();
}, 10000);
</script>
"""
def get_link_from_text(text):
text = text.replace('\n', '')
text = text.replace('\t', '')
text = re.sub(' +', ' ', text)
search_for = re.compile("href[ ]*=[ ]*'[^;]*")
debug = search_for.findall(text)
print(debug)
get_link_from_text(text)
Output:
["href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe'"]

Dynamically double-quote "keys" in text to form valid JSON string in python

I'm working with text contained in JS variables on a webpage and extracting strings using regex, then turning it into JSON objects in python using json.loads().
The issue I'm having is the unquoted "keys". Right now, I'm doing a series of replacements (code below) to "" each key in each string, but what I want is to dynamically identify any unquoted keys before passing the string into json.loads().
Example 1 with no space after : character
json_data1 = '[{storeName:"testName",address:"12345 Road",address2:"Suite 500",city:"testCity",storeImage:"http://www.testLink.com",state:"testState",phone:"999-999-9999",lat:99.9999,lng:-99.9999}]'
Example 2 with space after : character
json_data2 = '[{storeName: "testName",address: "12345 Road",address2: "Suite 500",city: "testCity",storeImage: "http://www.testLink.com",state: "testState",phone: "999-999-9999",lat: 99.9999,lng: -99.9999}]'
Example 3 with space after ,: characters
json_data3 = '[{storeName: "testName", address: "12345 Road", address2: "Suite 500", city: "testCity", storeImage: "http://www.testLink.com", state: "testState", phone: "999-999-9999", lat: 99.9999, lng: -99.9999}]'
Example 4 with space after : character and newlines
json_data4 = '''[
{
storeName: "testName",
address: "12345 Road",
address2: "Suite 500",
city: "testCity",
storeImage: "http://www.testLink.com",
state: "testState",
phone: "999-999-9999",
lat: 99.9999, lng: -99.9999
}]'''
I need to create pattern that identifies which are keys and not random string values containing characters such as the string link in storeImage. In other words, I want to dynamically find keys and double-quote them to use json.loads() and return a valid JSON object.
I'm currently replacing each key in the text this way
content = re.sub('storeName:', '"storeName":', content)
content = re.sub('address:', '"address":', content)
content = re.sub('address2:', '"address2":', content)
content = re.sub('city:', '"city":', content)
content = re.sub('storeImage:', '"storeImage":', content)
content = re.sub('state:', '"state":', content)
content = re.sub('phone:', '"phone":', content)
content = re.sub('lat:', '"lat":', content)
content = re.sub('lng:', '"lng":', content)
Returned as string representing valid JSON
json_data = [{"storeName": "testName", "address": "12345 Road", "address2": "Suite 500", "city": "testCity", "storeImage": "http://www.testLink.com", "state": "testState", "phone": "999-999-9999", "lat": 99.9999, "lng": -99.9999}]
I'm sure there is a better way of doing this but I haven't been able to find or come up with a regex pattern to handle these. Any help is greatly appreciated!

Something like this should do the job: ([{,]\s*)([^"':]+)(\s*:)
Replace for: \1"\2"\3
Example: https://regex101.com/r/oV0udR/1

That repetition is of course unnecessary. You could put everything into a single regex:
content = re.sub(r"\b(storeName|address2?|city|storeImage|state|phone|lat|lng):", r'"\1":', content)
\1 contains the match within the first (in this case, only) set of parentheses, so "\1": surrounds it with quotes and adds back the colon.
Note the use of a word boundary anchor to make sure we match only those exact words.

Regex: (\w+)\s?:\s?("?[^",]+"?,?)
Regex demo
import re
text = 'storeName: "testName", '
text = re.sub('(\w+)\s?:\s?("?[^",]+"?,?)', "\"\g<1>\":\g<2>", text)
print(text)
Output: "storeName":"testName",

How to extract text from file (with in script tag) using Python or beautifulsoup

Could you please help me with this little thing. I am looking to extract lat and lng value from the below code in SCRIPT tag(not in Body) using Beautiful soup(Python) or python. I am new to Python and blog are recommending to use Beautiful soup for extracting.
I want these two values lat: 21.25335 , lng: 81.649445
I am using regular expression for this . My regular expresion "^l([a-t])(:) ([0-9])([^,]+)"
Check this link for Regular expression and html file -
http://regexr.com/3glde
I get those two value with this regular expression but i want only those lat and lng value (numeric part ) to be stored in variable .
Here below is my python code which I am using
import re
pattern = re.compile("^[l]([a-t])([a-t])(\:) ([0-9])([^,]+)")
for i, line in enumerate(open('C:\hile_text.html')):
for match in re.finditer(pattern, line):
print 'Found on line %s: %s' % (i+1, match.groups())
Output:
Found on line 3218: ('a', 't', ':', '2', '1.244791')
Found on line 3219: ('n', 'g', ':', '8', '1.643486')
I want only those numeric value as output like 21.25335,81.649445 and want to store these values in variables or else you can provide alternate code to this.
plzz soon help me out .Thanks in anticipation.
This is the script tag in html file .
<script type="text/javascript">
window.mapDivId = 'map0Div';
window.map0Div = {
lat: 21.25335,
lng: 81.649445,
zoom: null,
locId: 5897747,
geoId: 297595,
isAttraction: false,
isEatery: true,
isLodging: false,
isNeighborhood: false,
title: "Aman Age Roll & Chicken ",
homeIcon: true,
url: "/Restaurant_Review-g297595-d5897747-Reviews-Aman_Age_Roll_Chicken-Raipur_Raipur_District_Chhattisgarh.html",
minPins: [
['hotel', 20],
['restaurant', 20],
['attraction', 20],
['vacation_rental', 0] ],
units: 'km',
geoMap: false,
tabletFullSite: false,
reuseHoverDivs: false,
noSponsors: true };
ta.store('infobox_js', 'https://static.tacdn.com/js3/infobox-c-v21051733989b.js');
ta.store("ta.maps.apiKey", "");
(function() {
var onload = function() {
if (window.location.hash == "#MAPVIEW") {
ta.run("ta.mapsv2.Factory.handleHashLocation", {}, true);
}
}
if (window.addEventListener) {
if (window.history && window.history.pushState) {
window.addEventListener("popstate", function(e) {
ta.run("ta.mapsv2.Factory.handleHashLocation", {}, false);
}, false);
}
window.addEventListener('load', onload, false);
}
else if (window.attachEvent) {
window.attachEvent('onload', onload);
}
})();
ta.store("mapsv2.show_sidebar", true);
ta.store('mapsv2_restaurant_reservation_js', ["https://static.tacdn.com/js3/ta-mapsv2-restaurant-reservation-c-v2430632369b.js"]);
ta.store('mapsv2.typeahead_css', "https://static.tacdn.com/css2/maps_typeahead-v21940478230b.css");
// Feature gate VR price pins on SRP map. VRC-14803
ta.store('mapsv2.vr_srp_map_price_enabled', true);
ta.store('mapsv2.geoName', 'Raipur');
ta.store('mapsv2.map_addressnotfound', "Address not found"); ta.store('mapsv2.map_addressnotfound3', "We couldn\'t find that location near {0}. Please try another search."); ta.store('mapsv2.directions', "Directions from {0} to {1}"); ta.store('mapsv2.enter_dates', "Enter dates for best prices"); ta.store('mapsv2.best_prices', "Best prices for your stay"); ta.store('mapsv2.list_accom', "List of accommodations"); ta.store('mapsv2.list_hotels', "List of hotels"); ta.store('mapsv2.list_vrs', "List of holiday rentals"); ta.store('mapsv2.more_accom', "More accommodations"); ta.store('mapsv2.more_hotels', "More hotels"); ta.store('mapsv2.more_vrs', "More Holiday Homes"); ta.store('mapsv2.sold_out_on_1', "SOLD OUT on 1 site"); ta.store('mapsv2.sold_out_on_y', "SOLD OUT on 2 sites"); </script>

Your regular expression is a bit messed up.
^l says you are trying to match an 'l' that is the very first character on a line.
^\s+(l[an][gt])(:\s+)(\d+\.\d+) would work better. Checkout an regerx analyzer tool such as http://www.myezapp.com/apps/dev/regexp/show.ws to get a breakdown of what is happening.
Here is a breakdown
Sequence: match all of the followings in order
BeginOfLine
Repeat
WhiteSpaceCharacter
one or more times
CapturingGroup
GroupNumber:1
Sequence: match all of the followings in order
l
AnyCharIn[ a n]
AnyCharIn[ g t]
CapturingGroup
GroupNumber:2
Sequence: match all of the followings in order
:
Repeat
WhiteSpaceCharacter
one or more times
CapturingGroup
GroupNumber:3
Sequence: match all of the followings in order
Repeat
Digit
one or more times
.
Repeat
Digit
one or more times

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

re.compile regex assistance (python, beautifulsoup) - python

As per commented above , correct your typo and you use this r"property\W+({.?})" RegexDemo property : look for exact string \W+ : matches any non-word character ({.?}) : capture group one .* matches any character inside braces {} ? matches as few times as possible

Try this, It may be long but work's fine \"property\"\:\s(\{((?:\"\w+\"\:\s\"?[\w\s]+\"?\,?\s?)+?)\}) https://regex101.com/r/7KzzRV/3

Related

How to replace URL parts with regex in Python?

python/ruby regex for domain in text

Python: How to get full match with RegEx

Dynamically double-quote "keys" in text to form valid JSON string in python

How to extract text from file (with in script tag) using Python or beautifulsoup

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

re.compile regex assistance (python, beautifulsoup) - python

As per commented above , correct your typo and you use this r"property\W+({.*?})" RegexDemo property : look for exact string \W+ : matches any non-word character ({.*?}) : capture group one .* matches any character inside braces {} ? matches as few times as possible

Try this, It may be long but work's fine \"property\"\:\s*(\{((?:\"\w+\"\:\s*\"?[\w\s]+\"?\,?\s?)+?)\}) https://regex101.com/r/7KzzRV/3

Related

How to replace URL parts with regex in Python?

python/ruby regex for domain in text

Python: How to get full match with RegEx

Dynamically double-quote "keys" in text to form valid JSON string in python

How to extract text from file (with in script tag) using Python or beautifulsoup

Categories

Resources

As per commented above , correct your typo and you use this r"property\W+({.?})" RegexDemo property : look for exact string \W+ : matches any non-word character ({.?}) : capture group one .* matches any character inside braces {} ? matches as few times as possible

Try this, It may be long but work's fine \"property\"\:\s(\{((?:\"\w+\"\:\s\"?[\w\s]+\"?\,?\s?)+?)\}) https://regex101.com/r/7KzzRV/3