If I am finding & replacing some text how can I get it to replace some text that will change each day so ie anything between (( & )) whatever it is?
Cheers!
Use regular expressions (http://docs.python.org/library/re.html)?
Could you please be more specific, I don't think I fully understand what you are trying to accomplish.
EDIT:
Ok, now I see. This may be done even easier, but here goes:
>>> import re
>>> s = "foo(bar)whatever"
>>> r = re.compile(r"(\()(.+?)(\))")
>>> r.sub(r"\1baz\3",s)
'foo(baz)whatever'
For multiple levels of parentheses this will not work, or rather it WILL work, but will do something you probably don't want it to do.
Oh hey, as a bonus here's the same regular expression, only now it will replace the string in the innermost parentheses:
r1 = re.compile(r"(\()([^)^(]+?)(\))")
Related
I've recently decided to jump into the deep end of the Python pool and start converting some of my R code over to Python and I'm stuck on something that is very important to me. In my line of work, I spend a lot of time parsing text data, which, as we all know, is very unstructured. As a result, I've come to rely on the lookaround feature of regex and R's lookaround functionality is quite robust. For example, if I'm parsing a PDF that might introduce some spaces in between letters when I OCR the file, I'd get to the value I want with something like this:
oAcctNum <- str_extract(textBlock[indexVal], "(?<=ORIG\\s?:\\s?/\\s?)[A-Z0-9]+")
In Python, this isn't possible because the use of ? makes the lookbehind a variable-width expression as opposed to a fixed-width. This functionality is important enough to me that it deters me from wanting to use Python, but instead of giving up on the language I'd like to know the Pythonista way of addressing this issue. Would I have to preprocess the string before extracting the text? Something like this:
oAcctNum = re.sub(r"(?<=\b\w)\s(?=\w\b)", "")
oAcctNum = re.search(r"(?<=ORIG:/)([A-Z0-9])", textBlock[indexVal]).group(1)
Is there a more efficient way to do this? Because while this example was trivial, this issue comes up in very complex ways with the data I work with and I'd hate to have to do this kind of preprocessing for every line of text I analyze.
Lastly, I apologize if this is not the right place to ask this question; I wasn't sure where else to post it. Thanks in advance.
Notice that if you can use groups, you generally do not need lookbehinds. So how about
match = re.search(r"ORIG\s?:\s?/\s?([A-Z0-9]+)", string)
if match:
text = match.group(1)
In practice:
>>> string = 'ORIG : / AB123'
>>> match = re.search(r"ORIG\s?:\s?/\s?([A-Z0-9]+)", string)
>>> match
<_sre.SRE_Match object; span=(0, 12), match='ORIG : / AB123'>
>>> match.group(1)
'AB123'
You need to use capture groups in this case you described:
"(?<=ORIG\\s?:\\s?/\\s?)[A-Z0-9]+"
will become
r"ORIG\s?:\s?/\s?([A-Z0-9]+)"
The value will be in .group(1). Note that raw strings are preferred.
Here is a sample code:
import re
p = re.compile(r'ORIG\s?:\s?/\s?([A-Z0-9]+)', re.IGNORECASE)
test_str = "ORIG:/texthere"
print re.search(p, test_str).group(1)
IDEONE demo
Unless you need overlapping matches, capturing groups usage instead of a look-behind is rather straightforward.
print re.findall(r"ORIG\s?:\s?/\s?([A-Z0-9]+)",test_str)
You can directly use findall which will return all the groups in the regex if present.
I have a need to recover 2 results of a regular expression in Python: what is searched and all else.
For example, in:
"boofums",3,4
I'd like to find what is in the quotes and what isn't:
boofums
,3,4
What I have so far is:
bobbles = '"boofums",3,4'
pickles = re.split(r'\".*\"', bobbles)
morton = re.match(r'\".*\"', bobbles)
print(pickles[1])
print(morton[0])
,3,4
"boofums"
This seems to me insanely inefficient and not Python-esque. Is there a better way to do this? (Sorry for the "is there a better way" construct on StackOverflow, but... I need to do this better! 😂)
...and if you can help me extract just what's in the quotes, something that I'd easily do in Perl or Ruby, all the better!
You're probably best off with regex groupings:
So for your example I'd use something like
regex = re.compile("\"(.*)\"(.*)")
bobble_groups = regex.match(bobbles)
you can then use bobble_groups.group(1) to just get the quotation marks.
See named groups if you don't want to depend on an index number.
a, b = re.match('"(.*)"(.*)', bobbles).groups()
Brackets determine groups that are "saved" to the match object
I currently have some code that goes to a URL, fetches the source code, and I'm trying to get it to return a variable from the string. So I created:
changetime = refreshsource.find('VARIABLE pm NST')
But it wouldn't find the area in the string because the word is not VARIABLE, it is something else. How would I retrieve the constantly changing VARIABLE from that string?
A regular expression will be able to achieve this for you. I'd you give some examples of what variable will be the we could come up with a strict expression. To match what you have above something like the following will do:
import re
# this will match 01:23, 11:34, 12:00, etc.
timex = re.compile('.*(\d{2}:\d{2})[ ]?pm NST')
match = timex.match(text, re.M|re.S)
variable = match.groups(0)
Edit: this code will actually work (unlike that first attempt :) ):
import re
# this will match 01:23, 11:34, 12:00, etc.
timex = re.compile('(\d{2}:\d{2})[ ]?pm NST')
match = timex.search(text)
if match:
variable = match.groups(0)
If the pattern is really that simple, then this seems a typical case where regular expressions comes quite handy.
Note: if you are new to regular expressions, you may want to use some introduction, like the http://www.regular-expressions.info.
On the other hand, if the pattern is more complex, then you may want to use an HTML parser, like for instance BeautifulSoup.
I have to split a string into a list of substrings according to the criteria that all the parenthesis strings should be split .
Lets say I have (9+2-(3*(4+2))) then I should get (4+2), (3*6) and (9+2-18).
The basic objective is that I learn which of the inner parenthesis is going to be executed first and then execute it.
Please help....
It would be helpful if you could suggest a method using re module. Just so this is for everyone it is not homework and I understand Polish notation. What I am looking for is using the power of Python and re module to use it in less lines of code.
Thanks a lot....
The eval is insecure, so you have to check input string for dangerous things.
>>> import re
>>> e = "(9+2-(3*(4+2)))"
>>> while '(' in e:
... inner = re.search('(\([^\(\)]+\))', e).group(1)
... e = re.sub(re.escape(inner), eval('str'+inner), e)
... print inner,
...
(4+2) (3*6) (9+2-18)
Try something like this:
import re
a = "(9+2-(3*(4+2)))"
s,r = a,re.compile(r'\([^(]*?\)')
while('(' in s):
g = r.search(s).group(0)
s = r.sub(str(eval(g)),s)
print g
print s
This sounds very homeworkish so I am going to reply with some good reading that might lead you down the right path. Take a peek at http://en.wikipedia.org/wiki/Polish_notation. It's not exactly what you want but understanding will lead you pretty close to the answer.
i don't know exactly what you want to do, but if you want to add other operations and if you want to have more control over the expression, i suggest you to use a parser
http://www.dabeaz.com/ply/ <-- ply, for example
I need to do something in regex but I'm really not good at it, long time didn't do that .
/a/c/a.doc
I need to change it to
\\a\\c\\a.doc
Please trying to do it by using regular expression in Python.
I'm entirely in favor of helping user483144 distinguish "solution" from "regular expression", as the previous two answerers have already done. It occurs to me, moreover, that os.path.normpath() http://docs.python.org/library/os.path.html might be what he's really after.
why do you think you every solution to your problem needs regular expression??
>>> s="/a/c/a.doc"
>>> '\\'.join(s.split("/"))
'\\a\\c\\a.doc'
By the way, if you are going to change path separators, you may just as well use os.path.join
eg
mypath = os.path.join("C:\\","dir","dir1")
Python will choose the correct slash for you. Also, check out os.sep if you are interested.
You can do it without regular expressions:
x = '/a/c/a.doc'
x = x.replace('/',r'\\')
But if you really want to use re:
x = re.sub('/', r'\\', x )
\\ is means "\\" or r"\\" ?
re.sub(r'/', r'\\', 'a/b/c')
use r'....' alwayse when you use regular expression.
'\\\'.join(r'/a/c/a.doc'.split("/"))