Python regular expression problem

Python regular expression problem - python

I need to do something in regex but I'm really not good at it, long time didn't do that .
/a/c/a.doc
I need to change it to
\\a\\c\\a.doc
Please trying to do it by using regular expression in Python.

I'm entirely in favor of helping user483144 distinguish "solution" from "regular expression", as the previous two answerers have already done. It occurs to me, moreover, that os.path.normpath() http://docs.python.org/library/os.path.html might be what he's really after.

why do you think you every solution to your problem needs regular expression??
>>> s="/a/c/a.doc"
>>> '\\'.join(s.split("/"))
'\\a\\c\\a.doc'
By the way, if you are going to change path separators, you may just as well use os.path.join
eg
mypath = os.path.join("C:\\","dir","dir1")
Python will choose the correct slash for you. Also, check out os.sep if you are interested.

You can do it without regular expressions:
x = '/a/c/a.doc'
x = x.replace('/',r'\\')
But if you really want to use re:
x = re.sub('/', r'\\', x )

\\ is means "\\" or r"\\" ?
re.sub(r'/', r'\\', 'a/b/c')
use r'....' alwayse when you use regular expression.

'\\\'.join(r'/a/c/a.doc'.split("/"))

Related

Python re.sub with regex

Need help with regex within re.sub . In this case I am replacing with nothing ("")
My Current Code:
file_list = ['F_5500_SF_PART7_[0-9][0-9][0-9][0-9]_all.zip',
'F_5500_SF_[0-9][0-9][0-9][0-9]_All.zip',
'F_5500_[0-9][0-9][0-9][0-9]_All.zip',
'F_SCH_A_PART1_[0-9][0-9][0-9][0-9]_All.zip']
foldernames = [re.sub('(\d{4})_All.zip', '', i) for i in file_list]
The Result I am trying to achieve is:
foldernames = ['F_5500_SF_PART7','F_5500_SF','F_5500','F_SCH_A_PART1']
I think part of the complexity is the fact that there is already regex in my file_list. Hoping someone smarter could help.

You don't need a regular expression, you're removing fixed strings. So you can just use the str.replace() method.
foldernames = [i.replace('_[0-9][0-9][0-9][0-9]_All.zip', '').replace('_[0-9][0-9][0-9][0-9]_all.zip', '') for i in file_list]
The two calls to replace() are needed to handle both All and all. Or if the rest of the filename is always uppercase, you could use:
foldernames = [i.upper().replace('_[0-9][0-9][0-9][0-9]_ALL.ZIP', '') for i in file_list]

Barmar's answer is the most appropriate for your problem. But if you actually need to use regex (let's say not all the files have the same fixed "[0-9][0-9][0-9][0-9]" string), then you can use:
'_(\[[-\d]*\]){4}_[aA]ll.zip'
(the [aA]ll at the end if for capturing the lower-case "all" in your first case)

Python split with regular expression to divide string

I have a need to recover 2 results of a regular expression in Python: what is searched and all else.
For example, in:
"boofums",3,4
I'd like to find what is in the quotes and what isn't:
boofums
,3,4
What I have so far is:
bobbles = '"boofums",3,4'
pickles = re.split(r'\".*\"', bobbles)
morton = re.match(r'\".*\"', bobbles)
print(pickles[1])
print(morton[0])
,3,4
"boofums"
This seems to me insanely inefficient and not Python-esque. Is there a better way to do this? (Sorry for the "is there a better way" construct on StackOverflow, but... I need to do this better! 😂)
...and if you can help me extract just what's in the quotes, something that I'd easily do in Perl or Ruby, all the better!

You're probably best off with regex groupings:
So for your example I'd use something like
regex = re.compile("\"(.*)\"(.*)")
bobble_groups = regex.match(bobbles)
you can then use bobble_groups.group(1) to just get the quotation marks.
See named groups if you don't want to depend on an index number.

a, b = re.match('"(.*)"(.*)', bobbles).groups()
Brackets determine groups that are "saved" to the match object

De-greedifying a regular expression in python

I'm trying to write a regular expression that will convert a full path filename to a short filename for a given filetype, minus the file extension.
For example, I'm trying to get just the name of the .bar file from a string using
re.search('/(.*?)\.bar$', '/def_params/param_1M56/param/foo.bar')
According to the Python re docs, *? is the ungreedy version of *, so I was expecting to get
'foo'
returned for match.group(1) but instead I got
'def_params/param_1M56/param/foo'
What am I missing here about greediness?

What you're missing isn't so much about greediness as about regular expression engines: they work from left to right, so the / matches as early as possible and the .*? is then forced to work from there. In this case, the best regex doesn't involve greediness at all (you need backtracking for that to work; it will, but could take a really long time to run if there are a lot of slashes), but a more explicit pattern:
'/([^/]*)\.bar$'

I would suggest changing your regex so that it doesn't rely on greedyness.
You want only the filename before the extension .bar and everything after the final /. This should do:
re.search(`/[^/]*\.bar$`, '/def_params/param_1M56/param/foo.bar')
What this does is it matches /, then zero or more characters (as much as possible) that are not / and then .bar.

I don't claim to understand the non-greedy operators all that well, but a solution for that particular problem would be to use ([^/]*?)

The regular expressions starts from the right. Put a .* at the start and it should work.

I like regex but there is no need of one here.
path = '/def_params/param_1M56/param/foo.bar'
print path.rsplit('/',1)[1].rsplit('.')[0]
path = '/def_params/param_1M56/param/fululu'
print path.rsplit('/',1)[1].rsplit('.')[0]
path = '/def_params/param_1M56/param/one.before.two.dat'
print path.rsplit('/',1)[1].rsplit('.',1)[0]
result
foo
fululu
one.before.two

Other people have answered the regex question, but in this case there's a more efficient way than regex:
file_name = path[path.rindex('/')+1 : path.rindex('.')]

try this one on for size:
match = re.search('.*/(.*?).bar$', '/def_params/param_1M56/param/foo.bar')

Splitting an expression

I have to split a string into a list of substrings according to the criteria that all the parenthesis strings should be split .
Lets say I have (9+2-(3*(4+2))) then I should get (4+2), (3*6) and (9+2-18).
The basic objective is that I learn which of the inner parenthesis is going to be executed first and then execute it.
Please help....
It would be helpful if you could suggest a method using re module. Just so this is for everyone it is not homework and I understand Polish notation. What I am looking for is using the power of Python and re module to use it in less lines of code.
Thanks a lot....

The eval is insecure, so you have to check input string for dangerous things.
>>> import re
>>> e = "(9+2-(3*(4+2)))"
>>> while '(' in e:
... inner = re.search('(\([^\(\)]+\))', e).group(1)
... e = re.sub(re.escape(inner), eval('str'+inner), e)
... print inner,
...
(4+2) (3*6) (9+2-18)

Try something like this:
import re
a = "(9+2-(3*(4+2)))"
s,r = a,re.compile(r'\([^(]*?\)')
while('(' in s):
g = r.search(s).group(0)
s = r.sub(str(eval(g)),s)
print g
print s

This sounds very homeworkish so I am going to reply with some good reading that might lead you down the right path. Take a peek at http://en.wikipedia.org/wiki/Polish_notation. It's not exactly what you want but understanding will lead you pretty close to the answer.

i don't know exactly what you want to do, but if you want to add other operations and if you want to have more control over the expression, i suggest you to use a parser
http://www.dabeaz.com/ply/ <-- ply, for example

Replace in Python-* equivalent?

If I am finding & replacing some text how can I get it to replace some text that will change each day so ie anything between (( & )) whatever it is?
Cheers!

Use regular expressions (http://docs.python.org/library/re.html)?
Could you please be more specific, I don't think I fully understand what you are trying to accomplish.
EDIT:
Ok, now I see. This may be done even easier, but here goes:
>>> import re
>>> s = "foo(bar)whatever"
>>> r = re.compile(r"(\()(.+?)(\))")
>>> r.sub(r"\1baz\3",s)
'foo(baz)whatever'
For multiple levels of parentheses this will not work, or rather it WILL work, but will do something you probably don't want it to do.
Oh hey, as a bonus here's the same regular expression, only now it will replace the string in the innermost parentheses:
r1 = re.compile(r"(\()([^)^(]+?)(\))")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regular expression problem - python

I need to do something in regex but I'm really not good at it, long time didn't do that . /a/c/a.doc I need to change it to \\a\\c\\a.doc Please trying to do it by using regular expression in Python.

I'm entirely in favor of helping user483144 distinguish "solution" from "regular expression", as the previous two answerers have already done. It occurs to me, moreover, that os.path.normpath() http://docs.python.org/library/os.path.html might be what he's really after.

You can do it without regular expressions: x = '/a/c/a.doc' x = x.replace('/',r'\\') But if you really want to use re: x = re.sub('/', r'\\', x )

\\ is means "\\" or r"\\" ? re.sub(r'/', r'\\', 'a/b/c') use r'....' alwayse when you use regular expression.

'\\\'.join(r'/a/c/a.doc'.split("/"))

Related

Python re.sub with regex

Python split with regular expression to divide string

De-greedifying a regular expression in python

Splitting an expression

Replace in Python-* equivalent?

Categories

Resources