De-greedifying a regular expression in python

De-greedifying a regular expression in python - python

I'm trying to write a regular expression that will convert a full path filename to a short filename for a given filetype, minus the file extension.
For example, I'm trying to get just the name of the .bar file from a string using
re.search('/(.*?)\.bar$', '/def_params/param_1M56/param/foo.bar')
According to the Python re docs, *? is the ungreedy version of *, so I was expecting to get
'foo'
returned for match.group(1) but instead I got
'def_params/param_1M56/param/foo'
What am I missing here about greediness?

What you're missing isn't so much about greediness as about regular expression engines: they work from left to right, so the / matches as early as possible and the .*? is then forced to work from there. In this case, the best regex doesn't involve greediness at all (you need backtracking for that to work; it will, but could take a really long time to run if there are a lot of slashes), but a more explicit pattern:
'/([^/]*)\.bar$'

I would suggest changing your regex so that it doesn't rely on greedyness.
You want only the filename before the extension .bar and everything after the final /. This should do:
re.search(`/[^/]*\.bar$`, '/def_params/param_1M56/param/foo.bar')
What this does is it matches /, then zero or more characters (as much as possible) that are not / and then .bar.

I don't claim to understand the non-greedy operators all that well, but a solution for that particular problem would be to use ([^/]*?)

The regular expressions starts from the right. Put a .* at the start and it should work.

I like regex but there is no need of one here.
path = '/def_params/param_1M56/param/foo.bar'
print path.rsplit('/',1)[1].rsplit('.')[0]
path = '/def_params/param_1M56/param/fululu'
print path.rsplit('/',1)[1].rsplit('.')[0]
path = '/def_params/param_1M56/param/one.before.two.dat'
print path.rsplit('/',1)[1].rsplit('.',1)[0]
result
foo
fululu
one.before.two

Other people have answered the regex question, but in this case there's a more efficient way than regex:
file_name = path[path.rindex('/')+1 : path.rindex('.')]

try this one on for size:
match = re.search('.*/(.*?).bar$', '/def_params/param_1M56/param/foo.bar')

Related

python - the string behind forward slash can't be find by re module

It seems a very simple problem, but this has taken me a few hours:
mystr='link/123'
pattern = re.compile(r'123')
print(pattern.match(mystr))
And result is None.
As I known, '/' is just a ordinary char, and I have no idea why re does not work?

match will only match at the beginning of the string.
https://docs.python.org/3/library/re.html#re.regex.match
Use search instead.
https://docs.python.org/3/library/re.html#re.regex.search

regex match proc name without slash

I have a list of proc names on Linux. Some have slash, some don't. For example,
kworker/23:1
migration/39
qmgr
I need to extract just the proc name without the slash and the rest. I tried a few different ways but still won't get it completely correct. What's wrong with my regex? Any help would be much appreciated.
>>> str='kworker/23:1'
>>> match=re.search(r'^(.+)\/*',str)
>>> match.group(1)
'kworker/23:1'

The problem with the regex is, that the greedy .+ is going until the end, because everything after it is optional, meaning it is kept as short as possible (essentially empty). To fix this replace the . with anything but a /.
([^\/]+)\/?.*
works. You can test this regex here. In case it is new to you, [^\/] matches anything, but a slash., as the ^ in the beginning inverts which characters are matched.
Alternatively, you can also use split as suggested by Moses Koledoye. split is often better for simple string manipulation, while regex enables you to perform very complex tasks with rather little code.

An alternative to regex is to split on slash and take the first item:
>>> s ='kworker/23:1'
>>> s.split('/')[0]
'kworker'
This also works when the string does not contain a slash:
>>> s = 'qmgr'
>>> s.split('/')[0]
'qmgr'
But if you're going to stick to re, I think re.sub is what you want, as you won't need to fetch the matching group:
>>> import re
>>> s ='kworker/23:1'
>>> re.sub(r'/.*$', '', s)
'kworker'
On a side note, assignig the name str shadows the in built string type, which you don't want.

Find string in possibly multiple parentheses?

I am looking for a regular expression that discriminates between a string that contains a numerical value enclosed between parentheses, and a string that contains outside of them. The problem is, parentheses may be embedded into each other:
So, for example the expression should match the following strings:
hey(example1)
also(this(onetoo2(hard)))
but(here(is(a(harder)one)maybe23)Hehe)
But it should not match any of the following:
this(one)is22misleading
how(to(go)on)with(multiple)3parent(heses(around))
So far I've tried
\d[A-Za-z] \)
and easy things like this one. The problem with this one is it does not match the example 2, because it has a ( string after it.
How could I solve this one?

The problem is not one of pattern matching. That means regular expressions are not the right tool for this.
Instead, you need lexical analysis and parsing. There are many libraries available for that job.
You might try the parsing or pyparsing libraries.

These type of regexes are not always easy, but sometimes it's possible to come up with a way provided the input remains somewhat consistent. A pattern generally like this should work:
(.*(\([\d]+[^(].*\)|\(.*[^)][\d]+.*\)).*)
Code:
import re
p = re.compile(ur'(.*(\([\d]+[^(].*\)|\(.*[^)][\d]+.*\)).*)', re.MULTILINE)
result = re.findall(p, searchtext)
print(result)
Result:
https://regex101.com/r/aL8bB8/1

Python regular expression problem

I need to do something in regex but I'm really not good at it, long time didn't do that .
/a/c/a.doc
I need to change it to
\\a\\c\\a.doc
Please trying to do it by using regular expression in Python.

I'm entirely in favor of helping user483144 distinguish "solution" from "regular expression", as the previous two answerers have already done. It occurs to me, moreover, that os.path.normpath() http://docs.python.org/library/os.path.html might be what he's really after.

why do you think you every solution to your problem needs regular expression??
>>> s="/a/c/a.doc"
>>> '\\'.join(s.split("/"))
'\\a\\c\\a.doc'
By the way, if you are going to change path separators, you may just as well use os.path.join
eg
mypath = os.path.join("C:\\","dir","dir1")
Python will choose the correct slash for you. Also, check out os.sep if you are interested.

You can do it without regular expressions:
x = '/a/c/a.doc'
x = x.replace('/',r'\\')
But if you really want to use re:
x = re.sub('/', r'\\', x )

\\ is means "\\" or r"\\" ?
re.sub(r'/', r'\\', 'a/b/c')
use r'....' alwayse when you use regular expression.

'\\\'.join(r'/a/c/a.doc'.split("/"))

Regular expression to match start of filename and filename extension

What is the regular expression to match strings (in this case, file names) that start with 'Run' and have a filename extension of '.py'?
The regular expression should match any of the following:
RunFoo.py
RunBar.py
Run42.py
It should not match:
myRunFoo.py
RunBar.py1
Run42.txt
The SQL equivalent of what I am looking for is ... LIKE 'Run%.py' ....

For a regular expression, you would use:
re.match(r'Run.*\.py$')
A quick explanation:
. means match any character.
* means match any repetition of the previous character (hence .* means any sequence of chars)
\ is an escape to escape the explicit dot
$ indicates "end of the string", so we don't match "Run_foo.py.txt"
However, for this task, you're probably better off using simple string methods. ie.
filename.startswith("Run") and filename.endswith(".py")
Note: if you want case insensitivity (ie. matching "run.PY" as well as "Run.py", use the re.I option to the regular expression, or convert to a specific case (eg filename.lower()) before using string methods.

I don't really understand why you're after a regular expression to solve this 'problem'. You're just after a way to find all .py files that start with 'Run'. So this is a simple solution that will work, without resorting to compiling an running a regular expression:
import os
for filename in os.listdir(dirname):
root, ext = os.path.splitext(filename)
if root.startswith('Run') and ext == '.py':
print filename

Warning:
jobscry's answer ("^Run.?.py$") is incorrect (will not match "Run123.py", for example).
orlandu63's answer ("/^Run[\w]*?.py$/") will not match "RunFoo.Bar.py".
(I don't have enough reputation to comment, sorry.)

/^Run.*\.py$/
Or, in python specifically:
import re
re.match(r"^Run.*\.py$", stringtocheck)
This will match "Runfoobar.py", but not "runfoobar.PY". To make it case insensitive, instead use:
re.match(r"^Run.*\.py$", stringtocheck, re.I)

You don't need a regular expression, you can use glob, which takes wildcards e.g. Run*.py
For example, to get those files in your current directory...
import os, glob
files = glob.glob( "".join([ os.getcwd(), "\\Run*.py"]) )

If you write a slightly more complex regular expression, you can get an extra feature: extract the bit between "Run" and ".py":
>>> import re
>>> regex = '^Run(?P<name>.*)\.py$'
>>> m = re.match(regex, 'RunFoo.py')
>>> m.group('name')
'Foo'
(the extra bit is the parentheses and everything between them, except for '.*' which is as in Rob Howard's answer)

This probably doesn't fully comply with file-naming standards, but here it goes:
/^Run[\w]*?\.py$/

mabye:
^Run.*\.py$
just a quick try

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

De-greedifying a regular expression in python - python

I don't claim to understand the non-greedy operators all that well, but a solution for that particular problem would be to use ([^/]*?)

The regular expressions starts from the right. Put a .* at the start and it should work.

Other people have answered the regex question, but in this case there's a more efficient way than regex: file_name = path[path.rindex('/')+1 : path.rindex('.')]

try this one on for size: match = re.search('./(.?).bar$', '/def_params/param_1M56/param/foo.bar')

Related

python - the string behind forward slash can't be find by re module

regex match proc name without slash

Find string in possibly multiple parentheses?

Python regular expression problem

Regular expression to match start of filename and filename extension

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

De-greedifying a regular expression in python - python

I don't claim to understand the non-greedy operators all that well, but a solution for that particular problem would be to use ([^/]*?)

The regular expressions starts from the right. Put a .* at the start and it should work.

Other people have answered the regex question, but in this case there's a more efficient way than regex: file_name = path[path.rindex('/')+1 : path.rindex('.')]

try this one on for size: match = re.search('.*/(.*?).bar$', '/def_params/param_1M56/param/foo.bar')

Related

python - the string behind forward slash can't be find by re module

regex match proc name without slash

Find string in possibly multiple parentheses?

Python regular expression problem

Regular expression to match start of filename and filename extension

Categories

Resources

try this one on for size: match = re.search('./(.?).bar$', '/def_params/param_1M56/param/foo.bar')