Splitting an expression - python

I have to split a string into a list of substrings according to the criteria that all the parenthesis strings should be split .
Lets say I have (9+2-(3*(4+2))) then I should get (4+2), (3*6) and (9+2-18).
The basic objective is that I learn which of the inner parenthesis is going to be executed first and then execute it.
Please help....
It would be helpful if you could suggest a method using re module. Just so this is for everyone it is not homework and I understand Polish notation. What I am looking for is using the power of Python and re module to use it in less lines of code.
Thanks a lot....

The eval is insecure, so you have to check input string for dangerous things.
>>> import re
>>> e = "(9+2-(3*(4+2)))"
>>> while '(' in e:
... inner = re.search('(\([^\(\)]+\))', e).group(1)
... e = re.sub(re.escape(inner), eval('str'+inner), e)
... print inner,
...
(4+2) (3*6) (9+2-18)

Try something like this:
import re
a = "(9+2-(3*(4+2)))"
s,r = a,re.compile(r'\([^(]*?\)')
while('(' in s):
g = r.search(s).group(0)
s = r.sub(str(eval(g)),s)
print g
print s

This sounds very homeworkish so I am going to reply with some good reading that might lead you down the right path. Take a peek at http://en.wikipedia.org/wiki/Polish_notation. It's not exactly what you want but understanding will lead you pretty close to the answer.

i don't know exactly what you want to do, but if you want to add other operations and if you want to have more control over the expression, i suggest you to use a parser
http://www.dabeaz.com/ply/ <-- ply, for example

Related

Python split with regular expression to divide string

I have a need to recover 2 results of a regular expression in Python: what is searched and all else.
For example, in:
"boofums",3,4
I'd like to find what is in the quotes and what isn't:
boofums
,3,4
What I have so far is:
bobbles = '"boofums",3,4'
pickles = re.split(r'\".*\"', bobbles)
morton = re.match(r'\".*\"', bobbles)
print(pickles[1])
print(morton[0])
,3,4
"boofums"
This seems to me insanely inefficient and not Python-esque. Is there a better way to do this? (Sorry for the "is there a better way" construct on StackOverflow, but... I need to do this better! 😂)
...and if you can help me extract just what's in the quotes, something that I'd easily do in Perl or Ruby, all the better!
You're probably best off with regex groupings:
So for your example I'd use something like
regex = re.compile("\"(.*)\"(.*)")
bobble_groups = regex.match(bobbles)
you can then use bobble_groups.group(1) to just get the quotation marks.
See named groups if you don't want to depend on an index number.
a, b = re.match('"(.*)"(.*)', bobbles).groups()
Brackets determine groups that are "saved" to the match object

backward search in python?

I have the list with lines like:
=cat-egory/packagename-version
so I have to split it up into 3 different variables, like
category = cat-egory
package_name = packagename
package_version = version
I have to avoid
= and /
chars
I am fond of perl so I used to write a regexp like:
(?<==)\w+.\w+
which would give me cat-egory without leading = character
and so on, but as far as I know ?<= does not work in python, how must I extract the data then?
It seems to be working well. See: https://regex101.com/r/nnMRKd/2
Seems to work OK, maybe you are just missing the basic Python framework for capturing:
import re
text = "=cat-egory/packagename-version"
results = re.search("(?<==)\w+.\w+", text)
if results:
print (results.group(0))
output:
cat-egory
Make sure to use .search instead of .match as suggested by a comment. the .group is how you reference what you have captured instead of $1 in perl. Nothing too fancy here :)
You could even go one step further and use tuple unpacking:
import re
string = "=cat-egory/packagename-version"
rx = re.compile(r'(?<==)([^/]+)/([^-]+)-(.+)')
for match in rx.finditer(string):
category, package_name, version = match.groups()
print(category)
# cat-egory

Splitting string by '],[', but keeping the brackets

Ive got a string in this format
a = "[a,b,c],[e,d,f],[g,h,i]"
Each part I want to be split is separated by ],[. I tried a.split("],[") and I get the end brackets removed.
In my example that would be:
["[a,b,c","e,d,f","g,h,i]"]
I was wondering if there was a way to keep the brackets after the split?
Desired outcome:
["[a,b,c]","[e,d,f]","[g,h,i]"]
The problem is that str.split removes whatever substring you split on from the resulting list. I think it would be better in this case to use the slightly more powerful split function from the re module:
>>> from re import split
>>> a = "[a,b,c],[e,d,f],[g,h,i]"
>>> split(r'(?<=\]),(?=\[)', a)
['[a,b,c]', '[e,d,f]', '[g,h,i]']
>>>
(?<=\]) is a lookbehind assertion which looks for ]. Similarly, (?=\[) is a lookahead assertion which looks for [. Both constructs are explained in Regular Expression Syntax.
Python is very flexible, so you just have to manage it a bit and be adaptive to your case.
In [8]:a = "[a,b,c],[e,d,f],[g,h,i]"
a.replace('],[','] [').split(" ")
Out[8]:['[a,b,c]', '[e,d,f]', '[g,h,i]']
The other answers are correct, but here is another way to go.
Important note: this is just to present another option that may prove useful in certain cases. Don't do it in the general case, and do so only in you're absolutely certain that you have the control over the expression you're passing into exec statement.
# provided you declared a, b, c, d, e, f, g, h, i beforehand
>>> exp = "[a,b,c],[e,d,f],[g,h,i]"
>>> exec("my_object = " + exp)
>>> my_object
([a,b,c],[e,d,f],[g,h,i])
Then, you can do whatever you like with my_object.
Provided that you have full control over exp, this way of doing sounds more appropriate and Pythonic to me because you are treating a piece of Python code written in a string as a... piece of Python code written in a string (hence the exec statement). Without manipulating it through regexp or artificial hacks.
Just keep in mind that it can be dangerous.

in python find index in list if combination of strings exist

I'm writing my first script and trying to learn python.
But I'm stuck and can't get out of this one.
I'm writing a script to change file names.
Lets say I have a string = "this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv"
I want the result to be string = "This Is Test3 E00"
this is what I have so far:
l = list(string)
//Transform the string into list
for i in l:
if "E" in l:
p = l.index("E")
if isinstance((p+1), int () is True:
if isinstance((p+2), int () is True:
delp = p+3
a = p-3
del l[delp:]
new = "".join(l)
new = new.replace("."," ")
print (new)
get in index where "E" and check if after "E" there are 2 integers.
Then delete everything after the second integer.
However this will not work if there is an "E" anyplace else.
at the moment the result I get is:
this is tEst
because it is finding index for the first "E" on the list and deleting everything after index+3
I guess my question is how do I get the index in the list if a combination of strings exists.
but I can't seem to find how.
thanks for everyone answers.
I was going in other direction but it is also not working.
if someone could see why it would be awesome. It is much better to learn by doing then just coping what others write :)
this is what I came up with:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
anyone can tell me why this isn't working. I get an error.
Thank you so much
Have you ever heard of a Regular Expression?
Check out python's re module. Link to the Docs.
Basically, you can define a "regex" that would match "E and then two integers" and give you the index of it.
After that, I'd just use python's "Slice Notation" to choose the piece of the string that you want to keep.
Then, check out the string methods for str.replace to swap the periods for spaces, and str.title to put them in Title Case
An easy way is to use a regex to find up until the E followed by 2 digits criteria, with s as your string:
import re
up_until = re.match('(.*?E\d{2})', s).group(1)
# this.is.tEst3.E00
Then, we replace the . with a space and then title case it:
output = up_until.replace('.', ' ').title()
# This Is Test3 E00
The technique to consider using is Regular Expressions. They allow you to search for a pattern of text in a string, rather than a specific character or substring. Regular Expressions have a bit of a tough learning curve, but are invaluable to learn and you can use them in many languages, not just in Python. Here is the Python resource for how Regular Expressions are implemented:
http://docs.python.org/2/library/re.html
The pattern you are looking to match in your case is an "E" followed by two digits. In Regular Expressions (usually shortened to "regex" or "regexp"), that pattern looks like this:
E\d\d # ('\d' is the specifier for any digit 0-9)
In Python, you create a string of the regex pattern you want to match, and pass that and your file name string into the search() method of the the re module. Regex patterns tend to use a lot of special characters, so it's common in Python to prepend the regex pattern string with 'r', which tells the Python interpreter not to interpret the special characters as escape characters. All of this together looks like this:
import re
filename = 'this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv'
match_object = re.search(r'E\d\d', filename)
if match_object:
# The '0' means we want the first match found
index_of_Exx = match_object.end(0)
truncated_filename = filename[:index_of_Exx]
# Now take care of any more processing
Regular expressions can get very detailed (and complex). In fact, you can probably accomplish your entire task of fully changing the file name using a single regex that's correctly put together. But since I don't know the full details about what sorts of weird file names might come into your program, I can't go any further than this. I will add one more piece of information: if the 'E' could possibly be lower-case, then you want to add a flag as a third argument to your pattern search which indicates case-insensitive matching. That flag is 're.I' and your search() method would look like this:
match_object = re.search(r'E\d\d', filename, re.I)
Read the documentation on Python's 're' module for more information, and you can find many great tutorials online, such as this one:
http://www.zytrax.com/tech/web/regex.htm
And before you know it you'll be a superhero. :-)
The reason why this isn't working:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
...is because 'i' contains a character from the string 'l', not an integer. You compare it with 'E' (which works), but then try to add 1 to it, which errors out.

Replace in Python-* equivalent?

If I am finding & replacing some text how can I get it to replace some text that will change each day so ie anything between (( & )) whatever it is?
Cheers!
Use regular expressions (http://docs.python.org/library/re.html)?
Could you please be more specific, I don't think I fully understand what you are trying to accomplish.
EDIT:
Ok, now I see. This may be done even easier, but here goes:
>>> import re
>>> s = "foo(bar)whatever"
>>> r = re.compile(r"(\()(.+?)(\))")
>>> r.sub(r"\1baz\3",s)
'foo(baz)whatever'
For multiple levels of parentheses this will not work, or rather it WILL work, but will do something you probably don't want it to do.
Oh hey, as a bonus here's the same regular expression, only now it will replace the string in the innermost parentheses:
r1 = re.compile(r"(\()([^)^(]+?)(\))")

Categories

Resources