Can we add/change scope with a plugin in Sublime Text 3? - python

In a custom *.sublime-syntax file, I have this:
- match: '(\d+)/(\d+)x (.+)$'
captures:
1: constant.numeric.owned.items_tracker
2: constant.numeric.needed.items_tracker
3: variable.description.items_tracker
I would like to set capture 3 scope to variable.description.done.items_tracker instead if capture 1 is greater than capture 2
I think is not possible to do this in sublime-syntax ; so, can I do this with python in a plugin and how?

In Sublime, syntax definitions are the only way to apply scopes, and the scopes are applied by using regular expression matches such as the ones outlined in your question.
The Syntax facility doesn't have any sort of direct "conditional" logic that would allow it to take a programmatic action like converting the matches to integers and then compare them and do something different.
Additionally although a plugin can modify the source of a file at will, it can't apply scopes; that is strictly the purview of the syntax definition itself.
As such, what you want to do is not directly possible in the general case. A potential workaround might be to have as many rules as there are combinations of numeric values, but that's not practical unless the total range of values and the spreads between them is very small.

Related

How is PLY's parsetab.py formatted?

I'm working on a project to convert MATLAB code to Python, and have been somewhat successful after building off others work. The tool uses PLY (an implementation of lex and yacc parsing tools for Python) to parse the MATLAB input. Unfortunately, it is a requirement that my code is written in Python 3, not Python 2. The tool runs without issue in Python 2, but I get a strange error in Python 3 (Assuming A is an array):
log_idx = A <= 16;
^
SyntaxError: Unexpected "=" (parser)
The MATLAB code I am trying to convert is:
idx = A <= 16;
which should convert to almost the same thing in Python 3:
idx = A <= 16
The only real difference between the Python 3 code and the Python 2 code is the PLY-generated parsetab.py file, which has substantial differences in the following variables:
_tabversion
_lr_signature
_lr_action_items
_lr_goto_items
I'm having trouble understanding the purpose of these variables and why they could be different when the only difference was the Python version used to generate the parsetab.py file.
I tried searching for documentation on this, but was unsuccessful. I originally suspected it could be a difference in the way strings are formatted between Python 2 and Python 3, but that didn't turn anything up either. Is there anyone familiar with PLY that could give some insight into how these variables are generated, or why the Python version is creating this difference?
Edit: I'm not sure if this would be useful to anyone because the file is very long and cryptic, but below is an example of part of the first lines of _lr_action_items and _lr_goto_items
Python 2:
_lr_action_items = {'DOTDIV':([6,9,14,20,22,24,32,34,36,42,46,47,52,54,56,57,60,71,72,73,74,75 ...
_lr_goto_items = {'lambda_args':([45,80,238,],[99,161,263,]),'unwind':([1,8,28,77,87,160,168,177 ...
Python 3:
_lr_action_items = {'END_STMT':([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,26,27,39,41,48,50 ...
_lr_goto_items = {'top':([0,],[1,]),'stmt':([1,44,46,134,137,207,212,214,215,244,245,250 ...
I'm going to go out on a limb here, because you have provided practically no indication of what code you are actually using. So I'm just going to assume that you copied the lexer.py file from the github repository you linked to in your question.
There's an important clue in this error message:
log_idx = A <= 16;
^
SyntaxError: Unexpected "=" (parser)
Evidently, <= is not being scanned as a single token; otherwise, the parser would not see an = token at that point in the input. This can only mean that the scanner is returning two tokens, < and =, and if that's the case, it is most certainly a syntax error, as you would expect from
log_idx = A < = 16;
To figure out why the lexer would do this, it's important to understand how the Ply (default) lexer works. It gathers up all the lexer patterns from variables whose names start t_, which must be either functions or variables whose values are strings. It then sorts them as follows:
function docstrings, in order by line number in the source file.
string values, in reverse order by length.
See Specification of Tokens in the Ply manual.
That usually does the right thing, but not always. The intention of sorting in reverse order by length is that a prefix pattern will come after a pattern which matches a longer string. So if you had patterns '<' and '<=', '<=' would be tried first, and so in the case where the input had <=, the < pattern would never be tried. That's important, since if '<' is tried first, '<=' will never be recognised.
However, this simple heuristic does not always work. The fact that a regular expression is shorter does not necessarily mean that its match will be shorter. So if you expect "maximal munch" semantics, you sometimes have to be careful about your patterns. (Or you can supply them as docstrings, because then you have complete control over the order.)
And whoever created that lexer.py file was not careful about their patterns, because it includes (among other issues):
t_LE = r"<="
t_LT = r"\<"
Note that since these are raw strings, the backslash is retained in the second string, so both patterns are of length 2:
>>> len(r"\<")
2
>>> len(r"<=")
2
Since the two patterns have the same length, their relative order in the sort is unspecified. And it is quite possible that the two versions of Python produce different sort orders, either because of differences in the implementation of sort or because of differences in the order which the dictionary of variables is iterated, or some combination of the above.
< has no special significance in a Python regular expression, so there is no need to backslash-escape it in the definition of t_LT. (Clearly, since it is not backslash-escaped in t_LE.) So the simplest solution would be to make the sort order unambiguous by removing the backslash:
t_LE = r"<="
t_LT = r"<"
Now, t_LE is longer and will definitely be tried first.
That's not the only instance of this problem in the lexer file, so you might want to revise it carefully.
Note: You could also fix the problem by adding an unnecessary backslash to the t_LE pattern; there is an argument for taking the attitude, "When in doubt, escape." However, it is useful to know which characters need to be escaped in a Python regex, and the Python documentation for the re package contains a complete list. Also, consider using long raw strings for patterns which include quotes, since neither " nor ' need to be backslash escaped in a Python regex.

How to change priority in math order(asterisk)

I want users to input math formula in my system. How can convert case1 formula to case2 formula using Python? In another word, I would like to change math order specifically for double asterisks.
#case1
3*2**3**2*5
>>>7680
#case2
3*(2**3)**2*5
>>>960
Not only is this not something that Python supports, but really, why would you want to? Modifying BIDMAS or PEMDAS (depending on your location), would not only give you incorrect answers, but also confuse the hell out of any devs looking at the code.
Just use brackets like in Case 2, it's what they're for.
If users are supposed to enter formulas into your program, I would suggest keeping it as is. The reason is that exponentiation in mathematics is right-associative, meaning the execution goes from the top level down. For example: a**b**c = a**(b**c), by convention.
There are some programs that use bottom-up resolution of the stacked exponentiation -- MS Excel and LibreOffice are some of them, however, it is against the regular convention, and always confused the hell out of me.
If you would like to override this behavior, and still be mathematically correct, you have to use brackets.
You can always declare your own power method that would resolve the way you want it -- something like numpy.pow(). You could overload the built-in, but that's too much hastle.
Read this
Below is the example to achieve this using re as:
expression = '3*2**3**2*5'
asterisk_exprs = re.findall(r"\d+\*\*\d+", expression) # List of all expressions matching the criterion
for expr in asterisk_exprs:
expression = expression.replace(expr, "({})".format(expr)) # Replace the expression with required expression
# Value of variable 'expression': '3*(2**3)**2*5'
In order to evaluate the mathematical value of str expression, use eval as:
>>> eval(expression)
960

Why use Python format rather than slicing?

I've inherited some Python code and during a review I discovered several instances in which the author uses format to slice strings, such as:
someStr = 'ABCDEFG'
newStr1 = '{0}{1}{2}'.format(someStr[0], someStr[1], someStr[2])
Why would they do that, instead of slicing it, such as:
newStr2 = someStr[0:3]
Both achieve the same result, even when using a left-padded numeric sequence such as '012345' (meaning that '012' is produced, as expected, rather than '12').
It seems to me that slicing is more intuitive, so I'm curious if there is an advantage to using format?
There is not advantage to using str.format() here, I have no idea why they are doing that. There is a small chance they were doing this to force an indexing error if the inputstring was shorter than 3 characters, but I'd have expected an explicit comment stating this (and personally I'd have made that an explicit check).
Just use a slice:
newStr2 = somStr[:3]
Note that you can drop the 0 here; it is the default. If you are going to update the code (do so only when doing other maintenance), you might want to update it to using the PEP-8 style guide, including naming conventions. Local variables should use lower_case_with_underscores, not mixedCamelCase.

Python - Evaluating a string expression in a string

I am trying to do something like this
Evaluating a mathematical expression in a string
Update - Some details about the app
The app 'exposes' some variables to the users. An example of an exposed variable is user_name. The user of the app can then create a new variable called 'user_name_upper' that can be set as user_name.upper(). Another example is exposed variables first_name and last_name and the user can create a new variable called 'full_name = last_name.upper() + ',' + first_name.upper()'. This is entered using a input box UI element. So no hooks into the program. Or think of this as a report like excel where I can create a new column to be a manipulation of some already defined variables.
The users of this app are not programmers. But they can be given a list of examples to find their way around string manipulations
However, my expression will be used for string manupulation. Something like "string3 = string1 + string2". Here I'd like set the value of string3 to the value of string1 appended with string2.
Or "string1 = string2.lower()"
I have researched and have come to the conclusion that eval can be used but is very dangerous. From what I understand, ast_literal_eval() will not work with string manipulation methods like lower()
as described here Why does this string not work with ast.literal_eval
Any suggestion on how to go about this?
ast.literal_eval is the wrong function. It only evaluates literals like 2.3 or "hello".
What you want is the built in function compile() or ast.parse(). These functions IMHO (I never used them) can create abstract syntax trees. Look at the second paragraph of:
http://docs.python.org/2/library/ast.html
Off course it's risky to let your users enter arbitrary expressions. However I think you ask this question, because you want want to search the AST for problematic code.
Although I would recommend other methods of using this, if you need to use dynamic variables (when you don't know what they will be called, or how many there will be), I find that dictionaries work well.
Ex:
def createVariable(variables, string1, string2):
variables[string1] = string2.lower()

Function Parser with RegEx in Python

I have a source code in Fortran (almost irrelevant) and I want to parse the function names and arguments.
eg using
(\w+)\([^\(\)]+\)
with
a(b(1 + 2 * 2), c(3,4))
I get the following: (as expected)
b, 1 + 2 * 2
c, 3,4
where I would need
a, b(1 + 2 * 2), c(3,4)
b, 1 + 2 * 2
c, 3,4
Any suggestions?
Thanks for your time...
It can be done with regular expressions-- use them to tokenize the string, and work with the tokens. i.e. see re.Scanner. Alternatively, just use pyparsing.
This is a nonlinear grammar -- you need to be able to recurse on a set of allowed rules. Look at pyparsing to do simple CFG (Context Free Grammar) parsing via readable specifications.
It's been a while since I've written out CFGs, and I'm probably rusty, so I'll refer you to the Python EBNF to get an idea of how you can construct one for a subset of a language syntax.
Edit: If the example will always be simple, you can code a small state machine class/function that iterates over the tokenized input string, as #Devin Jeanpierre suggests.
You can take a look at PLY (Python Lex-Yacc), it's (in my opinion) very simple to use and well documented, and it comes with a calculator example which could be a good starting point.
I don't think this is a job for regular expressions... they can't really handle nested patterns.
This is because regexes are compiled into FSMs (Finite State Machines). In order to parse arbitrarily nested expressions, you can't use a FSM, because you need infinitely many states to keep track of the arbitrary nesting. Also see this SO thread.
You can't do this with regular expression only. It's sort of recursive. You should match first the most external function and its arguments, print the name of the function, then do the same (match the function name, then its arguments) with all its arguments. Regex alone are not enough.

Categories

Resources