Related
I want to make a kind of simple spreadsheet with python.
I need to parse a formula from a string.
All the operations I need for now are: + - * / ^ ()
Formulas will be always starting with '='.
Examples of input:
=4+8-6/2
this is simple: just usig eval()
2.=4b+12*(2+5)
where '4b' is a link to another cell in spreadsheet (a variable).
** It's possible to make all the links like 'b4' instead of '4b'
The script would substitute the link (variable) with corresponding value.
What I'm unable to achive is to make parser 'understand' variables like '4b' or 'b4', the rest is pretty simple
What would you suggest?
P.S. I'm new to Python. Tried pyparsing but it's to complicated to use or to correct custom examples according to my needs. Hope to find more simple solution
ActiveState has an amazingly simple recipe for a Python-based spreadsheet that meets most of your requirements, I believe. Its class SpreadSheet is defined in terms of a couple of internal dictionaries along with a very small number of relatively short methods.
The related comments are also very interesting and show how it could be extended and made to minimize potential security issues. I highly recommend you take a look at it.
A major limitation I noted about it is that there's no dependency checking, so updating a cell doesn't automatically update any that might be depend upon it, and any that might be dependent on those, etc.
Lambda expressions to create anonymous functions would be an efficient way to handle this. I would recommend checking out some of Peter Norvig's Udacity course - The design of computer programs. As part of lesson 2 he covers a scenario very similar to this looking at cryptarithmatic. The course is free, and self paced so you can dip in and skip around lessons / sub lessons as much as you need:
https://www.udacity.com/course/cs212
The section below goes into more detail, but basically someone stated that the Ruby-written DSL RSpec couldn't be rewritten in Python. Is that true? If so, why?
I'm wanting to better understand the technical differences between Ruby and Python.
Update: Why am I asking this question?
The Running away from RSpec discussion has some statements about it being "impossible" to recreate RSpec in Python. I was trying to make the question a little broader in hopes of learning more of the technical differences between Ruby and Python. In hindsight, maybe I should have tightened the question's scope to just asking if it truly is impossible to recreate RSpec in Python, and if so why.
Below are just a few quotes from the Running away from RSpec discussion.
Initial Question
For the past few weeks I have been thinking a lot about RSpec and why there is no clear, definite answer when someone asks:
"I'm looking for a Python equivalent of RSpec. Where can I find such a
thing?"
Probably the most common (and understandable) answer is that Python syntax
wouldn't allow such a thing whereas in Ruby it is possible.
First Response to Initial Question
Not syntax exactly. Rspec monkeypatches every object inside of its
scope, inserting the methods "should" and "should_not". You can do
something in python, but you can't monkeypatch the built-in types.
Another Response
As you suggest, it's impossible. Mote and PySpec are just fancy ways
to name your tests: weak implementations of one tiny corner of RSpec.
Mote uses horrible settrace magic; PySpec adds a bunch of
domain-irrelevant noise. Neither even supports arbitrary context
strings. RSpec is more terse, more expressive, removes the noise, and
is an entirely reasonable thing to build in Ruby.
That last point is important: it's not just that RSpec is possible in
Ruby; it's actually idiomatic.
If I had to point out one great difficulty for creating a Python RSpec, it would be the lack of a good syntax in Python for creating anonymous functions (as in JavaScript) or blocks (as in Ruby). The only option for a Python programmer is to use lambdas, which is not an option at all because lambdas just accept one expression. The do ... end blocks used in RSpec would have to be written as a function before calling describe and it, as in the example below:
def should_do_stuff():
# ...
it("should do stuff", should_do_stuff)
Not so sexy, right?
There are some difficulties in creating the should methods, but I bet it would be a smaller problem. Actually, one does not even need to use such an unusual syntax—you could get similar results (maybe even better, depending on your taste) using the Jasmine syntax, which can be trivially implemented.
That said, I feel that Python syntax is more focused on efficiently representing the usual program components such as classes, functions, variables, etc. It is not well suited to be extended. I, for one, think that a good Python program is one where I can see objects, and functions, and variables, and I understand what each one of these elements do. Ruby programmers, OTOH, seem to seek for a more prose-like style, where a new language is defined for a new problem. It is a good way of doing things, too, but not a Pythonic way. Python is good to represent algorithms, not prose.
Sometimes it is a draconian limit. How could one use BDD for example? Well, the usual way of pushing these limits in Python is to effectively write your own DSL, but it should REALLY be another language. That is what Pyccuracy is, for example: another language for BDD. A more mainstream example is doctest. (Actually, if I would write some BDD Python library, I would write it based on doctest.) Another example of Python DSL is Twill. And yet another example is reStructuredText, used in Sphinx.
Summarizing: IMHO the hardest barrier to DSLs in Python is the lack of a flexible syntax for creating anonymous functions. And it is not a fault*: Python is not fond of having its syntax heavily explored anyway—it is considered to make code less clear in the Python universe. If you want a new syntax in Python you are well advised to write your own language, or at least it is the way I feel.
* Or maybe it is - I have to confess that I miss anonymous functions. However, I recognize that they would be hard to implement elegantly given the Python semantic indentation.
I set out on an attempt to implement something like rspec in Python.
I got this:
with It('should pass') as test:
test.should_be_equal(1, 1)
source: https://gist.github.com/2029866
(thoughts?)
EDIT: My answer to your question is that the lack of anonymous blocks prevents a Ruby DSL like RSpec from being rewritten in Python but you can get a close approximation using with statements.
One of Ruby's strengths is in the creation of DSLs. However the reasons given for it being difficult in python can be sidestepped. For example you can easily subclass the builtin types, e.g:
>>> class myint(int): pass
>>> i = myint(5)
>>> i
5
If I were going to create a DSL in python I'd use pyparsing or Parsley and something like the above behind the scenes, optimizing the syntax for the problem, not the implementation language.
By mixing Mamba and Expects, I think you can get very close to what RSpec is for Rails...
https://github.com/nestorsalceda/mamba
https://github.com/jaimegildesagredo/expects
Also, I think Specter should match your expectations with testing:
https://github.com/jmvrbanac/Specter
http://specter.readthedocs.io/en/latest/writing_tests/index.html
I think this is what you are looking for. Yes, we made the "impossible" in python
"sure" is an utility belt for expressive python tests, created by Gabriel Falcão
i am creating ( researching possibility of ) a highly customizable python client and would like to allow users to actually edit the code in another language to customize the running of program. ( analogous to browser which itself coded in c/c++ and run another language html/js ). so my question is , is there any programming language implemented in pure python which i can see as a reference ( or use directly ? ) -- i need simple language ( simple statements and ifs can do )
edit: sorry if i did not make myself clear but what i want is "a language to customize the running of program" , even though pypi seems a great option, what i am looking for is more simple which i can study and extend myself if need arise. my google searches pointing towards xml based langagues. ( BMEL , XForms etc ).
The question isn't completely clear on scope, but I have a hunch that PyPy, embedding other full languages, and similar solutions might be overkill. It sounds like iamgopal may really be interested in something more like Interpreter Pattern or Little Language.
If the language you want to support is really small (see the Interpreter Pattern link), then hand-coding this yourself in Python won't be too hard. You can write a simple parser (Google around; here's one example), then walk the AST and evaluate user expressions.
However, if you expect this to be used for a long time or by many people, it may be worth throwing a real language at the problem. (I'd recommend Python itself if your users are already familiar with basic Python syntax).
Ren'Py is a modification to Python syntax built on top of Python itself, using the language tools in the stdlib.
For your user's sake, don't use an XML based language - XML is an awful basis for a programming language and your users will hate you for it.
Here is a suggestion. Use a strict subset of Python for your language. Use the compiler module to convert their code into an abstract syntax tree and walk the tree to to validate that the code conforms to your subset before converting the AST into python bytecode.
N.B. I just checked the docs and see that the compiler package is deprecated in 2.6 and removed in Python 3.x. Does anyone know why that is?
Numerous template languages such as Cheetah, Django templates, Genshi, Mako, Mighty might serve as an example.
Why not Python itself? With some care you can use eval to run user code.
One of the good thing about interpreted scripting languages is that you don't need another extra scripting language!
PLY (Python Lex-Yacc)
is something of your interest.
Possibly Common Lisp (or any other Lisp) will be the best choice for that task. Because Lisp make it possible to easily extend host language with powerful macroses and construct DSL (domain specific language).
If all you need is simple if statements and expressions, I'm sure it wouldn't be an awful task to parse each line. Something like
if some flag
activate some feature
deactivate some feature
elif some other flag
activate some feature
activate some feature
else
logout
Just write a class which, while parsing takes the first word, checks if it's "if, elif, else," etc, and if so, check a flag and set a flag saying you either are or are not executing until the next conditional. If it's not a conditional, call a function based on the first keyword that would modify the program state in some way.
The class could store some local execution state (are we in an if statement? If so are we executing this branch?) and have another class containing some global application state (flags that are checkable by if statements, etc).
This is probably the wrong thing to do in your situation (it's very prone to bugs, it's dangerous if you don't treat the data in the scripts correctly), but it's at least a start if you do decide to interpret your own mini-language.
Seriously though, if you try this, be very, very, srs careful. Don't give the scripts any functionality that they don't definitely need, because you are almost certainly opening security holes by doing something like this.
Don't say I didn't warn you.
Take this simple C# LINQ query, and imagine that db.Numbers is an SQL table with one column Number:
var result =
from n in db.Numbers
where n.Number < 5
select n.Number;
This will run very efficiently in C#, because it generates an SQL query something like
select Number from Numbers where Number < 5
What it doesn't do is select all the numbers from the database, and then filter them in C#, as it might appear to do at first.
Python supports a similar syntax:
result = [n.Number for n in Numbers if n.Number < 5]
But it the if clause here does the filtering on the client side, rather than the server side, which is much less efficient.
Is there something as efficient as LINQ in Python? (I'm currently evaluating Python vs. IronPython vs. Boo, so an answer that works in any of those languages is fine.)
sqlsoup in sqlalchemy gives you the quickest solution in python I think if you want a clear(ish) one liner . Look at the page to see.
It should be something like...
result = [n.Number for n in db.Numbers.filter(db.Numbers.Number < 5).all()]
LINQ is a language feature of C# and VB.NET. It is a special syntax recognized by the compiler and treated specially. It is also dependent on another language feature called expression trees.
Expression trees are a little different in that they are not special syntax. They are written just like any other class instantiation, but the compiler does treat them specially under the covers by turning a lambda into an instantiation of a run-time abstract syntax tree. These can be manipulated at run-time to produce a command in another language (i.e. SQL).
The C# and VB.NET compilers take LINQ syntax, and turn it into lambdas, then pass those into expression tree instantiations. Then there are a bunch of framework classes that manipulate these trees to produce SQL. You can also find other libraries, both MS-produced and third party, that offer "LINQ providers", which basically pop a different AST processer in to produce something from the LINQ other than SQL.
So one obstacle to doing these things in another language is the question whether they support run-time AST building/manipulation. I don't know whether any implementations of Python or Boo do, but I haven't heard of any such features.
Look closely at SQLAlchemy. This can probably do much of what you want. It gives you Python syntax for plain-old SQL that runs on the server.
I believe that when IronPython 2.0 is complete, it will have LINQ support (see this thread for some example discussion). Right now you should be able to write something like:
Queryable.Select(Queryable.Where(someInputSequence, somePredicate), someFuncThatReturnsTheSequenceElement)
Something better might have made it into IronPython 2.0b4 - there's a lot of current discussion about how naming conflicts were handled.
Boo supports list generator expressions using the same syntax as python. For more information on that, check out the Boo documentation on Generator expressions and List comprehensions.
A key factor for LINQ is the ability of the compiler to generate expression trees.
I am using a macro in Nemerle that converts a given Nemerle expression into an Expression tree object.
I can then pass this to the Where/Select/etc extension methods on IQueryables.
It's not quite the syntax of C# and VB, but it's close enough for me.
I got the Nemerle macro via a link on this post:
http://groups.google.com/group/nemerle-dev/browse_thread/thread/99b9dcfe204a578e
It should be possible to create a similar macro for Boo. It's quite a bit of work however, given the large set of possible expressions you need to support.
Ayende has given a proof of concept here:
http://ayende.com/Blog/archive/2008/08/05/Ugly-Linq.aspx
I'm writing a server that I expect to be run by many different people, not all of whom I will have direct contact with. The servers will communicate with each other in a cluster. Part of the server's functionality involves selecting a small subset of rows from a potentially very large table. The exact choice of what rows are selected will need some tuning, and it's important that it's possible for the person running the cluster (eg, myself) to update the selection criteria without getting each and every server administrator to deploy a new version of the server.
Simply writing the function in Python isn't really an option, since nobody is going to want to install a server that downloads and executes arbitrary Python code at runtime.
What I need are suggestions on the simplest way to implement a Domain Specific Language to achieve this goal. The language needs to be capable of simple expression evaluation, as well as querying table indexes and iterating through the returned rows. Ease of writing and reading the language is secondary to ease of implementing it. I'd also prefer not to have to write an entire query optimiser, so something that explicitly specifies what indexes to query would be ideal.
The interface that this will have to compile against will be similar in capabilities to what the App Engine datastore exports: You can query for sequential ranges on any index on the table (eg, less-than, greater-than, range and equality queries), then filter the returned row by any boolean expression. You can also concatenate multiple independent result sets together.
I realise this question sounds a lot like I'm asking for SQL. However, I don't want to require that the datastore backing this data be a relational database, and I don't want the overhead of trying to reimplement SQL myself. I'm also dealing with only a single table with a known schema. Finally, no joins will be required. Something much simpler would be far preferable.
Edit: Expanded description to clear up some misconceptions.
Building a DSL to be interpreted by Python.
Step 1. Build the run-time classes and objects. These classes will have all the cursor loops and SQL statements and all of that algorithmic processing tucked away in their methods. You'll make heavy use of the Command and Strategy design patterns to build these classes. Most things are a command, options and choices are plug-in strategies. Look at the design for Apache Ant's Task API -- it's a good example.
Step 2. Validate that this system of objects actually works. Be sure that the design is simple and complete. You're tests will construct the Command and Strategy objects, and then execute the top-level Command object. The Command objects will do the work.
At this point you're largely done. Your run-time is just a configuration of objects created from the above domain. [This isn't as easy as it sounds. It requires some care to define a set of classes that can be instantiated and then "talk among themselves" to do the work of your application.]
Note that what you'll have will require nothing more than declarations. What's wrong with procedural? One you start to write a DSL with procedural elements, you find that you need more and more features until you've written Python with different syntax. Not good.
Further, procedural language interpreters are simply hard to write. State of execution, and scope of references are simply hard to manage.
You can use native Python -- and stop worrying about "getting out of the sandbox". Indeed, that's how you'll unit test everything, using a short Python script to create your objects. Python will be the DSL.
["But wait", you say, "If I simply use Python as the DSL people can execute arbitrary things." Depends on what's on the PYTHONPATH, and sys.path. Look at the site module for ways to control what's available.]
A declarative DSL is simplest. It's entirely an exercise in representation. A block of Python that merely sets the values of some variables is nice. That's what Django uses.
You can use the ConfigParser as a language for representing your run-time configuration of objects.
You can use JSON or YAML as a language for representing your run-time configuration of objects. Ready-made parsers are totally available.
You can use XML, too. It's harder to design and parse, but it works fine. People love it. That's how Ant and Maven (and lots of other tools) use declarative syntax to describe procedures. I don't recommend it, because it's a wordy pain in the neck. I recommend simply using Python.
Or, you can go off the deep-end and invent your own syntax and write your own parser.
I think we're going to need a bit more information here. Let me know if any of the following is based on incorrect assumptions.
First of all, as you pointed out yourself, there already exists a DSL for selecting rows from arbitrary tables-- it is called "SQL". Since you don't want to reinvent SQL, I'm assuming that you only need to query from a single table with a fixed format.
If this is the case, you probably don't need to implement a DSL (although that's certainly one way to go); it may be easier, if you are used to Object Orientation, to create a Filter object.
More specifically, a "Filter" collection that would hold one or more SelectionCriterion objects. You can implement these to inherit from one or more base classes representing types of selections (Range, LessThan, ExactMatch, Like, etc.) Once these base classes are in place, you can create column-specific inherited versions which are appropriate to that column. Finally, depending on the complexity of the queries you want to support, you'll want to implement some kind of connective glue to handle AND and OR and NOT linkages between the various criteria.
If you feel like it, you can create a simple GUI to load up the collection; I'd look at the filtering in Excel as a model, if you don't have anything else in mind.
Finally, it should be trivial to convert the contents of this Collection to the corresponding SQL, and pass that to the database.
However: if what you are after is simplicity, and your users understand SQL, you could simply ask them to type in the contents of a WHERE clause, and programmatically build up the rest of the query. From a security perspective, if your code has control over the columns selected and the FROM clause, and your database permissions are set properly, and you do some sanity checking on the string coming in from the users, this would be a relatively safe option.
"implement a Domain Specific Language"
"nobody is going to want to install a server that downloads and executes arbitrary Python code at runtime"
I want a DSL but I don't want Python to be that DSL. Okay. How will you execute this DSL? What runtime is acceptable if not Python?
What if I have a C program that happens to embed the Python interpreter? Is that acceptable?
And -- if Python is not an acceptable runtime -- why does this have a Python tag?
Why not create a language that when it "compiles" it generates SQL or whatever query language your datastore requires ?
You would be basically creating an abstraction over your persistence layer.
You mentioned Python. Why not use Python? If someone can "type in" an expression in your DSL, they can type in Python.
You'll need some rules on structure of the expression, but that's a lot easier than implementing something new.
You said nobody is going to want to install a server that downloads and executes arbitrary code at runtime. However, that is exactly what your DSL will do (eventually) so there probably isn't that much of a difference. Unless you're doing something very specific with the data then I don't think a DSL will buy you that much and it will frustrate the users who are already versed in SQL. Don't underestimate the size of the task you'll be taking on.
To answer your question however, you will need to come up with a grammar for your language, something to parse the text and walk the tree, emitting code or calling an API that you've written (which is why my comment that you're still going to have to ship some code).
There are plenty of educational texts on grammars for mathematical expressions you can refer to on the net, that's fairly straight forward. You may have a parser generator tool like ANTLR or Yacc you can use to help you generate the parser (or use a language like Lisp/Scheme and marry the two up). Coming up with a reasonable SQL grammar won't be easy. But google 'BNF SQL' and see what you come up with.
Best of luck.
It really sounds like SQL, but perhaps it's worth to try using SQLite if you want to keep it simple?
It sounds like you want to create a grammar not a DSL. I'd look into ANTLR which will allow you to create a specific parser that will interpret text and translate to specific commands. ANTLR provides libraries for Python, SQL, Java, C++, C, C# etc.
Also, here is a fine example of an ANTLR calculation engine created in C#
A context-free grammar usually has a tree like structure and functional programs have a tree like structure too. I don't claim the following would solve all of your problems, but it is a good step in the direction if you are sure that you don't want to use something like SQLite3.
from functools import partial
def select_keys(keys, from_):
return ({k : fun(v, row) for k, (v, fun) in keys.items()}
for row in from_)
def select_where(from_, where):
return (row for row in from_
if where(row))
def default_keys_transform(keys, transform=lambda v, row: row[v]):
return {k : (k, transform) for k in keys}
def select(keys=None, from_=None, where=None):
"""
SELECT v1 AS k1, 2*v2 AS k2 FROM table WHERE v1 = a AND v2 >= b OR v3 = c
translates to
select(dict(k1=(v1, lambda v1, r: r[v1]), k2=(v2, lambda v2, r: 2*r[v2])
, from_=table
, where= lambda r : r[v1] = a and r[v2] >= b or r[v3] = c)
"""
assert from_ is not None
idfunc = lambda k, t : t
select_k = idfunc if keys is None else select_keys
if isinstance(keys, list):
keys = default_keys_transform(keys)
idfunc = lambda t, w : t
select_w = idfunc if where is None else select_where
return select_k(keys, select_w(from_, where))
How do you make sure that you are not giving users ability to execute arbitrary code. This framework admits all possible functions. Well, you can right a wrapper over it for security that expose a fixed list of function objects that are acceptable.
ALLOWED_FUNCS = [ operator.mul, operator.add, ...] # List of allowed funcs
def select_secure(keys=None, from_=None, where=None):
if keys is not None and isinstance(keys, dict):
for v, fun keys.values:
assert fun in ALLOWED_FUNCS
if where is not None:
assert_composition_of_allowed_funcs(where, ALLOWED_FUNCS)
return select(keys=keys, from_=from_, where=where)
How to write assert_composition_of_allowed_funcs. It is very difficult to do that it in python but easy in lisp. Let us assume that where is a list of functions to be evaluated in a lips like format i.e. where=(operator.add, (operator.getitem, row, v1), 2) or where=(operator.mul, (operator.add, (opreator.getitem, row, v2), 2), 3).
This makes it possible to write a apply_lisp function that makes sure that the where function is only made up of ALLOWED_FUNCS or constants like float, int, str.
def apply_lisp(where, rowsym, rowval, ALLOWED_FUNCS):
assert where[0] in ALLOWED_FUNCS
return apply(where[0],
[ (apply_lisp(w, rowsym, rowval, ALLOWED_FUNCS)
if isinstance(w, tuple)
else rowval if w is rowsym
else w if isinstance(w, (float, int, str))
else None ) for w in where[1:] ])
Aside, you will also need to check for exact types, because you do not want your types to be overridden. So do not use isinstance, use type in (float, int, str). Oh boy we have run into:
Greenspun's Tenth Rule of Programming: any sufficiently complicated
C or Fortran program contains an ad hoc informally-specified
bug-ridden slow implementation of half of Common Lisp.