Python: Parsing formula from spreadsheet

Python: Parsing formula from spreadsheet - python

I want to make a kind of simple spreadsheet with python.
I need to parse a formula from a string.
All the operations I need for now are: + - * / ^ ()
Formulas will be always starting with '='.
Examples of input:
=4+8-6/2
this is simple: just usig eval()
2.=4b+12*(2+5)
where '4b' is a link to another cell in spreadsheet (a variable).
** It's possible to make all the links like 'b4' instead of '4b'
The script would substitute the link (variable) with corresponding value.
What I'm unable to achive is to make parser 'understand' variables like '4b' or 'b4', the rest is pretty simple
What would you suggest?
P.S. I'm new to Python. Tried pyparsing but it's to complicated to use or to correct custom examples according to my needs. Hope to find more simple solution

ActiveState has an amazingly simple recipe for a Python-based spreadsheet that meets most of your requirements, I believe. Its class SpreadSheet is defined in terms of a couple of internal dictionaries along with a very small number of relatively short methods.
The related comments are also very interesting and show how it could be extended and made to minimize potential security issues. I highly recommend you take a look at it.
A major limitation I noted about it is that there's no dependency checking, so updating a cell doesn't automatically update any that might be depend upon it, and any that might be dependent on those, etc.

Lambda expressions to create anonymous functions would be an efficient way to handle this. I would recommend checking out some of Peter Norvig's Udacity course - The design of computer programs. As part of lesson 2 he covers a scenario very similar to this looking at cryptarithmatic. The course is free, and self paced so you can dip in and skip around lessons / sub lessons as much as you need:
https://www.udacity.com/course/cs212

Related

What is the pythonic way to implement a css parser/replacer

I want to implement a script that reads a CSS file and makes meaningful changes to it (adding/removing/replacing lines/words etc.). The basic logic is implement an RTL (right-to-left) transformation.
I could think of quite a few approaches to it:
file reader - read a line, analyze it and make the needed changes to it.
two phase scan - create in memory model, scan and change it, save model to text.
regular expressions - It might be quite difficult because some of them might be very complex.
basically what I'm wondering is which of those, or other methods, would be the python way to do it? are there any relevant libraries you think I should be familiar with for this kind of operation?
Edit:
it's should be noted that this is a "learn python through this usable project" kind of project so I'm not familiar with most libraries you would mention here.

If you want something "quick and dirty" there are many interesting ways to do this. (As you said: line-by-line, regular expressions, …)
But if you want to do it "right" (correct on all kinds of inputs) you’ll need a real parser based on the official CSS tokenization and grammar. In Python there is cssutils and tinycss. (Disclaimer: I’m tinycss’s author.) If you want to learn, I like to think that tinycss’s source code is straightforward and easy to learn :)

Good practices in writing MATLAB code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I would like to know the basic principles and etiquette of writing a well structured code.

Read Code Complete, it will do wonders for everything. It'll show you where, how, and when things matter. It's pretty much the Bible of software development (IMHO.)

These are the most important two things to keep in mind when you are writing code:
Don't write code that you've already written.
Don't write code that you don't need to write.

MATLAB Programming Style Guidelines by Richard Johnson is a good resource.

Well, if you want it in layman's terms:
I reccomend people to write the shortest readable program that works.
There are a lot more rules about how to format code, name variables, design classes, separate responsibilities. But you should not forget that all of those rules are only there to make sure that your code is easy to check for errors, and to ensure it is maintainable by someone else than the original author. If keep the above reccomendation in mind, your progam will be just that.

This list could go on for a long time but some major things are:
Indent.
Descriptive variable names.
Descriptive class / function names.
Don't duplicate code. If it needs duplication put in a class / function.
Use gettors / settors.
Only expose what's necessary in your objects.
Single dependency principle.
Learn how to write good comments, not lots of comments.
Take pride in your code!
Two good places to start:
Clean-Code Handbook
Code-Complete

If you want something to use as a reference or etiquette, I often follow the official Google style conventions for whatever language I'm working in, such as for C++ or for Python.
The Practice of Programming by Rob Pike and Brian W. Kernighan also has a section on style that I found helpful.

First of all, "codes" is not the right word to use. A code is a representation of another thing, usually numeric. The correct words are "source code", and the plural of source code is source code.
--
Writing good source code:
Comment your code.
Use variable names longer than several letters. Between 5 and 20 is a good rule of thumb.
Shorter lines of code is not better - use whitespace.
Being "clever" with your code is a good way to confuse yourself or another person later on.
Decompose the problem into its components and use hierarchical design to assemble the solution.
Remember that you will need to change your program later on.
Comment your code.
There are many fads in computer programming. Their proponents consider those who are not following the fad unenlightened and not very with-it. The current major fads seem to be "Test Driven Development" and "Agile". The fad in the 1990s was 'Object Oriented Programming'. Learn the useful core parts of the ideas that come around, but don't be dogmatic and remember that the best program is one that is getting the job done that it needs to do.
very trivial example of over-condensed code off the top of my head
for(int i=0,j=i; i<10 && j!=100;i++){
if i==j return i*j;
else j*=2;
}}
while this is more readable:
int j = 0;
for(int i = 0; i < 10; i++)
{
if i == j
{
return i * j;
}
else
{
j *= 2;
if(j == 100)
{
break;
}
}
}
The second example has the logic for exiting the loop clearly visible; the first example has the logic entangled with the control flow. Note that these two programs do exactly the same thing. My programming style takes up a lot of lines of code, but I have never once encountered a complaint about it being hard to understand stylistically, while I find the more condensed approaches frustrating.
An experienced programmer can and will read both - the above may make them pause for a moment and consider what is happening. Forcing the reader to sit down and stare at the code is not a good idea. Code needs to be obvious. Each problem has an intrinsic complexity to expressing its solution. Code should not be more complex than the solution complexity, if at all possible.
That is the essence of what the other poster tried to convey - don't make the program longer than need be. Longer has two meanings: more lines of code (ie, putting braces on their own line), and more complex. Making a program more complex than need be is not good. Making it more readable is good.

Have a look to
97 Things Every Programmer Should Know.
It's free and contains a lot of gems like this one:
There is one quote that I think is
particularly good for all software
developers to know and keep close to
their hearts:
Beauty of style and harmony and grace
and good rhythm depends on simplicity.
— Plato
In one sentence I think this sums up
the values that we as software
developers should aspire to.
There are a number of things we strive
for in our code:
Readability
Maintainability
Speed of development
The elusive quality of beauty
Plato is telling us that the enabling
factor for all of these qualities is
simplicity.

The Python Style Guide is always a good starting point!

European Standards For Writing and Documenting Exchangeable Fortran 90 Code have been in my bookmarks, like forever. Also, there was a thread in here, since you are interested in MATLAB, on organising MATLAB code.

Personally, I've found that I learned more about programming style from working through SICP which is the MIT Intro to Comp SCI text (I'm about a quarter of the way through.) Than any other book. That being said, If you're going to be working in Python, the Google style guide is an excellent place to start.
I read somewhere that most programs (scripts anyways) should never be more than a couple of lines long. All the requisite functionality should be abstracted into functions or classes. I tend to agree.

Many good points have been made above. I definitely second all of the above. I would also like to add that spelling and consistency in coding be something you practice (and also in real life).
I've worked with some offshore teams and though their English is pretty good, their spelling errors caused a lot of confusion. So for instance, if you need to look for some function (e.g., getFeedsFromDatabase) and they spell database wrong or something else, that can be a big or small headache, depending on how many dependencies you have on that particular function. The fact that it gets repeated over and over within the code will first off, drive you nuts, and second, make it difficult to parse.
Also, keep up with consistency in terms of naming variables and functions. There are many protocols to go by but as long as you're consistent in what you do, others you work with will be able to better read your code and be thankful for it.

Pretty much everything said here, and something more. In my opinion the best site concerning what you're looking for (especially the zen of python parts are fun and true)
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
Talks about both PEP-20 and PEP-8, some easter eggs (fun stuff), etc...

You can have a look at the Stanford online course: Programming Methodology CS106A. The instructor has given several really good instruction for writing source code.
Some of them are as following:
write programs for people to read, not just for computers to read. Both of them need to be able to read it, but it's far more important that a person reads it and understands it, and that the computer still executes it correctly. But that's the first major software engineering principle to
think about.
How to make comments:
 put in comments to clarify things in the program, which are not obvious
How to make decomposition
One method solves one problem
Each method has code approximate 1~15lines
Give methods good names
Write comment for code

Unit Tests
Python and matlab are dynamic languages. As your code base grows, you will be forced to refactor your code. In contrast to statically typed languages, the compiler will not detect 'broken' parts in your project. Using unit test frameworks like xUnit not only compensate missing compiler checks, they allow refactoring with continuous verification for all parts of your project.
Source Control
Track your source code with a version control system like svn, git or any other derivative. You'll be able to back and forth in your code history, making branches or creating tags for deployed/released versions.
Bug Tracking
Use a bug tracking system, if possible connected with your source control system, in order to stay on top of your issues. You may not be able, or forced, to fix issues right away.
Reduce Entropy
While integrating new features in your existing code base, you will add more lines of code, and potentially more complexity. This will increase entropy. Try to keep your design clean, by introducing an interface, or inheritance hierarchy in order to reduce entropy again. Not paying attention to code entropy will render your code unmaintainable over time.
All of The Above Mentioned
Pure coding related topics, like using a style guide, not duplicating code, ...,
has already been mentioned.

A small addition to the wonderful answers already here regarding Matlab:
Avoid long scripts, instead write functions (sub routines) in separate files. This will make the code more readable and easier to optimize.
Use Matlab's built-in functions capabilities. That is, learn about the many many functions that Matlab offers instead of reinventing the wheel.
Use code sectioning, and whatever the other code structure the newest Matlab version offers.
Learn how to benchmark your code using timeit and profile . You'll discover that sometimes for loops are the better solution.

The best advice I got when I asked this question was as follows:
Never code while drunk.

Make it readable, make it intuitive, make it understandable, and make it commented.

Advice on translating code from very unrelated languages (in this case Scheme to Python)?

Reasoning: I'm trying to convert a large library from Scheme to Python
Are there any good strategies for doing this kind of conversion? Specifically cross-paradigm in this case since Python is more OO and Scheme is Functional.
Totally subjective so I'm making it community wiki

I would treat the original language implementation almost like a requirements specification, and write up a design based on it (most importantly including detailed interface definitions, both for the external interfaces and for those between modules within the library). Then I would implement from that design.
What I would most definitely NOT do is any kind of function-by-function translation.

Use the scheme implementation as a way of generating test cases. I'd write a function that can call scheme code, and read the output, converting it back into python.
That way, you can write test cases that look like this:
def test_f():
assert_equal(library.f(42), reference_implementation('(f 42)'))
This doesn't help you translate the library, but it will give you pretty good confidence that what you have gives the right results.
Of course, depending on what the scheme does, it may not be quite as simple as this...

I would setup a bunch of whiteboards and write out the algorithms from the Scheme code. Then I would implement the algorithms in Python. Then, as #PaulHankin suggests, use the Scheme code as a way to write test cases to test the Python code

If you don't have time to do as the others have suggested and actually re-implement the functionality, there is no reason you CAN'T implement it in a strictly functional fashion.
Python supports the key features necessary to do functional programming, and you might find that your time was better spent doing other things, especially if absolute optimization is not required. On the other hand, you might find bug-hunting to be quite hard.

Write a Python interpreter in Scheme and directly translate your program to that :-) You can start with def:
(define-syntax def
(syntax-rules ()
((def func-name rest ...)
(define func-name (lambda rest ...)))))
;; test
(def sqr (x) (* x x))
(sqr 2) => 4

First Order Logic Engine

I'd like to create an application that can do simple reasoning using first order logic. Can anyone recommend an "engine" that can accept an arbitrary number of FOL expressions, and allow querying of those expressions (preferably accessible via Python)?

Don't query using first-order logic (FOL) unless you absolutely have to: first-order logic is not decidable, but only semi-decidable, and so queries will often, unavoidably not terminate.
Description logic is essentially a decidable fragment of first-order logic, reformulated in a manner that is good for talking about classes of entity and their interrelationships. There are many engines for description logic in Python, for example seth, based on OWL-DL.
If you are really sure that you need the vastness of FOL, then FLiP is worth a look. I've not used it (not really keen on Python, to be honest), but this is a good approach to making logic checking available to a programming language.

PyLog:
PyLog is a first order logic library
including a PROLOG engine in Python.

Recipe 303057: Pythologic -- Prolog syntax in Python / http://code.activestate.com/recipes/303057/

Writing a Domain Specific Language for selecting rows from a table

I'm writing a server that I expect to be run by many different people, not all of whom I will have direct contact with. The servers will communicate with each other in a cluster. Part of the server's functionality involves selecting a small subset of rows from a potentially very large table. The exact choice of what rows are selected will need some tuning, and it's important that it's possible for the person running the cluster (eg, myself) to update the selection criteria without getting each and every server administrator to deploy a new version of the server.
Simply writing the function in Python isn't really an option, since nobody is going to want to install a server that downloads and executes arbitrary Python code at runtime.
What I need are suggestions on the simplest way to implement a Domain Specific Language to achieve this goal. The language needs to be capable of simple expression evaluation, as well as querying table indexes and iterating through the returned rows. Ease of writing and reading the language is secondary to ease of implementing it. I'd also prefer not to have to write an entire query optimiser, so something that explicitly specifies what indexes to query would be ideal.
The interface that this will have to compile against will be similar in capabilities to what the App Engine datastore exports: You can query for sequential ranges on any index on the table (eg, less-than, greater-than, range and equality queries), then filter the returned row by any boolean expression. You can also concatenate multiple independent result sets together.
I realise this question sounds a lot like I'm asking for SQL. However, I don't want to require that the datastore backing this data be a relational database, and I don't want the overhead of trying to reimplement SQL myself. I'm also dealing with only a single table with a known schema. Finally, no joins will be required. Something much simpler would be far preferable.
Edit: Expanded description to clear up some misconceptions.

Building a DSL to be interpreted by Python.
Step 1. Build the run-time classes and objects. These classes will have all the cursor loops and SQL statements and all of that algorithmic processing tucked away in their methods. You'll make heavy use of the Command and Strategy design patterns to build these classes. Most things are a command, options and choices are plug-in strategies. Look at the design for Apache Ant's Task API -- it's a good example.
Step 2. Validate that this system of objects actually works. Be sure that the design is simple and complete. You're tests will construct the Command and Strategy objects, and then execute the top-level Command object. The Command objects will do the work.
At this point you're largely done. Your run-time is just a configuration of objects created from the above domain. [This isn't as easy as it sounds. It requires some care to define a set of classes that can be instantiated and then "talk among themselves" to do the work of your application.]
Note that what you'll have will require nothing more than declarations. What's wrong with procedural? One you start to write a DSL with procedural elements, you find that you need more and more features until you've written Python with different syntax. Not good.
Further, procedural language interpreters are simply hard to write. State of execution, and scope of references are simply hard to manage.
You can use native Python -- and stop worrying about "getting out of the sandbox". Indeed, that's how you'll unit test everything, using a short Python script to create your objects. Python will be the DSL.
["But wait", you say, "If I simply use Python as the DSL people can execute arbitrary things." Depends on what's on the PYTHONPATH, and sys.path. Look at the site module for ways to control what's available.]
A declarative DSL is simplest. It's entirely an exercise in representation. A block of Python that merely sets the values of some variables is nice. That's what Django uses.
You can use the ConfigParser as a language for representing your run-time configuration of objects.
You can use JSON or YAML as a language for representing your run-time configuration of objects. Ready-made parsers are totally available.
You can use XML, too. It's harder to design and parse, but it works fine. People love it. That's how Ant and Maven (and lots of other tools) use declarative syntax to describe procedures. I don't recommend it, because it's a wordy pain in the neck. I recommend simply using Python.
Or, you can go off the deep-end and invent your own syntax and write your own parser.

I think we're going to need a bit more information here. Let me know if any of the following is based on incorrect assumptions.
First of all, as you pointed out yourself, there already exists a DSL for selecting rows from arbitrary tables-- it is called "SQL". Since you don't want to reinvent SQL, I'm assuming that you only need to query from a single table with a fixed format.
If this is the case, you probably don't need to implement a DSL (although that's certainly one way to go); it may be easier, if you are used to Object Orientation, to create a Filter object.
More specifically, a "Filter" collection that would hold one or more SelectionCriterion objects. You can implement these to inherit from one or more base classes representing types of selections (Range, LessThan, ExactMatch, Like, etc.) Once these base classes are in place, you can create column-specific inherited versions which are appropriate to that column. Finally, depending on the complexity of the queries you want to support, you'll want to implement some kind of connective glue to handle AND and OR and NOT linkages between the various criteria.
If you feel like it, you can create a simple GUI to load up the collection; I'd look at the filtering in Excel as a model, if you don't have anything else in mind.
Finally, it should be trivial to convert the contents of this Collection to the corresponding SQL, and pass that to the database.
However: if what you are after is simplicity, and your users understand SQL, you could simply ask them to type in the contents of a WHERE clause, and programmatically build up the rest of the query. From a security perspective, if your code has control over the columns selected and the FROM clause, and your database permissions are set properly, and you do some sanity checking on the string coming in from the users, this would be a relatively safe option.

"implement a Domain Specific Language"
"nobody is going to want to install a server that downloads and executes arbitrary Python code at runtime"
I want a DSL but I don't want Python to be that DSL. Okay. How will you execute this DSL? What runtime is acceptable if not Python?
What if I have a C program that happens to embed the Python interpreter? Is that acceptable?
And -- if Python is not an acceptable runtime -- why does this have a Python tag?

Why not create a language that when it "compiles" it generates SQL or whatever query language your datastore requires ?
You would be basically creating an abstraction over your persistence layer.

You mentioned Python. Why not use Python? If someone can "type in" an expression in your DSL, they can type in Python.
You'll need some rules on structure of the expression, but that's a lot easier than implementing something new.

You said nobody is going to want to install a server that downloads and executes arbitrary code at runtime. However, that is exactly what your DSL will do (eventually) so there probably isn't that much of a difference. Unless you're doing something very specific with the data then I don't think a DSL will buy you that much and it will frustrate the users who are already versed in SQL. Don't underestimate the size of the task you'll be taking on.
To answer your question however, you will need to come up with a grammar for your language, something to parse the text and walk the tree, emitting code or calling an API that you've written (which is why my comment that you're still going to have to ship some code).
There are plenty of educational texts on grammars for mathematical expressions you can refer to on the net, that's fairly straight forward. You may have a parser generator tool like ANTLR or Yacc you can use to help you generate the parser (or use a language like Lisp/Scheme and marry the two up). Coming up with a reasonable SQL grammar won't be easy. But google 'BNF SQL' and see what you come up with.
Best of luck.

It really sounds like SQL, but perhaps it's worth to try using SQLite if you want to keep it simple?

It sounds like you want to create a grammar not a DSL. I'd look into ANTLR which will allow you to create a specific parser that will interpret text and translate to specific commands. ANTLR provides libraries for Python, SQL, Java, C++, C, C# etc.
Also, here is a fine example of an ANTLR calculation engine created in C#

A context-free grammar usually has a tree like structure and functional programs have a tree like structure too. I don't claim the following would solve all of your problems, but it is a good step in the direction if you are sure that you don't want to use something like SQLite3.
from functools import partial
def select_keys(keys, from_):
return ({k : fun(v, row) for k, (v, fun) in keys.items()}
for row in from_)
def select_where(from_, where):
return (row for row in from_
if where(row))
def default_keys_transform(keys, transform=lambda v, row: row[v]):
return {k : (k, transform) for k in keys}
def select(keys=None, from_=None, where=None):
"""
SELECT v1 AS k1, 2*v2 AS k2 FROM table WHERE v1 = a AND v2 >= b OR v3 = c
translates to
select(dict(k1=(v1, lambda v1, r: r[v1]), k2=(v2, lambda v2, r: 2*r[v2])
, from_=table
, where= lambda r : r[v1] = a and r[v2] >= b or r[v3] = c)
"""
assert from_ is not None
idfunc = lambda k, t : t
select_k = idfunc if keys is None else select_keys
if isinstance(keys, list):
keys = default_keys_transform(keys)
idfunc = lambda t, w : t
select_w = idfunc if where is None else select_where
return select_k(keys, select_w(from_, where))
How do you make sure that you are not giving users ability to execute arbitrary code. This framework admits all possible functions. Well, you can right a wrapper over it for security that expose a fixed list of function objects that are acceptable.
ALLOWED_FUNCS = [ operator.mul, operator.add, ...] # List of allowed funcs
def select_secure(keys=None, from_=None, where=None):
if keys is not None and isinstance(keys, dict):
for v, fun keys.values:
assert fun in ALLOWED_FUNCS
if where is not None:
assert_composition_of_allowed_funcs(where, ALLOWED_FUNCS)
return select(keys=keys, from_=from_, where=where)
How to write assert_composition_of_allowed_funcs. It is very difficult to do that it in python but easy in lisp. Let us assume that where is a list of functions to be evaluated in a lips like format i.e. where=(operator.add, (operator.getitem, row, v1), 2) or where=(operator.mul, (operator.add, (opreator.getitem, row, v2), 2), 3).
This makes it possible to write a apply_lisp function that makes sure that the where function is only made up of ALLOWED_FUNCS or constants like float, int, str.
def apply_lisp(where, rowsym, rowval, ALLOWED_FUNCS):
assert where[0] in ALLOWED_FUNCS
return apply(where[0],
[ (apply_lisp(w, rowsym, rowval, ALLOWED_FUNCS)
if isinstance(w, tuple)
else rowval if w is rowsym
else w if isinstance(w, (float, int, str))
else None ) for w in where[1:] ])
Aside, you will also need to check for exact types, because you do not want your types to be overridden. So do not use isinstance, use type in (float, int, str). Oh boy we have run into:
Greenspun's Tenth Rule of Programming: any sufficiently complicated
C or Fortran program contains an ad hoc informally-specified
bug-ridden slow implementation of half of Common Lisp.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.