I'm reading through Learning Python (3rd Edition), by Mark Lutz, and I'm in the portion that is dealing with the nuts and bolts of Python syntax.
He defines the Python language-structure hierarchy as follows:
Programs are composed of modules
Modules contain statements
Statements contain expressions
Expressions create and process objects
I'm a little confused about the definition of Python statements.
I've heard Expressions described as anything that is a value, but also can contain things like addition, etc.
Is it safe to say that statements are structured operations on expressions that drive a module's logic?
Yes, you are almost there.
Expressions are something that evaluate to some value.
On the other hand, statements are something that cause some action.
That action can be on some object, based on result of an expression which may or may not involve some other object(s).
I found this with a quick Google search, is it what you where looking for?
What is the difference between an expression and a statement in Python?
"Statements (see 1, 2), on the other hand, are everything that can make up a line (or several lines) of Python code. Note that expressions are statements as well."
I'm quite wary of classifications like this, and especially attempts to make them into a hierarchy. An expression can also be, for example, a function call; I guess that falls into your "anything that is a value" definition since a function always returns a value even if it is None.
A statement is really everything else; assignment, flow control (eg defining a for or while loop, try/except, break, continue...), the introduction of a function or a class definition (the def or class keywords), and so on.
Related
Does Python have some convention for names of predicate functions? For example predicate function names end with p for Common Lisp or ? for Scheme. Sometimes are used prefixes like is_ or has_, ..., but they may not be appropriate in some cases (they can be quite long to make sense and does not have to be obvious that function is predicate at first sight).
No, Python has no such official convention. Read PEP 8 for what is standardized in Python.
In Python, readability counts. If something is a predicate function, anyone should be able to immediately tell what that function is without knowing any of your standards. isinstance, str.isupper, hasattr are all part of the standard library, which is a good indication of the canonical style. I can immediately tell what these functions do and return without even reading the documentation. Go with whatever you think is the most readable to someone unfamiliar with your code, which will most likely be is or has prefixes.
What i mean is, how is the syntax defined, i.e. how can i make my own constructs like these?
I realise in a lot of languages, things like this will be built into the compiler / spec, and so it's dealt with by the compiler (at least that how i understand it to work).
But with python, everything i've come across so far has been accessible to the programmer, and so you more or less have the freedom to do whatever you want.
How would i go about writing my own version of for or while? Is it even possible?
I don't have any actual application for this, so the answer to any WHY?! questions is just "because why not?" or "curiosity".
No, you can't, not from within Python. You can't add new syntax to the language. (You'd have to modify the source code of Python itself to make your own custom version of Python.)
Note that the iterator protocol allows you to define objects that can be used with for in a custom way, which covers a lot of the possible use cases of writing your own iteration syntax.
Well, you have a couple of options for creating your own syntax:
Write a higher-order function, like map or reduce.
Modify python at the C level. This is, as you might expect, relatively easy as compared with fiddling with many other languages. See this article for an example: http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/
Fake it using the debug facilities, or the encodings facility. See this code: http://entrian.com/goto/download.html and http://timhatch.com/projects/pybraces/
Use a preprocessor. Here's one project that tries to make this easy: http://www.fiber-space.de/langscape/doc/index.html
Use of the python facilities built in to achieve a similar effect (decorators, metaclasses, and the like).
Obviously, none of this is quite what you're looking for, but python, unlike smalltalk or lisp, isn't (necessarily) programmed in itself and guarantees to expose its own underlying execution and parsing mechanisms at runtime.
You can't make equivalent constructs. for, while, if etc. are statements, and they are built into the language with their own specific syntax. There are languages that do allow this sort of thing though (to some degree), such as Scala.
while, print, for etc. are keywords. That means they are parsed by the python parser whilst reading the code, stripped any redundant characters and result in tokens. Afterwards a lexer takes those tokens as input and builds a program tree which is then excuted by the interpreter. Said so, those constructs are used only as syntactic sugar for underlying lexical machinery and as such are not visible from inside the code.
Since Python uses tabs spacing to indicate scope (and as such, has no end of } symbols), does that limit the language in any way from having particular functionality?
Note: I'm not talking about personal preferences on coding-style, I'm talking about real language limitation as a direct result of not having an end statement?
For example, it appears by a post directly from Guido that the lack of multi-line lamba's due to Python not having a terminating end / } symbol?
If so, what other Python limations are there because of this language design decision to use indentation?
Update:
Please note this question is not about Lambda's and technically, not even Python per se. It's about programming language design ... and what limitations does a programming language have when it's designed to have indentation (as opposed to end statements) represent block scope.
There is no lack of end/ }: an end is represented by a "dedent" to the previous depth. So there is no limitation in any way.
Example:
x = 123
while x > 10:
if x % 21:
print("x")
print("y")
print("z")
A "begin" corresponds to increasing of indentation level (after while, after if).
An "end" corresponds to decreasing of indentation level (after the respective print()s).
If you omit the print("y"), you have a "dedentation" to the topmost level, which corresponds to having two successive "end"s.
The answer to this question ranges somewhere between syntactic sugar and language style, i.e. how to phrase a problem elegant and compliant to language philosophy. Any turing-complete language, even assembly language and C - definitely lacking any lambda support - may solve any problem. Lambda allows just a different (arguably more elegant if looking from functional language viewpoint) phrasing of stuff also stateable using standard function definition. So I can't recognize a limitation here beyond having to code differently.
One of the biggest limitations (if you would call it that) is that you can not use tabs and spaces in the same program.
And you shouldn't. Ever.
There are no apparent structural limitations, except perhaps when parsing a python source file (parsing, not interpreting):
def foo(bar):
# If bar contains multiple elements
if len(bar) > 1:
return bar
This is perfectly legal python code, however, when parsing the file, you may run into trouble trying to figure out which indentation level the comment belongs to.
What do you mean by "limitation"? Do you mean there are computations Python cannot perform that other languages can? In that case, the answer is definitely No. Python is turing complete.
Do you mean that the lack of end statements change how computations are expressed in Python? In that case, the answer is "mostly not". You must understand that Python's dedent is an end statement; it's a byte sequence that the interpreter recognizes as the end of a block.
However, as others have mentioned, the use of indentation to denote blocks is awkward when it comes to inline functions (Python's lambda). This means the style of Python programs might be slightly different than from JavaScript, for example (where it's common to write large inline functions).
That being said, many languages don't even have inline functions to begin with, so I wouldn't call this a limitation.
If I am evaluating a Python string using eval(), and have a class like:
class Foo(object):
a = 3
def bar(self, x): return x + a
What are the security risks if I do not trust the string? In particular:
Is eval(string, {"f": Foo()}, {}) unsafe? That is, can you reach os or sys or something unsafe from a Foo instance?
Is eval(string, {}, {}) unsafe? That is, can I reach os or sys entirely from builtins like len and list?
Is there a way to make builtins not present at all in the eval context?
There are some unsafe strings like "[0] * 100000000" I don't care about, because at worst they slow/stop the program. I am primarily concerned about protecting user data external to the program.
Obviously, eval(string) without custom dictionaries is unsafe in most cases.
eval() will allow malicious data to compromise your entire system, kill your cat, eat your dog and make love to your wife.
There was recently a thread about how to do this kind of thing safely on the python-dev list, and the conclusions were:
It's really hard to do this properly.
It requires patches to the python interpreter to block many classes of attacks.
Don't do it unless you really want to.
Start here to read about the challenge: http://tav.espians.com/a-challenge-to-break-python-security.html
What situation do you want to use eval() in? Are you wanting a user to be able to execute arbitrary expressions? Or are you wanting to transfer data in some way? Perhaps it's possible to lock down the input in some way.
You cannot secure eval with a blacklist approach like this. See Eval really is dangerous for examples of input that will segfault the CPython interpreter, give access to any class you like, and so on.
You can get to os using builtin functions: __import__('os').
For python 2.6+, the ast module may help; in particular ast.literal_eval, although it depends on exactly what you want to eval.
Note that even if you pass empty dictionaries to eval(), it's still possible to segfault (C)Python with some syntax tricks. For example, try this on your interpreter: eval("()"*8**5)
You are probably better off turning the question around:
What sort of expressions are you wanting to eval?
Can you insure that only strings matching some narrowly defined syntax are eval()d?
Then consider if that is safe.
For example, if you are wanting to let the user enter an algebraic expression for evaluation, consider limiting them to one letter variable names, numbers, and a specific set of operators and functions. Don't eval() strings containing anything else.
There is a very good article on the un-safety of eval() in Mark Pilgrim's Dive into Python tutorial.
Quoted from this article:
In the end, it is possible to safely
evaluate untrusted Python expressions,
for some definition of “safe” that
turns out not to be terribly useful in
real life. It’s fine if you’re just
playing around, and it’s fine if you
only ever pass it trusted input. But
anything else is just asking for
trouble.
I'm working on a macro system for Python (as discussed here) and one of the things I've been considering are units of measure. Although units of measure could be implemented without macros or via static macros (e.g. defining all your units ahead of time), I'm toying around with the idea of allowing syntax to be extended dynamically at runtime.
To do this, I'm considering using a sort of partial evaluation on the code at compile-time. If parsing fails for a given expression, due to a macro for its syntax not being available, the compiler halts evaluation of the function/block and generates the code it already has with a stub where the unknown expression is. When this stub is hit at runtime, the function is recompiled against the current macro set. If this compilation fails, a parse error would be thrown because execution can't continue. If the compilation succeeds, the new function replaces the old one and execution continues.
The biggest issue I see is that you can't find parse errors until the affected code is run. However, this wouldn't affect many cases, e.g. group operators like [], {}, (), and `` still need to be paired (requirement of my tokenizer/list parser), and top-level syntax like classes and functions wouldn't be affected since their "runtime" is really load time, where the syntax is evaluated and their objects are generated.
Aside from the implementation difficulty and the problem I described above, what problems are there with this idea?
Here are a few possible problems:
You may find it difficult to provide the user with helpful error messages in case of a problem. This seems likely, as any compilation-time syntax error could be just a syntax extension.
Performance hit.
I was trying to find some discussion of the pluses, minuses, and/or implementation of dynamic parsing in Perl 6, but I couldn't find anything appropriate. However, you may find this quote from Nicklaus Wirth (designer of Pascal and other languages) interesting:
The phantasies of computer scientists
in the 1960s knew no bounds. Spurned
by the success of automatic syntax
analysis and parser generation, some
proposed the idea of the flexible, or
at least extensible language. The
notion was that a program would be
preceded by syntactic rules which
would then guide the general parser
while parsing the subsequent program.
A step further: The syntax rules would
not only precede the program, but they
could be interspersed anywhere
throughout the text. For example, if
someone wished to use a particularly
fancy private form of for statement,
he could do so elegantly, even
specifying different variants for the
same concept in different sections of
the same program. The concept that
languages serve to communicate between
humans had been completely blended
out, as apparently everyone could now
define his own language on the fly.
The high hopes, however, were soon
damped by the difficulties encountered
when trying to specify, what these
private constructions should mean. As
a consequence, the intreaguing idea of
extensible languages faded away rather
quickly.
Edit: Here's Perl 6's Synopsis 6: Subroutines, unfortunately in markup form because I couldn't find an updated, formatted version; search within for "macro". Unfortunately, it's not too interesting, but you may find some things relevant, like Perl 6's one-pass parsing rule, or its syntax for abstract syntax trees. The approach Perl 6 takes is that a macro is a function that executes immediately after its arguments are parsed and returns either an AST or a string; Perl 6 continues parsing as if the source actually contained the return value. There is mention of generation of error messages, but they make it seem like if macros return ASTs, you can do alright.
Pushing this one step further, you could do "lazy" parsing and always only parse enough to evaluate the next statement. Like some kind of just-in-time parser. Then syntax errors could become normal runtime errors that just raise a normal Exception that could be handled by surrounding code:
def fun():
not implemented yet
try:
fun()
except:
pass
That would be an interesting effect, but if it's useful or desirable is a different question. Generally it's good to know about errors even if you don't call the code at the moment.
Macros would not be evaluated until control reaches them and naturally the parser would already know all previous definitions. Also the macro definition could maybe even use variables and data that the program has calculated so far (like adding some syntax for all elements in a previously calculated list). But this is probably a bad idea to start writing self-modifying programs for things that could usually be done as well directly in the language. This could get confusing...
In any case you should make sure to parse code only once, and if it is executed a second time use the already parsed expression, so that it doesn't lead to performance problems.
Here are some ideas from my master's thesis, which may or may not be helpful.
The thesis was about robust parsing of natural language.
The main idea: given a context-free grammar for a language, try to parse a given
text (or, in your case, a python program). If parsing failed, you will have a partially generated parse tree. Use the tree structure to suggest new grammar rules that will better cover the parsed text.
I could send you my thesis, but unless you read Hebrew this will probably not be useful.
In a nutshell:
I used a bottom-up chart parser. This type of parser generates edges for productions from the grammar. Each edge is marked with the part of the tree that was consumed. Each edge gets a score according to how close it was to full coverage, for example:
S -> NP . VP
Has a score of one half (We succeeded in covering the NP but not the VP).
The highest-scored edges suggest a new rule (such as X->NP).
In general, a chart parser is less efficient than a common LALR or LL parser (the types usually used for programming languages) - O(n^3) instead of O(n) complexity, but then again you are trying something more complicated than just parsing an existing language.
If you can do something with the idea, I can send you further details.
I believe looking at natural language parsers may give you some other ideas.
Another thing I've considered is making this the default behavior across the board, but allow languages (meaning a set of macros to parse a given language) to throw a parse error at compile-time. Python 2.5 in my system, for example, would do this.
Instead of the stub idea, simply recompile functions that couldn't be handled completely at compile-time when they're executed. This will also make self-modifying code easier, as you can modify the code and recompile it at runtime.
You'll probably need to delimit the bits of input text with unknown syntax, so that the rest of the syntax tree can be resolved, apart from some character sequences nodes which will be expanded later. Depending on your top level syntax, that may be fine.
You may find that the parsing algorithm and the lexer and the interface between them all need updating, which might rule out most compiler creation tools.
(The more usual approach is to use string constants for this purpose, which can be parsed to a little interpreter at run time).
I don't think your approach would work very well. Let's take a simple example written in pseudo-code:
define some syntax M1 with definition D1
if _whatever_:
define M1 to do D2
else:
define M1 to do D3
code that uses M1
So there is one example where, if you allow syntax redefinition at runtime, you have a problem (since by your approach the code that uses M1 would be compiled by definition D1). Note that verifying if syntax redefinition occurs is undecidable. An over-approximation could be computed by some kind of typing system or some other kind of static analysis, but Python is not well known for this :D.
Another thing that bothers me is that your solution does not 'feel' right. I find it evil to store source code you can't parse just because you may be able to parse it at runtime.
Another example that jumps to mind is this:
...function definition fun1 that calls fun2...
define M1 (at runtime)
use M1
...function definition for fun2
Technically, when you use M1, you cannot parse it, so you need to keep the rest of the program (including the function definition of fun2) in source code. When you run the entire program, you'll see a call to fun2 that you cannot call, even if it's defined.