Related
I'm reading through Learning Python (3rd Edition), by Mark Lutz, and I'm in the portion that is dealing with the nuts and bolts of Python syntax.
He defines the Python language-structure hierarchy as follows:
Programs are composed of modules
Modules contain statements
Statements contain expressions
Expressions create and process objects
I'm a little confused about the definition of Python statements.
I've heard Expressions described as anything that is a value, but also can contain things like addition, etc.
Is it safe to say that statements are structured operations on expressions that drive a module's logic?
Yes, you are almost there.
Expressions are something that evaluate to some value.
On the other hand, statements are something that cause some action.
That action can be on some object, based on result of an expression which may or may not involve some other object(s).
I found this with a quick Google search, is it what you where looking for?
What is the difference between an expression and a statement in Python?
"Statements (see 1, 2), on the other hand, are everything that can make up a line (or several lines) of Python code. Note that expressions are statements as well."
I'm quite wary of classifications like this, and especially attempts to make them into a hierarchy. An expression can also be, for example, a function call; I guess that falls into your "anything that is a value" definition since a function always returns a value even if it is None.
A statement is really everything else; assignment, flow control (eg defining a for or while loop, try/except, break, continue...), the introduction of a function or a class definition (the def or class keywords), and so on.
What i mean is, how is the syntax defined, i.e. how can i make my own constructs like these?
I realise in a lot of languages, things like this will be built into the compiler / spec, and so it's dealt with by the compiler (at least that how i understand it to work).
But with python, everything i've come across so far has been accessible to the programmer, and so you more or less have the freedom to do whatever you want.
How would i go about writing my own version of for or while? Is it even possible?
I don't have any actual application for this, so the answer to any WHY?! questions is just "because why not?" or "curiosity".
No, you can't, not from within Python. You can't add new syntax to the language. (You'd have to modify the source code of Python itself to make your own custom version of Python.)
Note that the iterator protocol allows you to define objects that can be used with for in a custom way, which covers a lot of the possible use cases of writing your own iteration syntax.
Well, you have a couple of options for creating your own syntax:
Write a higher-order function, like map or reduce.
Modify python at the C level. This is, as you might expect, relatively easy as compared with fiddling with many other languages. See this article for an example: http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/
Fake it using the debug facilities, or the encodings facility. See this code: http://entrian.com/goto/download.html and http://timhatch.com/projects/pybraces/
Use a preprocessor. Here's one project that tries to make this easy: http://www.fiber-space.de/langscape/doc/index.html
Use of the python facilities built in to achieve a similar effect (decorators, metaclasses, and the like).
Obviously, none of this is quite what you're looking for, but python, unlike smalltalk or lisp, isn't (necessarily) programmed in itself and guarantees to expose its own underlying execution and parsing mechanisms at runtime.
You can't make equivalent constructs. for, while, if etc. are statements, and they are built into the language with their own specific syntax. There are languages that do allow this sort of thing though (to some degree), such as Scala.
while, print, for etc. are keywords. That means they are parsed by the python parser whilst reading the code, stripped any redundant characters and result in tokens. Afterwards a lexer takes those tokens as input and builds a program tree which is then excuted by the interpreter. Said so, those constructs are used only as syntactic sugar for underlying lexical machinery and as such are not visible from inside the code.
The format function in builtins seems to be like a subset of the str.format method used specifically for the case of a formatting a single object.
eg.
>>> format(13, 'x')
'd'
is apparently preferred over
>>> '{0:x}'.format(13)
'd'
and IMO it does look nicer, but why not just use str.format in every case to make things simpler? Both of these were introduced in 2.6 so there must be a good reason for having both at once, what is it?
Edit: I was asking about str.format and format, not why we don't have a (13).format
tldr; format just calls obj.__format__ and is used by the str.format method which does even more higher level stuff. For the lower level it makes sense to teach an object how to format itself.
It is just syntactic sugar
The fact that this function shares the name and format specification with str.format can be misleading. The existence of str.format is easy to explain: it does complex string interpolation (replacing the old % operator); format can format a single object as string, the smallest subset of str.format specification. So, why do we need format?
The format function is an alternative to the obj.format('fmt') construct found in some OO languages. This decision is consistent with the rationale for len (on why Python uses a function len(x) instead of a property x.length like Javascript or Ruby).
When a language adopts the obj.format('fmt') construct (or obj.length, obj.toString and so on), classes are prevented from having an attribute called format (or length, toString, you got the idea) - otherwise it would shadow the standard method from the language. In this case, the language designers are placing the burden of preventing name clashes on the programmer.
Python is very fond of the PoLA and adopted the __dunder__ (double underscores) convention for built-ins in order to minimize the chance of conflicts between user-defined attributes and the language built-ins. So obj.format('fmt') becomes obj.__format__('fmt'), and of course you can call obj.__format__('fmt') instead of format(obj, 'fmt') (the same way you can call obj.__len__() instead of len(obj)).
Using your example:
>>> '{0:x}'.format(13)
'd'
>>> (13).__format__('x')
'd'
>>> format(13, 'x')
'd'
Which one is cleaner and easier to type? Python design is very pragmatic, it is not only cleaner but is well aligned with the Python's duck-typed approach to OO and gives the language designers freedom to change/extend the underlying implementation without breaking legacy code.
The PEP 3101 introduced the new str.format method and format built-in without any comment on the rationale for the format function, but the implementation is obviously just syntactic sugar:
def format(value, format_spec):
return value.__format__(format_spec)
And here I rest my case.
What Guido said about it (or is it official?)
Quoting the very BDFL about len:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
source: pyfaq#effbot.org (original post here has also the original question Guido was answering). Abarnert suggests also:
There's additional reasoning about len in the Design and History FAQ. Although it's not as complete or as good of an answer, it is indisputably official. – abarnert
Is this a practical concern or just syntax nitpicking?
This is a very practical and real-world concern in languages like Python, Ruby or Javascript because in dynamically typed languages any mutable object is effectively a namespace, and the concept of private methods or attributes is a matter of convention. Possibly I could not put it better than abarnert in his comment:
Also, as far as the namespace-pollution issue with Ruby and JS, it's worth pointing out that this is an inherent problem with dynamically-typed languages. In statically-typed languages as diverse as Haskell and C++, type-specific free functions are not only possible, but idiomatic. (See The Interface Principle.) But in dynamically-typed languages like Ruby, JS, and Python, free functions must be universal. A big part of language/library design for dynamic languages is picking the right set of such functions.
For example, I just left Ember.js in favor of Angular.js because I was tired of namespace conflicts in Ember; Angular handles this using an elegant Python-like strategy of prefixing built-in methods (with $thing in Angular, instead of underscores like python), so they do not conflict with user-defined methods and properties. Yes, the whole __thing__ is not particularly pretty but I'm glad Python took this approach because it is very explicit and avoid the PoLA class of bugs regarding object namespace clashes.
I think format and str.format do different things. Even though you could use str.format for both, it makes sense to have separate versions.
The top level format function is part of the new "formatting protocol" that all objects support. It simply calls the __format__ method of the object it is passed, and returns a string. This is a low-level task, and Python's style is to usually have builtin functions for those. Paulo Scardine's answer explains some of the rationale for this, but I don't think it really addresses the differences between what format and str.format do.
The str.format method is a bit more high-level, and also a bit more complex. It can not only format multiple objects into a single result, but it can also reorder, repeat, index, and do various other transformations on the objects. Don't just think of "{}".format(obj). str.format is really designed for more about complicated tasks, like these:
"{1} {0} {1!r}".format(obj0, obj1) # reorders, repeats, and and calls repr on obj1
"{0.value:.{0.precision}f}".format(obj) # uses attrs of obj for value and format spec
"{obj[name]}".format(obj=my_dict) # takes argument by keyword, and does an item lookup
For the low-level formatting of each item, str.format relies on the same machinery of the format protocol, so it can focus its own efforts on the higher level stuff. I doubt it actually calls the builtin format, rather than its arguments' __format__ methods, but that's an implementation detail.
While ("{:"+format_spec+"}").format(obj) is guaranteed to give the same results as format(obj, format_spec), I suspect the latter will be a bit faster, since it doesn't need to parse the format string to check for any of the complicated stuff. However the overhead may be lost in the noise in a real program.
When it comes to usage (including examples on Stack Overflow), you may see more str.format use simply because some programmers do not know about format, which is both new and fairly obscure. In contrast, it's hard to avoid str.format (unless you have decided to stick with the % operator for all of your formatting). So, the ease (for you and your fellow programmers) of understanding a str.format call may outweigh any performance considerations.
If I am evaluating a Python string using eval(), and have a class like:
class Foo(object):
a = 3
def bar(self, x): return x + a
What are the security risks if I do not trust the string? In particular:
Is eval(string, {"f": Foo()}, {}) unsafe? That is, can you reach os or sys or something unsafe from a Foo instance?
Is eval(string, {}, {}) unsafe? That is, can I reach os or sys entirely from builtins like len and list?
Is there a way to make builtins not present at all in the eval context?
There are some unsafe strings like "[0] * 100000000" I don't care about, because at worst they slow/stop the program. I am primarily concerned about protecting user data external to the program.
Obviously, eval(string) without custom dictionaries is unsafe in most cases.
eval() will allow malicious data to compromise your entire system, kill your cat, eat your dog and make love to your wife.
There was recently a thread about how to do this kind of thing safely on the python-dev list, and the conclusions were:
It's really hard to do this properly.
It requires patches to the python interpreter to block many classes of attacks.
Don't do it unless you really want to.
Start here to read about the challenge: http://tav.espians.com/a-challenge-to-break-python-security.html
What situation do you want to use eval() in? Are you wanting a user to be able to execute arbitrary expressions? Or are you wanting to transfer data in some way? Perhaps it's possible to lock down the input in some way.
You cannot secure eval with a blacklist approach like this. See Eval really is dangerous for examples of input that will segfault the CPython interpreter, give access to any class you like, and so on.
You can get to os using builtin functions: __import__('os').
For python 2.6+, the ast module may help; in particular ast.literal_eval, although it depends on exactly what you want to eval.
Note that even if you pass empty dictionaries to eval(), it's still possible to segfault (C)Python with some syntax tricks. For example, try this on your interpreter: eval("()"*8**5)
You are probably better off turning the question around:
What sort of expressions are you wanting to eval?
Can you insure that only strings matching some narrowly defined syntax are eval()d?
Then consider if that is safe.
For example, if you are wanting to let the user enter an algebraic expression for evaluation, consider limiting them to one letter variable names, numbers, and a specific set of operators and functions. Don't eval() strings containing anything else.
There is a very good article on the un-safety of eval() in Mark Pilgrim's Dive into Python tutorial.
Quoted from this article:
In the end, it is possible to safely
evaluate untrusted Python expressions,
for some definition of “safe” that
turns out not to be terribly useful in
real life. It’s fine if you’re just
playing around, and it’s fine if you
only ever pass it trusted input. But
anything else is just asking for
trouble.
Coming from a C# background the naming convention for variables and method names are usually either camelCase or PascalCase:
// C# example
string thisIsMyVariable = "a"
public void ThisIsMyMethod()
In Python, I have seen the above but I have also seen underscores being used:
# python example
this_is_my_variable = 'a'
def this_is_my_function():
Is there a more preferable, definitive coding style for Python?
See Python PEP 8: Function and Variable Names:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
Variable names follow the same convention as function names.
mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.
The Google Python Style Guide has the following convention:
module_name, package_name, ClassName, method_name, ExceptionName, function_name, GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name.
A similar naming scheme should be applied to a CLASS_CONSTANT_NAME
David Goodger (in "Code Like a Pythonista" here) describes the PEP 8 recommendations as follows:
joined_lower for functions, methods,
attributes, variables
joined_lower or ALL_CAPS for
constants
StudlyCaps for classes
camelCase only to conform to
pre-existing conventions
As the Style Guide for Python Code admits,
The naming conventions of Python's
library are a bit of a mess, so we'll
never get this completely consistent
Note that this refers just to Python's standard library. If they can't get that consistent, then there hardly is much hope of having a generally-adhered-to convention for all Python code, is there?
From that, and the discussion here, I would deduce that it's not a horrible sin if one keeps using e.g. Java's or C#'s (clear and well-established) naming conventions for variables and functions when crossing over to Python. Keeping in mind, of course, that it is best to abide with whatever the prevailing style for a codebase / project / team happens to be. As the Python Style Guide points out, internal consistency matters most.
Feel free to dismiss me as a heretic. :-) Like the OP, I'm not a "Pythonista", not yet anyway.
As mentioned, PEP 8 says to use lower_case_with_underscores for variables, methods and functions.
I prefer using lower_case_with_underscores for variables and mixedCase for methods and functions makes the code more explicit and readable. Thus following the Zen of Python's "explicit is better than implicit" and "Readability counts"
There is PEP 8, as other answers show, but PEP 8 is only the styleguide for the standard library, and it's only taken as gospel therein. One of the most frequent deviations of PEP 8 for other pieces of code is the variable naming, specifically for methods. There is no single predominate style, although considering the volume of code that uses mixedCase, if one were to make a strict census one would probably end up with a version of PEP 8 with mixedCase. There is little other deviation from PEP 8 that is quite as common.
further to what #JohnTESlade has answered. Google's python style guide has some pretty neat recommendations,
Names to Avoid
single character names except for counters or iterators
dashes (-) in any package/module name
\__double_leading_and_trailing_underscore__ names (reserved by Python)
Naming Convention
"Internal" means internal to a module or protected or private within a class.
Prepending a single underscore (_) has some support for protecting module variables and functions (not included with import * from). Prepending a double underscore (__) to an instance variable or method effectively serves to make the variable or method private to its class (using name mangling).
Place related classes and top-level functions together in a module. Unlike Java, there is no need to limit yourself to one class per module.
Use CapWords for class names, but lower_with_under.py for module names. Although there are many existing modules named CapWords.py, this is now discouraged because it's confusing when the module happens to be named after a class. ("wait -- did I write import StringIO or from StringIO import StringIO?")
Guidelines derived from Guido's Recommendations
Most python people prefer underscores, but even I am using python since more than 5 years right now, I still do not like them. They just look ugly to me, but maybe that's all the Java in my head.
I simply like CamelCase better since it fits better with the way classes are named, It feels more logical to have SomeClass.doSomething() than SomeClass.do_something(). If you look around in the global module index in python, you will find both, which is due to the fact that it's a collection of libraries from various sources that grew overtime and not something that was developed by one company like Sun with strict coding rules. I would say the bottom line is: Use whatever you like better, it's just a question of personal taste.
Personally I try to use CamelCase for classes, mixedCase methods and functions. Variables are usually underscore separated (when I can remember). This way I can tell at a glance what exactly I'm calling, rather than everything looking the same.
There is a paper about this: http://www.cs.kent.edu/~jmaletic/papers/ICPC2010-CamelCaseUnderScoreClouds.pdf
TL;DR It says that snake_case is more readable than camelCase. That's why modern languages use (or should use) snake wherever they can.
The coding style is usually part of an organization's internal policy/convention standards, but I think in general, the all_lower_case_underscore_separator style (also called snake_case) is most common in python.
I personally use Java's naming conventions when developing in other programming languages as it is consistent and easy to follow. That way I am not continuously struggling over what conventions to use which shouldn't be the hardest part of my project!
Whether or not being in class or out of class:
A variable and function are lowercase as shown below:
name = "John"
def display(name):
print("John")
And if they're more than one word, they're separated with underscore "_" as shown below:
first_name = "John"
def display_first_name(first_name):
print(first_name)
And, if a variable is a constant, it's uppercase as shown below:
FIRST_NAME = "John"
Lenin has told... I'm from Java/C# world too. And SQL as well.
Scrutinized myself in attempts to find first sight understandable examples of complex constructions like list in the dictionary of lists where everything is an object.
As for me - camelCase or their variants should become standard for any language. Underscores should be preserved for complex sentences.
Typically, one follow the conventions used in the language's standard library.