Managing Perl habits in a Python environment

Managing Perl habits in a Python environment - python

Perl habits die hard. Variable declaration, scoping, global/local is different between the 2 languages. Is there a set of recommended python language idioms that will render the transition from perl coding to python coding less painful.
Subtle variable misspelling can waste an extraordinary amount of time.
I understand the variable declaration issue is quasi-religious among python folks
I'm not arguing for language changes or features, just a reliable bridge between
the 2 languages that will not cause my perl habits sink my python efforts.
Thanks.

Splitting Python classes into separate files (like in Java, one class per file) helps find scoping problems, although this is not idiomatic python (that is, not pythonic).
I have been writing python after much perl and found this from tchrist to be useful, even though it is old:
http://linuxmafia.com/faq/Devtools/python-to-perl-conversions.html
Getting used to doing without perl's most excellent variable scoping has been the second most difficult issue with my perl->python transition. The first is obvious if you have much perl: CPAN.

I like the question, but I don't have any experience in Perl so I'm not sure how to best advise you.
I suggest you do a Google search for "Python idioms". You will find some gems. In particular:
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
http://docs.python.org/dev/howto/doanddont.html
http://jaynes.colorado.edu/PythonIdioms.html
As for the variable "declaration" issue, here's my best advice for you:
Remember that in Python, objects have a life of their own, separate from variable names. A variable name is a tag that is bound to an object. At any time, you may rebind the name to a different object, perhaps of a completely different type. Thus, this is perfectly legal:
x = 1 # bind x to integer, value == 1
x = "1" # bind x to string, value is "1"
Python is in fact strongly typed; try executing the code 1 + "1" and see how well it works, if you don't believe me. The integer object with value 1 does not accept addition of a string value, in the absence of explicit type coercion. So Python names never ever have sigil characters that flag properties of the variable; that's just not how Python does things. Any legal identifier name could be bound to any Python object of any type.

In python $_ does not exist except in the python shell and variables with global scope are frowned upon.
In practice this has two major effects:
In Python you can't use regular expressions as naturally as Perl, s0 matching each iterated $_ and similarly catching matches is more cumbersome
Python functions tend to be called explicitly or have default variables
However these differences are fairly minor when one considers that in Python just about everything becomes a class. When I used to do Perl I thought of "carving"; in Python I rather feel I am "composing".
Python doesn't have the idiomatic richness of Perl and I think it is probably a mistake to attempt to do the translation.

Read, understand, follow, and love PEP 8, which details the style guidelines for everything about Python.
Seriously, if you want to know about the recommended idioms and habits of Python, that's the source.

Don't mis-type your variable names. Seriously. Use short, easy, descriptive ones, use them locally, and don't rely on the global scope.
If you're doing a larger project that isn't served well by this, use pylint, unit tests and coverage.py to make SURE your code does what you expect.
Copied from a comment in one of the other threads:
"‘strict vars’ is primarily intended to stop typoed references and missed-out ‘my’s from creating accidental globals (well, package variables in Perl terms). This can't happen in Python as bare assignments default to local declaration, and bare unassigned symbols result in an exception."

Related

A technique for a C++-to-Python migrant to lessen the impact of missing identifier declarations

Before coming to a concrete example, let me mention the problem. As a beginning Python programmer with extensive experience in C++, I'm always missing variable declarations. I could yield to the temptation of documenting the type of every nontrivial identifier, but I have a feeling that that would not be terribly pythonic. For one thing, it would be silly that neither the interpreter nor any tool parse these informal declarations. And if the interpreter did, that would be an entirely different language.
As an alternative to writing mere comments, I am contemplating switching to a mode of creating datatypes whose only purpose is to enforce types/interfaces. They would streamline the code and would make me detect type errors at earlier stages. For this convenience I would be paying a little loss in efficiency from the indirection.
For example, to avoid writing as a comment "Dictionary of Employee objects indexed by employeeID", I would write a wrapper class called "EmployeeDict", whose interface would limit the operations that can/cannot be performed.
Would such an idea fly in the long term? Does it defeat the spirit of Python in some way? Is it used by experienced Pythonistas?
For those conversant in C++, I would in other words be translating
typedef std::map<EmployeeId, Employee> MyMap;
into a type. (Though I am not actually porting any code across.)
Update
Even if it's unphythonic, as HumphreyTriscuit confirms, I am loath to write comments that get read by humans without also automating a little the type checking. It's nice that this issue is resolved in 3.5, but I'm stuck for the time being with 2.7, and so I'll mark jsbueno's answer correct until someone can suggest a way—à la "assert isinstance(param, dict)", but one that also concisely confirms the type of the key/value, somewhat paralleling C++—to solve this problem in 2.7.

Actually, as of Python 3.5, the language comes bundled with tools for parameter type annotations that is introspectable by third party tools- of which tehre might be some ut there already.
Anyway, take a look at https://www.python.org/dev/peps/pep-0484/
Even if you don't use any other tools - the way described on PEP 484 above is the "Pythonic way" of declaring types, that won't conflict with other 3rd party tools. So,if you want to write a tool chain of yours as you describe, you should start by creating using function annotations as described on that PEP.
That is good for documenting (and enfocing if the case be), parameters and return values. For class attributes, you can check this answer of mine, based on crafting a special __setitem__ method on a abse class of your hierarchy:
Force python class member variable to be specific type
As for local variables - there is no way to enforce/check their type but code comments.
And a last advise to keep you "on the Python way" remember to be permissive and check for interfaces, rather than specific classes.

How do the for / while / print things work in python?

What i mean is, how is the syntax defined, i.e. how can i make my own constructs like these?
I realise in a lot of languages, things like this will be built into the compiler / spec, and so it's dealt with by the compiler (at least that how i understand it to work).
But with python, everything i've come across so far has been accessible to the programmer, and so you more or less have the freedom to do whatever you want.
How would i go about writing my own version of for or while? Is it even possible?
I don't have any actual application for this, so the answer to any WHY?! questions is just "because why not?" or "curiosity".

No, you can't, not from within Python. You can't add new syntax to the language. (You'd have to modify the source code of Python itself to make your own custom version of Python.)
Note that the iterator protocol allows you to define objects that can be used with for in a custom way, which covers a lot of the possible use cases of writing your own iteration syntax.

Well, you have a couple of options for creating your own syntax:
Write a higher-order function, like map or reduce.
Modify python at the C level. This is, as you might expect, relatively easy as compared with fiddling with many other languages. See this article for an example: http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/
Fake it using the debug facilities, or the encodings facility. See this code: http://entrian.com/goto/download.html and http://timhatch.com/projects/pybraces/
Use a preprocessor. Here's one project that tries to make this easy: http://www.fiber-space.de/langscape/doc/index.html
Use of the python facilities built in to achieve a similar effect (decorators, metaclasses, and the like).
Obviously, none of this is quite what you're looking for, but python, unlike smalltalk or lisp, isn't (necessarily) programmed in itself and guarantees to expose its own underlying execution and parsing mechanisms at runtime.

You can't make equivalent constructs. for, while, if etc. are statements, and they are built into the language with their own specific syntax. There are languages that do allow this sort of thing though (to some degree), such as Scala.

while, print, for etc. are keywords. That means they are parsed by the python parser whilst reading the code, stripped any redundant characters and result in tokens. Afterwards a lexer takes those tokens as input and builds a program tree which is then excuted by the interpreter. Said so, those constructs are used only as syntactic sugar for underlying lexical machinery and as such are not visible from inside the code.

Python Core Library and PEP8

I was trying to understand why Python is said to be a beautiful language. I was directed to the beauty of PEP 8... and it was strange. In fact it says that you can use any convention you want, just be consistent... and suddenly I found some strange things in the core library:
request()
getresponse()
set_debuglevel()
endheaders()
http://docs.python.org/py3k/library/http.client.html
The below functions are new in the Python 3.1. What part of PEP 8 convention is used here?
popitem()
move_to_end()
http://docs.python.org/py3k/library/collections.html
So my question is: is PEP 8 used in the core library, or not? Why is it like that?
Is there the same situation as in PHP where I cannot just remember the name of the function because there are possible all ways of writing the name?
Why PEP 8 is not used in the core library even for the new functions?

PEP 8 recommends using underscores as the default choice, but leaving them out is generally done for one of two reasons:
consistency with some other API (e.g. the current module, or a standard interface)
because leaving them out doesn't hurt readability (or even improves it)
To address the specific examples you cite:
popitem is a longstanding method on dict objects. Other APIs that adopt it retain that spelling (i.e. no underscore).
move_to_end is completely new. Despite other methods on the object omitting underscores, it follows the recommended PEP 8 convention of using underscores, since movetoend is hard to read (mainly because toe is a word, so most people's brains will have to back up and reparse once they notice the nd)
set_debuglevel (and the newer set_tunnel) should probably have left the underscore out for consistency with the rest of the HTTPConnection API. However, the original author may simply have preferred set_debuglevel tosetdebuglevel (note that debuglevel is also an argument to the HTTPConnection constructor, explaining the lack of a second underscore) and then the author of set_tunnel simply followed that example.
set_tunnel is actually another case where dropping the underscore arguably hurts readability. The juxtaposition of the two "t"s in settunnel isn't conducive to easy parsing.
Once these inconsistencies make it into a Python release module, it generally isn't worth the hassle to try and correct them (this was done to de-Javaify the threading module interface between Python 2 and Python 3, and the process was annoying enough that nobody else has volunteered to "fix" any other APIs afflicted by similar stylistic problems).

From PEP8:
But most importantly: know when to be
inconsistent -- sometimes the style
guide just doesn't apply. When in doubt, use your best judgment. Look
at other examples and decide what looks best. And don't hesitate to
ask!
What you have mentioned here is somewhat consistent with the PEP8 guidelines; actually, the main inconsistencies are in other parts, usually with CamelCase.

The Python standard library is not as tightly controlled as it could be, and the style of modules varies. I'm not sure what your examples are meant to illustrate, but it is true that Python's library does not have one voice, as Java's does, or Win32. The language (and library) are built by an all-volunteer crew, with no corporation paying salaries to people dedicated to the language, and it sometimes shows.
Of course, I believe other factors outweigh this negative, but it is a negative nonetheless.

Python Notation?

I've just started using Python and I was thinking about which notation I should use. I've read the PEP 8 guide about notation for Python and I agree with most stuff there except function names (which I prefer in mixedCase style).
In C++ I use a modified version of the Hungarian notation where I don't include information about type but only about the scope of a variable (for instance lVariable for a local variable and mVariable for a member variable of a class, g for global, s for static, in for a function's input and out for a function's output.)
I don't know if this notation style has a name but I was wondering whether it's a good idea not to use such a notation style in Python. I am not extremely familiar with Python so you guys/gals might see issues that I can't imagine yet.
I'm also interested to see what you think of it in general :) Some people might say it makes the code less readable, but I've become used to it and code written without these labels is the code that is less readable for me.

(Almost every Python programmer will say it makes the code less readable, but I've become used to it and code written without these labels is the code that is less readable for me)
FTFY.
Seriously though, it will help you but confuse and annoy other Python programmers that try to read your code.
This also isn't as necessary because of how Python itself works. For example you would never need your "mVariable" form because it's obvious in Python:
class Example(object):
def__init__(self):
self.my_member_var = "Hello"
def sample(self):
print self.my_member_var
e = Example()
e.sample()
print e.my_member_var
No matter how you access a member variable (using self.foo or myinstance.foo) it's always clear that it's a member.
The other cases might not be so painfully obvious, but if your code isn't simple enough that a reader can keep in mind "the 'names' variable is a parameter" while reading a function you're probably doing something wrong.

Use PEP-8. It is almost universal in the Python world.

I violate PEP8 in my code. I use:
lowercaseCamelCase for methods and functions
_prefixedWithUnderscoreLowercaseCamelCase for "private" methods
underscore_spaced for variables (any)
_prefixed_with_underscore_variables for "private" self variables (attributes)
CapitalizedCamelCase for classes and modules (although I am moving to lowercasedmodules)
I never liked hungarian notation. A variable name should be easy and concise, provide sufficient information to be clear where (in which scope) it's used and what is its purpose, easy to read, concerned about the meaning of what it refers to, not its technical mumbo-jumbo (eg. type).
The reason behind my violations are due to practical considerations, and previous experience.
in C++ and Java, it's tradition to have CapitalizedCamel for classes and lowercaseCamel for member functions.
I worked on a codebase where the underscore prefix was used to indicate private but not that much private. We did not want to mess with the python name mangling (double underscore). This gave us the chance to violate a bit the formalities and peek the internal class state during unit testing.

There exists a handy pep-8 compliance script you can run against your code:
http://github.com/cburroughs/pep8.py/tree/master

It'll depend on the project and the target audience.
If you're building an open source application/plug-in/library, stick with the PEP guidelines.
If this is a project for your company, stick with the company conventions, or something similar.
If this is your own personal project, then use what ever convention is fluid and easy for you to use.
I hope this makes sense.

You should simply be consistent with your naming conventions in your own code. However, if you intend to release your code to other developers you should stick to PEP-8.
For example the 4 spaces vs. 1 tab is a big deal when you have a collaborative project. People submitting code to a source repository with tabs requires developers to be constantly arguing over whitespace issues (which in Python is a BIG deal).
Python and all languages have preferred conventions. You should learn them.
Java likes mixedCaseStuff.
C likes szHungarianNotation.
Python prefers stuff_with_underscores.
You can write Java code with_python_type_function_names.
You can write Python code with javaStyleMixedCaseFunctionNamesThatAreSupposedToBeReallyExplict
as long as your consistant :p

What is the naming convention in Python for variable and function?

Coming from a C# background the naming convention for variables and method names are usually either camelCase or PascalCase:
// C# example
string thisIsMyVariable = "a"
public void ThisIsMyMethod()
In Python, I have seen the above but I have also seen underscores being used:
# python example
this_is_my_variable = 'a'
def this_is_my_function():
Is there a more preferable, definitive coding style for Python?

See Python PEP 8: Function and Variable Names:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
Variable names follow the same convention as function names.
mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.

The Google Python Style Guide has the following convention:
module_name, package_name, ClassName, method_name, ExceptionName, function_name, GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name.
A similar naming scheme should be applied to a CLASS_CONSTANT_NAME

David Goodger (in "Code Like a Pythonista" here) describes the PEP 8 recommendations as follows:
joined_lower for functions, methods,
attributes, variables
joined_lower or ALL_CAPS for
constants
StudlyCaps for classes
camelCase only to conform to
pre-existing conventions

As the Style Guide for Python Code admits,
The naming conventions of Python's
library are a bit of a mess, so we'll
never get this completely consistent
Note that this refers just to Python's standard library. If they can't get that consistent, then there hardly is much hope of having a generally-adhered-to convention for all Python code, is there?
From that, and the discussion here, I would deduce that it's not a horrible sin if one keeps using e.g. Java's or C#'s (clear and well-established) naming conventions for variables and functions when crossing over to Python. Keeping in mind, of course, that it is best to abide with whatever the prevailing style for a codebase / project / team happens to be. As the Python Style Guide points out, internal consistency matters most.
Feel free to dismiss me as a heretic. :-) Like the OP, I'm not a "Pythonista", not yet anyway.

As mentioned, PEP 8 says to use lower_case_with_underscores for variables, methods and functions.
I prefer using lower_case_with_underscores for variables and mixedCase for methods and functions makes the code more explicit and readable. Thus following the Zen of Python's "explicit is better than implicit" and "Readability counts"

There is PEP 8, as other answers show, but PEP 8 is only the styleguide for the standard library, and it's only taken as gospel therein. One of the most frequent deviations of PEP 8 for other pieces of code is the variable naming, specifically for methods. There is no single predominate style, although considering the volume of code that uses mixedCase, if one were to make a strict census one would probably end up with a version of PEP 8 with mixedCase. There is little other deviation from PEP 8 that is quite as common.

further to what #JohnTESlade has answered. Google's python style guide has some pretty neat recommendations,
Names to Avoid
single character names except for counters or iterators
dashes (-) in any package/module name
\__double_leading_and_trailing_underscore__ names (reserved by Python)
Naming Convention
"Internal" means internal to a module or protected or private within a class.
Prepending a single underscore (_) has some support for protecting module variables and functions (not included with import * from). Prepending a double underscore (__) to an instance variable or method effectively serves to make the variable or method private to its class (using name mangling).
Place related classes and top-level functions together in a module. Unlike Java, there is no need to limit yourself to one class per module.
Use CapWords for class names, but lower_with_under.py for module names. Although there are many existing modules named CapWords.py, this is now discouraged because it's confusing when the module happens to be named after a class. ("wait -- did I write import StringIO or from StringIO import StringIO?")
Guidelines derived from Guido's Recommendations

Most python people prefer underscores, but even I am using python since more than 5 years right now, I still do not like them. They just look ugly to me, but maybe that's all the Java in my head.
I simply like CamelCase better since it fits better with the way classes are named, It feels more logical to have SomeClass.doSomething() than SomeClass.do_something(). If you look around in the global module index in python, you will find both, which is due to the fact that it's a collection of libraries from various sources that grew overtime and not something that was developed by one company like Sun with strict coding rules. I would say the bottom line is: Use whatever you like better, it's just a question of personal taste.

Personally I try to use CamelCase for classes, mixedCase methods and functions. Variables are usually underscore separated (when I can remember). This way I can tell at a glance what exactly I'm calling, rather than everything looking the same.

There is a paper about this: http://www.cs.kent.edu/~jmaletic/papers/ICPC2010-CamelCaseUnderScoreClouds.pdf
TL;DR It says that snake_case is more readable than camelCase. That's why modern languages use (or should use) snake wherever they can.

The coding style is usually part of an organization's internal policy/convention standards, but I think in general, the all_lower_case_underscore_separator style (also called snake_case) is most common in python.

I personally use Java's naming conventions when developing in other programming languages as it is consistent and easy to follow. That way I am not continuously struggling over what conventions to use which shouldn't be the hardest part of my project!

Whether or not being in class or out of class:
A variable and function are lowercase as shown below:
name = "John"
def display(name):
print("John")
And if they're more than one word, they're separated with underscore "_" as shown below:
first_name = "John"
def display_first_name(first_name):
print(first_name)
And, if a variable is a constant, it's uppercase as shown below:
FIRST_NAME = "John"

Lenin has told... I'm from Java/C# world too. And SQL as well.
Scrutinized myself in attempts to find first sight understandable examples of complex constructions like list in the dictionary of lists where everything is an object.
As for me - camelCase or their variants should become standard for any language. Underscores should be preserved for complex sentences.

Typically, one follow the conventions used in the language's standard library.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.