Naming convention for containers in the Python standard library

Naming convention for containers in the Python standard library - python

Consider the naming convention used for different types of containers in the Python standard library:
Why do some methods follow camel case, but others like deque and defaultdict don't? How are these methods different from each other in a way that would explain this difference?
If it's because at some point the convention changed, why wouldn't the module e.g. provide alias them with camel case names to old names as well?

Usually in python, class names follow the "pascal" case convention, methods / functions follow the "snake" case convention.
But here is a official reference from https://www.python.org/dev/peps/pep-0008/:
Package and Module Names
Modules should have short, all-lowercase
names. Underscores can be used in the module name if it improves
readability. Python packages should also have short, all-lowercase
names, although the use of underscores is discouraged.
When an extension module written in C or C++ has an accompanying
Python module that provides a higher level (e.g. more object oriented)
interface, the C/C++ module has a leading underscore (e.g. _socket).
Class names
Class names should normally use the CapWords convention.
The naming convention for functions may be used instead in cases where
the interface is documented and used primarily as a callable.
Note that there is a separate convention for builtin names: most
builtin names are single words (or two words run together), with the
CapWords convention used only for exception names and builtin
constants.

Related

What is the capitalization standard for class names in the Python Standard Library?

The norm for Python standard library classes seems to be that class names are lowercase - this appears to hold true for built-ins such as str and int as well as for most classes that are part of standard library modules that must be imported such as datetime.date or datetime.datetime.
But, certain standard library classes such as enum.Enum and decimal.Decimal are capitalized. At first glance, it might seem that classes are capitalized when their name is equal to the module name, but that does not hold true in all cases (such as datetime.datetime).
What's the rationale/logic behind the capitalization conventions for class names in the Python Standard Library?

The Key Resources section of the Developers Guide lists PEP 8 as the style guide.
From PEP 8 Naming Conventions, emphasis mine.
The naming conventions of Python's library are a bit of a mess, so
we'll never get this completely consistent -- nevertheless, here are
the currently recommended naming standards. New modules and packages
(including third party frameworks) should be written to these
standards, but where an existing library has a different style,
internal consistency is preferred.
Also from PEP 8
A style guide is about consistency. Consistency with this style guide
is important. Consistency within a project is more important.
Consistency within one module or function is the most important.
...
Some other good reasons to ignore a particular guideline:
To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean
up someone else's mess (in true XP style).
Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.
You probably will never know why Standard Library naming conventions conflict with PEP 8 but it is probably a good idea to follow it for new stuff or even in your own projects.

Pep 8 is considered to be the standard style guide by many Python devs. This recommends to name classes using CamelCase/CapWords.
The naming convention for functions may be used instead in cases where the interface is documented and used primarily as a callable.
Note that there is a separate convention for builtin names: most builtin names are single words (or two words run together), with the CapWords convention used only for exception names and builtin constants.
Check this link for PEP8 naming conventions and standards.
datetime is a part of standard library,
Python’s standard library is very extensive, offering a wide range of facilities as indicated by the long table of contents listed below. The library contains built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming.
In some cases, like sklearn, nltk, django, the package names are all lowercase. This link will take you there.
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket ).
I hope this covers all the questions.

Python Variable Naming vs C++ Variable Naming

I am starting to get deeper in learning Python and having a great time doing it but there is one thing that is bothering me. Why is it in C++ answers/tutorial/books are variables namedLikeThis and in Python they are named_like_this?
Is this just personal preference or is it a convention that should be followed for readability/clarity sake? Not a big deal but I don't want to be the weird guy writing annoying looking code.

People writing Python generally follow PEP8 (http://www.python.org/dev/peps/pep-0008/).

To expand on #thebjorn's answer, from PEP8:
Package and Module Names
Modules should have short, all-lowercase names. Underscores can be
used in the module name if it improves readability. Python packages
should also have short, all-lowercase names, although the use of
underscores is discouraged.
...
Class Names
Class names should normally use the CapWords convention.
...
Function Names
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.
...
Method Names and Instance Variables
Use the function naming rules: lowercase with words separated by
underscores as necessary to improve readability.
...
Constants
Constants are usually defined on a module level and written in all
capital letters with underscores separating words.
I definitely recommend reading PEP8 in its entirety, not only the naming conventions part. You'll be a much better Pythonista for it :)

What is the underscore prefix for python file name?

In cherryPy for example, there are files like:
__init__.py
_cptools.py
How are they different? What does this mean?

__...__ means reserved Python name (both in filenames and in other names). You shouldn't invent your own names using the double-underscore notation; and if you use existing, they have special functionality.
In this particular example, __init__.py defines the 'main' unit for a package; it also causes Python to treat the specific directory as a package. It is the unit that will be used when you call import cherryPy (and cherryPy is a directory). This is briefly explained in the Modules tutorial.
Another example is the __eq__ method which provides equality comparison for a class. You are allowed to call those methods directly (and you use them implicitly when you use the == operator, for example); however, newer Python versions may define more such methods and thus you shouldn't invent your own __-names because they might then collide. You can find quite a detailed list of such methods in Data model docs.
_... is often used as 'internal' name. For example, modules starting with _ shouldn't be used directly; similarly, methods with _ are supposedly-private and so on. It's just a convention but you should respect it.

These, and other, naming conventions are described in detail in Style Guide for Python Code - Descriptive: Naming Styles
Briefly:
__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g.__init__, __import__ or __file__. Never invent such names; only use them as documented.
_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.

__init__.py is a special file that, when existing in a folder turns that folder into module. Upon importing the module, __init__.py gets executed. The other one is just a naming convention but I would guess this would say that you shouldn't import that file directly.
Take a look here: 6.4. Packages for an explanation of how to create modules.
General rule: If anything in Python is namend __anything__ then it is something special and you should read about it before using it (e.g. magic functions).

The current chosen answer already gave good explanation on the double-underscore notation for __init__.py.
And I believe there is no real need for _cptools.py notation in a filename. It is presumably an unnecessary extended usage of applying the "single leading underscore" rule from the Style Guide for Python Code - Descriptive: Naming Styles:
_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.
If anything, the said Style Guide actually is against using _single_leading_underscore.py in filename. Its Package and Module Names section only mentions such usage when a module is implemented in C/C++.
In general, that _single_leading_underscore notation is typically observed in function names, method names and member variables, to differentiate them from other normal methods.
There is few need (if any at all), to use _single_leading_underscore.py on filename, because the developers are not scrapers , they are unlikely to salvage a file based on its filename. They would just follow a package's highest level of APIs (technically speaking, its exposed entities defined by __all__), therefore all the filenames are not even noticeable, let alone to be a factor, of whether a file (i.e. module) would be used.

Python naming conventions for modules

I have a module whose purpose is to define a class called "nib". (and a few related classes too.) How should I call the module itself? "nib"? "nibmodule"? Anything else?

Just nib. Name the class Nib, with a capital N. For more on naming conventions and other style advice, see PEP 8, the Python style guide.

I would call it nib.py. And I would also name the class Nib.
In a larger python project I'm working on, we have lots of modules defining basically one important class. Classes are named beginning with a capital letter. The modules are named like the class in lowercase. This leads to imports like the following:
from nib import Nib
from foo import Foo
from spam.eggs import Eggs, FriedEggs
It's a bit like emulating the Java way. One class per file. But with the added flexibility, that you can allways add another class to a single file if it makes sense.

I know my solution is not very popular from the pythonic point of view, but I prefer to use the Java approach of one module->one class, with the module named as the class.
I do understand the reason behind the python style, but I am not too fond of having a very large file containing a lot of classes. I find it difficult to browse, despite folding.
Another reason is version control: having a large file means that your commits tend to concentrate on that file. This can potentially lead to a higher quantity of conflicts to be resolved. You also loose the additional log information that your commit modifies specific files (therefore involving specific classes). Instead you see a modification to the module file, with only the commit comment to understand what modification has been done.
Summing up, if you prefer the python philosophy, go for the suggestions of the other posts. If you instead prefer the java-like philosophy, create a Nib.py containing class Nib.

nib is fine. If in doubt, refer to the Python style guide.
From PEP 8:
Package and Module Names
Modules should have short, all-lowercase names. Underscores can be used
in the module name if it improves readability. Python packages should
also have short, all-lowercase names, although the use of underscores is
discouraged.
Since module names are mapped to file names, and some file systems are
case insensitive and truncate long names, it is important that module
names be chosen to be fairly short -- this won't be a problem on Unix,
but it may be a problem when the code is transported to older Mac or
Windows versions, or DOS.
When an extension module written in C or C++ has an accompanying Python
module that provides a higher level (e.g. more object oriented)
interface, the C/C++ module has a leading underscore (e.g. _socket).

From PEP-8: Package and Module Names:
Modules should have short, all-lowercase names. Underscores can be
used in the module name if it improves readability.
Python packages should also have short, all-lowercase names, although the use of
underscores is discouraged.
When an extension module written in C or C++ has an accompanying
Python module that provides a higher level (e.g. more object oriented)
interface, the C/C++ module has a leading underscore (e.g. _socket).

What is the naming convention in Python for variable and function?

Coming from a C# background the naming convention for variables and method names are usually either camelCase or PascalCase:
// C# example
string thisIsMyVariable = "a"
public void ThisIsMyMethod()
In Python, I have seen the above but I have also seen underscores being used:
# python example
this_is_my_variable = 'a'
def this_is_my_function():
Is there a more preferable, definitive coding style for Python?

See Python PEP 8: Function and Variable Names:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
Variable names follow the same convention as function names.
mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.

The Google Python Style Guide has the following convention:
module_name, package_name, ClassName, method_name, ExceptionName, function_name, GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name.
A similar naming scheme should be applied to a CLASS_CONSTANT_NAME

David Goodger (in "Code Like a Pythonista" here) describes the PEP 8 recommendations as follows:
joined_lower for functions, methods,
attributes, variables
joined_lower or ALL_CAPS for
constants
StudlyCaps for classes
camelCase only to conform to
pre-existing conventions

As the Style Guide for Python Code admits,
The naming conventions of Python's
library are a bit of a mess, so we'll
never get this completely consistent
Note that this refers just to Python's standard library. If they can't get that consistent, then there hardly is much hope of having a generally-adhered-to convention for all Python code, is there?
From that, and the discussion here, I would deduce that it's not a horrible sin if one keeps using e.g. Java's or C#'s (clear and well-established) naming conventions for variables and functions when crossing over to Python. Keeping in mind, of course, that it is best to abide with whatever the prevailing style for a codebase / project / team happens to be. As the Python Style Guide points out, internal consistency matters most.
Feel free to dismiss me as a heretic. :-) Like the OP, I'm not a "Pythonista", not yet anyway.

As mentioned, PEP 8 says to use lower_case_with_underscores for variables, methods and functions.
I prefer using lower_case_with_underscores for variables and mixedCase for methods and functions makes the code more explicit and readable. Thus following the Zen of Python's "explicit is better than implicit" and "Readability counts"

There is PEP 8, as other answers show, but PEP 8 is only the styleguide for the standard library, and it's only taken as gospel therein. One of the most frequent deviations of PEP 8 for other pieces of code is the variable naming, specifically for methods. There is no single predominate style, although considering the volume of code that uses mixedCase, if one were to make a strict census one would probably end up with a version of PEP 8 with mixedCase. There is little other deviation from PEP 8 that is quite as common.

further to what #JohnTESlade has answered. Google's python style guide has some pretty neat recommendations,
Names to Avoid
single character names except for counters or iterators
dashes (-) in any package/module name
\__double_leading_and_trailing_underscore__ names (reserved by Python)
Naming Convention
"Internal" means internal to a module or protected or private within a class.
Prepending a single underscore (_) has some support for protecting module variables and functions (not included with import * from). Prepending a double underscore (__) to an instance variable or method effectively serves to make the variable or method private to its class (using name mangling).
Place related classes and top-level functions together in a module. Unlike Java, there is no need to limit yourself to one class per module.
Use CapWords for class names, but lower_with_under.py for module names. Although there are many existing modules named CapWords.py, this is now discouraged because it's confusing when the module happens to be named after a class. ("wait -- did I write import StringIO or from StringIO import StringIO?")
Guidelines derived from Guido's Recommendations

Most python people prefer underscores, but even I am using python since more than 5 years right now, I still do not like them. They just look ugly to me, but maybe that's all the Java in my head.
I simply like CamelCase better since it fits better with the way classes are named, It feels more logical to have SomeClass.doSomething() than SomeClass.do_something(). If you look around in the global module index in python, you will find both, which is due to the fact that it's a collection of libraries from various sources that grew overtime and not something that was developed by one company like Sun with strict coding rules. I would say the bottom line is: Use whatever you like better, it's just a question of personal taste.

Personally I try to use CamelCase for classes, mixedCase methods and functions. Variables are usually underscore separated (when I can remember). This way I can tell at a glance what exactly I'm calling, rather than everything looking the same.

There is a paper about this: http://www.cs.kent.edu/~jmaletic/papers/ICPC2010-CamelCaseUnderScoreClouds.pdf
TL;DR It says that snake_case is more readable than camelCase. That's why modern languages use (or should use) snake wherever they can.

The coding style is usually part of an organization's internal policy/convention standards, but I think in general, the all_lower_case_underscore_separator style (also called snake_case) is most common in python.

I personally use Java's naming conventions when developing in other programming languages as it is consistent and easy to follow. That way I am not continuously struggling over what conventions to use which shouldn't be the hardest part of my project!

Whether or not being in class or out of class:
A variable and function are lowercase as shown below:
name = "John"
def display(name):
print("John")
And if they're more than one word, they're separated with underscore "_" as shown below:
first_name = "John"
def display_first_name(first_name):
print(first_name)
And, if a variable is a constant, it's uppercase as shown below:
FIRST_NAME = "John"

Lenin has told... I'm from Java/C# world too. And SQL as well.
Scrutinized myself in attempts to find first sight understandable examples of complex constructions like list in the dictionary of lists where everything is an object.
As for me - camelCase or their variants should become standard for any language. Underscores should be preserved for complex sentences.

Typically, one follow the conventions used in the language's standard library.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.