Why Python built-in libraries does not follow naming convention? - python

This error is present on some of the main Python built-in libraries. For example:
"foo".startswith("bar") # instead of .starts_with
re.findall("[ab]", "foobar"]) # instead of .find_all
Is this just a compatibility issue? PEP 8 states that method names must be written in lowercase with words separated by underscores.

Related

How does the jupyter notebook parse unicode variable names and why do I get a bug?

One thing that I quite like working in Python inside jupyter notebook is that I can use some unicode symbols to name my variables. For example, to use greek letters, I type \alpha followed by tab and I get Ξ±.
I just ran into an unexpected behaviour when using a bold capital T, \bfT followed by tab which results in 𝐓.
The experiment is the following. Inside a cell (running Python 3) type:
T = 1
𝐓 = 2
print(T) # prints 2
To my surprise, the second line is reassigning the variable T but I would expect it to be different from 𝐓. Can somebody please explain what's the catch with using Unicode?
I don't know if it helps, but as another experiment, I can see that the same two symbols as strings are in fact different
'T'.encode('utf8'), '𝐓'.encode('utf8') # (b'T', b'\xf0\x9d\x90\x93')
How is the notebook processing my variable names?
This behaviour is defined in the python language specification for identifiers (variable names). https://docs.python.org/3/reference/lexical_analysis.html#identifiers
2.3. Identifiers and keywords
Identifiers (also referred to as names) are described by the following lexical definitions.
The syntax of identifiers in Python is based on the Unicode standard
annex UAX-31, with elaboration and changes as defined below; see also
PEP 3131 for further details.
[...]
All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC.
We can confirm that T and 𝐓 are equivalent under NFKC using the standard library unicodedata module.
>>> import unicodedata
>>> unicodedata.normalize('NFKC','𝐓') == 'T'
True
So you should avoid using so similar unicode characters in the same scope as unique variable names.
But there's still a lot of unicode characters that are unique and can be safely used in identifiers:
>>> unicodedata.normalize('NFKC','πŸ’©Ξ»')
'πŸ’©Ξ»'

Two different definition of strip method in Python (2.7.14rc1) official documentation? str.strip([chars]) vs string.strip(s[, chars])

When I search the strip method for a string in Python (2.7.14rc1) official documentation, I found out there are two definitions. That is str.strip([chars]) and string.strip(s[, chars])
My question is: What are the differences between the two definitions? Which one should I follow?
str.strip([chars]) is in 5. Built-in Types section: https://docs.python.org/2/library/stdtypes.html#str.strip
string.strip(s[, chars]) is in 7. String Services section:https://docs.python.org/2/library/string.html#string.strip
There is no functional differences between the two methods aside from the fact that string.strip() requires an extra import (import string) to be called.
In addition, string.strip() is not included in Python 3, and .strip() (the method that acts directly on the str object) is seen more often compared to string.strip().
Other than that, you are free to use whichever you would like to use.
As strip() is available as a method on the built-in str type, you can just use that without importing the string module.
And if you look into the code, you can see that string.strip() still uses str.strip() under the hood.

Why an underscore in `float.is_integer`, but not in `str.isnumeric`?

It appears that float.is_integer is the only "is" method with an underscore in its name among built-in types in Python. Examples that don't include an underscore: str.isalnum, str.isalpha, str.isdecimal, str.isdigit, str.isidentifier, str.islower, str.isnumeric, str.isprintable, str.isspace, str.istitle, str.isupper.
Any clues as to why?
By PEP 8, I would expect all these names to include an underscore. But practicality beats purity (PEP 20), so omitting the underscore in frequently used and short names makes sense. However, both naming conventions at once seems as a consequence of backward compatibility (with the logging module as the canonical example).
A similar question has been asked on the Python bug tracker:
Compare isinstance, issubclass, and islower to is_integer, is_fifo, and is_enabled. In Python 3.6, of all the names in the standard library starting with is, I count 69 names with the underscore and 91 without. It seems better to pick one way or the other and stick with it. I would recommend using the underscore, for legibility.
And the answer (from R. David Murray, a Python core developer) there was:
Yep, that would be nice. But Python has evolved over time, and we must maintain backward compatibility. The names are what they are.
Given the numbers in the question it seems like is_integer is not the only method with an underscore.

Can a Python package name (on pypi) contain an diaeresis/"umlaut"?

Can a Python package name contain an umlaut, i.e. "Γ€", "ΓΌ" or "ΓΆ"? Are there limitations and differences (encoding, OS, Python 2 vs 3)?
https://en.wikipedia.org/wiki/Diaeresis_(diacritic)
Python 2.x does not allow any characters other than letters, numbers, and underscores.
Python 3.x supports far more characters, including the umlaut and other letters with diaereses. However, it is not recommended to use special characters in your identifier names. This could make it difficult for other users to use your package or read your identifier name.
https://www.python.org/dev/peps/pep-3131/
https://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html

What are the requirements for naming python modules?

I've been going through Learn Python The Hard Way as a sort of refresher. Instead of naming each example ex#.py (where # is the number of the exercise), however, I've just been calling them #.py. This worked fine until I got to Exercise 25, which requires you to import a module you just created through the interpreter. When I try this the following happens:
>>> import 25
File "<stdin>", line 1
import 25
^
SyntaxError: invalid syntax
I tried renaming the file to ex25.py and it then worked as expected (>>> import ex25). What I'm wondering is what are the naming requirements for python modules? I had a look at the official documentation here but didn't see it mention any restrictions.
Edit: All three answers by iCodez, Pavel and BrenBarn give good resources and help answer different aspects of this question. I ended up picking iCodez's answer as the correct one simply because it was the first answer.
Modules that you import with the import statement must follow the same naming rules set for variable names (identifiers). Specifically, they must start with either a letter1 or an underscore and then be composed entirely of letters, digits2, and/or underscores.
You may also be interested in what PEP 8, the official style-guide for Python code, has to say about module names:
Modules should have short, all-lowercase names. Underscores can be
used in the module name if it improves readability. Python packages
should also have short, all-lowercase names, although the use of
underscores is discouraged.
1 Letters are the ASCII characters A-Z and a-z.
2 Digits are the ASCII characters 0-9.
The explicit rules for what is allowed to be a valid identifier (variable, module name etc.) can be found here: https://docs.python.org/dev/reference/lexical_analysis.html#identifiers
In your case, this is the relevant sentence:
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.
Strictly speaking, you can name a Python file anything you want. However, in order to import it using the import statement, the filename needs to be a valid Python identifier --- something you could use as a variable name. That means it must use only alphanumerics and underscores, and not start with a digit. This is because the grammar of the import statement requires the module name to be an identifier.
This is why you didn't see the problem until you got to an exercise that requires importing. You can run a Python script with a numeric name from the command line with python 123.py, but you won't be able to import that module.

Categories

Resources