What are the requirements for naming python modules?

What are the requirements for naming python modules? - python

I've been going through Learn Python The Hard Way as a sort of refresher. Instead of naming each example ex#.py (where # is the number of the exercise), however, I've just been calling them #.py. This worked fine until I got to Exercise 25, which requires you to import a module you just created through the interpreter. When I try this the following happens:
>>> import 25
File "<stdin>", line 1
import 25
^
SyntaxError: invalid syntax
I tried renaming the file to ex25.py and it then worked as expected (>>> import ex25). What I'm wondering is what are the naming requirements for python modules? I had a look at the official documentation here but didn't see it mention any restrictions.
Edit: All three answers by iCodez, Pavel and BrenBarn give good resources and help answer different aspects of this question. I ended up picking iCodez's answer as the correct one simply because it was the first answer.

Modules that you import with the import statement must follow the same naming rules set for variable names (identifiers). Specifically, they must start with either a letter1 or an underscore and then be composed entirely of letters, digits2, and/or underscores.
You may also be interested in what PEP 8, the official style-guide for Python code, has to say about module names:
Modules should have short, all-lowercase names. Underscores can be
used in the module name if it improves readability. Python packages
should also have short, all-lowercase names, although the use of
underscores is discouraged.
1 Letters are the ASCII characters A-Z and a-z.
2 Digits are the ASCII characters 0-9.

The explicit rules for what is allowed to be a valid identifier (variable, module name etc.) can be found here: https://docs.python.org/dev/reference/lexical_analysis.html#identifiers
In your case, this is the relevant sentence:
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.

Strictly speaking, you can name a Python file anything you want. However, in order to import it using the import statement, the filename needs to be a valid Python identifier --- something you could use as a variable name. That means it must use only alphanumerics and underscores, and not start with a digit. This is because the grammar of the import statement requires the module name to be an identifier.
This is why you didn't see the problem until you got to an exercise that requires importing. You can run a Python script with a numeric name from the command line with python 123.py, but you won't be able to import that module.

Related

Importing custom packages in python [duplicate]

Basically when I have a python file like:
python-code.py
and use:
import (python-code)
the interpreter gives me syntax error.
Any ideas on how to fix it? Are dashes illegal in python file names?

You should check out PEP 8, the Style Guide for Python Code:
Package and Module Names Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
Since module names are mapped to file names, and some file systems are case insensitive and truncate long names, it is important that module names be chosen to be fairly short -- this won't be a problem on Unix, but it may be a problem when the code is transported to older Mac or Windows versions, or DOS.
In other words: rename your file :)

One other thing to note in your code is that import is not a function. So import(python-code) should be import python-code which, as some have already mentioned, is interpreted as "import python minus code", not what you intended. If you really need to import a file with a dash in its name, you can do the following::
python_code = __import__('python-code')
But, as also mentioned above, this is not really recommended. You should change the filename if it's something you control.

TLDR
Dashes are not illegal but you should not use them for 3 reasons:
You need special syntax to import files with dashes
Nobody expects a module name with a dash
It's against the recommendations of the Python Style Guide
If you definitely need to import a file name with a dash the special syntax is this:
module_name = __import__('module-name')
Curious about why we need special syntax?
The reason for the special syntax is that when you write import somename you're creating a module object with identifier somename (so you can later use it with e.g. somename.funcname). Of course module-name is not a valid identifier and hence the special syntax that gives a valid one.
You don't get why module-name is not valid identifier?
Don't worry -- I didn't either. Here's a tip to help you: Look at this python line: x=var1-var2. Do you see a subtraction on the right side of the assignment or a variable name with a dash?
PS
Nothing original in my answer except including what I considered to be the most relevant bits of information from all other answers in one place

The problem is that python-code is not an identifier. The parser sees this as python minus code. Of course this won't do what you're asking. You will need to use a filename that is also a valid python identifier. Try replacing the - with an underscore.

On Python 3 use import_module:
from importlib import import_module
python_code = import_module('python-code')
More generally,
import_module('package.subpackage.module')

You could probably import it through some __import__ hack, but if you don't already know how, you shouldn't. Python module names should be valid variable names ("identifiers") -- that means if you have a module foo_bar, you can use it from within Python (print foo_bar). You wouldn't be able to do so with a weird name (print foo-bar -> syntax error).

Although proper file naming is the best course, if python-code is not under our control, a hack using __import__ is better than copying, renaming, or otherwise messing around with other authors' code. However, I tried and it didn't work unless I renamed the file adding the .py extension. After looking at the doc to derive how to get a description for .py, I ended up with this:
import imp
try:
python_code_file = open("python-code")
python_code = imp.load_module('python_code', python_code_file, './python-code', ('.py', 'U', 1))
finally:
python_code_file.close()
It created a new file python-codec on the first run.

How does the jupyter notebook parse unicode variable names and why do I get a bug?

One thing that I quite like working in Python inside jupyter notebook is that I can use some unicode symbols to name my variables. For example, to use greek letters, I type \alpha followed by tab and I get α.
I just ran into an unexpected behaviour when using a bold capital T, \bfT followed by tab which results in 𝐓.
The experiment is the following. Inside a cell (running Python 3) type:
T = 1
𝐓 = 2
print(T) # prints 2
To my surprise, the second line is reassigning the variable T but I would expect it to be different from 𝐓. Can somebody please explain what's the catch with using Unicode?
I don't know if it helps, but as another experiment, I can see that the same two symbols as strings are in fact different
'T'.encode('utf8'), '𝐓'.encode('utf8') # (b'T', b'\xf0\x9d\x90\x93')
How is the notebook processing my variable names?

This behaviour is defined in the python language specification for identifiers (variable names). https://docs.python.org/3/reference/lexical_analysis.html#identifiers
2.3. Identifiers and keywords
Identifiers (also referred to as names) are described by the following lexical definitions.
The syntax of identifiers in Python is based on the Unicode standard
annex UAX-31, with elaboration and changes as defined below; see also
PEP 3131 for further details.
[...]
All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC.
We can confirm that T and 𝐓 are equivalent under NFKC using the standard library unicodedata module.
>>> import unicodedata
>>> unicodedata.normalize('NFKC','𝐓') == 'T'
True
So you should avoid using so similar unicode characters in the same scope as unique variable names.
But there's still a lot of unicode characters that are unique and can be safely used in identifiers:
>>> unicodedata.normalize('NFKC','💩λ')
'💩λ'

Why Python built-in libraries does not follow naming convention?

This error is present on some of the main Python built-in libraries. For example:
"foo".startswith("bar") # instead of .starts_with
re.findall("[ab]", "foobar"]) # instead of .find_all
Is this just a compatibility issue? PEP 8 states that method names must be written in lowercase with words separated by underscores.

How to find undocumented methods in my code?

I am writing documentation for a project and I would like to make sure I did not miss any method. The code is written in Python and I am using PyCharm as an IDE.
Basically, I would need a REGEX to match something like:
def method_name(with, parameters):
someVar = something()
...
but it should NOT match:
def method_name(with, parameters):
""" The doc string """
...
I tried using PyCharm's search with REGEX feature with the pattern ):\s*[^"'] so it would match any line after : that doesn't start with " or ' after whitespace, but it doesn't work. Any idea why?

You mentioned you were using PyCharm: there is an inspection "Missing, empty, or incorrect docstring" that you can enable and will do that for you.
Note that you can then change the severity for it to show up more or less prominently.

There is a tool called pydocstyle which checks if all classes, functions, etc. have properly formatted docstrings.
Example from the README:
$ pydocstyle test.py
test.py:18 in private nested class `meta`:
D101: Docstring missing
test.py:27 in public function `get_user`:
D300: Use """triple double quotes""" (found '''-quotes)
test:75 in public function `init_database`:
D201: No blank lines allowed before function docstring (found 1)
I don't know about PyCharm, but pydocstyle can, for example, be integrated in Vim using the Syntastic plugin.

I don't know python, but I do know my regex.
And your regex has issues. First of all, as comments have mentioned, you may have to escape the closing parenthesis. Secondly, you don't match the new line following the function declaration. Finally, you look for single or double quotations at the START of a line, yet the start of a line contains whitespace.
I was able to match your sample file with \):\s*\n\s*["']. This is a multiline regex. Not all programs are able to match multiline regex. With grep, for example, you'd have to use this method.
A quick explanation of what this regex matches: it looks for a closing parenthesis followed by a semicolon. Any number of optional whitespace may follow that. Then there should be a new line followed by any number of whitespace (indentation, in this case). Finally, there must be a single or double quote. Note that this matches functions that do have comments. You'd want to invert this to find those without.

In case PyCharm is not available, there is a little tool called ckdoc written in Python 3.5.
Given one or more files, it finds modules, classes and functions without a docstring. It doesn't search in imported built-in or external libraries – it only considers objects defined in files residing in the same folder as the given file, or subfolders of that folder.
Example usage (after removing some docstrings)
> ckdoc/ckdoc.py "ckdoc/ckdoc.py"
ckdoc/ckdoc.py
module
ckdoc
function
Check.documentable
anykey_defaultdict.__getitem__
group_by
namegetter
type
Check
There are cases when it doesn't work. One such case is when using Anaconda with modules. A possible workaround in that case is to use ckdoc from Python shell. Import necessary modules and then call the check function.
> import ckdoc, main
> ckdoc.check(main)
/tmp/main.py
module
main
function
main
/tmp/custom_exception.py
type
CustomException
function
CustomException.__str__
False
The check function returns True if there are no missing docstrings.

Why are underscores better than hyphens for file names?

From Building Skills in Python:
A file name like exercise_1.py is better than the name exercise-1.py. We can run both programs equally well from the command line, but the name with the hyphen limits our ability to write larger and more sophisticated programs.
Why is this?

The issue here is that importing files with the hyphen-minus (the default keyboard key -; U+002D) in their name doesn't work since it represents minus signs in Python. So, if you had your own module you wanted to import, it shouldn't have a hyphen in its name:
>>> import test-1
File "<stdin>", line 1
import test-1
^
SyntaxError: invalid syntax
>>> import test_1
>>>
Larger programs tend to be logically separated into many different modules, hence the quote
the name with the hyphen limits our ability to write larger and more sophisticated programs.

From that very document (p.368, Section 30.2 'Module Definition'):
Note that a module name must be a valid Python name... A module's name is limited to letters, digits and "_"s.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What are the requirements for naming python modules? - python

Related

Importing custom packages in python [duplicate]

How does the jupyter notebook parse unicode variable names and why do I get a bug?

Why Python built-in libraries does not follow naming convention?

How to find undocumented methods in my code?

Why are underscores better than hyphens for file names?

Categories

Resources