Debugging and Testing a Python Package Workflow

Debugging and Testing a Python Package Workflow - python

Within a python package, what is the generally accept workflow for debugging and testing ?
Currently I have a test folder that sits at the same level as the project src folder. I want the test files (unit tests and others) to call the latest developed Python code in the project in the same-level folder. However for this I need to do some things such as adding folders to the path and relative imports to get it to work. But this seems to break for other project developers.
The alternative is to install the code locally using pip install. But this is a hassle to do every time I want to test.
Can someone please recommend a workflow that is safe and efficient for testing within a project. If the first of these is preferred, what should I do regarding imports and path to get the test code to call the local project code in a reliable way that will work for everyone ?

However for this I need to do some things such as adding folders to the path and relative imports to get it to work. But this seems to break for other project developers.
Doing the following, does not break the project AFAIK.
For me the following layout works (see also here):
project
|--src # directory for all of your code
|--test # directory for tests
...
Then I have the following line of code (before importing from src) in each test .py file:
import sys
sys.path.append("./src")
Finally, I execute the tests from the project directory.
EDIT
When using pytest, you can actually move the path-append-statement into your conftest.py file. That way you don't have to add it to each test .py file.

A possible trick is to make the test folder a package by adding an __init__.py file into it:
import sys
import os.path
sys.path.append(os.path.join(os.path.dirname(os.path.dirname(__file__)), 'src')
then assuming the src folder contains a a.py file, you can import it into the test_a.py file from test as simply as import a provided you start the test from the folder containing test as (assuming unittest):
python -m unittest test.test_a
The good news if you follow that pattern is that you can run the full list of tests with
python -m unittest discover
because the magic of discover will find the test package and all test*.py files under it.

Related

strategy for making sure imports of custom modules within project work from crontab?

I have a code project with various python scripts and modules. The folder structure of the github project is something like this:
/data_collection
/analysis
/modules
/helpers
Most of the scripts in data_collection and analysis will import stuff from modules or helpers. The code for doing this, in an example script /data_collection/pull_data.py, would be something like this:
import sys
sys.path.insert(0, '..')
from modules import my_module
from helpers import my_helper
now, if i simply run this code from the shell (from the dir that the script is in) - easy, it works just fine.
BUT: I want to run this from the crontab. It doesn't work, because crontab's PWD is always the cron user's home dir.
Now, I realise that I could add PWD=/path/to/project at the top of cron. But, what if I also have other project's scripts firing from cron?
I also realise that I could reorganise the whole folder structure of the project, perhaps putting all these folders into a folder called app and adding __init__.py to each folder -- but I'm not really in a position to do that at this moment.
So - I wonder, is there a possibility to achieve the following:
retain the relative paths in sys.path.insert within scripts (or perhaps get some solution that avoids the sys.path business altogether (so that it can run without modification on other systems)
be able to run these scripts from the crontab while also running scripts that live in other project directories from the crontab
Many thanks in advance!

I've created an experimental import library: ultraimport
It allows to do file system based imports with relative (or absolute) paths that will always work, no matter how you run the code or which user is running the code (given the user has read access to the files you're trying to import).
In your example script /data_collection/pull_data.py you would then write:
import ultraimport
my_module = ultraimport('__dir__/../modules/my_module.py')
my_helper = ultraimport('__dir__/../helpers/my_helper.py')
No need for any __init__.py and no need to change your directory structure and no need to change sys.path. Also your current working directory (which you can get running pwd) does not play any role for finding the import files.

Editing pythonpath for every script in a project

So I structure almost all of my projects like this:
root/
|- scripts/
|- src/
|- etc. ...
I put runnable scripts in scripts/ and importable modules in src/, and by convention run every script from the root directory (so I always stay in root, then type 'python scripts/whatever')
In order to be able to import code from src/, I've decided to start every script with this:
import sys
import os
# use this to make sure we always have the dir we ran from in path
sys.path.append(os.getcwd())
To make sure root/ is always in the path for scripts being run from root.
My question is: is this considered bad style? I like my conventions of always running scripts from the root directory, and keeping my scripts separate from my modules, but it seems like a weird policy to always edit the path variable for every script I write.
If this is considered bad style, could you provide alternative recommendations? Either different ways for me to keep my existing conventions or recommendations for different ways to structure projects would be great!
Thanks!

My recommendation is to use:
root/$ python -m scripts.whatever
With the -m you use the . notation rather than the file path and you won't need to setup path code in each of the scripts because the -m tells Python to start looking for imports in the directory where you called Python.
If your file structure also happens to be installed using setup.py and may be found within the site_packages there are some other things to consider:
If you call -m from the root of the directory structure (as I've shown above) it will call the code found in your directories
If you call -m from anywhere else, it will find the installed code from sys.path and call that
This can be a subtle gotcha if you happen to be running an interpreter that has your package installed and you are trying to make changes to your scripts locally and can't figure out why your changes aren't there (this happened to a coworker of mine who wasn't using virtual environments).

Python: relative imports without packages or modules

After trying to deal with relative imports and reading the many StackOverflow posts about it, I'm starting to realize this is way more complicated than it needs to be.
No one else is using this code but me. I'm not creating a tool or an api, so I see no need to create a "package" or a "module". I just want to organize my code into folders, as opposed to having a 50 scripts all in the same directory. It's just ridiculous.
Basically, just want to have three folders and be able to import scripts from wherever I want. One folder that contains some scripts with utility functions. Another folder has the majority of the code, and another folder that contains some experiments (I'm doing machine learning research).
/project/code/
/project/utils/
/project/experiments/
All I need to do is import python files between folders.
One solution I have found is putting __init__.py files in each directory including the root directory where the folders live. But I then need to run my experiments in the parent of that directory.
/experiment1.py
/experiment2.py
/project/__init__.py
/project/code/__init__.py
/project/utils/__init__.py
So the above works. But then I have two problems. My experiments don't live in the same folder as the code. This is just annoying, but I guess I can live with it. But the bigger problem is that I can't put my experiments different folders:
/experiments/experiment1/something.py
/experiments/experiment2/something_else.py
/project/__init__.py
/project/code/__init__.py
/project/utils/__init__.py
I suppose I could symlink the project directory into each experiment folder, but thats ridiculous.
The other way of doing it is treating everything like a module:
/project/__init__.py
/project/code/__init__.py
/project/utils/__init__.py
/project/experiments/__init__.py
/project/experiments/experiment1/something.py
/project/experiments/experiment2/something_else.py
But then I have to run my experiments with python -m project.experiments.experiment1.something which just seems odd to me.
A solution I have found thus far is:
import imp
import os
currentDir = os.path.dirname(__file__)
filename = os.path.join(currentDir, '../../utils/helpful.py')
helpful = imp.load_source('helpful',filename)
This works, but it is tedious and ugly though. I tried creating a script to handle this for me, but then the os.path.dirname(__file__) is wrong.
Surely someone has been in my position trying to organize their python scripts in folders rather than having them all flat in a directory.
Is there a good, simple solution to this problem, or will I have to resort to one of the above?

Running python files as a module seems also odd to me. You can put your experiments in a folder still by putting a main.py file on the root directory. So the folder tree would be the following;
/project/experiments
/project/code
/project/utils
/project/main.py
Call your experiments on your main.py file and make your imports on main.py file. The __init__.pys should also be inside each folder.
By this way, you won't need to run the py files as a python module. Also, the project will have a single entry point which is very useful in many cases.

So I found this tutorial very useful.
The solution I have come up with is as follows:
project/
.gitignore
setup.py
README.rst
MANIFEST.in
code/
__init__.py
something.py
tests/
__init__.py
tests.py
utils/
__init__.py
utils.py
experiments/
experiment1/
data.json
experiment1.py
experiment2/
data.json
experiment2.py
Then I run python setup.py develop in order symlink my code so I can import it anywhere else (you can unlink with python setup.py develop --uninstall).
I still haven't decided whether I like my experiments to live inside the project folder or outside. I don't think it really matters since this code is only for my personal use. But I suppose it would be proper for it to live outside...

How to structure a simple command line based python project

I wrote a command line app in a single file app_name.py and it works. Now I am breaking it into different .py files for easier management and readability. I have placed all these py files in a src/ folder as I see that a lot on github.
app_name/
src/
file1.py
file2.py
cli.py
__init__.py
I have placed all the imports in my __init__.py The relative imports like from .file1 import function1 are not in __init__ and are placed within the individual files where there are needed. For example,
#!usr/bin/env python
#__init__.py
import os
import argparse
#etc...
run_cli()
In cli.py, I have
from .file1 import function1
def run_cli(): pass
if __name__ == '__main__':
run_cli()
This is because when I use, on the actual command line app_name <arguments> then the __name__ isnt main, so I call run_cli() in __init__.py. Although this doesn't look right, as I will have to call src not app_name

I think you may be confusing two things. Having a src directory in your source distribution is a reasonably common, idiomatic thing to do; having that in the installed package or app, not so much.
There are two basic ways to structure this.
First, you can build a Python distribution that installs a package into site-packages, and installs a script into wherever Python scripts go on the PATH. The Python Packaging User Guide covers this in its tutorial on building and distributing packages. See the Layout and Scripts sections, and the samples linked from there.
This will usually give you an installed layout something like this:
<...somewhere system/user/venv .../lib/pythonX.Y/site-packages>
app_name/
file1.py
file2.py
cli.py
__init__.py
<...somewhere.../bin/>
app_name
But, depending on how the user has chosen to install things, it could instead be an egg, or a zipped package, or a wheel, or anything else. You don't care, as long as your code works. In particular, your code can assume that app_name is an importable package.
Ideally, the app_name script on your path is an "entry-point" script (pip itself may be a good example on your system), ideally one built on the fly at install time. With setuptools, You can just specify which package it should import and which function it should call in that package, it will do everything else—making sure to shebang the Python actually used at install time, figuring out how to pkg_resources up the package and add it to the sys.path (not needed by default, but if you don't want it to be importable, you can make that work), and so on.
As mentioned in a comment from the original asker, python-commandline-bootstrap might help you put this solution together more quickly.
The alternative is to keep everything out of site-packages, and make the package specific to your app. In this case, what you basically want to do is:
Install a directory that contains both the package (either as a directory, or zipped up) and the wrapper script.
In the wrapper script, add dirname(abspath(argv[0])) to sys.path before the import.
Symlink the wrapper script into some place on the user's PATH.
In this case, you have to write the wrapper script manually, but then you don't need anything fancy this way.
But often, you don't really want an application to depend on the user having some specific version and setup of Python. You may want to use a tool like pyInstaller, py2exe, py2app, cx_Freeze, zc.buildout, etc. that does all of the above and more. They all do different things, all the way up to the extreme of building a Mac .app bundle directory with a custom, standalone, stripped Python framework and stdlib and a wrapper executable that embeds the framework.
Either way, you really don't want to call the package directory src. That means the package itself will be named src, which is not a great name. If your app is called spamifier, you want to see spamifier in your tracebacks, not src, right?

Nosetest & import

I am pretty new to Python. Currently I am trying out PyCharm and I am encountering some weird behavior that I can't explain when I run tests.
The project I am currently working on is located in a folder called PythonPlayground. This folder contains some subdirectories. Every folder contains a init.py file. Some of the folders contain nosetest tests.
When I run the tests with the nosetest runner from the command line inside the project directory, I have to put "PythonPlayground" in front of all my local imports. E.g. when importing the module called "model" in the folder "ui" I have to import it like this:
from PythonPlayground.ui.model import *
But when I run the tests from inside Pycharm, I have to remove the leading "PythonPlayground" again, otherwise the tests don't work. Like this:
from ui.model import *
I am also trying out the mock framework, and for some reason this framework always needs the complete name of the module (including "PythonPlayground"). It doesn't matter whether I run the tests from command line or from inside PyCharm:
with patch('PythonPlayground.ui.models.User') as mock:
Could somebody explain the difference in behavior to me? And what is the correct behavior?

I think it happens because PyCharm have its own "copy" of interpreter which have its own version of sys paths where you project's root set to one level lower the PythonPlayground dir.
And you could find preferences of interpreter in PyCharm fro your project and set proper top level.
ps. I have same problems but in Eclipse + pydev

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Debugging and Testing a Python Package Workflow - python

Related

strategy for making sure imports of custom modules within project work from crontab?

Editing pythonpath for every script in a project

Python: relative imports without packages or modules

How to structure a simple command line based python project

Nosetest & import

Categories

Resources