How to exclude git-untracked files with setup.py sdist

How to exclude git-untracked files with setup.py sdist - python

I am trying to release a package which is tracked by git, and I assumed setuptools would help me with that. But if I run
python3 sdist
I can see that it also copies untracked files (files that I have not added to git) from the package into the archive (which are temp-scripts I use for testing but are not needed for the package itself). Can I somehow ignore them, as I don't want to always remove them before packaging?
I use packages=find_packages() in the setup() and apart from packing too many files, everything seems to be working fine.
An hour of googling did only reveal a lot of people trying to exclude certain folders/packages.. which is not what I want.
And I don't want to specify these files manually.
I just want to say "please only pack git-versioned files, thank you".
Thanks for any help!
Cheers,
Joschua
Edit: Changed the title to make clear, that I did not expect this to be default behavior.

setuptools is a python package. git is a completely separate software package for version control. The two don't even know about each other. However...
There is a setuptools-git package in PyPi that might help you do what you want to do:

Related

How do I determine which requirements are actually needed in setup.py?

I'm cleaning up packaging for a python project I didn't create. Currently, it does some explicitly unsupported magic to get its dependencies from a requirements.txt file. The file looks like it may have been generated by pip freeze; there are fixed versions for everything, and many apparently-extraneous packages listed. I am pretty sure some of these aren't real dependencies, but I don't know which ones.
Given just the source tree, how would I figure out, from scratch, what dependencies ought to be included in install_requires?
As a first stab, I'm grepping for non-stdlib import statements. I hope there's a better way.

There's no way to do this perfectly, because Python is too flexible.
But it's usually possible to do it well enough.
You can use start with the stdlib's modulefinder.
Beyond that, a number of projects—mostly projects designed for building binary executables, installers, etc. for Python apps—have come up with heuristics that go even farther.
These usually work. And, when they fail, you usually immediately spot it on your first test. Even if they aren't sufficient, they're at the very least good sample code. Here are a few off the top of my head:
cx_Freeze
py2exe
py2app
pyInstaller
In case you're wondering why it's impossible:
Even forgetting about the program of dependencies in C extension modules, Python is just too flexible to catch all the ways you could import a module via static analysis.
Sure, you'd have to be dealing with code written by someone crazy enough to use explicitly unsupported magic for no good reason… but if you were, there's nothing to stop someone from writing this instead of import lxml:1
with open('picture.jpg', encoding='cp500') as f:
getattr(sys.modules[11], codecs.encode('vzcbeg_zbqhyr', 'rot13'))(f.read().strip())
In reality, things aren't going to be that bad. But they could easily be too bad for rg import to be sufficient.
You could try to detect all the imports dynamically with a simple import hook, but that's only guaranteed to work if you can exercise 100% of the code paths.
1. Of course this only works if importlib was the 12th module loaded, and if picture.jpg is not a JPEG image but a textfile whose contents are, in EBCDIC, lxml\n

I've had great results with pipreqs that will automatically generate a requirements.txt file from your source code.
pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt

I wrote a tool, realreq, specifically for this issue.
You can install it using pip python3 -m pip install realreq. Using it is easy as:
realreq -s /path/to/your/source
It will then gather your dependencies actually used in your source code.

I mean, the most effective way would honestly be to go through the code line by line and determine what packages may not be needed, what packages need updates, etc. I know Python 2 and 3 both have ModuleFinder which finds all the modules a script needs to successfully compile and run, but I've never used it before, so not sure how effective it is, especially for what you're doing. However, if you're interested, I'll attach the link below.
https://docs.python.org/3/library/modulefinder.html

Do python projects need a MANIFEST.in, and what should be in it?

The "Python Distribute" guide (was at python-distribute.org, but that registration has lapsed) tells me to include doc/txt files and .py files are excluded in MANIFEST.in file
The sourcedist documentation tells me only sdist uses MANIFEST.in and only includes file you specify and to include .py files. It also tells me to use: python setup.py sdist --manifest-only to generate a MANIFEST, but python tells me this doesn't exist
I appreciate these are from different versions of python and the distribution system is in a
complete mess, but assuming I am using python 3 and setuptools (the new one that includes distribute but now called setuptools, not the old setuptools that was deprecated for distribute tools only to be brought back into distribute and distribute renamed to setuptools.....)
and I'm following the 'standard' folder structure and setup.py file,
Do I need a MANIFEST.in ?
What should be in it ?
When will all these different package systems and methods be made into one single simple process ?

Re: "Do I need a MANIFEST.in?
No, you do not have to use MANIFEST.in. Both, distutils and setuptools are including in source
distribution package all the files mentioned in setup.py - modules, package python files,
README.txt and test/test*.py. If this is all you want to have in distribution package, you do
not have to use MANIFEST.in.
If you want to manipulate (add or remove) default files to include, you have to use MANIFEST.in.
Re: What should be in it?
The procedure is simple:
Make sure, in your setup.py you include (by means of setup arguments) all the files you feel important for the program to run (modules, packages, scripts ...)
Clarify, if there are some files to add or some files to exclude. If neither is needed, then there is no need for using MANIFEST.in.
If MANIFEST.in is needed, create it. Usually, you add there tests*/*.py files, README.rst if you do not use README.txt, docs files and possibly some data files for test suite, if necessary.
For example:
include README.rst
include COPYING.txt
To test it, run python setup.py sdist, and examine the tarball created under dist/.
When will all these different package systems ...
Comparing the situation today and 2 years ago - the situation is much much better - setuptools is the way to go. You can ignore the fact, distutils is a bit broken and is low level base for setuptools as setuptools shall take care of hiding these things from you.
EDIT: Last few projects I use pbr for building distribution packages with three line setup.py and rest being in setup.cfg and requirements.txt. No need to care about MANIFEST.in and other strange stuff. Even though the package would deserve a bit more documentation. See http://docs.openstack.org/developer/pbr/

Old question, new answer:
No, you don't need MANIFEST.in. However, to get setuptools to do what you (usually) mean, you do need to use the setuptools_scm, which takes the role of MANIFEST.in in 2 key places:
It ensures all relevant files are packaged when running the sdist command (where all relevant files is defined as "all files under source control")
When using include_package_data to include package data as part of the build or bdist_wheel. (again: files under source control)
The historical understanding of MANIFEST.in is: when you don't have a source control system, you need some other mechanism to distinguish between "source files" and "files that happen to be in your working directory". However, your project is under source control (right??) so there's no need for MANIFEST.in. More info in this article.

Python compile all modules into a single python file

I am writing a program in python to be sent to other people, who are running the same python version, however these some 3rd party modules that need to be installed to use it.
Is there a way to compile into a .pyc (I only say pyc because its a python compiled file) that has the all the dependant modules inside it as well?
So they can run the programme without needing to install the modules separately?
Edit:
Sorry if it wasnt clear, but I am aware of things such as cx_freeze etc but what im trying to is just a single python file.
So they can just type "python myapp.py" and then it will run. No installation of anything. As if all the module codes are in my .py file.

If you are on python 2.3 or later and your dependencies are pure python:
If you don't want to go the setuptools or distutiles routes, you can provide a zip file with the pycs for your code and all of its dependencies. You will have to do a little work to make any complex pathing inside the zip file available (if the dependencies are just lying around at the root of the zip this is not necessary. Then just add the zip location to your path and it should work just as if the dependencies files has been installed.
If your dependencies include .pyds or other binary dependencies you'll probably have to fall back on distutils.

You can simply include .pyc files for the libraries required, but no - .pyc cannot work as a container for multiple files (unless you will collect all the source into one .py file and then compile it).

It sounds like what you're after is the ability for your end users to run one command, e.g. install my_custom_package_and_all_required_dependencies, and have it assemble everything it needs.
This is a perfect use case for distutils, with which you can make manifests for your own code that link out to external dependencies. If your 3rd party modules are available publicly in a standard format (they should be, and if they're not, it's pretty easy to package them yourself), then this approach has the benefit of allowing you to very easily change what versions of 3rd party libraries your code runs against (see this section of the above linked doc). If you're dead set on packaging others' code with your own, you can always include the required files in the .egg you create with distutils.

Two options:
build a package that will install the dependencies for them (I don't recommend this if the only dependencies are python packages that are installed with pip)
Use virtual environments. You use an existing python on their system but python modules are installed into the virtualenv.
or I suppose you could just punt, and create a shell script that installs them, and tell them to run it once before they run your stuff.

Change dependencies code on dotcloud. Django

I'm deploying my Django app with Dotcloud. While developing locally, I had to make changes inside the code of some dependencies (that are in my virtualenv).
So my question is: is there a way to make the same changes on the dependencies (for example django-registration or django_socketio) while deploying on dotcloud?
Thank you for your help.

There are many ways, but not all of them are clean/easy/possible.
If those dependencies are on github, bitbucket, or a similar code repository, you can:
fork the dependency,
edit your fork,
point to the fork in your requirements.txt file.
This will allow you to track further changes to those dependencies, and easily merge your own modifications with future versions.
Otherwise, you can include the (modified) dependencies with your code. It's not very clean and increases the size of your app, but that's fine too.
Last but not least, you can write a very hackish postinstall script, to locate the .py file to be modified (e.g. import foo ; foopath = foo.__file__), then apply a patch on that file. This would probably cause most sysadmins to cringe in terror, but it's worth mentioning :-)

If you are using a requirements.txt, no, there is not a way to do that from pypi, since Dotcloud is simply downloading the packages you've specified from pypi, and obviously your changes within your virtualenv are not going to be reflected by the canonical versions of the packages.
In order to use the edited versions of your dependencies, you'll have to bundle them into your code like any other module you've written, and import them from there.

Using git to manage virtualenv state: will this cause problems?

I currently have git and virtualenv set up in a way which exactly
suits my needs and, so far, hasn't caused any problems. However I'm aware that
my setup is non-standard and I'm wondering if anyone more familiar with virtualenv's
internals can point out if, and where, it's likely to go wrong.
My setup
My virtualenv is inside my git repository, but git is set to ignore the bin and include directories and everything in lib except for the site-packages directory.
More precisely, my .gitignore file looks like this:
*.pyc
# Ignore all the virtualenv stuff except the actual packages
# themselves
/bin
/include
/lib/python*/*
!/lib/python*/site-packages
# Ignore easyinstall and setuptools
/lib/python*/site-packages/easy-install.pth
/lib/python*/site-packages/setuptools.pth
/lib/python*/site-packages/setuptools-*
/lib/python*/site-packages/pip-*
With this arrangement I -- and anyone else working on a checkout of the project -- can use virtualenv and pip as normal but with the following advantages:
If anyone updates or installs a package and pushes their changes, anyone else who pulls those changes automatically gets the update: they don't need to notice that a requirements.txt file has changed or do any post-receive hook magic.
There are no network dependencies: all the code to make the application work lives in the git repository.
I'm aware that this only works with pure-Python packages, but that's all I'm concerned with at the moment.
Does anyone know of any other problems with this approach that I should be aware of?

This is an interesting question. I think the other two answers (thus far) raise good specific points. Clearly you've thought this through and have arrived at a solution you like, but I'll note that there does seem to be a philosophical split here among virtualenv users.
One camp, to which I'd guess you belong, feels that the local VE is part of the project (i.e. it should be under version control). The other feels that the VE should essentially be treated as a development artifact -- that requirements.txt should be part of the project repo, but that you should be able to blow away and recreate the VE as needed.
I just mention this because when I first saw this distinction made, it helped shape my thinking about virtualenv. (I'm in the second camp, FWIW, because it seems simpler and cleaner to me, but that's not to say that being in the first camp is wrong for your particular project.)

If you have any -e items in your requirements.txt - in other words if you have any editable dependencies as described in the format that are using the git version control system you will most likely run into issues when your src directory is committed. If you just add /src to your .gitignore then you can avoid this problem, but often the installation just adds a pointer in site-packages to this location, so you need it!

In /lib/python2.7/site-packages in my virtualenv I have absolute paths, for example in Django.egg-link I have /Users/henry/.virtualenv/mowapp/src/django (this is a Mac).
Check to see if you have any absolute paths checked into git, I think that is a major problem with this approach.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to exclude git-untracked files with setup.py sdist - python

setuptools is a python package. git is a completely separate software package for version control. The two don't even know about each other. However... There is a setuptools-git package in PyPi that might help you do what you want to do:

Related

How do I determine which requirements are actually needed in setup.py?

Do python projects need a MANIFEST.in, and what should be in it?

Python compile all modules into a single python file

Change dependencies code on dotcloud. Django

Using git to manage virtualenv state: will this cause problems?

Categories

Resources