Python wheel: same source code but different md5sum - python

We need to check the md5sum of self made python packages, actually taking it from resulting *.whl file. The problem is that the md5sum changes on every build, even if there no changes in source code. Also we have tested this on third party packages, i.e. django-celery, and get the same behavior.
So the questions are:
What differs if we don't change the source code?
Is it possible to get the same md5sum for the same python builds?
upd.
To illustrate the issue I get two reports made on two django-celery builds.
Build content checksums is exactly the same (4th column), but the checksums of the *.whl files itself differs.
Links to the reports:
https://www.dropbox.com/s/0kkbhwd2fgopg67/django_celery-3.1.17-py2-none-any2.htm?dl=0
https://www.dropbox.com/s/vecrq587jjrjh2r/django_celery-3.1.17-py2-none-any1.htm?dl=0

Quoting the relevant PEP:
A wheel is a ZIP-format archive with a specially formatted file name and the .whl extension.
ZIP archives preserve the modification time of each file.
Wheel archives do not contain just source code, but also other files and directories that are generated on the fly when the archive is created. Therefore, even if you don't touch your Python source code, the wheel will still contain contents that have a different modification time.
One way to work around this problem is to unzip the wheel and compute the checksums of the contents.

Related

How to exclude git-untracked files with setup.py sdist

I am trying to release a package which is tracked by git, and I assumed setuptools would help me with that. But if I run
python3 sdist
I can see that it also copies untracked files (files that I have not added to git) from the package into the archive (which are temp-scripts I use for testing but are not needed for the package itself). Can I somehow ignore them, as I don't want to always remove them before packaging?
I use packages=find_packages() in the setup() and apart from packing too many files, everything seems to be working fine.
An hour of googling did only reveal a lot of people trying to exclude certain folders/packages.. which is not what I want.
And I don't want to specify these files manually.
I just want to say "please only pack git-versioned files, thank you".
Thanks for any help!
Cheers,
Joschua
Edit: Changed the title to make clear, that I did not expect this to be default behavior.
setuptools is a python package. git is a completely separate software package for version control. The two don't even know about each other. However...
There is a setuptools-git package in PyPi that might help you do what you want to do:

Is there an ldd like command for python

This is the problem: You try to run a python script that you didn't write yourself, and it is missing a module. Then you solve that problem and try again - now another module is missing. And so on.
Is there anything, a command or something, that can go through the python sources and check that all the necessary modules are available - perhaps even going as far as looking up the dependencies of missing modules online (although that may be rather ambitious)? I think of it as something like 'ldd', but of course this is much more like yum or apt-get in its scope.
Please note, BTW, I'm not talking about the package dependencies like in pip (I think it is called, never used it), but about the logical dependencies in the source code.
There are several packages that analyze code dependencies:
https://docs.python.org/2/library/modulefinder.html
Modulefinder seems like what you want, and reports what modules can't be loaded. It looks like it works transitively from the example, but I am not sure.
https://pypi.org/project/findimports/
This also analyzes transitive imports, I am not sure however what the output is if a module is missing.
... And some more you can find with your favorite search engine
To answer the original question more directly, I think...
lddcollect is available via pip and looks very good.
Emphases mine:
Typical use case: you have a locally compiled application or library with large number of dependencies, and you want to share this binary. This tool will list all shared libraries needed to run it. You can then create a minimal rootfs with just the needed libraries. Alternatively you might want to know what packages need to be installed to run this application (Debian based systems only for now).
There are two modes of operation.
List all shared library files needed to execute supplied inputs
List all packages you need to apt-get install to execute supplied inputs as well as any shared libraries that are needed but are not under package management.
In the first mode it is similar to ldd, except referenced symbolic links to libraries are also listed. In the second mode shared library dependencies that are under package management are not listed, instead the name of the package providing the dependency is listed.
lddcollect --help
Usage: lddcollect [OPTIONS] [LIBS_OR_DIR]...
Find all other libraries and optionally Debian dependencies listed
applications/libraries require to run.
Two ways to run:
1. Supply single directory on input
- Will locate all dynamic libs under that path
- Will print external libs only (will not print any input libs that were found)
2. Supply paths to individual ELF files on a command line
- Will print input libs and any external libs referenced
Prints libraries (including symlinks) that are referenced by input files,
one file per line.
When --dpkg option is supplied, print:
1. Non-dpkg managed files, one per line
2. Separator line: ...
3. Package names, one per line
Options:
--dpkg / --no-dpkg Lookup dpkg libs or not, default: no
--json Output in json format
--verbose Print some info to stderr
--ignore-pkg TEXT Packages to ignore (list package files instead)
--help Show this message and exit.
I can't test it against my current use case for ldd right now, but I quickly ran it against a binary I've built, and it seems to report the same kind of info, in fact almost twice as many lines!

How to list all files present in a mercurial repository at a given changeset?

I am working on a python script to generate release notes.
I am looking for a way to list all files present in a given changeset.
I am not interested in what changed, but the whole list of files.
For the moment i thought about two possibilities:
update the repository to a given changeset and get the list of files
customize with the hg log via template
(1) above is not so elegant, and I have not been able to implement (2).
Do you have some suggestions?
I thinks i found the answer by myself:
hg manifest -r <changeset>

.pyc files generated in different folder

Friends,
I was trying to create a python distribution. I executed the commands:
python setup.py sdist followed by python setup.py install
but the distribution created showed up folders build and dist without .pyc file.
Then I tried to find that file using windows find and got that its present in
C:\Python27\Lib\site-packages
Could anybody tell me the mistake I did in setup or missed anything.
Thanks in advance,
Saurabh
sdist command creates a source distribution that should not and does not contain pyc files.
install command installs in your local environment. Creating Python distributions is not its purpose.
If your package is pure Python then the source distribution is enough to install it anywhere you like; you don't need pyc files that depend on Python version and thus less general. bdist or bdist_egg (setuptools) generate among other things pyc files.
You won't get any pyc files until the first time you run the module. A source distribution is just that. Distributing pyc files is not usually useful anyway, they are not portable. If you intended to only distribute pyc files then you are in for a lovely set of problems when different python versions, and different operating systems, are used. Always distribute the source.
For most modules, the time taken to generate them the first time they are used is trivial - it is not worth worrying about.
By the way, when you move to Python 3, the pyc files are now stored in a directory called __pycache__ (3.2).

Removing compilation metadata from Python

I use Python in one of my products.
I compiled the source code using:
./configure --prefix=/home/myname/python_install
make
make install.
I looked inside python_install directory and noticed that many files (config, pyc, pyo), disclose information about my environment (i.e. strings with where i compiled it, directory, date, name, etc..)
I used 'grep -i -r "myname"' *
How do I remove this metadata from all those files? I do not want to ship my product with this information.
This is probably not something you have to worry about. Is it a secret where you stored your files? If so, choose a different directory name to begin with. Otherwise, I doubt you're going to be able to remove all trace of its location.
BTW, shipping a Python project means an interested party could basically read your Python source, so why worry about the locations of the files?

Categories

Resources