Removing compilation metadata from Python - python

I use Python in one of my products.
I compiled the source code using:
./configure --prefix=/home/myname/python_install
make
make install.
I looked inside python_install directory and noticed that many files (config, pyc, pyo), disclose information about my environment (i.e. strings with where i compiled it, directory, date, name, etc..)
I used 'grep -i -r "myname"' *
How do I remove this metadata from all those files? I do not want to ship my product with this information.

This is probably not something you have to worry about. Is it a secret where you stored your files? If so, choose a different directory name to begin with. Otherwise, I doubt you're going to be able to remove all trace of its location.
BTW, shipping a Python project means an interested party could basically read your Python source, so why worry about the locations of the files?

Related

Where to store large resource files with python/pypi packages?

I have the following problem: I'm developing a package in python that will need a static data file for its operation that is somewhat large (currently around 70 MB, may get larger over time).
This isn't excessively large, but it's likely beyond what pypi will accept, so just having the file as a resource file as part of the package is not really an option. It also doesn't compress very well.
So I'm expecting to do something of the following: I'll store the file somewhere where it can be downloaded via https and add a command to the tool that will download that extra data needed. (I.e. expect something like a commandline tool with a --fetch-operational-data parameter that one might call once after installation and may call for updates every now and then, though updates of that file are expected to be rare.)
However this leads to the question where to store that data, and I feel there's no really good option.
"Usually" package resource files can be managed with importlib_resources which can access files that are stored within module directories. But the functions like open_binary are all read only and while one could probably get the path and write there, this probably goes against the intention of how it is supposed to be used (e.g. a major selling point for the importlib functionality is that it can be used in zip'ed packages, and that would obviously break).
Alternatively one could have a dot directory (~/.mytool/). However this means there's no good way to install this globally.
On the other hand there could be a system-wide directory (/var/lib/mytool ?), but then a user couldn't use the package. One could try to autodetect if the data is in /var/lib and fallback to ~/.mytool and write to whatever is writable on the update command.
Furthermore currently the tool is executable from its git repo, which adds another complexity (would it download the file into an extra dir in the gitrepo if it's executed from there? or also use /var/lib/mytool / ~/.mytool ?)
Whatever I would do, it feels like an ugly hack. Any good ideas?

Converting a python package into a single importable file

Is there a way to convert a python package, i.e. is a folder of python files, into a single file that can be copied and then directly imported into a python script without needing to run any extra shell commands? I know it is possible to zip all of the files and then unzip them from python when they are needed, but I'm hoping that there is a more elegant solution.
It's not totally clear what the question is. I could interpret it two ways.
If you are looking to manage the symbols from many modules in a more organized way:
You'll want to put an __init__.py file in your directory and make it a package. In it you can define the symbols for your package, and create a graceful import packagename behavior. Details on packages.
If you are looking to make your code portable to another environment:
One way or the other, the package needs to be accessible in whatever environment it is run in. That means it either needs to be installed in the python environment (likely using pip), copied into a location that is in a subdirectory relative to the running code, or in a directory that is listed in the PYTHONPATH environment variable.
The most straightforward way to package up code and make it portable is to use setuptools to create a portable package that can be installed into any python environment. The manual page for Packaging Projects gives the details of how to go about building a package archive, and optionally uploading to PyPi for public distribution. If it is for private use, the resulting archive can be passed around without uploading it to the public repository.

Is there an ldd like command for python

This is the problem: You try to run a python script that you didn't write yourself, and it is missing a module. Then you solve that problem and try again - now another module is missing. And so on.
Is there anything, a command or something, that can go through the python sources and check that all the necessary modules are available - perhaps even going as far as looking up the dependencies of missing modules online (although that may be rather ambitious)? I think of it as something like 'ldd', but of course this is much more like yum or apt-get in its scope.
Please note, BTW, I'm not talking about the package dependencies like in pip (I think it is called, never used it), but about the logical dependencies in the source code.
There are several packages that analyze code dependencies:
https://docs.python.org/2/library/modulefinder.html
Modulefinder seems like what you want, and reports what modules can't be loaded. It looks like it works transitively from the example, but I am not sure.
https://pypi.org/project/findimports/
This also analyzes transitive imports, I am not sure however what the output is if a module is missing.
... And some more you can find with your favorite search engine
To answer the original question more directly, I think...
lddcollect is available via pip and looks very good.
Emphases mine:
Typical use case: you have a locally compiled application or library with large number of dependencies, and you want to share this binary. This tool will list all shared libraries needed to run it. You can then create a minimal rootfs with just the needed libraries. Alternatively you might want to know what packages need to be installed to run this application (Debian based systems only for now).
There are two modes of operation.
List all shared library files needed to execute supplied inputs
List all packages you need to apt-get install to execute supplied inputs as well as any shared libraries that are needed but are not under package management.
In the first mode it is similar to ldd, except referenced symbolic links to libraries are also listed. In the second mode shared library dependencies that are under package management are not listed, instead the name of the package providing the dependency is listed.
lddcollect --help
Usage: lddcollect [OPTIONS] [LIBS_OR_DIR]...
Find all other libraries and optionally Debian dependencies listed
applications/libraries require to run.
Two ways to run:
1. Supply single directory on input
- Will locate all dynamic libs under that path
- Will print external libs only (will not print any input libs that were found)
2. Supply paths to individual ELF files on a command line
- Will print input libs and any external libs referenced
Prints libraries (including symlinks) that are referenced by input files,
one file per line.
When --dpkg option is supplied, print:
1. Non-dpkg managed files, one per line
2. Separator line: ...
3. Package names, one per line
Options:
--dpkg / --no-dpkg Lookup dpkg libs or not, default: no
--json Output in json format
--verbose Print some info to stderr
--ignore-pkg TEXT Packages to ignore (list package files instead)
--help Show this message and exit.
I can't test it against my current use case for ldd right now, but I quickly ran it against a binary I've built, and it seems to report the same kind of info, in fact almost twice as many lines!

Python packaging: create directory in homedir during installation

I am creating a Python package whose features include logging certain actions in a database. Thus I would like to store the database in a location such as ~/.package-name/database.sqlite. Additionally, ~/.package-name could be a directory to hold configuration files.
What is the best practice for doing this? I am using setuptools to handle package installation. I imagine that within one of my modules, I would have code that checks for the existence of the database file and config file(s), creating them if necessary.
Reading the documentation, it states
you can’t actually install data files to some arbitrary location on a user’s machine; this is a feature, not a bug. You can always include a script in your distribution that extracts and copies your the documentation or data files to a user-specified location, at their discretion.
It seems that I cannot create the location ~/.package-name using setuptools. So should I create this directory the first time the user runs the program by checking for the directory and invoking a script or function?
Is there a standard sort of example I might look at? I had some trouble searching for my problem.

Python 2.4 OAuth compatibilty

I have installed python oauth on my python2.4 platform, however making the python twitter package work requires some tweaks in oauth.. I am quite new to python but I assume I cannot alter the egg.. how do I install a non-egg version and how do I remove the egg safely ?
Python eggs (like java jar files) use the zip format. So to answer your question on how to make your tweaks:
Find the file location
Navigate to location, make a backup copy
If the file is stored as oauth.egg, unzip it
Start modifying!
Find the egg location
Open up a python interpreter and run the following:
>>> import oauth
>>> oauth.__file__
'/usr/lib/python2.6/dist-packages/oauth/__init__.pyc'
Your path will differ, but that will tell you where to look. Often the source code will be unpacked and available in the same directory as a .py file, in this case oauth.py.
(By the way the __file__ attribute is available on all modules unless they represent linked C libraries, but that should not be your case with oauth.)
I'll skip the file navigation, backup, and unzip details, as those will depend on your system.
Removing a Python Egg Safely
I'm afraid my knowledge is lacking here. Removing the egg file is easy, but I'm not at all sure how to check for dependencies from other packages, other than running $ ack python.module.to.remove across your python library. But some basic facts that may help
Directories that include __init__.py in them are treated as part of the python path. See Modules and Packages
Python eggs will add a .pth file containing additional places to add to the path.
>>> import sys; sys.path will show every directory that Python searches for modules/packages.
The PYTHONPATH environment variable can be configured to add paths you choose to the python search path
PS If you are new to Python, I highly recommend finding out more about IPython. It makes the Python intepreter much nicer to deal with.
Good luck and welcome to Python!

Categories

Resources