I've locally installed via pip Python package in virtualenv. I'd like to modify it (not monkey patch or subclass, but deeply modify) and keep it in my source control system referencing without installing. Maybe later I'd like to package it again so I'd like to keep all files for creating package, not only python sources.
Should I just copy it to my project folder and deinstall from virtualenv?
Two points. One, are the changes you're planning to make useful for anyone else? If the first, you might consider cloning the source repo, making your changes and submitting a PR. Even if it's not immediately merged, you can make use of setup.py to create a local package and install that in your virtualenv.
And two, are you planning to use these changes for just one project, or on many projects? If it's just for one project, throwing it in your repo and deeply modifying it is probably an ok thing, (although you need to confirm you're allowed to do so by the license). If you can foresee using this in multiple projects, you're probably better off creating a repo for it, and packaging it via setup.py.
Related
I'm starting a brand new project. I'm splitting code of different functions (e.g., utils, database, main application) into their own respective packages such that other new projects can just add them as dependencies and import them in the future. Packages may have cross-dependencies (e.g., database depends on utils).
I know that I can build each component as a Python package and use pip to manage the dependencies. However, as I'm starting from scratch, I will be making active changes to all packages at the same time. It seems to me packaging them as a "proper package" would be quite inefficient. I envisage that I will need to add a new function in say utils, increment the version, use it database, then realise that I need another new function that belongs in utils, increment version again etc.
What would be the best way to structure the project in this scenario? I'm using conda, Python 3.10 and VSCode if that matters.
I suggest the package approach you are thinking about. The key method you're missing is to make editable installs locally for all your packages.
pip install -e . in package root directory
Editable installs will reflect their changes right away in your environment
Since you are using conda you probably want conda develop . like this answer suggests
I have made a few Python projects for work that all revolve around extracting data, performing Pandas manipulations, and exporting to Excel. Obviously, there are common functions I've been reusing. I've saved these into utils.py, and I copy paste utils.py into each new project.
Whenever I change utils.py, I need to ensure that I change it in my other project, which is an error-prone process
What would you suggest?
Currently, I create a new directory for each project, so
/PyCharm Projects
--/CollegeBoard
----/venv
----/CollegeBoard.py
----/Utils.py
----/Paths.py
--/BoxTracking
----/venv
----/BoxTracking.py
----/Utils.py
----/Paths.py
I'm wondering if this is the most effective way to structure/version control my work. Since I have many imports in common, too, would a directory like this be better?
/Projects
--/Reporting
----/venv
----/Collegeboard
------/Collegeboard.py
------/paths.py
----/BoxTracking
------/BoxTracking.py
------/paths.py
----/Utils.py
I would appreciate any related resources.
Instead of putting a copy of utils.py into each of your projects, make utils.py into a package with it's own dedicated repository/folder somewhere. I'd recommend renaming it to something less generic, such as "zhous_utils".
In that dedicated repository for zhous_utils, you can create a setup.py file and you can use that setup.py file to install the current version of the zhous_utils into your python install. That way you can import zhous_utils into any other python script on your PC, just like you would import pandas or any other package you've installed to your computer.
Check out this stackoverflow thread: What is setup.py?
When you understand setup.py, then you will understand how to make and install your own packages so that you can import those installed packages to any python script on your PC. That way all source code for zhous_utils is centralized to just one folder on your PC, which you can update whenever you want and re-install the package.
Now, of course, there are some potential challenges/downsides to this. If you install zhous_utils to your computer and then import and use zhous_utils in one of your other projects, then you've just made zhous_utils into a dependency of that project. That means that if you want to share that project with other people and let them work on it as well or use it in some way, then they will need to install zhous_utils. Just be aware of that. This won't be an issue if you're the only one interacting/developing the source code of the projects you intend to import zhous_utils into.
I'm working by myself right now, but am looking at ways to scale my operation.
I'd like to find an easy way to version my Python distribution, so that I can recreate it very easily. Is there a tool to do this? Or can I add /usr/local/lib/python2.7/site-packages/ (or whatever) to an svn repo? This doesn't solve the problems with PATHs, but I can always write a script to alter the path. Ideally, the solution would be to build my Python env in a VM, and then hand copies of the VM out.
How have other people solved this?
virtualenv + requirements.txt are your friend.
You can create several virtual python installs for your projects, everything containing exactly those library versions you need (Tip: pip freeze spits out a requirements.txt with the exact library versions).
Find a good reference to virtualenv here: http://simononsoftware.com/virtualenv-tutorial/ (it's from this question Comprehensive beginner's virtualenv tutorial?).
Alternatively, if you just want to distribute your code together with libraries, PyInstaller is worth a try. You can package everything together in a static executable - you don't even have to install the software afterwards.
You want to use virtualenv. It lets you create an application(s) specific directory for installed packages. You can also use pip to generate and build a requirements.txt
For the same goal, i.e. having the exact same Python distribution as my colleagues, I tried to create a virtual environment in a network drive, so that everybody of us would be able to use it, without anybody making his local copy.
The idea was to share the same packages installed in a shared folder.
Outcome: Python run so unbearably slow that it could not be used. Also installing a package was very very sluggish.
So it looks there is no other way than using virtualenv and a requirements file. (Even if unfortunately often it does not always work smoothly on Windows and it requires manual installation of some packages and dependencies, at least at this time of writing.)
I'm deploying my Django app with Dotcloud. While developing locally, I had to make changes inside the code of some dependencies (that are in my virtualenv).
So my question is: is there a way to make the same changes on the dependencies (for example django-registration or django_socketio) while deploying on dotcloud?
Thank you for your help.
There are many ways, but not all of them are clean/easy/possible.
If those dependencies are on github, bitbucket, or a similar code repository, you can:
fork the dependency,
edit your fork,
point to the fork in your requirements.txt file.
This will allow you to track further changes to those dependencies, and easily merge your own modifications with future versions.
Otherwise, you can include the (modified) dependencies with your code. It's not very clean and increases the size of your app, but that's fine too.
Last but not least, you can write a very hackish postinstall script, to locate the .py file to be modified (e.g. import foo ; foopath = foo.__file__), then apply a patch on that file. This would probably cause most sysadmins to cringe in terror, but it's worth mentioning :-)
If you are using a requirements.txt, no, there is not a way to do that from pypi, since Dotcloud is simply downloading the packages you've specified from pypi, and obviously your changes within your virtualenv are not going to be reflected by the canonical versions of the packages.
In order to use the edited versions of your dependencies, you'll have to bundle them into your code like any other module you've written, and import them from there.
I generally don't bother to install Python modules. I use web2py, and just dump them in the modules folder and let it take care of the local imports. It just always seemed like the most straightforward way of doing things- never felt right about handling dependencies at a system-wide level, and never felt like messing with virtual envs.
On one of my other questions, the answerer said
Generally, the best practice for 3rd party modules is to install them
via pip or easy_install (preferably in a virtualenv), if they're
available on PyPI, rather than copying them somewhere onto your
PYTHONPATH. ... [because that] runs the install scripts hooks
necessary to install executable scripts, build C extensions, etc.,
that isn't done by just copying in a module.
I don't fully understand this. I always thought it was more of a preference, but is it true that it's better practice to install 3rd party modules, and am I potentially causing problems by not doing that? Does using a framework like web2py make a difference?
It depends on the module and what you want to use it for. Some packages come with useful command line tools, which may only be available to you if you install them appropriately.
Conversely, if you're writing code which is to be distributed to environments you don't have much control over, you often have to keep a copy of the code locally within your project, as the target environment may not have the package... web projects often fall into this category, depending on your serving environment, of course.