I work on an open source project that consists of a python API along with numerous extension modules which are installed as separate python packages. All of these extensions live in their own github repositories.
I am currently trying to improve the appearance of our documentation by using a different sphinx theme (https://sphinx-themes.org/) as well as some custom CSS I've added myself. I have now updated the core repo to use the new theme. Here are the docs for the core package.
The problem is that under our current setup I would have to make about 15 separate (but almost identical) pull requests to update the theme in all of the different repositories. Whats more any changes I do make in the future will have to be made to each individual repository. This is clearly way too laborious not an optimal way of working whatsoever.
One solution I thought about was to have the documentation theme in its own repository and have the different projects pull in the shared theme somehow. Although I'm now sure how best to execute this. Any suggestions would be appreciated.
We had the same problem when developing Trading Strategy documentation.
Eventually, we gave up finding a very good solution for this, and settle to have all documentation to a single Git repository. Any package is a git submodule dependency in this repository.
Then we have a manual script to update dependencies and make a new commit and docs build, and also the same as a Github hook in CI actions.
You could create a Python package of your theme, making sure you include its parent theme as a dependency in your requirements. sphinx-book-theme does just that.
Related
Rather by accident I found myself in a situation in a previous role where the previous admin apparently installed "Python bindings" of InfluxDB and Docker-Compose and magically both applications where available on the systems while I was sure that they where written in Go.
I had a few issues with that:
It's incomprehensible what happens here, there should be some go binary belonging to the application but I can't find it by name, I doubt that docker-compose and influxdb have been rewritten in Python just to have one more option available while at least docker-compose static binaries are available on Github for direct download. It doesn't make a lot of sense to me.
Undermining security guidelines set by the organization and best practices for systems administration.
Dependency Confusion
Links to the packages on PyPI:
https://pypi.org/project/influxdb/
https://pypi.org/project/docker-compose/
I haven't looked into Python wheels and packaging before beyond Debian packaging, I just got curious again the get to the bottom of this strange usage pattern.
Docker-Compose refers to https://github.com/docker/compose a project consisting of 95.5% Go code according to GitHub, which isn't really helpful since the source package and wheel package on PyPI look completely different and at first sight I'm overwhelmed by the amount of Python files. InfluxDB seems to be a better example but I would really appreciate help from a Python developer or package maintainer explaining to me what happening there. Thanks.
Edit 2022-09-10:
From the show notes of Security Now 887: https://www.grc.com/sn/sn-887-notes.pdf
a researcher at Checkmarx noted in a technical report they published last week that “A worrying
feature in pip/PyPI allows code to automatically run when developers are merely downloading a
package.” He added that the feature is alarming because “a great deal of the malicious packages
we are finding in the wild use this feature of code execution upon installation to achieve higher
infection rates.”
With my preexisting misconception about some PyPI packages like docker-compose, that sounded alarming to me.
The following article mentions that compiled libraries from C, Rust, Go and others can be bundled in packages, but no applications "hidden" as artifacts, which I assumed. https://realpython.com/python-wheels/
I have to find a solution for sharing code between two big Django projects. The main things to share are models and serializers and template tags. I've came up with 3 different solutions and I need you to find pro and cons to be able to make a choice.
I'll list you the solutions I found:
git submodules
Create a repository where to store my *.py files and include them as a django app such as 'common_deps'
Even if this is the purpose of git submodules there are a bit hard to use and its easy to fall into traps.
python package
Create a python package to store my *.py files.
It seems to be the best option to me event if that means that I'll need to change my requirements.txt file on my projects on each new release.
Simple git repository
Create a new repository to store my *.py files and include them as a django app such as 'common_deps'. Then add it to my PYTHON_PATH
I need some advices, I haven't chosen yet. I'm just telling myself that git submodules seems to be a bas idea.
Tell me guys.
I will definitely go with the second option you listed - packaging your app. If you follow steps in the Packaging your app part of official Django tutorial, you'll get tar.gz file which will allow you to include your app in any project you want by simply installing (e.g. with pip) to the virtual env connected with the project or globally
I will go with python package, after all this is what it is for.
I'm building a webapp using Django which needs to have two different versions: an Enterprise version and a standard public version. Up until now, I've been only developing the Enterprise version and am now looking for the best way to separate the two versions in the simplest way while avoiding duplication of code as much as possible. The main difference between the two versions will be that they need different URLs and different Views. I intend to differentiate based on subdomain using a multi-tenant architecture, where the www.example.com is the public version, and company1.example.com hits the enterprise version.
I've come up with a couple potential solutions, but I'm not happy with any of them.
Separate Git repositories and entirely separate projects, with all common code duplicated. This much duplication of code is bound to be error prone where things will get out of sync and is expected to be ridden with copy-paste mistakes. This is a last-resort solution.
Separate Git repositories, with common code shared via Git Submodules (a single common 'base' repository containing base models and shared views). I've read horror stories about git submodules, though, so I'm wary of this solution.
Single Git repository containing multiple 'project' folders (public/enterprise) each with their own base urls.py, settings.py, wsgi.py, etc...) and multiple manage.py files to choose which "Project" to run. I'm afraid that this solution would become an utter mess because it wouldn't be possible to have the public and enterprise versions use different versions of the common library if one needs an update before the other.
Separate Git repositories, with all shared code developed as 'Re-usable apps' and installed into the python path. This would be a somewhat clean solution, but would be difficult to work with any time changes needed to be made to the common modules.
Single project where all features are managed via conditional logic in the views. This would be most prone to bugs and confusion of all, and I'd prefer to avoid this solution.
Does anyone have any experience with this type of solution or could anyone help me find the best solution to this problem?
What about "a single Git repository, with all shared code developed as 'Re-usable apps'"? That is configure the options enabled with the INSTALLED_APPS setting.
First you need to decide on your release process. If you intend on releasing both versions simultaneously, using the one git repository makes sense.
An overriding concern might be if you have different distribution requirements for the code, e.g. if you want the code in the public version to be publicly available and the enterprise version to be private. Then you might have to use two git repositories.
Have you looked into using git subtree? It's an alternative to submodules, and it makes the process a little less complicated. I think Atlassian does a great job of explaining how it's used and the pros and cons. A few examples are:
"Contents of the module can be modified without having a separate repository copy of the dependency somewhere else."
"The sub-project’s code is available right after the clone of the super project is done."
"Management of a simple workflow is easy."
The Atlassian link is here.
Here's also a link to git-subtree's description file.
Probably the best solution is to identify exactly which code is shared between the two projects and make that a reusable app.
Then each installation can install that django app, and then has their own site specific code as well.
I'm current working on a python based Google App Engine project. Specifically, I'm using Flask for the application. I'm wondering what the accepted method of including external python modules is, specifically when it comes to the repository. From what I can tell, including other people's code in my repository is bad form for several reasons. However, other people will be working on the same repository, so we should be using the same external modules to insure the same results.
Specifically, I need to include Flask (and its dependencies) to my application. The easiest way to do this with Google App Engine is just to throw them into the root level:
MyProject
app.yaml
main.py
MyApp
Flask
...
What is the proper way to bring in these external modules in such a project? Both a generalized answer and one specific to my case would be useful. Also, any other related recommendations would be appreciated. Thank you much.
While it is indeed possible to include third party libraries as submodules or symlinks from external repositories, in practice it's not a good idea. Here are two scenarios on what could go wrong:
If the third party library releases a new version that breaks the functionality, you will have to either make all the necessary changes to meet the new requirements or simply find the previous version to keep working and break the external connection. Usually this happens when you are very close to deadlines.
If the third party library releases a new version and one of your colleagues is upgraded and made all the necessary changes to support the new version, on your side the code will be broken until you will upgrade as well.
The above examples are much more visible in big projects with lots of dependencies and as more people joining the project in the long run it becomes a huge problem! I could come up with more examples, but I think you can see the point.
Your best option is to include the external libraries into your repository, which also has the advantage that you are able to have the whole project up and running on a new machine without many dependencies. There are many ways on how to organize your third party libraries and all of them needs to be included on the same or deeper level with your app.yaml file. Just as #dragonx mentioned include only the core library code.
Also do not afraid putting stuff into your repository cause space is not an issue today and these libraries usually not updating that often so your repository size is not getting too much bigger over time.
Since you mentioned Flask on Google App Engine, you can check out my gae-init project, where you can see in practice how the external libraries are organised.
You're actually asking two questions here.
How do I include the external library in my GAE project?
You've got the right idea. Whatever way you go about it, you must somehow include Flask and its dependencies in the root of your GAE project. One way is to put a copy directly in there.
The second way is to use a symbolic link to the folder that contains the external library. I'm not sure about Flask, but often times external repos contain the actual library code in a subdirectory - so often you don't want the root of the repo in your GAE app, just the root of the actual source. In this case, it's easier to put a symlink that links to the source folder.
How do I manage external libraries in my source repo?
This is a harder question to answer since it depends what source control tool you're using. Yes, you do want to have everyone use the same versions of external libraries, and they should be included in your source control somehow.
If you're using git, git submodule is the way to go. It's a bit confusing to start with but it'll get the job done.
I'd recommend a repo structure that looks something like this
repo/
thirdparty/
flask/
other_dependency/
another_dependency/
README.TXT
setup.py
src/
app/
app.yaml
your_source.py
softlink_to_flask
softlink_to_other_dependency
softlink_to_another_dependency_src
In this example you keep the source to your external libraries in the thirdparty folder. These may be git submodules. In the app folder you have your source, and softlinks to the appropriate files that are actually needed for your app to run. In this case, the actual code for another_dependency may be in the another_dependency/src folder rather than the actual root of another dependency. This way you don't need to include the unnecessary files in your deployment folder, but you can still keep the entire library in your repo.
You can't just create requirements.txt and put it to GAE. Your code must include all pure python libraries that used your project and doesn't supported by GAE (https://developers.google.com/appengine/docs/python/tools/libraries27).
If you look at flask deploy example for GAE (http://flask.pocoo.org/docs/quickstart/#deploying-to-a-web-server and https://github.com/kamalgill/flask-appengine-template) you can find some dependencies like flask, werkzeug and etc. and all this dependencies you must push to GAE server.
So I see three solutions:
Use local requirements for local development and make custom build function that will download all dependencies, put with your application and upload to GAE server.
Add tools for local deployment when you just start project that put required libraries with your application (don't forget about .gitignore).
Use something like git submodules to requirements repositories.
There is two case for using python third party packages in google app engine project:
If your library is one of the supported runtime-provided third-party libraries of GAE section
just add it to your app.yml file under libraries
libraries:
- name: package_name
version: latest
Add your code
import pack_name
Sometimes you need to install the package with
pip install package_name
Make sure you're using the right interpreter, by using
pip freeze
you can make sure the package is installed successfully to the right path.
Otherwise, if GAE does not support you library, you need to download it manually and save it locally under root/Lib directory:
or through GIT
or through pip (pip install package_name -t path/to/your/Lib/dir)
After that, we should declare Lib directory as source dir in pycharm
pycharm->preferences->Project Structure
Choose Lib directory and mark it as source.
Then, import it.
import pack_name
Pay attention that when you're doing the import, you choosing the local path and not your python path.
In general, that's recommended to have requirements.txt file, that includes all the used packages names, and then the pycharm will recognize the uninstalled packages and suggest you to install them.
Good Luck
One issue that comes up during Pinax development is dealing with development versions of external apps. I am trying to come up with a solution that doesn't involve bringing in the version control systems. Reason being I'd rather not have to install all the possible version control systems on my system (or force that upon contributors) and deal the problems that might arise during environment creation.
Take this situation (knowing how Pinax works will be beneficial to understanding):
We are beginning development on a new version of Pinax. The previous version has a pip requirements file with explicit versions set. A bug comes in for an external app that we'd like to get resolved. To get that bug fix in Pinax the current process is to simply make a minor release of the app assuming we have control of the app. Apps we don't have control we just deal with the release cycle of the app author or force them to make releases ;-) I am not too fond of constantly making minor releases for bug fixes as in some cases I'd like to be working on new features for apps as well. Of course branching the older version is what we do and then do backports as we need.
I'd love to hear some thoughts on this.
Could you handle this using the "==dev" version specifier? If the distribution's page on PyPI includes a link to a .tgz of the current dev version (such as both github and bitbucket provide automatically) and you append "#egg=project_name-dev" to the link, both easy_install and pip will use that .tgz if ==dev is requested.
This doesn't allow you to pin to anything more specific than "most recent tip/head", but in a lot of cases that might be good enough?
I meant to mention that the solution I had considered before asking was to put up a Pinax PyPI and make development releases on it. We could put up an instance of chishop. We are already using pip's --find-links to point at pypi.pinaxproject.com for packages we've had to release ourselves.
Most open source distributors (the Debians, Ubuntu's, MacPorts, et al) use some sort of patch management mechanism. So something like: import the base source code for each package as released, as a tar ball, or as a SCM snapshot. Then manage any necessary modifications on top of it using a patch manager, like quilt or Mercurial's Queues. Then bundle up each external package with any applied patches in a consistent format. Or have URLs to the base packages and URLs to the individual patches and have them applied during installation. That's essentially what MacPorts does.
EDIT: To take it one step further, you could then version control the set of patches across all of the external packages and make that available as a unit. That's quite easy to do with Mercurial Queues. Then you've simplified the problem to just publishing one set of patches using one SCM system, with the patches applied locally as above or available for developers to pull and apply to their copies of the base release packages.
EDIT: I am not sure I am reading your question correctly so the following may not answer your question directly.
Something I've considered, but haven't tested, is using pip's freeze bundle feature. Perhaps using that and distributing the bundle with Pinax would work? My only concern would be how different OS's are handled. For example, I've never used pip on Windows, so I wouldn't know how a bundle would interact there.
The full idea I hope to try is creating a paver script that controls management of the bundles, making it easy for users to upgrade to newer versions. This would require a bit of scaffolding though.
One other option may be you keeping a mirror of the apps you don't control, in a consistent vcs, and then distributing your mirrored versions. This would take away the need for "everyone" to have many different programs installed.
Other than that, it seems the only real solution is what you guys are doing, there isn't a hassle-free way that I've been able to find.