GAE - Including external python modules without adding them to the repository?

GAE - Including external python modules without adding them to the repository? - python

I'm current working on a python based Google App Engine project. Specifically, I'm using Flask for the application. I'm wondering what the accepted method of including external python modules is, specifically when it comes to the repository. From what I can tell, including other people's code in my repository is bad form for several reasons. However, other people will be working on the same repository, so we should be using the same external modules to insure the same results.
Specifically, I need to include Flask (and its dependencies) to my application. The easiest way to do this with Google App Engine is just to throw them into the root level:
MyProject
app.yaml
main.py
MyApp
Flask
...
What is the proper way to bring in these external modules in such a project? Both a generalized answer and one specific to my case would be useful. Also, any other related recommendations would be appreciated. Thank you much.

While it is indeed possible to include third party libraries as submodules or symlinks from external repositories, in practice it's not a good idea. Here are two scenarios on what could go wrong:
If the third party library releases a new version that breaks the functionality, you will have to either make all the necessary changes to meet the new requirements or simply find the previous version to keep working and break the external connection. Usually this happens when you are very close to deadlines.
If the third party library releases a new version and one of your colleagues is upgraded and made all the necessary changes to support the new version, on your side the code will be broken until you will upgrade as well.
The above examples are much more visible in big projects with lots of dependencies and as more people joining the project in the long run it becomes a huge problem! I could come up with more examples, but I think you can see the point.
Your best option is to include the external libraries into your repository, which also has the advantage that you are able to have the whole project up and running on a new machine without many dependencies. There are many ways on how to organize your third party libraries and all of them needs to be included on the same or deeper level with your app.yaml file. Just as #dragonx mentioned include only the core library code.
Also do not afraid putting stuff into your repository cause space is not an issue today and these libraries usually not updating that often so your repository size is not getting too much bigger over time.
Since you mentioned Flask on Google App Engine, you can check out my gae-init project, where you can see in practice how the external libraries are organised.

You're actually asking two questions here.
How do I include the external library in my GAE project?
You've got the right idea. Whatever way you go about it, you must somehow include Flask and its dependencies in the root of your GAE project. One way is to put a copy directly in there.
The second way is to use a symbolic link to the folder that contains the external library. I'm not sure about Flask, but often times external repos contain the actual library code in a subdirectory - so often you don't want the root of the repo in your GAE app, just the root of the actual source. In this case, it's easier to put a symlink that links to the source folder.
How do I manage external libraries in my source repo?
This is a harder question to answer since it depends what source control tool you're using. Yes, you do want to have everyone use the same versions of external libraries, and they should be included in your source control somehow.
If you're using git, git submodule is the way to go. It's a bit confusing to start with but it'll get the job done.
I'd recommend a repo structure that looks something like this
repo/
thirdparty/
flask/
other_dependency/
another_dependency/
README.TXT
setup.py
src/
app/
app.yaml
your_source.py
softlink_to_flask
softlink_to_other_dependency
softlink_to_another_dependency_src
In this example you keep the source to your external libraries in the thirdparty folder. These may be git submodules. In the app folder you have your source, and softlinks to the appropriate files that are actually needed for your app to run. In this case, the actual code for another_dependency may be in the another_dependency/src folder rather than the actual root of another dependency. This way you don't need to include the unnecessary files in your deployment folder, but you can still keep the entire library in your repo.

You can't just create requirements.txt and put it to GAE. Your code must include all pure python libraries that used your project and doesn't supported by GAE (https://developers.google.com/appengine/docs/python/tools/libraries27).
If you look at flask deploy example for GAE (http://flask.pocoo.org/docs/quickstart/#deploying-to-a-web-server and https://github.com/kamalgill/flask-appengine-template) you can find some dependencies like flask, werkzeug and etc. and all this dependencies you must push to GAE server.
So I see three solutions:
Use local requirements for local development and make custom build function that will download all dependencies, put with your application and upload to GAE server.
Add tools for local deployment when you just start project that put required libraries with your application (don't forget about .gitignore).
Use something like git submodules to requirements repositories.

There is two case for using python third party packages in google app engine project:
If your library is one of the supported runtime-provided third-party libraries of GAE section
just add it to your app.yml file under libraries
libraries:
- name: package_name
version: latest
Add your code
import pack_name
Sometimes you need to install the package with
pip install package_name
Make sure you're using the right interpreter, by using
pip freeze
you can make sure the package is installed successfully to the right path.
Otherwise, if GAE does not support you library, you need to download it manually and save it locally under root/Lib directory:
or through GIT
or through pip (pip install package_name -t path/to/your/Lib/dir)
After that, we should declare Lib directory as source dir in pycharm
pycharm->preferences->Project Structure
Choose Lib directory and mark it as source.
Then, import it.
import pack_name
Pay attention that when you're doing the import, you choosing the local path and not your python path.
In general, that's recommended to have requirements.txt file, that includes all the used packages names, and then the pycharm will recognize the uninstalled packages and suggest you to install them.
Good Luck

Related

How to use a utils across projects and ensure it is updated

I have made a few Python projects for work that all revolve around extracting data, performing Pandas manipulations, and exporting to Excel. Obviously, there are common functions I've been reusing. I've saved these into utils.py, and I copy paste utils.py into each new project.
Whenever I change utils.py, I need to ensure that I change it in my other project, which is an error-prone process
What would you suggest?
Currently, I create a new directory for each project, so
/PyCharm Projects
--/CollegeBoard
----/venv
----/CollegeBoard.py
----/Utils.py
----/Paths.py
--/BoxTracking
----/venv
----/BoxTracking.py
----/Utils.py
----/Paths.py
I'm wondering if this is the most effective way to structure/version control my work. Since I have many imports in common, too, would a directory like this be better?
/Projects
--/Reporting
----/venv
----/Collegeboard
------/Collegeboard.py
------/paths.py
----/BoxTracking
------/BoxTracking.py
------/paths.py
----/Utils.py
I would appreciate any related resources.

Instead of putting a copy of utils.py into each of your projects, make utils.py into a package with it's own dedicated repository/folder somewhere. I'd recommend renaming it to something less generic, such as "zhous_utils".
In that dedicated repository for zhous_utils, you can create a setup.py file and you can use that setup.py file to install the current version of the zhous_utils into your python install. That way you can import zhous_utils into any other python script on your PC, just like you would import pandas or any other package you've installed to your computer.
Check out this stackoverflow thread: What is setup.py?
When you understand setup.py, then you will understand how to make and install your own packages so that you can import those installed packages to any python script on your PC. That way all source code for zhous_utils is centralized to just one folder on your PC, which you can update whenever you want and re-install the package.
Now, of course, there are some potential challenges/downsides to this. If you install zhous_utils to your computer and then import and use zhous_utils in one of your other projects, then you've just made zhous_utils into a dependency of that project. That means that if you want to share that project with other people and let them work on it as well or use it in some way, then they will need to install zhous_utils. Just be aware of that. This won't be an issue if you're the only one interacting/developing the source code of the projects you intend to import zhous_utils into.

How do I effectively make changes to a 3rd party django app?

I am working on a Django application which uses django-leaflet, but this question applies to any python library. I want to change some django-leaflet code to see if the changes would solve a problem we are having. What are my options? Do I need to create an example app in the django-leaflet repository and preform my modify-test loop there? Or do I need to upload individual re-named versions of django-leaflet to pypi?

You can modify your 3rd party app by uninstalling it using pip uninstall, then copying (or git cloning) the app into your source tree. You may need to temporarily add a line like sys.path.append("./django-leaflet") to your manage.py file so that the 3rd party modules will be in scope.
Once you are happy with your changes, you can send them to the original author as a pull request or upload your own version of the app.

You can always change the code directly in site-packages/, although that requires a certain level of attention to detail to prevent shooting yourself in the foot.
Other than that you can check out the code and, from the directory containing the 3rd party package's setup.py, do
pip install -e .
(which is similar, but better, than python setup.py develop)
This will install a link to the sources in site-packages/ so you can do the modify/test loop in the 3rd party package and run tests in your own package.
The advantage being that you'll have VCS support for your changes.

Most efficient solution for sharing code between two django projects

I have to find a solution for sharing code between two big Django projects. The main things to share are models and serializers and template tags. I've came up with 3 different solutions and I need you to find pro and cons to be able to make a choice.
I'll list you the solutions I found:
git submodules
Create a repository where to store my *.py files and include them as a django app such as 'common_deps'
Even if this is the purpose of git submodules there are a bit hard to use and its easy to fall into traps.
python package
Create a python package to store my *.py files.
It seems to be the best option to me event if that means that I'll need to change my requirements.txt file on my projects on each new release.
Simple git repository
Create a new repository to store my *.py files and include them as a django app such as 'common_deps'. Then add it to my PYTHON_PATH
I need some advices, I haven't chosen yet. I'm just telling myself that git submodules seems to be a bas idea.
Tell me guys.

I will definitely go with the second option you listed - packaging your app. If you follow steps in the Packaging your app part of official Django tutorial, you'll get tar.gz file which will allow you to include your app in any project you want by simply installing (e.g. with pip) to the virtual env connected with the project or globally

I will go with python package, after all this is what it is for.

Python web app that can download itself

I'm writing a small web app that I'd like to include the ability to download itself. The ideal solution would be for users to be able to "pip install" the full app but that users of the app would be able to download a version of it to use themselves (perhaps with reduced functionality or without some of the less essential dependencies).
I'm currently using Bottle as I'd like to keep everything as close to the standard library as possible. Users could be on different platforms or Python versions, which are other reasons for minimising the use of extra modules. (Though I'll assume 2.7 or 3.3 will be in use regardless of platform).
My current thinking is to have the app use __file__ or similar and zip itself up. It could also use setuptools/distribute and call sdist on itself. Users could then execute the zip file, or install the app using the source distribution. (ideally I'd like to provide both of these options).
The app would include aggressive import checking to fallback to available modules, with Bottle being the only requirement (and would be included in the downloaded file).
Can anyone think of a robust approach to providing this functionality?
Update: users of the app cannot be guaranteed to have internet access at all times, hence the requirement for being able to download a version of the app from someone who as previously installed it. Python experience cannot be assumed either, hence the idea of letting users run python -m myApp.zip to run their own version.
Update II: as the level of python experience also cannot be guaranteed I'd want the simplest way for a user to get a mostly working version of the app. Experienced users would then be free to 'upgrade' the app by installing their own choice of additional modules. The vast majority of these would be different servers to host the app from (CherryPy, Twisted, etc) and so would not strictly count as a dependency but a "nice to have".
Update III: based on the answer below I will look into a PyPI/buildout based solution but would still be interested in whether there is a specific solution to the above approach.

Just package your app and put it on PyPI. Trying to automatically package the code running on the server seems over-engineered. Then you can let people use pip to install your app. In your app, provide a link to the PyPI page.
Then you can also add dependencies in the setup.py, and pip will install them for you. It seems like you are trying to build your own packaging infrastructure, but don't have to. Use what's out there.

Change dependencies code on dotcloud. Django

I'm deploying my Django app with Dotcloud. While developing locally, I had to make changes inside the code of some dependencies (that are in my virtualenv).
So my question is: is there a way to make the same changes on the dependencies (for example django-registration or django_socketio) while deploying on dotcloud?
Thank you for your help.

There are many ways, but not all of them are clean/easy/possible.
If those dependencies are on github, bitbucket, or a similar code repository, you can:
fork the dependency,
edit your fork,
point to the fork in your requirements.txt file.
This will allow you to track further changes to those dependencies, and easily merge your own modifications with future versions.
Otherwise, you can include the (modified) dependencies with your code. It's not very clean and increases the size of your app, but that's fine too.
Last but not least, you can write a very hackish postinstall script, to locate the .py file to be modified (e.g. import foo ; foopath = foo.__file__), then apply a patch on that file. This would probably cause most sysadmins to cringe in terror, but it's worth mentioning :-)

If you are using a requirements.txt, no, there is not a way to do that from pypi, since Dotcloud is simply downloading the packages you've specified from pypi, and obviously your changes within your virtualenv are not going to be reflected by the canonical versions of the packages.
In order to use the edited versions of your dependencies, you'll have to bundle them into your code like any other module you've written, and import them from there.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.