Referencing an external library in a Python appengine project, using Pydev/Eclipse

Referencing an external library in a Python appengine project, using Pydev/Eclipse - python

it's a couple of months I've started development in Python - having myself a C# and Java background.
I'm currently working on 2 different python/appengine applications, and as often happens in those cases, both application share common code - so I would like to refactor and move the common/generic code into a shared place.
In either Java or C# I'd just create a new library project, move the code into the new project and add a reference to the library from the main projects.
I tried the same in Python, but I am unable to make it work.
I am using Eclipse with Pydev plugin.
I've created a new Pydev project, moved the code, and attempted to:
reference the library project from the main projects (using Project Properties -> Project References)
add the library src folder folder into the main projects (in this case I have an error - I presume it's not possible to leave the project boundaries when adding an existing source folder)
add as external library (pretty much the same as google libraries are defined, using Properties -> External libraries)
Import as link (from Import -> File System and enabling "Create links in workspace")
In all cases I am able to reference the library code while developing, but when I start debugging, the appengine development server throws an exception because it can't find what I have moved into a separate library project.
Of course I've searched for a solution a lot, but it looks like nobody has experienced the same problem - or maybe nobody doesn't need to do the same :)
The closest solution I've been able to find is to add an ant script to zip the library sources and copy in the target project - but this way debugging is a pain, as I am unable to step into the library code.
Any suggestion?
Needless to say, the proposed solution must take into account that the library code has to be included in the upload process to appengine...
Thanks

The dev_appserver and the production environment don't have any concept of projects or libraries, so you need to structure your app so that all the necessary libraries are under the application's root. The easiest way to do this, usually, is to symlink them in as subdirectories, or worst-case, to copy them (or, using version control, make them sub-repositories).
How that maps to operations in your IDE depends on the IDE, but in general, it's probably easiest to get the app structured as you need it on disk, and work backwards from that to get your IDE set up how you like it.

Related

django: taking DRY & code reuse to the next level?

In the django culture, I have encountered the concept of app reuse but not snippet reuse. Here is an example of what I mean by snippet reuse: I have a function getDateTimeObjFromString( sDateTime ), obviously you pass a string date time and it returns a python date-time object.
Back in the late 1980's or early 1990's, I was exposed to the idea of snippet reuse at a FoxPro developers conference. If you write code for a specific problem and find it is useful elsewhere in your project, move it to a project library. If you find that code is useful for other projects, move it to a generic library that can be accessed by all projects.
(At the FoxPro DevCon, they did not call it snippet reuse. I coined that term to make clear that I am referring to reuse of chunks of code smaller than an entire app. The FoxPro DevCon was long ago, I do not remember exactly what they called it.)
I read the most recent "Two Scoops of Django", and it does mention reusing snippets within a single project but I did not find any mention of the concept of snippet reuse across multiple projects.
I wrote and used getDateTimeObjFromString() long before I tackled my django app. It is in packages I keep under /home/Common/pyPacks. On my computers, I set PYTHONPATH=/home/Common/pyPacks, so every project can access the code there. The code for getDateTimeObjFromString() is under a Time subdirectory in a file named Convert.py. So to use the code in any project:
from Time.Convert import getDateTimeObjFromString
My django app downloads data from an API, and that data includes timestamps. It would be nice if the API sent python date time objects, but what you get are strings. Hence the utility of getDateTimeObjFromString().
This is just one example, there are many little functions under /home/Common/pyPacks that I found convenient to access in my django project.
Yes /home/Common/pyPacks are under version control in github and yes I deploy on any particular machine via git pull.
When working on my django project from a development computer, PYTHONPATH works, and I can import the packages. But then I tried running my django app on a server via wsgi.py -- PYTHONPATH is disabled. I can set PYTHONPATH both at the OS and Apache2 level, but python ignores it, the functions cannot import.
I do not want to bother with making my personal generic library an official python package under PyPI.
Does the django community expect me to copy and paste?
I arrived at a work around: make /home/Common/pyPacks a psudeo site-package by putting a "pyPks" symlink in the virtual environment's site-packages directory to /home/Common/pyPacks, adding "pyPks" to the INSTALLED_APPS, then changing all the import statements as follows:
original:
from Time.Convert import getDateTimeObjFromString
work around update:
from pyPks.Time.Convert import getDateTimeObjFromString
I also had to update all my generic library files to handle both absolute imports via PYTHONPATH and relative imports.
How to fix “Attempted relative import in non-package” even with init.py
Is there a better way to reuse snippets in a django project?

Basics of setting up a Spyder workspace and projects

I have searched for a basic tutorial regarding workspaces and projects in the Spyder IDE. What I want to understand is the basic concepts of how to use the workspace and projects to organize my code. It seems that this is perhaps basic programming skills and that is the reason why I have issues finding any kind of overview. This page seems to be related, but is actually about Eclipse and rather sparse. The Pythonxy tutorial and the documentation for Spyder does not go into any detail. Neither does the Anaconda documentation.
The questions I have are:
When should I set up a new workspace (if ever)?
When do I create a new project?
How does the PYTHONPATH depend on my workspace and project settings? Is it the same in all cases or can I customize it per workspace/project?
Are there other settings apart from the PYTHONPATH that I should configure?
How specific are the answers above to Spyder? Would it be the same for other IDEs, like Eclipse?
I am running Spyder on 64-bit Windows 7, as part of the Anaconda package.

Update Oct 2016: Spyder 3 now has project facilities similar to that of other IDEs (especially Rstudio).
Now you if you have a folder with scripts, you can go to
Projects > New Projects > Existing Directory
to import it. The selected directory will be set as the base directory for the project.

I use spyder for data analysis and I have just started using the project workspace. I believe that it allows you to write better code due to the organization. As a previous post stated that "This can be helpful in web development", which is true because web development requires good software engineering due to the complexity of the files and how they interact with each other. This organization/structure can be used in data analysis as well.
Often, data analysts that use Anaconda have an engineering or science background, not necessarily software engineering or computer science. This means that good software engineering principles may be missing (myself included). Setting up a workspace does one critical thing that I believe is missing from the discussion. It adds the workspace to the system path. Set up a project and then try
import sys
print sys.path
You will see your project's directory added to the PYTHONPATH . This means I can break up my project and import functions from different files within my project. This is highly beneficial when analysis becomes complex or you want to create some type of larger model that will be used on a regular basis. I can create all of my functions in one file, maybe functions for plots in another and then import them in a separate script file.
in myScript.py
from myFunctions import func1
from myFunctions import func2
from myPlots import histPlot
This is a much cleaner approach to data analysis and allows you to focus on one specific task at a time.
In python 3 there is the %autoreload capability so you can work on your functions and then go back to your script file and it will reload them each time if you find errors. I haven't tried this yet bc the majority of my work is in 2.7, but this would seem to add even greater flexibility when developing.
So when should you do this? I think it is always a good idea, I just started using this setup and I will never go back!

In my experience, setting up a workspace in Spyder is not always necessary.
A workspace is a space on your computer where you create and save all the files you work in. Workspaces usually help in managing your project files.
Once you create a workspace in Spyder, a pane called "Project Explorer" opens up inside Spyder. There you see in real-time the files of your project. For instance, if you generate a file with Python, it will show in that pane.
The pane let's you keep the files organized, filter them etc. This can be useful for web development for example because helps you keep your content organized.
I use Python to handle files (e.g. csv) and work with data (data analysis), and I find no use in the workspace feature.
Moreover, if you delete a file in the Project Explorer pane, the file cannot be found in the Windows recycle bin.

One critical piece of information that appears to be missing from the Spyder documentation is how to create a new workspace in the first place. When no workspace exists after installing Spyder, creating your first project automatically initiates the creation of a workspace (at least in the Anaconda 3 distribution). However, it is not as obvious how to create a new workspace when a workspace already exists.
This is the only method I have found for creating a new workspace:
(1) Select the Project explorer window in Spyder. If this window or tab doesn't appear anywhere in the Spyder application, use View > Panes > Project explorer to enable the window.
(2) Click on the folder icon in the upper-right corner of the Project explorer window. This icon brings up a dialog that can create a new workspace. The dialog allows selection of a directory for the .spyderworkspace file.

GAE - Including external python modules without adding them to the repository?

I'm current working on a python based Google App Engine project. Specifically, I'm using Flask for the application. I'm wondering what the accepted method of including external python modules is, specifically when it comes to the repository. From what I can tell, including other people's code in my repository is bad form for several reasons. However, other people will be working on the same repository, so we should be using the same external modules to insure the same results.
Specifically, I need to include Flask (and its dependencies) to my application. The easiest way to do this with Google App Engine is just to throw them into the root level:
MyProject
app.yaml
main.py
MyApp
Flask
...
What is the proper way to bring in these external modules in such a project? Both a generalized answer and one specific to my case would be useful. Also, any other related recommendations would be appreciated. Thank you much.

While it is indeed possible to include third party libraries as submodules or symlinks from external repositories, in practice it's not a good idea. Here are two scenarios on what could go wrong:
If the third party library releases a new version that breaks the functionality, you will have to either make all the necessary changes to meet the new requirements or simply find the previous version to keep working and break the external connection. Usually this happens when you are very close to deadlines.
If the third party library releases a new version and one of your colleagues is upgraded and made all the necessary changes to support the new version, on your side the code will be broken until you will upgrade as well.
The above examples are much more visible in big projects with lots of dependencies and as more people joining the project in the long run it becomes a huge problem! I could come up with more examples, but I think you can see the point.
Your best option is to include the external libraries into your repository, which also has the advantage that you are able to have the whole project up and running on a new machine without many dependencies. There are many ways on how to organize your third party libraries and all of them needs to be included on the same or deeper level with your app.yaml file. Just as #dragonx mentioned include only the core library code.
Also do not afraid putting stuff into your repository cause space is not an issue today and these libraries usually not updating that often so your repository size is not getting too much bigger over time.
Since you mentioned Flask on Google App Engine, you can check out my gae-init project, where you can see in practice how the external libraries are organised.

You're actually asking two questions here.
How do I include the external library in my GAE project?
You've got the right idea. Whatever way you go about it, you must somehow include Flask and its dependencies in the root of your GAE project. One way is to put a copy directly in there.
The second way is to use a symbolic link to the folder that contains the external library. I'm not sure about Flask, but often times external repos contain the actual library code in a subdirectory - so often you don't want the root of the repo in your GAE app, just the root of the actual source. In this case, it's easier to put a symlink that links to the source folder.
How do I manage external libraries in my source repo?
This is a harder question to answer since it depends what source control tool you're using. Yes, you do want to have everyone use the same versions of external libraries, and they should be included in your source control somehow.
If you're using git, git submodule is the way to go. It's a bit confusing to start with but it'll get the job done.
I'd recommend a repo structure that looks something like this
repo/
thirdparty/
flask/
other_dependency/
another_dependency/
README.TXT
setup.py
src/
app/
app.yaml
your_source.py
softlink_to_flask
softlink_to_other_dependency
softlink_to_another_dependency_src
In this example you keep the source to your external libraries in the thirdparty folder. These may be git submodules. In the app folder you have your source, and softlinks to the appropriate files that are actually needed for your app to run. In this case, the actual code for another_dependency may be in the another_dependency/src folder rather than the actual root of another dependency. This way you don't need to include the unnecessary files in your deployment folder, but you can still keep the entire library in your repo.

You can't just create requirements.txt and put it to GAE. Your code must include all pure python libraries that used your project and doesn't supported by GAE (https://developers.google.com/appengine/docs/python/tools/libraries27).
If you look at flask deploy example for GAE (http://flask.pocoo.org/docs/quickstart/#deploying-to-a-web-server and https://github.com/kamalgill/flask-appengine-template) you can find some dependencies like flask, werkzeug and etc. and all this dependencies you must push to GAE server.
So I see three solutions:
Use local requirements for local development and make custom build function that will download all dependencies, put with your application and upload to GAE server.
Add tools for local deployment when you just start project that put required libraries with your application (don't forget about .gitignore).
Use something like git submodules to requirements repositories.

There is two case for using python third party packages in google app engine project:
If your library is one of the supported runtime-provided third-party libraries of GAE section
just add it to your app.yml file under libraries
libraries:
- name: package_name
version: latest
Add your code
import pack_name
Sometimes you need to install the package with
pip install package_name
Make sure you're using the right interpreter, by using
pip freeze
you can make sure the package is installed successfully to the right path.
Otherwise, if GAE does not support you library, you need to download it manually and save it locally under root/Lib directory:
or through GIT
or through pip (pip install package_name -t path/to/your/Lib/dir)
After that, we should declare Lib directory as source dir in pycharm
pycharm->preferences->Project Structure
Choose Lib directory and mark it as source.
Then, import it.
import pack_name
Pay attention that when you're doing the import, you choosing the local path and not your python path.
In general, that's recommended to have requirements.txt file, that includes all the used packages names, and then the pycharm will recognize the uninstalled packages and suggest you to install them.
Good Luck

Using Eclipse PyDev's search function with external libraries

I find PyDev's search function incredibly useful and use it regularly to navigate around my projects. I've got my interpreters set up correctly so PyDev knows about the external libraries that my code uses, and even lets me follow references into the library modules. This is great, obviously, but I also want to be able to search the external libraries like I can search my own code.
There's a similar question pertaining to Java development here: How do I search Libraries in eclipse?
Is there anything out there for PyDev?

I use two different approaches to allow searching in my library code:
When I am using virtualenv, I keep all my code under myproject/src and add it and myproject/lib/python2.7/site-packages/ as pydev source folders. (Be sure to setup your python interpreter to myproject/bin/python as well)
In other cases, I use two different pydev projects. The first (myproject) includes my code. The second one is called myproject-lib and includes the libraries as it's source paths (.../site_packages). The first project references the second projects (and usually I keep both of them in one workspace). This works great with virtualenv, but I believe that you can actually create a pydev project in your system-wide python. Make sure you use the same python interpreter in both projects.
Now you can quickly and easily use Open Resource (CTRL+T) and the Globals Browser (CTRL+Shift+T) to lookup your libs.

I'm afraid PyDev doesn't support this yet. I created feature request for this at https://jira.appcelerator.org/browse/APSTUD-7405 Meanwhile you could link folders of external libraries to your project.

How to organise the file structure of my already working plugin system?

I am working on a project whose main design guiding principle is extensibility.
I implemented a plugin system by defining a metaclass that register - with a class method - the class name of any plugin that gets loaded (each type of plugin inherit from a specific class defined in the core code, as there are different types of plugins in the application). Basically this means that a developer will have to define his class as
class PieChart(ChartPluginAncestor):
# Duck typing:
# Implement compulsory methods for Plugins
# extending Chart functionality
and the main program will know of his presence because PieChart will be included in the list of registered plugins available at ChartPluginAncestor.plugins.
Being the mounting method a class method, all plugins get registered when their class code is loaded into memory (so even before an object of that class is instantiated).
The system works good enough™ for me (although I am always open to suggestions on how to improve the architecture!) but I am now wondering what would be the best way to manage the plugin files (i.e. where and how the files containing the plugins should be stored).
So far I am using - for developing purposes - a package that I called "plugins". I put all my *.py files containing plugins classes in the package directory, and I simply issue import plugins in the main.py file, for all the plugins to get mounted properly.
EDIT: Jeff pointed out in the comments that import plugins the classes contained in the various modules of the packages won't be readily available (I did not realise this as I was - for debugging purposes - importing each class separately with from plugins.myAI import AI).
However this system is only good while I am developing and testing the code, as:
Plugins might come with their own unittests, and I do not want to load those in memory.
All plugins are currently loaded into memory, but indeed there are certain plugins which are alternative versions of the same feature, so you really just need to know that you can switch between the two, but you want to load into memory just the one you picked from the config pane.
At some point, I will want to have a double location for installing plugins: a system-wide location (for example somewhere under /usr/local/bin/) and a user-specific one (for example somewhere under /home/<user>/.myprogram/).
So my questions are really - perhaps - three:
Plugin container: what is the most sensible choice for my goal? single files? packages? a simple directory of .py files?)
Recognise the presence of plugins without necessarily loading (importing) them: what is a smart way to use Python introspection to do so?
Placing plugins in two different locations: is there a standard way / best practice (under gnu/linux, at least) to do that?

The question is hard to address, because the needs are complex.
Anyway I will try with some suggestions.
About
Placing plugins in two different
locations: is there a standard way /
best practice (under gnu/linux, at
least) to do that?
A good approach is virtualenv. Virtualenv is a python module to build "isolated" python installation. It is the better way to get separate projects working together.
You get a brand new site-package where you can put your plugins with the relevant project modules.
Give it a try: http://pypi.python.org/pypi/virtualenv
Plugin container: what is the most
sensible choice for my goal? single
files? packages? a simple directory of
.py files?)
A good approach is a python package which can do a "self registration" upon import: simply define inside the package directory a proper init.py
An example can be http://www.qgis.org/wiki/Writing_Python_Plugins
and also the API described here http://twistedmatrix.com/documents/current/core/howto/plugin.html
See also http://pypi.python.org/pypi/giblets/0.2.1
Giblets is a simple plugin system
based on the component architecture of
Trac. In a nutshell, giblets allows
you to declare interfaces and discover
components that implement them without
coupling.
Giblets also includes plugin discovery
based on file paths or entry points
along with flexible means to manage
which components are enabled or
disabled in your application.

I also have a plugin system with three types of plugins, though I don't claim to have done it well. You can see some details here.
For internal plugins, I have a package (e.g., MethodPlugins) and in this package is a module for each plugin (e.g., MethodPlugins.IRV). Here is how I load the plugins:
Load the package (import MethodPlugins)
Use pkgutil.iter_modules to load all the modules there (e.g., MethodPlugins.IRV)
All the plugins descend from a common base class so I can use __subclassess__ to identify them all.
I believe this would allow you to recognize plugins without actually loading them, though I don't do that as I just load them all.
For external plugins, I have a specified directory where users can put them, and I use os.listdir to import them. The user is required to use the right base class so I can find them.
I would be interested in improving this as well, but it also works good enough for me. :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.