Is anyone using NLTK with GAE? From this thread it appears that GAE does not support NLTK (Special installation tricks needed.) Do you know any other lightweight similar Python module? Thanks.
GAE supports pretty much any "pure" Python modules which don't try to access files or sockets or other system level utilities. The poster from your link was mostly just trying to minimize the number of modules they included. They expressed a trial and error approach to figuring out which NLTK modules would be needed for their application. A slightly faster approach would be to download the whole NLTK package and move in all the ".py" files rather than just one at a time. There's no big downside to including modules you won't be using.
However this process is something of a fact of life with GAE. Any modules that aren't directly included in the GAE libraries need to be installed manually, and they need to be checked for any deviations from the GAE sandbox restrictions. See this.
A quick glance at the NLTK source code suggests that modules that depend on "mallet" in particular might be problematic, since this is compiled java code.
Related
Would it be possible to create a python module that lazily downloads and installs submodules as needed? I've worked with "subclassed" modules that mimic real modules, but I've never tried to do so with downloads involved. Is there a guaranteed directory that I can download source code and data to, that the module would then be able to use on subsequent runs?
To make this more concrete, here is the ideal behavior:
User runs pip install magic_module and the lightweight magic_module is installed to their system.
User runs the code import magic_module.alpha
The code goes to a predetermine URL, is told that there is an "alpha" subpackage, and is then given the URLs of alpha.py and alpha.csv files.
The system downloads these files to somewhere that it knows about, and then loads the alpha module.
On subsequent runs, the user is able to take advantage of the downloaded files to skip the server trip.
At some point down the road, the user could run a import magic_module.alpha ; alpha._upgrade() function from the command line to clear the cache and get the latest version.
Is this possible? Is this reasonable? What kinds of problems will I run into with permissions?
Doable, certainly. The core feature will probably be import hooks. The relevant module would be importlib in python 3.
Extending the import mechanism is needed when you want to load modules that are stored in a non-standard way. Examples include [...] modules that are loaded from a database over a network.
Convenient, probably not. The import machinery is one of the parts of python that has seen several changes over releases. It's undergoing a full refactoring right now, with most of the existing things being deprecated.
Reasonable, well it's up to you. Here are some caveats I can think of:
Tricky to get right, especially if you have to support several python versions.
What about error handling? Should application be prepared for import to fail in normal circumstances? Should they degrade gracefully? Or just crash and spew a traceback?
Security? Basically you're downloading code from someplace, how do you ensure the connection is not being hijacked?
How about versionning? If you update some of the remote modules, how can make the application download the correct version?
Dependencies? Pushing of security updates? Permissions management?
Summing it up, you'll have to solve most of the issues of a package manager, along with securing downloads and permissions issues of course. All those issues are tricky to begin with, easy to get wrong with dire consequences.
So with all that in mind, it really comes down to how much resources you deem worth investing into that, and what value that adds over a regular use of readily available tools such as pip.
(the permission question cannot really be answered until you come up with a design for your package manager)
I know this is possible to do using additional libraries such as win32com or python-pptx, but I wasn wondering if anyone knew of a way to insert an image into a powerpoint slide using the standard libraries. Lots of googling has indicated that the best solution is probably win32com, but since I can guarantee that every system this script will be deployed to will have win32com, I am looking for an implemention leveraging libraries all systems with a standard python 2.7 install will have.
It is probably possible to modify a .pptx file with the standard library without much effort: these new generation of files are meant to be zip-compressed XML + external images files, and can be handled by ziplib and standard xml parsers.
Legacy .ppt files however are a binary closed format, with little documentation, and hundrededs of corner cases. It would alwasys "be possible" to change them, since they are still just bytes, but it would take considerable effort.
That said, starting with Python 3.4, the Python installer "PIP" comes default with the language install: probably the best way to go would be to script the installation of external libraries based on the built-in PIP - that way one would not have to all external library usage.
New to Python, so excuse my lack of specific technical jargon. Pretty simple question really, but I can't seem to grasp or understand the concept.
It seems that a lot of modules require using pip or easy_install and running setup.py to "install" into your python installation or your virtualenv. What is the difference between installing a module and simply taking it and importing the into another script? It seems that you access the modules the same way.
Thanks!
It's like the difference between:
Uploading a photo to the internet
Linking the photo URL inside an HTML page
Installing puts the code somewhere python expects those kinds of things to be, and the import statement says "go look there for something named X now, and make the data available to me for use".
For a single module, it usually doesn't make any difference. For complicated webs of modules, though, an installation program may do many things that wouldn't be immediately obvious. For example, it may also copy data files into locations the new modules can find them, put executables (binary libraries, or DLLs on Windws, for example) where the new modules can find them, do different things depending on which version of Python you have, and so on.
If deploying a web of modules were always easy, nobody would have written setup programs to begin with ;-)
Is there a place that lists standard library and 3rd party modules that work with IronPython? If not, please let me know here.
Here are some I have tried:
cherrypy - works with fepy - example
comtypes: has no hope until ctypes is functional which seems far off still.
dulwich: builds after removing optional extensions from setup.py file. Imports after adding in jdhardy's zlib and subprocess modules. Seems to pass its own tests.
numpy, parts of scipy: pytools
rpyc works out of the box. Awesome library so you can remotely use cpython ojects from ironpython and vice versa
The official IronPython website has a page that lists the compatibility status of third-party libraries.
However, currently only two libraries are listed. If you're a third-party library developer and you know how well your library works with IronPython, it would be great to add it there.
pywin32 and PyODBC go away, replaced by the FCL (optionally using one of the ODBC .NET data providers if you're married to ODBC). CherryPy is pure Python and so should mostly work; I'm sure the developers would be interested in hearing about any problems. For NumPy there's Ironclad.
numpy and parts of scipy now work with ironpython: http://pytools.codeplex.com/
I am new at writing APIs in python, in any language for that matter. I was hoping to get pointers on how i can create an API that can be installed using setup.py method and used in other python projects. Something similar to the twitterapi.
I have already created and coded all the methods i want to include in the API. I just need to know how to implement the installation so other can use my code to leverage ideas they may have. Or if i need to format the code a certain way to facilitate installation.
I learn best with examples or tutorials.
Thanks so much.
It's worth noting that this part of python is undergoing some changes right now. It's all a bit messy. The most current overview I know of is the Hitchhiker's Guide to Packaging: http://guide.python-distribute.org/
The current state of packaging section is important: http://guide.python-distribute.org/introduction.html#current-state-of-packaging
The python packaging world is a mess (like poswald said). Here's a brief overview along with a bunch of pointers. Your basic problem (using setup.py etc.) is solved by reading the distutils guide which msw has mentioned in his comment.
Now for the dirt. The basic infrastructure of the distribution modules which is in the Python standard library is distutils referred to above. It's limited in some ways and so a series of extensions was written on top of it called setuptools. Setuptools along with actually increasing the functionality provided a command line "installer" called "easy_install".
Setuptools maintenance was not too great and so it was forked and a more active branch called "distribute" was setup and it is the preferred alternative right now. In addition to this, a replacement for easy_install named pip was created which was more modular and useful.
Now there's a huge project going which attempts to fold in all changes from distribute and stuff into a unified library that will go into the stdlib. It's tentatively called "distutils2".