For my GAE app I need to do some natural language processing to extract the subject and object from an input sentence.
Apparently NLTK can't be installed (easily) on GAE so I am looking for another solution.
I noticed GAE comes with Antlr3 but from browsing their documentation it solves a different kind of grammar problem.
Any ideas?
You can easily build and NTLK RPC server on some machine and access it.
Another option is to find another web based service that already does that (such as opencalais).
With regards to the NLTK problem specifically, my solution would probably be to fix the weird imports that NLTK is doing, and use that as originally planned. When you're done, submit a patch of course.
That said, if this ultimately involves touching the data store, the answer is that it probably can't be done in a performant way, unless your data set is small or for some reason your NLP stuff doesn't need to hit some kind of full-text index. The GAE guys are working on it, but they have indicated that no one should be expecting a quick resolution to this particular issue.
Related
Firstly, apologies for the very basic question. I have looked into other answers but they haven't quite answered what I'm after. I'm confident designing a site in HTML/CSS and have very very basic knowledge of Python.
I want to run a very basic Python script on my website. It analyses tweets about a specific topic, and then posts a sentiment analysis score. I want it to run this sentiment analysis every hour and cache the score.
I have a working Python script which does this in Jupyter Notebook. Could you give me an overview of how I would make this script function online and cache the results? I've read into using Python web frameworks, but from my limited understanding, they seem like overkill?
Thank you for your help!
Could you give me an overview of how I would make this script function online
The key thing would be to uncouple the two parts of your system:
Producing the data
Showing it in a website.
So the first thing to do is have your sentiment-analysis script push its value to a database. The database could be something as simple as a csv file, or it could be a key/value store, or something like MySQL or CouchDB (or hundreds of other choices).
Over on the website you have to make a decision between:
Server-side
Client-side
If the former, you could program in Python if that is what you are most familiar with. Whatever language/framework combination you go for, there will an example tutorial of how to read a value from a database and display it: it is just about the most fundamental thing.
If client-side you will usually be programming in JavaScript. Again you need to choose a framework, but again you should easily be able to find a tutorial to follow.
(Unless you have a good reason to prefer server-side, such as familiarity with an existing framework, or security issues with accessing your database, I'd go with a client-side approach.)
I've read into using Python web frameworks... overkill?
Yes and no. You are going to need some kind of database, and some kind of framework. It would be good to understand the basics of web security, too. If the sentiment analysis is your major goal, all that is going to be a distraction, and it might be better to find a friend who already knows web programming to work with. Or just find a tutorial that is very close to what you want to do, and adapt that.
(P.S. I was going to flag your question as "too broad", but you did ask for an overview, so I hope this helps.)
I'm tasked with creating our Google Maps website store locator and so far all I've been able to find is old php tutorials and some new appEngine apps.
The apps look great. They seem to function as designed and it looks like this is the way I need to proceed. I even found a demos here and here and both are perfect.
Problem is, I'm not at the level yet to understand them in order to learn from them and start implementing my own app for our stores. I do plan on using them to learn, but at the moment I'm not at that level yet so I'm not even really learning anything by examining the code.
Is there anything I can use at the moment that is a plugin option while I learn this? Perhaps any python tutorials out there hiding somewhere? I can learn these demos but I really need something for the time being while I'm figuring it all out.
This demo from 2008 might be a bit old but will put you on the right tracks.
There is also locator in geodatastore. Demo
I'm working on a project that is quite search-oriented. Basically, users will add content to the site, and this content should be immediately available in the search results. The project is still in development.
Up until now, I've been using Haystack with Xapian. One thing I'm worried about is the performance of the website once a lot of content is available. Indexing will have to occur very frequently if I want to emulate real-time search.
I was reading up on MongoDB recently. I haven't found a satisfying answer to my question, but I have the feeling that MongoDB might be of help for the real-time search indexing issue I expect to encounter. Is this correct? In other words, would the search functionality available in MongoDB be more suited for a real-time search function?
The content that will be available on the site is large unstructured text (including HTML) and related data (prices, tags, datetime info).
Thanks in advance,
Laundro
I don't know much about MongoDB, but I'm using with great success Sphinx Search - simple, powerful and very fast tool for full text indexing&search. It also provides Python wrapper out-of-the-box.
It would be easier to pick it up if Haystack provided bindings for it, unfortunately Sphinx bindings are still on a wish list.
Nevertheless, setting Spinx up is so quick (I did it in a few hours, for existing in-production Django-based CRM), that maybe you can give it a try before switching to a more generic solution.
MongoDB is not really a "dedicated full text search engine". Based on their full text search docs you can only create a array of tags that duplicates the string data or other columns, which with many elements (hundreds or thousands) can make inserts very expensive.
Agree with Tomasz, Sphinx Search can be used for what you need. Real time indexes if you want it to be really real time or Delta indexes if several seconds of delay are acceptable.
I would like to provide my Python GAE website in the user's own language, using only the tools available directly in App Engine. For that, I would like to use GNU gettext files (.po and .mo files).
Has someone successfully combined Python Google App Engine and gettext files? If so, could you please provide the steps you used?
I had started a discussion in GAE's Google group, but haven't been able to extract from it how I'd like to do it: I don't want to add external dependencies, like Babel (suggested in the discussion). I want to use plain vanilla Google App Engine, so no manual update of Django or this kind of stuff.
At first, I will start using the language sent by the browser, so no need to manually force the language by using cookies etc. However, I might add a language changing feature later, once the basic internationalization works.
As a background note to give you more details about what I'm trying to do, I would like to internationalize Issue Tracker Tracker, an open source application I've hosted on Launchpad. I plan to use Launchpad's translation platform (explaining why I'd like to use .mo files). You can have a look at the source code in it's Bazaar branch (sorry no link due to stackoverflow spam prevention limit for new users...)
Thanks for helping me advance on this project!
As my needs were simple, I used a simple hack instead of (unavailable) gettext. I created a file with string translations, translate.py. Approximately like this:
en={}
ru={}
en['default_site_title']=u"Site title in English"
ru['default_site_title']=u"Название сайта по-русски"
Then in the main code I defined a function which returns a dictionary with translations into the most suitable language from the list (the first one to have a translation is used or English):
import translate
def get_messages(languages=[]):
msgs=translate.en
for lang in languages:
if hasattr(translate,lang):
msgs=getattr(translate,lang)
break
return msgs
Usage:
msgs = get_messages(["it","ru","en"])
hi = msgs['hello_message'] % 'yourname'
I also defined a helper function which extracts a list of languages from Accept-Language header.
It's not the most flexible solution, but it doesn't have any external dependencies and works for me (in a toy project). I think translate.py may be generated automatically from gettext files.
In case you want to see more, my actual source is here.
You can use the Django internationalisation tool, like explained here.
They are also saying that there is no easy way to do this.
I hope that helps you :)
I'm wondering how the best way to build a way to interface with Yahoo Chat is. I haven't found anything that looks incredibly easy to do yet. One thought it to build it all from scratch, the other thought would be to grab the code from open source software. I could use something like zinc, however, this maybe more complex than it needs to be. Another option would be to find a library that supports it, however, I haven't seen one. What are your thoughts on how to proceed and what would be the best way? I'm not necessarily looking for the fastest way as this is a bit of a learning project for me.
Python-purple is a python API for accessing libpurple, the Pidgin backend. It will give you access to all the IM networks which Pidgin supports, including Y!Messenger, MSN Messenger, Jabber/GTalk/XMPP, and more...