Elastic search client for Python: advice? - python

I'm writing some scripts for our sales people to query an index with elastic search through python. (Eventually the script will update lead info in our Salesforce DB.)
I have been using the urllib2 module, with simplejson, to pull results. The problem is that this seems to be a not-so-good approach, evidenced by scripts which are taking longer and longer to run.
Questions:
Does anyone have any opinions (opinions, on the internet???) about Elastic Search clients for Python? Specifically, I've found pyes and pyelasticsearch, via elasticsearch.org---how do these two stack up?
How good or bad is my current approach of dynamically building the query and running it via self.raw_results = simplejson.load(urllib2.urlopen(self.query))?
Any advice is greatly appreciated!

We use pyes. And its pretty neat. You can there go with the thrift protocol which is faster then the rest service.

It sounds like you have an issue unrelated to the client. If you can pare down what's being sent to ES and represent it in a simple curl command it will make what's actually running slowly more apparent. I suspect we just need to tweak your query to make sure it's optimal for your context.

Related

How to turn a simple python script to basic webapp?

I would like to know what is the fastest way to turn a simple Python script into a basic web app.
For example, say I would like to create a web app that takes a keyword from the user and display the most retweeted tweet on Twitter. If I write a python script that is capable of performing that task using Twitter's API, how would I go about turning it into a web app for people to access?
I have looked at frameworks such as Django, but it would take me weeks or months to learn how to use it. I just need something quick and simple. Any such alternatives?
Make a CGI script out of it. You basically get the request information from the webserver via environment variables and you print the desired HTML to stdout. There are helper libraries such as Werkzeug which help with abstracting away the handling of the environment variables by wrapping them in a Request object.
This technique is quite outdated and isn't normally used nowadays as the script has to be run on every request and thus incurs the startup cost all the time.
Nevertheless this may actually be a good solution for you because it is quick and every webserver supports it.

Google apps python script

I want to be able to give out a form, with essentially 4 inputs. Each time it is submitted, I would it to trigger a python script.
I'm not totally sure where to start here. I could have the python code live on a server somewhere and have the google apps script trigger it, but ideally I could do this without having to host my code somewhere else. I also would like to avoid paying for anything...
Any and all advice would be appreciated. Please assume I have only a small amount of knowledge about this kind of stuff.
Check out this tutorial on the AppEngine documentation.
https://developers.google.com/appengine/docs/python/gettingstartedpython27/introduction
It will help you set up using Python and WebApp2 (for your forms!) and storing this data to datastore if you wish. You could just expand and modify the guestbook tutorial they make you do to have your application/script do exactly what you need it to. It's an excellent tutorial to get started even if you don't have much knowledge about python or appengine.

Running Python Scrapers and Database in online environment

I hope that this question is not stupid but I am a beginner and I can hardly find good suggestions for what I need.
I run python webscrapers that crawl data from several platforms. So far the whole system is running locally on my Mac. As I will me travelling and need to check the consistency of my scrapers that will operate on a daily basis for at least a month I need to find another solution.
My requirements are:
* Storage for MySQL Database (Would love to use PhPMyAdmin to administrate)
* Enough Processing Power for Python Scrapers
* Necessary Python Modules can be installed
* Data can be accessed at any time and data dumps can be created
I have already found providers like AktiveState (http://www.activestate.com/) or PiCloud (http://www.picloud.com/). They seem to be solutions for the processing of the code.
Does anyone have experience with them?
Can somebody give me a tip for good storage that can communicate with these platforms?
As I am a beginner, I really need something that is easy to use. Thanks!
I usually use Amazon Web Services, Heroku, or Google App Engine, though your choice will depend on your level of expertise and your cost basis.
Heroku is the easiest to use but most costly, followed by GAE, and then AWS is the hardest to use but cheapest.

Ideas on how to retrieve and analyze server logs with python?

To start off, this desktop app is really to give myself an excuse to learn python and how a gui works.
Im trying to help my clients visualize how much bandwidth they are going through, when its happening and where their visitors are. All of this would be displayed by graphs or whatever would be most convienient. (Down the road, I'd like to add cpu/mem usage)
I was thinking the easiest way would be for the app to connect via sftp, download the specified log and then use regex to filter the necessary information.
I was thinking of using :
Python 2.6
Pyside
Paramiko
to start out with. I was looking at twisted for the sftp part but I though maybe keeping it simple for now would be a better choice.
Does this seem right? Should I be trying to use sftp? Or should I try to interact with some subdomain from my site to push the logs to the client? (i.e app.mysite.com)
How about regular expressions to parse the logs?
sftp or shelling out to rsync seems like a reasonable way to retrieve the logs. As for parsing them, regular expressions are what most people tend to use. However, there are other approaches, too. For instance:
Parse Apache logs to SQLite database
Using pyparsing to parse logs. This one is parsing a different kind of log file, but the approach is still interesting.
Parsing Apache access logs with Python. The author actually wrote a little parser, which is available in an apachelogs module.
You get the idea.

Small "embeddable" database that can also be synced over the network?

I am looking for a small database that can be "embedded" into my Python application without running a separate server, as one can do with SQLite or Metakit. I don't need an SQL database, in fact storing free-form data like Python dictionaries or JSON is preferable.
The other requirement is that to be able to run an instance of the database on a server, and have instances of my application (clients) sync the database with the server (two-way), similar to what CouchDB replication can do.
Is there a database that will do this?
From what you describe, it sounds like you could get by using pickle and FTP.
If you don't need an SQL database, what's wrong with CouchDB? You can spawn a local process to serve the DB, and you could easily write a server wrapper to allow only access from your app. I'm not sure about the access story, but I believe the latest Ubuntu uses CouchDB for synchronizeable user-level data.
Seems like the perfect job for CouchDB: 2 way sync is incredibly easy, schema-less JSON documents are the native format. If you're using python, couchdb-python is a great way to work with CouchDB.
Do you need clients to work offline and then resync when they reconnect to the network? I don't know if MongoDB can handle the offline client scenario, but if the client is online all the time, MongoDB might be a good solution too. It has pretty goode python support. Still a separate process, but perhaps easier to get running on Windows than CouchDB.
BerkeleyDB might be another option to check out, and it's lightweight enough. easy_install bsddb3 if you need a Python interface.
HSQLDB does this, but unfortunately it's Java rather than Python.
Firebird SQL might be closer to what you want, since it does seem to have a Python interface.

Categories

Resources