Choosing a development stack for web based data vis/mining - python

I would like to know what various folks in the community think about tool/language choices for a small team (3-5) developers working build simple data driven applications. We want to do data munging/Analysis/Datavis.
We will likely ultimately have Hadoop on the data crunching end and will have javascript on the front end. Ideally we want some level of R integration too.
My best thought at the moment is Django, Python, using R with Rpy (http://rpy.sourceforge.net/) and Boto (http://code.google.com/p/boto/).
Are there other good alternatives? Would there be any significant down/up sides to trying to go a JVM route instead? What tools would you use and why?

Instead of JVM you could check rApache and rserve also. Well, I have no idea though what extra would you get instead of using Rpy.

Related

How to provide common web interface to various R and python programs?

We have a large, interactive R program that we would like to interface with Shiny. There is a small Python program we would also like to create an interface for alongside it. There are no dependencies between the two sets of code, but as a research institute we'd like to provide a common interface for the two programs might be accessed by the same users. What is a good way to go about it? Is it better to consolidate under python/Django and use rpy2, or make system calls to the python program through R's Shiny interface? Are there better alternatives, or recommended practices?
Django would be an overkill.
rpy2 is a good option for small modules containing simpler methods
flask is another good option for python's side. Programmers can transmit files or even build simple web-interfaces. I prefer this method. Tell your students/collegues to define fixed APIs and response format [JSON/XML] and even a new scholar wouldn't have to spend times thinking about how to make it work. Just tell him the APIs and work with it just like Alchemy etc interfaces.
Shiny is a good option for building web-interfaces on R side. A quick tutorial that works. http://shiny.rstudio.com/tutorial/lesson2/

recommendations for report generator (Python or web service)

I maintain several web applications and I'd like to add some "nice" reporting/analytics pages. Building that once is simple enough (e.g. using flot or similar plotting libraries) but somehow it seems like there should be a report generation library out there which "just" generates the necessary graphs without much coding + offer some filtering ability.
There are some tools out there but for some reason there was never a good fit:
must work on Linux
open source preferred though closed source works as well as long the pricing model is also suitable for small installs
Python API required (or external services using standard web protocols)
I realize that this is not exactly a unique question but I couldn't find other stackoverflow questions with the same scope. Any pointers appreciated.
Update (2012-08-09, 15:10 UTC): I realized I did not state some more requirements/wishes:
web interface to access reports
access control: Each user can only get reports on his own data (simple to do with a library, might be hard with an external server)
filtering: I need interactive filtering of values based on some parameters (e.g. "only events in this time frame", "only in place X").
Windward* is one software company that offers a solution that seems to meet most of your needs. They offer a Python API through either Jython or a RESTful API (their Java Engine and Javelin, respectively), and their main strength is that template design is done in Microsoft Office, so reports can be very flexible and are easy to put together (most people already know how to use Word, so there's also much less learning curve than other solutions out there). You can add dynamic filters that take parameters at runtime or change on-the-fly, you can output to a variety of formats including HTML and PDF, and it works with pretty much every major datasource. For a web interface, you can either build your own and easily integrate reporting into it (Engine) or buy one pre-built and modify it to your specifications (Javelin).
On the downside, they are closed-source and without knowing more about your setup, it would be difficult for me to say whether their pricing would work. Might be worth a look, though--the links above and their documentation wiki are probably good places to start looking to see if you're a fit.
*Disclaimer: I work for Windward. I do believe they are one of the better reporting packages out there, but there are others that may fit your needs too.

Web interface to a system of stochastic differential equations

I have a system of stochastic differential equations implemented in MATLAB. Just 4 variables integrated with Euler-Maruyama, so nothing too fancy... the technical details aren't important for this question though.
How would you recommend I go about building a web interface (i.e., let people change the parameters and initial conditions in their web browser and then display the results)?
The first step should be translating the code to numpy/matplotlib, right?
Should I be saving the output as an image or doing some fancy HTML5 plotting stuff?
Are there any publicly available tools/frameworks that will make it easy to build a nice web UI for this kind of thing?
Any tips on where to host this kind of thing, or am I basically limited to setting up my own server?
If there are no constraints on the target language, I'd simply translate to R and use RApache. There are plenty of libraries to support this. In fact, you may not need to reimplement much code, given what's available in R libraries.
The reason I suggest R is that I've ported a lot of Matlab code to make it reusable or open, and R has usually been the easiest target for me, due to the libraries already developed.
EDIT/UPDATE: I overlooked using RStudio as a server. That might be the easiest way to go. See this page: http://www.rstudio.org/docs/server/getting_started.
Regarding where to set this up, you could look at using Amazon's "micro instances", if the amount of computation is very limited. There are also some new startups doing cloud stuff. One choice might be http://cloudnumbers.com/.
So, I'd recommend:
Post a question about the SDE stuff (to get pointers to the right package(s) on CRAN).
Install RStudio and play with the package
Try out RStudio server
Look for hosting :)
...
Profit! :) :)
I would not recommend saving stuff to disk and re-loading it. It's best to have things as smoothly coupled as possible, so that you don't have to code stuff to maintain a state on the server or browser side.
RStudio is quite cool if you want to take the R route. If you want to stick with Python, I recommend you to take a look at Femhub. It's perhaps the most mature and well developed web interface to do numerical computations in Python. Jut take a look to the "Published worksheets" examples to see what is capable of.

Will python provide enough performance for a proxy?

I want to start writing a http proxy that will modify responses according to some rules/filters I will configure. However, before I start coding it, I want to make sure I'm making the right choice in going with Python. Later, this tool would have to be able to process a lot of requests, so, I would like to know I can count on it later on to be able to perform when "push comes to shove".
As long as the bulk of the processing uses Python's built-in modules it should be fine as far as performance. The biggest strength of Python is its clear syntax and ease of testing/maintainability. If you find that one section of your code is slowing down the process, you can rewrite that section and use it as a C module, while keeping the bulk of your control code in Python.
However if you're looking to make the most optimized Python Code you may want to check out this SO post.
Yes, I think you will find Python to be perfectly adequate for your needs. There's a huge number of web frameworks, WSGI libraries, etc. to choose from, or learn from when building your own.
There's an interesting post on the Python History blog about how Python was supporting high performance websites in 1996.
This will depend on the library you use more than the language itself. The twisted framework is known to scale well.
Here's a proxy server example in python/twisted to get you started.
Bottomline: choose your third party tools wisely and I'm sure you'll be fine.
Python performs pretty well for most tasks, but you'll need to change the way you program if you're used to other languages. See Python is not Java for more info.
If plain old CPython doesn't give the performance you need, you have other options as well.
As has been mentioned, you can extend it in C (using a tool like swig or Pyrex). I also hear good things about PyPy as well, but bear in mind that it uses a restricted subset of Python. Lastly, a lot of people use psyco to speed up performance.

What are the benefits of using Python for web programming?

What makes Python stand out for use in web development? What are some examples of highly successful uses of Python on the web?
Django is, IMHO, one of the major benefits of using Python. Model your domain, code your classes, and voila, your ORM is done, and you can focus on the UI. Add in the ease of templating with the built-in templating language (or one of many others you can use as well), and it becomes very easy to whip up effective web applications in no time. Throw in the built-in admin interface, and it's a no-brainer.
Certainly one successful use of Python on the web is Google App Engine. Site authors write code in (a slightly restricted subset of) Python, which is then executed by the App Engine servers in a distributed and scalable manner.
Quotes about Python:
"Python is fast enough for our site
and allows us to produce maintainable
features in record times, with a
minimum of developers," said Cuong Do,
Software Architect, YouTube.com.
YouTube uses a lot of Python and is probably the best example of a Python success story.
A great example of a Django success story is the Washington Post, who recently shared a big list of applications they have developed:
http://push.cx/2009/washington-post-update
www.lawrence.com and www.ljworld.com are two of the first sites to use Django (before it was even open source).
djangositeoftheweek.com has a bunch of good case studies.
www.everyblock.com is another great example.
Finally, http://www.djangosites.org/ links to nearly 2,000 other Django powered sites.
Short anwser: the diversity of tools readily available and freedom of choice.
This sounds like a simple question but which it really isn't. While Python is very good for web development and this has been shown by the, oh so famous, Google App Engine, Plone and Django. One has to point out that the development way in Python requires a lot more from the developer than PHP but it gives a lot more to the mix as well.
The entry level on actually producing something is higher. This is because there are bunch of different tools for doing web development with Python. Choosing the web development framework can be a hard decision for an inexperienced developer.
Having a lot of different tools is a two edged sword. To some extent it brings you the freedom of choice to pick the one you might want but then again how do you really know which one is good for what you're doing. This brings me to my point. Python stands out from the mass by not having a standard or de facto web development library. While this is pretty much against the principle of having only one simple way of doing on thing it also brings us a wide variety of different tools with different kind of design choices. At first this might feel very frustrating because it would be so much easier if somebody had made the choice for you but now that you're left to make the choice you actually might have to think about what you're doing and what would fit. ...or you might just end up picking one and blowing your head off after you've realized that you made the wrong choice. Anyway you end up, you've made the choice and no one else.
Furthermore,
Python is both strong in web and in data analytics and machine learning. For example scikit, sci-py and numpy are very strong. In some cases, it can be very interesting to have the both elements on the same server.
For example http://rankmytweet.com uses this a lot.
trac(bug tracker) and moinmoin(wiki) are too web based python tools that I find invaluable.
GNU Mailman is another project written in python that is widely successful.
As many have pointed out, Django is a great reason to use Python...so in order to figure out why Python is great for web development, the best bet is to look at why it is a good language to build a framework like Django.
IMHO Python combines the cleanest, or at least one of the cleanest, metaprogramming models of any language with a very pure object orientation. This not only makes it possible to write extremely general abstractions that are easy to use, but also allows the abstractions to combine relatively cleanly with others. This is harder to do in languages that take a code-generation based approach to metaprogramming (e.g. Ruby).
Dynamic languages are in general good for web apps because the speed of development. Python in particular has two advantages over most of them:
"batteries included" means lots of available libraries
Django. For me this is the only reason why i use Python instead of Lua (which i like a lot more).
Besides the frameworks...
Python's pervasive support for Unicode should make i18n much smoother.
A sane namespace system makes debugging much nicer, because it's typically easier to find where things are defined.
Python's inability to function as a standalone templating language should discourage the mixture of HTML with model code
Great standard library
Other examples of Python sites are Reddit and YouTube.

Categories

Resources