Running Hadoop mapreduce Django apps on a Heroku dynamo - python

Is it readily possible to integrate a Hadoop client with python (django) mapreduce apps/scripts remotely (on Heroku dynamos or from a free cluster) as is done locally in these examples:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://blog.matthewrathbone.com/2013/11/17/python-map-reduce-on-hadoop---a-beginners-tutorial.html
Hadoop and Django, is it possible?
This Heroku addon led me to believe this might be possible: https://devcenter.heroku.com/articles/treasure-data. But this app isn't free and the learning-curve-to-cost ratio is not an obvious investment to me.
My motivation to cross the Heroku/Django/Hadoop bridge is to upgrade my current Django apps with social media mining features.

I doubt that you can install Hadoop on heroku, but even if you can what's the point?
Hadoop makes distributed computing easy, if you are going to run it in the Heroku free tier you will have a cluter of one, maybe two dynos. To harness the power of Hadoop you need more hardware. Also heroku dynos have 512 Mb of RAM....

Related

Build a docker image for google bigquery

I have a huge amount of data (hundreds of Gigas) on Google BigQuery and for easy of use (many post query treatements) I'm working with the bigquery python package. The problem is that I have to run again all my queries whenever I shut my laptop down, this is very expensive as my dataset is about one Tera. I think of Google Compute Engine but this is a poor solution as I will still paying for my machines if I don't stop them. My last solution is to mount a docker image on our own sandbox, this is cheaper and can do exactly what I'm looking for. So I would like to know if someone has ever mounted a docker image for BigQuery ? Thanks for helping!
We mount all of our python/bigquery projects into docker containers and push them to google cloud registry.
Automated scheduling, dependancy graphing, and logging can be handled with Google Cloud Composer (Airflow). Its pretty simple to get set up, and Airflow has a Kubernetes Pod Operator, That allows you to specify a python file to run in your docker image on GCR. You can use this workflow to make sure all of your queries and python scripts are run on GCP without having to worry about Google Compute Engine, or any devops type of things.
https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
https://cloud.google.com/composer/

Deployment on Google App Engine - Django, Vagrant, Ansible

I want to deploy Django project on google app engine
Following are the current situations.
I have a code on GITHUB
Djnago project has setup using Vagrant, Ansible, VirtualBox
I am completely new for cloud base deployments.
Need help to achieve this.
I checked google docs but there are couple of options for django related deployment, I am not sure which to pick for vagrant and ansible.
Your question is a bit too generic as it stands - making it here rather than comment for clarity.
If you're talking about deploying to GAE (Google App Engine) then most likely you cannot re-use your Ansible scripts as you've been writing for vagrant. As it may be possible to use Ansible to deploy on GAE, most people I know are using standard google procedure to deploy their app.
If you plan to use GCE (Google Compute Engine, a layer down in the infrastructure), you would be able to use your existing Ansible provisioning scripts (maybe with slight modification), follow along the Ansible documentation

Run multiple Python scripts in Azure (using Docker?)

I have a Python script that consumes an Azure queue, and I would like to scale this easily inside Azure infrastructure. I'm looking for the easiest solution possible to
run the Python script in an environment that is as managed as possible
have a centralized way to see the scripts running and their output, and easily scale the amount of scripts running through a GUI or something very easy to use
I'm looking at Docker at the moment, but this seems very complicated for the extremely simple task I'm trying to achieve. What possible approaches are known to do this? An added bonus would be if I could scale wrt the amount of items on the queue, but it is fine if we'd just be able to manually control the amount of parallelism.
You should have a look at Azure Web Apps, which also support Python.
This would be a managed and scaleable environment and also supports background tasks (WebJobs) with a central logging.
Azure Web Apps also offer a free plan for development and testing.
Per my experience, I think CoreOS on Azure can satisfy your needs. You can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-coreos-how-to/ to know how to get started.
CoreOS is a Linux distribution for running Docker as Linux container, that you can remote access via SSH client like putty. For using Docker, you can search the key words Docker tutorial via Bing to rapidly learning some simple usage that enough for running Python scripts.
Sounds to me like you are describing something like a micro-services architecture. From that perspective, Docker is a great choice. I recommend you consider using an orchestration framework such as Apache Mesos or Docker Swarm which will allow you to run your containers on a cluster of VMs with the ability to easily scale, deploy new versions, rollback and implement load balancing. The schedulers Mesos supports (Marathon and Chronos) also have a Web UI. I believe you can also implement some kind of triggered scaling like you describe but that will probably not be off the shelf.
This does seem like a bit of a learning curve but I think is worth it especially once you start considering the complexities of deploying new versions (with possible rollbacks), monitoring failures and even integrating things like Jenkins and continuous delivery.
For Azure, an easy way to deploy and configure a Mesos or Swarm cluster is by using Azure Container Service (ACS) which does all the hard work of configuring the cluster for you. Find additional info here: https://azure.microsoft.com/en-us/documentation/articles/container-service-intro/

How to deploy odoo on OpenShift?

I'm a bit confused with Openshift... so far i've created an app and added python and postgresql, i've downloaded odoo-master from sourceforge, and what i've done is clone the git, then what do i have to do?, should i just copy the folder odoo-master into the folder that git created, then run "git add ."->"git commit -m"odoo added" "->"git push" ?
Another questions that i have are:
When i add a new folder in the app folder how do i tell openshift to run the files that are inside that folder?
What does OpenShift Origin is?(be clear for a newbie)
What can i do with OpenShift Cartridge?
Is cartridge a client tool?, what are the client tools and what are they for?
If the question is unclear please ask me, my native language is not english and my technical language is not too great.
I can answer some of the questions.
Openshift origin
This is the primary place for Openshift source code. You will get all the features quickly in origin. After a lot of formal QA, the source is moved to enterprise and online. But in the end, the features you see in Openshift Origin should end in Online and Entreprise, as that's the same codebase. You can run the origin in your local system like a docker image in Virtual machine.
OpenShift-Specific Terminology
Application
This is your typical web application that will run on OpenShift. At this time, OpenShift is focused on hosting web applications. In your case it is Odoo (Openerp).
Gear
A gear is a server container with a set of resources that allows you to run their applications. Your gears run on OpenShift in the cloud. There are currently three gear types on OpenShift Online: small, medium, and large. Each size provides 1 GB of disk space by default. The large gear has 2 GB of RAM, the medium gear has 1 GB of RAM, and the small and small.highcpu gears have 512 MB of RAM.
Cartridge
To get a gear to do anything, you need to add a cartridge. Cartridges are the plug-ins that house the framework or components that can be used to create and run an application.
Basically Openshift splitted their runtime environments through different cartridges. Cartridges can be web frameworks, databases, monitoring services, or connectors to external backends. In the case of odoo, you need python and postgresql cartridges.
Python is a Standalone cartridge, postgresql is a Embedded cartridge.
You can control your cloud environment through the OpenShift Client tools, known as rhc or Web console.
With rhc, it easy to create and deploy applications, manage domains, control access to your OpenShift applications, and give you complete control of your cloud environment. Consider this as a ssh client for your openshift server.
You need to install Odoo dependencies from the openshift temp data directory in order to run Odoo (I have not tried Odoo in Openshift yet).

Python resources for cloud computing learning?

Is there a book or resource for learning cloud in Python or Scala? I know Django and app-engine but I am not that interested in learning more about a client framework. I'm interested in learning the core thing.
Steve Marx published a blog post describing a python sample running in Windows Azure, with the Rocket web server. The code is on github.
This will show you some interesting elements of setting up a python app in Windows Azure, including startup tasks. You'll still want to take a look at the Windows Azure Platform Training Kit to get a deeper understanding of Windows Azure.

Categories

Resources