Minimum System Requirements to run Recommendation in Predictionio

Minimum System Requirements to run Recommendation in Predictionio - python

I tried to have predictionio integrate with my app. I used recommendation Engine deployment as in quick start in Predictionio website.
Faced lot of issues but able to build the engine.
I tried to train the model using pio train. But it gave an error saying "java.lang.StackOverflowError". So it means memory is not enough in my server. Then I tried to increase the memory by using pio train -- --driver-memory 5g --executor-memory 5g. Still I am getting the same error
(I am using 4 cores, 6GB RAM Ubuntu 14.04 server).
SO I want to know what is the minimum server requirements have Predictionio.

Minimum Requirements can be found in here

Related

Deploy TensorFlow model to server?

I am trying to deploy a Python ML app (made using Streamlit) to a server. This app essentially loads a NN model that I previously trained and makes classification predictions using this model.
The problem I am running into is that because TensorFlow is such a large package (at least 150MB for the latest tensorflow-cpu version) the hosting service I am trying to use (Heroku) keeps telling me that I exceed the storage limit of 300MB.
I was wondering if anyone else had similar problems or an idea of how to fix/get around this issue?
What I've tried so far
I've already tried replacing the tensorflow requirement with tensorflow-cpu which did significantly reduce the size, but it was still too big so -
I also tried downgrading the tensorflow-cpu version to tensorflow-cpu==2.1.0 which finally worked but then I ran into issues on model.load() (which I think might be related to the fact that I downgraded the tf version since it works fine locally)

I've faced the same problem last year. I know this does not answer your Heroku specific question, but my solution was to use Docker with AWS Beanstalk. It worked out cheaper than Heroku and I had less issues with deployment. I can guide on how to do this if you are interested

You might have multiple modules downloaded. I would recommend you to open file explorer and see the actual directory of the downloaded modules.

How to get reproducible Python Apache Beam Dataflow environment builds?

Currently I build our Google Dataflow Python environment using the official example setup.py: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py
The problems with this approach are:
OS compatibility issues I am developing on a Mac and Dataflow instances are based on Ubuntu. Using setup.py is quite painful here as it doesn't seem like the right tool to encampsualte.
DataflowRunner take about 20-25 minutes to identify
I think getting a docker image to mirror the Dataflow environment would be a good solution to these problems and running the DirectRunner on the image.
It seems to me templates https://cloud.google.com/dataflow/docs/guides/templates/overview could help executing from different environments though I don't think they provide enough insight into the build process.
I am not sure where to find a Docker image I could use for this or if there are any better ways to reproducibly build Dataflow Python environments?

Tensorflow code on google cloud platform

I am facing a strange issue. I have a working code for a very simple neural network. I am running it on my laptop. Kind of slow but ok. I then created a 24 cores instance (Linux) on google cloud, and run the same code. It seems to take almost the same time. I expected to be a lot faster. Any idea why this could be the case? I am using a standard, vanilla pip installation of cpu tensorflow. Nothing fancy.
Would appreciate any ideas...
Best, Umberto

Set up spark using an external virtual machine

I am not as huge a computer person as many others on here, I majored in math with MatLab as my main computer knowledge. I have recently got involved with Apache Spark through the excellent edX course offered by Berkeley.
The method that they used for setting up Spark was provided in a great step by step guide, it involved: downloading Oracle VM Virtual Box with an Ubuntu 32bit VM, then through the use of a vagrant (again I'm not hugely computer-y so not 100% sure how this worked or what it is) connect this to IPython notebook. This enabled me to have access to Spark over the internet and to code in python with pySpark, this is exactly what I want to do.
Everything was going very well until the second lab exercise, it became apparent that my Windows laptop has insufficient free memory (just 3 Gb and four years old) after it continually froze and crashed when trying to work with large datasets.
It is not possible to have a VM in a VM apparently so I have spent most of today looking for alternative ways of setting up Spark to no avail; the guides are all aimed at someone with more computer knowledge than I have.
My (likely naive) idea now is to rent an external machine that I can interface with through my windows laptop completely as before but so that the virtual machine operates outside of the memory of my laptop i.e. in the cloud (using any of Ubuntu, Windows, etc.). Essentially I want to move the Oracle VM virtual box to an outside source to rid my computer of memory burdens and to use Ipython notebook as before.
How can I set up a virtual machine to use for the computational side of Spark in Ipython notebook?
Or is there an alternate method that would be simple to follow?

Don't run VMs. Instead:
Download the latest Spark version. (1.4.1 at the moment.)
Extract the archive.
Run bin/pyspark.cmd.
It's not an IPython Notebook, but you can run Python code against a local Spark instance.
If you want a beefier instance, do the same on a beefy remote machine. For example an EC2 m4.2xlarge is $0.5 per hour with 8 cores and 30 GB of RAM.

Running Hadoop mapreduce Django apps on a Heroku dynamo

Is it readily possible to integrate a Hadoop client with python (django) mapreduce apps/scripts remotely (on Heroku dynamos or from a free cluster) as is done locally in these examples:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://blog.matthewrathbone.com/2013/11/17/python-map-reduce-on-hadoop---a-beginners-tutorial.html
Hadoop and Django, is it possible?
This Heroku addon led me to believe this might be possible: https://devcenter.heroku.com/articles/treasure-data. But this app isn't free and the learning-curve-to-cost ratio is not an obvious investment to me.
My motivation to cross the Heroku/Django/Hadoop bridge is to upgrade my current Django apps with social media mining features.

I doubt that you can install Hadoop on heroku, but even if you can what's the point?
Hadoop makes distributed computing easy, if you are going to run it in the Heroku free tier you will have a cluter of one, maybe two dynos. To harness the power of Hadoop you need more hardware. Also heroku dynos have 512 Mb of RAM....

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.