OLAP Server for NodeJS

OLAP Server for NodeJS - python

I have been looking for ways to provide analytics for an app which is powered by REST server written in NodeJs and MySQL. Discovered OLAP which can actually make this much easier.
And found a python library that provides an OLAP HTTP server called 'Slicer'
http://cubes.databrewery.org/
Can someone explain how this works? Does this mean I have to update my schema. And create what is called fact tables?
Can this be used in conjunction with my NodeJS App? Any examples? Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
If I cant use python since I have no exp, what are basics to do it in Nodejs?
My model is basically list of words, so the olap queries I have are words made in days,weeks,months of length 2,5,10 letters in languages eng,french,german etc
Ideas, hints and guidance much appreciated!

As you found out, CUbes provides an HTTPS OLAP server (the slicer tool).
Can someone explain how this works?
As an OLAP server, you can issue OLAP queries to the server. The API is REST/JSON based, so you can easily query the server from Javascript, nodejs, Python or any other language of your choice via HTTP.
The server can answer OLAP queries. OLAP queries are based on a model of "facts" and "dimensions". You can for example query "the total sales amount for a given country and product, itemized by moonth".
Does this mean I have to update my schema. And create what is called fact tables?
OLAP queries are is built around the Facts and Dimension concepts.
OLAP-oriented datawarehousing strategies often involve the creation of these Fact and Dimension tables, building what is called a Star Schema or a Snowflake Schema. These schemas offer better performance for OLAP-type queries on relational databases. Data is often loaded by what is called an ETL process (it can be a simple script) that loads data in the appropriate form.
The Python Cubes framework, however, does not force you to alter your schema or create an alternate one. It has a SQL backend which allows you to define your model (in terms of Facts and Dimensions) without the need of changing the actual database model. This is the documentation for the model definition: https://pythonhosted.org/cubes/model.html .
However, in some cases you may still prefer to define a schema for Data Mining and use a transformation process to load data periodically. It depends on your needs, the amount of data you have, performance considerations, etc...
With Cubes you can also use other non RDBMS backends (ie MongoDB), some of which offer built-in aggregation capabilities that OLAP servers like Cubes can leverage.
Can this be used in conjunction with my NodeJS App?
You can issue queries to your Cubes Slicer server from NodeJS.
Any examples?
There is a Javascript client library to query Cubes. You probably want to use this one: https://github.com/Stiivi/cubes.js/
I don't know of any examples using NodeJS. You can try to get some inspiration from the included AngularJS application in Cubes (https://github.com/Stiivi/cubes/tree/master/incubator). Another client tool is CubesViewer which may be of use to you while building your model: http://jjmontesl.github.io/cubesviewer/ .
Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
You would run Cubes Slicer server as a web application (directly from your web server, ie. Apache). For example, with Apache, you would use apache-wsgi mod which allows to serve python applications.
Slicer can also run as a small web server in a standalone process, which is very handy during development (but I wouldn't recommend for production environments). In this case, it will be listening on a different port (typically: http://localhost:5000 ).
If I cant use python since I have no exp, what are basics to do it in Nodejs?
You don't really need to use Python at all. You can configure and use Python Cubes as OLAP server, and run queries from Javascript code (ie. directly from the browser). From the client point of view, is like a database system which you can query via HTTP and get responses in JSON format.

Related

Mongoose Schemas and inserting via python

I'm using a Nodejs server for a WebApp and Mongoose is acting as the ORM.
I've got some hooks that fire when data is inserted into a certain collection.
I want those hooks to fire when a python script inserts into the mongoDB instance. So if I have a pre save hook, it would modify the python scripts insert according to that hook.
Is this possible? If so, How do I do it?
If not, please feel free to explain to me why this is impossible and/or why I'm stupid.
EDIT: I came back to this question some months later and cringed just at how green I was when I asked it. All I really needed done was to create an API endpoint/flag on the NodeJS server that is specifically for automated tasks like the python script to send data to, and have mongoose in NodeJS land structure.

It is impossible because python and nodejs are 2 different runtimes - separate isolated processes which don't have access to each other memories.
Mongoose is a nodejs ORM - a library that maps Javascript objects to Mongodb documents and handles queries to the database.
All mongoose hooks belong to javascript space. They are executed on javascript objects before Mongoose sends any request to mongo. 2 outcomes from there: no other process can mess up with these hooks, not even another nodejs, and once the query reaches mongodb it's final, no more hooks, no more modifications.
One said a picture worth 100 words:
Neither python nor mongo are aware about mongoose hooks. All queries to mongo are initiated on the client side - a script sends a request to modify state of the database or to query state of the database.
The only way to trigger a javascript code execution from an update on mongodb side is to use change streams
Change streams are not mongoose hooks but can be used to hook into the updates on mongo side. It's a bit more advanced use of the database. It comes with additional requirements for mongo set up, size of the oplog, availability of the changestream clients, error handling etc.
You can learn more about change streams here https://docs.mongodb.com/manual/changeStreams/ I would strongly recommend to seek professional advice to architect such set up to avoid frustration and unexpected behaviour.

Mongo itself does not support hooks as a feature, mongoose gives you out of the box hooks you can use as you've mentioned. So what can you do to make it work in python?
Use an existing framework like python's eve, eve gives you database hooks, much like mongoose does. Now eve is a REST api framework which from your description doesn't sound like what you're looking for. Unfortunately I do not know of any package that's a perfect fit to your needs (if you do find one it would be great if you share a link in your question).
Build your own custom wrapper like this one. You can just built a custom wrapper class real quick and implement your own logic very easily.

Model implementation in production with python

I built a machine learning model of binary classification in python.
It works on my laptop (e.g. command line tool). Now I want to deploy it in production on a separate server in my company. It has to take inputs from another server (C# application), make some calculations and return outputs back to it.
My question is what are the best practices of doing such thing in production? As I know it can be done through TCP/IP connection.
I am new in this field and I don't know the terms used here.
So can anybody guide me?
Thanks.

I would say it depends on your infrastructure and how can the other application (C#) can communicate.
The easiest way in my opinion would be through a REST API (http request). There are some tools in different languages to create REST endpoints easily and request REST endpoints.
For example, in python, you can request the content of a URL like this:
What is the quickest way to HTTP GET in Python?
But it depends on what you have on the C# side. Can you update the C# code?
Here are a range of solutions:
REST API: need to expose REST endpoints on the communicating "service".
in C#: https://learn.microsoft.com/en-us/aspnet/web-api/overview/older-versions/build-restful-apis-with-aspnet-web-api
in python, I would recommend django framework if you need to create a server (but if the python only request things and don't serve as a server, you may not need it)
message queue like rabbitmq or zeromq, but it requires an external service to manage queues and messages
TCP/IP socket like you suggested, but it requires to manage yourself those connections

Running complex calculations (using python/pandas) in a Django server

I have developed a RESTful API using the Django-rest-framework in python. I developed the required models, serialised them, set up token authentication and all the other due diligence that goes along with it.
I also built a front-end using Angular, hosted on a different domain. I setup CORS modifications so I can access the API as required. Everything seems to be working fine.
Here is the problem. The web app I am building is a financial application that should allow the user to run some complex calculations on the server and send the results to the front-end app so they can be rendered into charts and other formats. I do not know how or where to put these calculations.
I chose Django for the back-end as I expected that python would help me run such calculations wherever required. Basically, when I call a particular api link on the server, I want to be able to retrieve data from my database, from multiple tables if required, and use the data to run some calculations using python or a library of python (pandas or numpy) and serve the results of the calculations as response to the API call.
If this is a daunting task, I at least want to be able to use the API to retrieve data from the tables to the front-end, process the data a little using JS, and send it to a python function located on the server with this processed data, and this function would run the necessary complex calculations and respond with results which would be rendered into charts / other formats.
Can anyone point me to a direction to move from here? I looked for resources online but I think I am unable to find the correct keywords to search for them. I just want a shell code kind of a thing to integrate into my current backed using which I can call some python scripts that I write to run these calculations.
Thanks in advance.

I assume your question is about "how do I do these calculations in the restful framework for django?", but I think in this case you need to move away from that idea.
You did everything correctly but RESTful APIs serve resources -- basically your model.
A computation however is nothing like that. As I see it, you have two ways of achieving what you want:
1) Write a model that represents the results of a computation and is served using the RESTful framework, thus your computation being a resource (can work nicely if you store the results in your database as a way of caching)
2) Add a route/endpoint to your api, that is meant to serve results of that computation.
Path 1: Computation as Resource
Create a model, that handles the computation upon instantiation.
You could even set up an inheritance structure for computations and implement an interface for your computation models.
This way, when the resource is requested and the restful framework wants to serve this resource, the computational result will be served.
Path 2: Custom Endpoint
Add a route for your computation endpoints like /myapi/v1/taxes/compute.
In the underlying controller of this endpoint, you will load up the models you need for your computation, perform the computation, and serve the result however you like it (probably a json response).
You can still implement computations with the above mentioned inheritance structure. That way, you can instantiate the Computation object based on a parameter (in the above case taxes).
Does this give you an idea?

python client app model comunication with a Json API

Sorry in advice for my strange english.
I have to develop a client application with python that comunicate with a php server that uses JSON protocol for data exchange.
There are many python frameworks that permit to implement MVC pattern, and in particular with structured Models for data handling, but all these model structures talk directly with a database in SQL language.
My purpose is to use a single server that shots data with JSON api to all kind of devices or platforms.
So, in my python application, i would to write a syncing model storage that talks directly with my Json Server as well as an ExtJs 4 app, using a framework or a library that permits to implement easily my request.
Does anybody knows any tools that permits this ?

If I understood your question correctly, you're looking for a proxying solution to put between application server clients. It may not be 100% fit but I'd suggest looking at Ext.Direct remoting that's built in Ext JS; RPC should work fine if you don't have to publish and maintain your API. As for proxying, take a look at RPC::ExtDirect::Client. It's an Ext.Direct client implementation in Perl; I developed it mostly for testing purposes but it can probably be used for proxying, too.
On a side note, I'm not sure why exactly you would want to implement such an architecture at all. It sounds overcomplicated for no good purpose.

How to make a cost effective but scalable site?

Portal Technology Assessment in which we will be creating a placement portal for the campuses and industry to help place students. The portal will handle large volumes of data and people logging in, approximately 1000 users/day in a concurrent mode.
What technology should i use? PHP with CakePHP as a framework, Ruby on Rails, ASP.NET, Python, or should I opt for cloud computing? Which of those are the most cost beneficial?

Any of those will do, it really depends on what you know. If you're comfortable with Python, use Django. If you like Ruby go with ROR. These modern frameworks are built to scale, assuming you're not going to be developing something on the scale of facebook then they should suffice.
I personally recommend nginx as your main server to host static content and possibly reverse-proxy to Django/mod_wsgi/Apache2.
Another important aspect is caching, make sure to use something like memcached and make sure the framework has some sort of plugin or it's easily attachable.

Language choice is important as you must choose language that you and your team feel the most comfortable with as you must develop mid-large size application. Of course use framework with Python it must be Django, with ASP.NET .NET or MVC.NET whatever you feel better with with Ruby ROR and with PHP there are too large amount of frameworks.
1000 concurrent users is not that much, especially it depends what users will do. Places where users will get large amount of data are better to Cache with with any caching engine you want. You need to design application this what so you can easily swap between real DB calls and calls to cache. For that use Data Objects like for Logins create an Object array of course if you need it. Save some information in cookies when user logins for example his last login, password in case he wants to change it, email and such so you will make less calls to DB in read mode ( select queries ).
use cookie less domain for static content like images, js and css files. setup on this domain the fastest system you can with simplest server you can, probably something based on Linux.
For servers, best advice is to either get large machine and set Virtual Boxes on it with vmware or other Linux based solution or to get few servers which is better because if on big server down you lost everything if one of 1 is down you still can do some stuff. Especially if you set railroad mode. Railroad mode is simple you set up Application server (IIS or Apache) on one server and make it master while you set up SQL on the same server and make it slave. On other server you set up SQL as master and Application server as slave. So server one serves IIS/Apache and Other one SQL, if one down you just need to change line in host.etc in order to set something somewhere else ( i don't know how to do that in Linux ).
last server for static content.
Cloud Computing, you will use if you want it or not. You will share resources with some applications as Google API for jquery and jqueryUI for instance but you create unique application and i don't believe making core of application based on cloud computing will do any good. Use large site's CDNs for good.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.