Is there an equivalent of redis command pipeline in MongoDB? - python

I am debugging a slow API endpoint in a web app which uses mongodb as storage. It turns out the request send 8 different queries to MongoDB, and group the data together to return. The MongoDB lives on another host, so the request involves 8 roundtrips.
These 8 requests don't have any dependency among themselves, so if I can send the 8 queries in a batch, or in parallel, a lot of time can be saved.
I am wondering if Mongo supports something like Redis's pipeline, or maybe send a script (like a lua script in Redis) for fetching data, so that I can get all data in one go?
If not, is there a way to send the querys in parallel? (The app is based on python/tornado/pymongo)

There are a number of options for this kind of thing in mongodb all of which can be accessed by the python driver pymonogo.
The best way (IMHO) is the aggregation framework which allows you to build a pipeline. However, some of the functionality is limited by mongodb version and whether you have sharded clusters.
Other options include map-reduce or simple operators.

AFAIK, there is no pipeline-like mechanism in MongoDB.
I would try using server-side scripts.

Related

Mongoose Schemas and inserting via python

I'm using a Nodejs server for a WebApp and Mongoose is acting as the ORM.
I've got some hooks that fire when data is inserted into a certain collection.
I want those hooks to fire when a python script inserts into the mongoDB instance. So if I have a pre save hook, it would modify the python scripts insert according to that hook.
Is this possible? If so, How do I do it?
If not, please feel free to explain to me why this is impossible and/or why I'm stupid.
EDIT: I came back to this question some months later and cringed just at how green I was when I asked it. All I really needed done was to create an API endpoint/flag on the NodeJS server that is specifically for automated tasks like the python script to send data to, and have mongoose in NodeJS land structure.
It is impossible because python and nodejs are 2 different runtimes - separate isolated processes which don't have access to each other memories.
Mongoose is a nodejs ORM - a library that maps Javascript objects to Mongodb documents and handles queries to the database.
All mongoose hooks belong to javascript space. They are executed on javascript objects before Mongoose sends any request to mongo. 2 outcomes from there: no other process can mess up with these hooks, not even another nodejs, and once the query reaches mongodb it's final, no more hooks, no more modifications.
One said a picture worth 100 words:
Neither python nor mongo are aware about mongoose hooks. All queries to mongo are initiated on the client side - a script sends a request to modify state of the database or to query state of the database.
The only way to trigger a javascript code execution from an update on mongodb side is to use change streams
Change streams are not mongoose hooks but can be used to hook into the updates on mongo side. It's a bit more advanced use of the database. It comes with additional requirements for mongo set up, size of the oplog, availability of the changestream clients, error handling etc.
You can learn more about change streams here https://docs.mongodb.com/manual/changeStreams/ I would strongly recommend to seek professional advice to architect such set up to avoid frustration and unexpected behaviour.
Mongo itself does not support hooks as a feature, mongoose gives you out of the box hooks you can use as you've mentioned. So what can you do to make it work in python?
Use an existing framework like python's eve, eve gives you database hooks, much like mongoose does. Now eve is a REST api framework which from your description doesn't sound like what you're looking for. Unfortunately I do not know of any package that's a perfect fit to your needs (if you do find one it would be great if you share a link in your question).
Build your own custom wrapper like this one. You can just built a custom wrapper class real quick and implement your own logic very easily.

Running complex calculations (using python/pandas) in a Django server

I have developed a RESTful API using the Django-rest-framework in python. I developed the required models, serialised them, set up token authentication and all the other due diligence that goes along with it.
I also built a front-end using Angular, hosted on a different domain. I setup CORS modifications so I can access the API as required. Everything seems to be working fine.
Here is the problem. The web app I am building is a financial application that should allow the user to run some complex calculations on the server and send the results to the front-end app so they can be rendered into charts and other formats. I do not know how or where to put these calculations.
I chose Django for the back-end as I expected that python would help me run such calculations wherever required. Basically, when I call a particular api link on the server, I want to be able to retrieve data from my database, from multiple tables if required, and use the data to run some calculations using python or a library of python (pandas or numpy) and serve the results of the calculations as response to the API call.
If this is a daunting task, I at least want to be able to use the API to retrieve data from the tables to the front-end, process the data a little using JS, and send it to a python function located on the server with this processed data, and this function would run the necessary complex calculations and respond with results which would be rendered into charts / other formats.
Can anyone point me to a direction to move from here? I looked for resources online but I think I am unable to find the correct keywords to search for them. I just want a shell code kind of a thing to integrate into my current backed using which I can call some python scripts that I write to run these calculations.
Thanks in advance.
I assume your question is about "how do I do these calculations in the restful framework for django?", but I think in this case you need to move away from that idea.
You did everything correctly but RESTful APIs serve resources -- basically your model.
A computation however is nothing like that. As I see it, you have two ways of achieving what you want:
1) Write a model that represents the results of a computation and is served using the RESTful framework, thus your computation being a resource (can work nicely if you store the results in your database as a way of caching)
2) Add a route/endpoint to your api, that is meant to serve results of that computation.
Path 1: Computation as Resource
Create a model, that handles the computation upon instantiation.
You could even set up an inheritance structure for computations and implement an interface for your computation models.
This way, when the resource is requested and the restful framework wants to serve this resource, the computational result will be served.
Path 2: Custom Endpoint
Add a route for your computation endpoints like /myapi/v1/taxes/compute.
In the underlying controller of this endpoint, you will load up the models you need for your computation, perform the computation, and serve the result however you like it (probably a json response).
You can still implement computations with the above mentioned inheritance structure. That way, you can instantiate the Computation object based on a parameter (in the above case taxes).
Does this give you an idea?

OLAP Server for NodeJS

I have been looking for ways to provide analytics for an app which is powered by REST server written in NodeJs and MySQL. Discovered OLAP which can actually make this much easier.
And found a python library that provides an OLAP HTTP server called 'Slicer'
http://cubes.databrewery.org/
Can someone explain how this works? Does this mean I have to update my schema. And create what is called fact tables?
Can this be used in conjunction with my NodeJS App? Any examples? Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
If I cant use python since I have no exp, what are basics to do it in Nodejs?
My model is basically list of words, so the olap queries I have are words made in days,weeks,months of length 2,5,10 letters in languages eng,french,german etc
Ideas, hints and guidance much appreciated!
As you found out, CUbes provides an HTTPS OLAP server (the slicer tool).
Can someone explain how this works?
As an OLAP server, you can issue OLAP queries to the server. The API is REST/JSON based, so you can easily query the server from Javascript, nodejs, Python or any other language of your choice via HTTP.
The server can answer OLAP queries. OLAP queries are based on a model of "facts" and "dimensions". You can for example query "the total sales amount for a given country and product, itemized by moonth".
Does this mean I have to update my schema. And create what is called fact tables?
OLAP queries are is built around the Facts and Dimension concepts.
OLAP-oriented datawarehousing strategies often involve the creation of these Fact and Dimension tables, building what is called a Star Schema or a Snowflake Schema. These schemas offer better performance for OLAP-type queries on relational databases. Data is often loaded by what is called an ETL process (it can be a simple script) that loads data in the appropriate form.
The Python Cubes framework, however, does not force you to alter your schema or create an alternate one. It has a SQL backend which allows you to define your model (in terms of Facts and Dimensions) without the need of changing the actual database model. This is the documentation for the model definition: https://pythonhosted.org/cubes/model.html .
However, in some cases you may still prefer to define a schema for Data Mining and use a transformation process to load data periodically. It depends on your needs, the amount of data you have, performance considerations, etc...
With Cubes you can also use other non RDBMS backends (ie MongoDB), some of which offer built-in aggregation capabilities that OLAP servers like Cubes can leverage.
Can this be used in conjunction with my NodeJS App?
You can issue queries to your Cubes Slicer server from NodeJS.
Any examples?
There is a Javascript client library to query Cubes. You probably want to use this one: https://github.com/Stiivi/cubes.js/
I don't know of any examples using NodeJS. You can try to get some inspiration from the included AngularJS application in Cubes (https://github.com/Stiivi/cubes/tree/master/incubator). Another client tool is CubesViewer which may be of use to you while building your model: http://jjmontesl.github.io/cubesviewer/ .
Since I have only created single server apps. Would python reside on the same nodejs server. How will it start? ('forever app.js' is my default script)
You would run Cubes Slicer server as a web application (directly from your web server, ie. Apache). For example, with Apache, you would use apache-wsgi mod which allows to serve python applications.
Slicer can also run as a small web server in a standalone process, which is very handy during development (but I wouldn't recommend for production environments). In this case, it will be listening on a different port (typically: http://localhost:5000 ).
If I cant use python since I have no exp, what are basics to do it in Nodejs?
You don't really need to use Python at all. You can configure and use Python Cubes as OLAP server, and run queries from Javascript code (ie. directly from the browser). From the client point of view, is like a database system which you can query via HTTP and get responses in JSON format.

Web server using python

I am trying to develop a multithreaded web server, it has the following task:
Collect data from various data sources (API calls), I was planning to do this using multiple threads.
Store the collected data in a memory data structure
Do some processing on the data structure using another thread
This data structure would be queried by the multiple clients; maybe I could also make separate threads for each client request.
Now regarding language and platform, I was considering either python or JAVA. I did some research on Flask framework for python, but I do not know how it will accommodate the multithreaded nature of web server.
Please suggest how I could achieve the above functionality in my project.
Flask, with some of the available addons, is very suited for what you want to do. Keep in mind that flask is pure python, and therefore you can access any of the excellent available python libraries.
As far as I understand what you have in mind, you can:
1- define a url that, when visited, executes the data gathering from external sources by means of, e.g. python-requests (http://docs.python-requests.org/en/latest/)
2- do the same periodically by scheduling the function above
3- store the collected data in a (e.g.) Redis database (which is memory based) or one of the many available databases (all of the nosql dbs have python bindings that you can access from a flask application)
4- define urls for the visiting clients to access the latest versions of the data. You will just need to define the data extraction functions (from redis or whatever you decide to use) and design a nice template to show them.
Flask/Werkzeug will take care of the multithreading necessary to handle simultaneous requests from different clients.

Django Passing data between views

I was wondering what is the 'best' way of passing data between views. Is it better to create invisible fields and pass it using POST or should I encode it in my URLS? Or is there a better/easier way of doing this? Sorry if this question is stupid, I'm pretty new to web programming :)
Thanks
There are different ways to pass data between views. Actually this is not much different that the problem of passing data between 2 different scripts & of course some concepts of inter-process communication come in as well. Some things that come to mind are -
GET request - First request hits view1->send data to browser -> browser redirects to view2
POST request - (as you suggested) Same flow as above but is suitable when more data is involved
Django session variables - This is the simplest to implement
Client-side cookies - Can be used but there is limitations of how much data can be stored.
Shared memory at web server level- Tricky but can be done.
REST API's - If you can have a stand-alone server, then that server can REST API's to invoke views.
Message queues - Again if a stand-alone server is possible maybe even message queues would work. i.e. first view (API) takes requests and pushes it to a queue and some other process can pop messages off and hit your second view (another API). This would decouple first and second view API's and possibly manage load better.
Cache - Maybe a cache like memcached can act as mediator. But then if one is going this route, its better to use Django sessions as it hides a whole lot of implementation details but if scale is a concern, memcached or redis are good options.
Persistent storage - store data in some persistent storage mechanism like mysql. This decouples your request taking part (probably a client facing API) from processing part by having a DB in the middle.
NoSql storages - if speed of writes are in other order of hundreds of thousands per sec, then MySql performance would become bottleneck (there are ways to get around by tweaking mysql config but its not easy). Then considering NoSql DB's could be an alternative. e.g: dynamoDB, Redis, HBase etc.
Stream Processing - like Storm or AWS Kinesis could be an option if your use-case is real-time computation. In fact you could use AWS Lambda in the middle as a server-less compute module which would read off and call your second view API.
Write data into a file - then the next view can read from that file (real ugly). This probably should never ever be done but putting this point here as something that should not be done.
Cant think of any more. Will update if i get any. Hope this helps in someway.

Categories

Resources